Nine elements of multi-core processor design

Publisher:WhisperingWavesLatest update time:2015-04-23 Source: eechina Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere
Like SMT, CMP is committed to exploring the coarse-grained parallelism of computing. CMP can be seen as the development of large-scale integrated circuit technology. When the chip capacity is large enough, the SMP (symmetric multiprocessor) or DSM (distributed shared processor) nodes in the large-scale parallel processor structure can be integrated into the same chip, and each processor executes different threads or processes in parallel. In a single-chip multiprocessor based on the SMP structure, processors communicate with each other through off-chip cache or off-chip shared memory. In a single-chip multiprocessor based on the DSM structure, processors communicate with each other through an on-chip high-speed crossbar switch network connected to the distributed memory.  
 
Since SMP and DSM are already very mature technologies, CMP structure design is relatively easy, but the requirements for back-end design and chip manufacturing process are higher. Because of this, CMP has become the first "future" high-performance processor structure to be applied to commercial CPUs.  
 
Although multi-core can take advantage of the many benefits brought by increased integration and increase the performance of the chip exponentially, it is obvious that some of the original system-level problems are introduced into the processor.  
 
1 Nuclear structure research: isomorphic or heterogeneous  
 
The structure of CMP is divided into two categories: homogeneous and heterogeneous. Homogeneous means that the structure of the internal core is the same, while heterogeneous means that the internal core structure is different. Therefore, it is crucial to study the implementation of the core structure for different applications to achieve the performance of future microprocessors. The structure of the core itself is related to the area, power consumption and performance of the entire chip. How to inherit and develop the achievements of traditional processors directly affects the performance and implementation cycle of multi-core. At the same time, according to Amdahl's theorem, the acceleration ratio of the program is determined by the performance of the serial part, so theoretically it seems that the structure of heterogeneous microprocessors has better performance.  
 
The instruction system used by the core is also very important for the implementation of the system. Whether the multiple cores use the same instruction system or different instruction systems, whether they can run the operating system, etc., will also be one of the research contents.  
 
2 Program Execution Model  
 
The first issue in multi-core processor design is to choose a program execution model. The applicability of the program execution model determines whether the multi-core processor can provide the highest performance at the lowest cost. The program execution model is the interface between compiler designers and system implementers. Compiler designers decide how to convert a high-level language program into a target machine language program according to a program execution model; system implementers decide how to effectively implement the program execution model on a specific target machine. When the target machine is a multi-core architecture, the questions that arise are: How does the multi-core architecture support important program execution models? Are there other program execution models that are more suitable for multi-core architectures? To what extent can these program execution models meet the needs of applications and be accepted by users?  
 
3 Cache Design: Multi-level Cache Design and Consistency Issues  
 
The speed gap between the processor and the main memory is a prominent contradiction for CMP, so multi-level cache must be used to alleviate it. Currently, there are CMPs with shared primary cache, shared secondary cache, and shared main memory. Usually, CMPs use a shared secondary cache CMP structure, that is, each processor core has a private primary cache, and all processor cores share a secondary cache.  
 
The cache architecture design is also directly related to the overall system performance. However, in the CMP structure, whether shared cache or unique cache is better, whether to build multi-level cache on a chip, and how many levels of cache to build, etc., have a great impact on the size, power consumption, layout, performance and operating efficiency of the entire chip, so these are all issues that need to be carefully studied and discussed.  
 
On the other hand, multi-level caches raise consistency issues. The cache consistency model and mechanism used will have a significant impact on the overall performance of the CMP. The cache consistency models widely used in traditional multi-processor system structures include: sequential consistency model, weak consistency model, release consistency model, etc. The related cache consistency mechanisms mainly include bus snooping protocol and directory-based directory protocol. Most current CMP systems use bus-based snooping protocol. 
 
4 Inter-core Communication Technology  
 
Programs executed by the CPU cores of a CMP processor sometimes need to share and synchronize data, so its hardware structure must support inter-core communication. An efficient communication mechanism is an important guarantee for the high performance of a CMP processor. Currently, there are two mainstream on-chip efficient communication mechanisms: one is a cache structure based on bus sharing, and the other is an on-chip interconnect structure. 
 
The bus-shared cache structure means that each CPU core has a shared secondary or tertiary cache to store frequently used data and communicate through the bus connecting the cores. The advantages of this system are simple structure and high communication speed, but the disadvantage is that the bus-based structure has poor scalability.  
 
The structure based on on-chip interconnection means that each CPU core has an independent processing unit and cache, and each CPU core is connected together through a cross switch or on-chip network. Each CPU core communicates through messages. The advantages of this structure are good scalability and guaranteed data bandwidth; the disadvantages are complex hardware structure and large software changes.  
 
Perhaps the result of the competition between the two is not to replace each other but to cooperate with each other, for example, using on-chip networks globally and buses locally to achieve a balance between performance and complexity.  
 
5 Bus Design  
 
In traditional microprocessors, cache misses or memory access events will have a negative impact on the CPU's execution efficiency, and the efficiency of the bus interface unit (BIU) will determine the extent of this impact. When multiple CPU cores request to access memory at the same time or cache misses occur in the private caches of multiple CPU cores at the same time, the efficiency of the BIU's arbitration mechanism for these multiple access requests and the conversion mechanism for external storage access determines the overall performance of the CMP system. Therefore, it is important to find an efficient multi-port bus interface unit (BIU) structure to convert the single-word access of multiple cores to main memory into a more efficient burst access; at the same time, it is important to find the number model of burst access words that is optimal for the overall efficiency of the CMP processor and the arbitration mechanism for efficient multi-port BIU access. 
 
6 Operating system design: task scheduling, interrupt handling, synchronization and mutual exclusion  
 
For multi-core CPUs, optimizing the operating system task scheduling algorithm is the key to ensuring efficiency. General task scheduling algorithms include global queue scheduling and local queue scheduling. The former means that the operating system maintains a global task waiting queue. When a CPU core in the system is idle, the operating system selects a ready task from the global task waiting queue and starts executing it on this core. The advantage of this method is that the CPU core utilization rate is high. The latter means that the operating system maintains a local task waiting queue for each CPU core. When a CPU core in the system is idle, it selects an appropriate task from the task waiting queue of the core to execute. The advantage of this method is that tasks basically do not need to be switched between multiple CPU cores, which is conducive to improving the local cache hit rate of the CPU core. At present, most multi-core CPU operating systems use a task scheduling algorithm based on a global queue.  
 
The interrupt handling of multi-core is very different from that of single-core. The processors of a multi-core need to communicate with each other through interrupts, so the local interrupt controllers between multiple processors and the global interrupt controller responsible for arbitrating the interrupt distribution between the cores also need to be encapsulated inside the chip.  
 
In addition, a multi-core CPU is a multi-tasking system. Since different tasks will compete for shared resources, the system needs to provide synchronization and mutual exclusion mechanisms. However, the traditional solution mechanism for a single core cannot meet the needs of multi-cores, and it is necessary to use the "read-modify-write" atomic operation or other synchronization and mutual exclusion mechanisms provided by the hardware to ensure it. 
 
7 Low power design  
 
The rapid development of semiconductor technology has made the integration of microprocessors higher and higher. At the same time, the surface temperature of processors has become higher and higher and has increased exponentially. The power density of processors can double every three years. Currently, low power consumption and thermal optimization design have become the core issues in microprocessor research. The multi-core structure of CMP determines that its related power consumption research is a crucial topic.  
 
Low-power design is a multi-level problem that requires research at multiple levels, including the operating system level, algorithm level, structure level, circuit level, etc. The low-power design methods at each level achieve different results - the higher the level of abstraction, the more obvious the effect of reducing power consumption and temperature.  
 
8 Memory Wall  
 
In order to make the chip core work fully, the minimum requirement is that the chip can provide memory bandwidth that matches the chip performance. Although the capacity of the internal cache can solve some problems, as the performance is further improved, there must be other means to increase the bandwidth of the memory interface, such as increasing the bandwidth of a single pin, DDR, DDR2, QDR, XDR, etc. Similarly, the system must also have a memory that can provide high bandwidth. Therefore, the chip has higher and higher requirements for packaging. Although the number of package pins increases by 20% each year, it still cannot completely solve the problem, and it also brings about the problem of cost increase. Therefore, how to provide a high-bandwidth, low-latency interface bandwidth is an important problem that must be solved.  
 
9 Reliability and safety design  
 
With the development of technological innovation, the application of processors has penetrated into all aspects of modern society, but there are great hidden dangers in terms of security. On the one hand, the reliability of the processor structure itself is low. Due to the ultra-fineness, high-speed clock design, and low power supply voltage, the safety factor in the design is becoming increasingly difficult to guarantee, and the incidence of failures is gradually increasing. On the other hand, malicious attacks from third parties are increasing in number and the means are becoming more and more advanced, which has become a universal social problem. Now, the improvement of reliability and security has attracted much attention in the field of computer architecture research.  
 
In the future, structures in which multiple processes are executed simultaneously within processor chips such as CMP will become mainstream. Coupled with the increased hardware complexity and design errors, the internal processor chips may not be safe. Therefore, there is still a long way to go in safety and reliability design.
Reference address:Nine elements of multi-core processor design

Previous article:Design and application of MC9S08LL16 in water meter and gas meter
Next article:Microcontroller power saving management method

Latest Microcontroller Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号