Clock Low Power Techniques in SoC Design

Publisher:书卷气息Latest update time:2011-12-24 Source: chinaaetKeywords:SoC Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

1 Overview

The complexity of SoC chip design is increasing day by day, and its internal clock design is becoming more and more complex. There are usually several clock domains inside a SoC chip. The system dynamic power consumption caused by the clock network has become a research hotspot in recent years. The system dynamic power consumption caused by the clock network can be divided into two aspects: (1) Since the clock network is used to provide clock signals for all timing units inside the chip, the speed of the clock frequency determines the dynamic power consumption of the timing unit and the logic unit connected to it. Turning off the clock will eliminate the dynamic power consumption of the circuit. (2) The characteristics of the clock network itself will lead to huge dynamic power consumption: 1) The clock network is the largest interconnection network in the chip, and its load is huge. The load comes from a large number of delay units inserted due to the capacitance of the interconnection line and the deviation of the balanced clock tree; 2) The clock network is the interconnection network with the highest flip rate in the chip. The flip rate directly determines the dynamic power consumption of the interconnection line and the dynamic power consumption of the standard unit driven by the interconnection line.

Aiming at the two types of system dynamic power consumption caused by the clock network, this paper studies and implements three clock low power technologies.

2 Dynamic Clock Management

The working state of a SoC chip varies greatly. Some applications require all modules inside the chip, while other applications only require some modules. In some applications, the chip needs to run at full speed, while in other applications, it can run at a very low operating frequency [1]. Combining the above two points, dynamic management of the chip clock can be divided into two aspects: dynamically switching the clock of the chip's internal modules, and dynamically configuring the clock frequency of the chip's internal modules.

This article takes the audio and video decoding SoC chip - rsthu1 as an example to introduce the dynamic clock management technology used in system-level design.

In order to realize the dynamic clock management of rsthu1, four working modes are defined in the chip system-level design, as shown in Table 1, where a solid circle indicates that this module is turned on; a hollow circle indicates that this module is turned off.

When the chip works in normal mode, the high-speed clock HCLK is used to supply the four main modules in the system, Risc0, Risc1, Decoder, and BE (Bit Engine). At this time, the system runs at full speed to perform audio and video decoding. When the chip works in low-speed mode, the low-speed clock VCLK is used to supply the above four modules. At this time, the system can run simple applications, ensuring the continuous operation of the system and reducing the clock frequency, that is, reducing the dynamic power consumption of the system. When the chip works in idle mode, only the operation of the operating system is retained, and the low-speed clock VCLK is used to supply Risc0, and the clock supply of other modules is turned off, eliminating the dynamic power consumption generated by the modules except Risc0. When the chip works in sleep mode, the clock supply of all modules is turned off to eliminate the dynamic power consumption generated when the system is not working.

Prime Power, a power analysis tool from Synopsys, is used to perform power analysis based on simulation waveforms of four operating modes at the RTL level. The results are shown in Table 2.

It can be seen that after adopting dynamic clock management technology in system-level design, the system dynamic power consumption decreases step by step in the four working modes of normal, low speed, idle and sleep, and the power consumption optimization effect is obvious.

3. Gated Clock

The following statements often appear in RTL code:

@posedge (CLK)

begin

if (EN == 1)

Data_out = Data_in;

end

If the above code is directly subjected to logic synthesis, the circuit structure shown in Figure 1 will be generated. The control signal for controlling the register state update is placed before the input end of the register, and the register state is updated by controlling whether new data is received. In the circuit of this structure, the register clock end is still flipping when the register state is not updated, which will waste the system dynamic power consumption.

The structure of Figure 2 is adopted, and the control signal is placed before the clock end of the register. By controlling whether the register is flipped, the register state is controlled to be updated. Compared with the circuit structure of Figure 1, the clock signal of the circuit structure of Figure 2 will not flip when the register state is not updated, eliminating the waste of system dynamic power consumption caused by this. Since multiple MUXs are replaced by a gated clock unit, the power consumption is further reduced.

The gated clock unit can be inserted during logic synthesis using the power optimization tool Power Compiler from Synopsys. The advantages are as follows [3]: (1) there is no need to modify the RTL code, as Power Compiler will automatically detect the statements in the RTL code where the gated clock can be inserted; and (2) the gated clock unit will be automatically inserted into the gate-level netlist during logic synthesis.

Power Compiler was used to perform gated clock synthesis on rsthu1, and Prime Power was used to perform power consumption analysis. The results are shown in Table 3. It can be seen that the total power consumption was reduced by 34.52% by using gated clock technology during logic synthesis, and the power consumption optimization effect was obvious.

4 Low Power Clock Tree Synthesis

By observing the growth process of the clock tree, it can be found that the growth of the clock tree is divided into horizontal expansion and vertical extension, as shown in FIG3 , where Arrow1 and Arrow3 are vertical extensions; Arrow2 is a horizontal expansion.

Ordinary clock tree synthesis aims to reduce clock skew, increase vertical extension, reduce horizontal expansion, invest more buffers, and adjust the delay of each clock path in a finer granularity to obtain smaller clock skew. The above method increases the scale of the clock tree at the expense of increasing the clock tree size. The clock tree obtained by synthesis is shown in Figure 4 (a).

For power consumption considerations, it is hoped that the scale of the clock tree can be reduced. The scale of the clock tree can be effectively reduced by reducing the vertical extension of the clock tree and increasing the horizontal expansion, as shown in Figure 4 (b). However, due to the reduction in the number of buffers, the flat clock tree will adjust the delay of each clock path in a coarse-grained manner compared to the deep clock tree, resulting in a larger clock deviation. It can be seen that the goal of reducing the scale of the clock tree is to perform low-power clock tree synthesis at the cost of increasing a certain clock deviation.

When performing clock tree synthesis, the back-end tool can constrain the clock tree structure through synthesis parameters, as shown in Table 4.

When performing clock tree synthesis on the fast clock HCLK of rsthu1, low-power clock tree synthesis with the goal of reducing the clock tree size is used, and the results are shown in Table 5. The maximum fan-out is increased, and the total path delay and the upper limit of the number of buffers at each level are reduced. After increasing the maximum fan-out, the clock tree size is reduced by 20.21%, while the clock skew only increases by 0.023 ns. Therefore, the deterioration of the skew results caused by reducing the clock tree size is acceptable.

5 Conclusion

At present, there are many clock low-power technologies, which can further reduce the power consumption caused by the clock network in the design of SoC chips. In future research work, more extensive and in-depth exploration is needed.

Keywords:SoC Reference address:Clock Low Power Techniques in SoC Design

Previous article:A brief discussion on design techniques to reduce power consumption
Next article:Dynamic energy consumption management scheme for embedded systems

Recommended ReadingLatest update time:2024-11-16 15:20

S3C2440 bare metal ------- clock
1.S3C2440 clock system 1.1.S3C2440 block diagram As can be seen from the above structure diagram, S3C2440 is mainly divided into CPU, high-speed bus, and low-speed bus. CPU works with FCLK The AHB bus works on HCLK, and the AHB (Advance High performance Bus) bus is mainly used for high-performance modules. Slo
[Microcontroller]
S3C2440 bare metal ------- clock
Demonstration and analysis of clock recovery
In the above text and pictures, we explained the influence of clock recovery on eye diagram and jitter test through PPT. In this article, we will give you a more intuitive understanding of clock recovery through actual demonstration. The instrument used for demonstration is our 10-bit S series oscilloscope. At the sam
[Test Measurement]
Demonstration and analysis of clock recovery
51 MCU Study Notes——13.1DS1302 Real-time Clock Principle
DS1302 Real Time Clock Chip schematic: Pin Diagram: It should be noted that both VCC1 and VCC2 are connected to external capacitors, but the capacity is different. The capacitance value of the normal backup power supply VCC2 is 100 times that of VCC1. This is not difficult to understand. If the power is cut off, V
[Microcontroller]
51 MCU Study Notes——13.1DS1302 Real-time Clock Principle
Detailed explanation of STM32 clock
In STM32, there are five clock sources, HSI, HSE, LSI, LSE, and PLL. In fact, there are four clock sources, as shown in the figure below (gray-blue). PLL is a phase-locked loop circuit that multiplies the frequency. Get the PLL clock. ①. HSI is the high-speed internal clock, RC oscillator, with a frequency of 8MHz.
[Microcontroller]
A brief discussion on the SysTick system clock timer of the STM32F10X chip
As the title, the text is as follows: 1 Introduction        To implement the configuration of the Cortex-M3 system timer SysTick, you need to have the following knowledge: the default frequency of the Cortex-M3 system timer is 8 times the frequency of HCLK (as shown in the figure below), so you need to configure the
[Microcontroller]
JZ2440: Clock settings
The goal of this section is to have a preliminary understanding of the clock on the board and to prepare for our subsequent programs through preliminary settings. 1. Basic resources on the board: Onboard crystal oscillator 12M The main clock source and USB clock source are both crystal oscillators   2. Related items
[Microcontroller]
JZ2440: Clock settings
Summary of TSMC's latest advanced process technology
After the 2019 VLSI Symposium in Japan, TSMC held a small press conference and gave a speech on packaging during SEMICON West. This article will summarize the technologies mentioned by TSMC in the above events.   7nm Node (N7) TSMC considers their 7nm node (N7) to be the most advanced logic technology available today.
[Embedded]
Summary of TSMC's latest advanced process technology
Design of electronic clock based on stc51 single chip microcomputer
Hardware Resources The STC series microcontroller, a four-digit digital tube, a diode, and some necessary auxiliary sub-circuits are used in the welding circuit board. The bit selection ports of the digital tube are P3^0, P3^1, P3^2, P3^3, and the segment selection ports are the 8 ports of P1; the adjustment po
[Microcontroller]
Latest Power Management Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号