USTC/Huawei Noah takes action! Chip performance ≠ layout score, EDA physical design framework is fully open source

Latest update time：2024-08-12

Reads：

ChipBench team contribution
Quantum Bit | Public Account QbitAI

Chip physical layout has a new evaluation standard that directly targets performance indicators!

USTC MIRA Lab and Huawei Noah's Ark Lab jointly released a new evaluation framework and dataset, which is completely open source.

With this set of standards, the problem of inconsistency between layout indicators and final end-to-end performance, and high scores but low PPA performance, can be solved.

In chip design, electronic design automation (EDA) is a crucial link and is known as the "mother of chips" in the industry, while chip physical layout (Placement) is a key step.

The chip physical layout problem is an NP-hard problem. People try to use AI to do this work, but there is a lack of an effective evaluation standard.

Traditional evaluation metrics - proxy metrics, while easy to calculate, often differ significantly from the final end-to-end performance of the chip.

To bridge this gap, USTC MIRA Lab and Huawei Noah's Ark Lab jointly released an evaluation framework called ChiPBench , as well as related datasets.

With the launch of ChiPBench, the author also found that the current chip layout algorithm has many shortcomings, reminding relevant researchers that it is time to develop new algorithms.

Chip design process faces challenges

According to "Moore's Law", the scale of integrated circuits (ICs) has grown exponentially, bringing unprecedented challenges to chip design.

To cope with this growing complexity, EDA tools came into being, providing great help to hardware engineers.

EDA tools can automatically complete each step in the chip design workflow, including high-level synthesis, logic synthesis, physical design, testing and verification.

Among them, chip layout is an important link, which can be divided into two sub-stages: macro layout and standard cell layout.

Macro layout is a critical issue in very large scale integration (VLSI) physical design, mainly involving the arrangement of larger components such as SRAMs and clock generators, often referred to as macros .

This stage has a significant impact on the overall layout of the chip and important design parameters such as line length, power consumption and area.

The subsequent standard cell layout stage needs to deal with the arrangement of more and smaller standard cells, which are the basic components of digital design.

Usually, this stage uses methods such as combinatorial optimization to optimize layout placement, minimize the distance between units, lay a good foundation for subsequent wiring work, and optimize the interconnection timing performance to a certain extent.

Chip layout is traditionally done manually by human professional designers, which is not only labor-intensive but also requires a lot of expert prior knowledge.

Therefore, many design automation methods, especially artificial intelligence-based algorithms, have been developed to automate this process.

However, due to the long workflow of chip design, the evaluation of these algorithms usually focuses on intermediate proxy metrics that are easy to calculate (such as half-perimeter wire length HPWL, layout cell density, etc.) , but these metrics often deviate to a certain extent from the end-to-end performance (i.e., the PPA of the final design) .

On the one hand, due to the lengthy chip design workflow, obtaining the end-to-end performance of a given chip layout scheme requires a lot of engineering design work. At the same time, the authors found that end-to-end performance cannot usually be obtained by directly using existing open source EDA tools and datasets.

For the above reasons, existing AI-based chip layout algorithms use simple and easily available intermediate proxy metrics to train and evaluate the learned models.

On the other hand, since PPA indicators reflect many aspects that were not fully considered in previous stages, there is a serious gap between proxy indicators and the final PPA targets .

Therefore, this gap greatly limits the application of existing AI-based layout algorithms in actual industrial scenarios.

End-to-end chip performance estimation

The authors believe that this gap is due to the oversimplification of early datasets .

For example, the widespread use of the Bookshelf format is a representative of "oversimplification". The layout results in this format are not suitable for subsequent design stages and cannot achieve an effective final design.

Some subsequent datasets, while providing LEF/DEF files and necessary files for running subsequent stages, still contain a limited number of circuits and lack information required by some open source tools such as OpenROAD .

For example, buffer component definitions required for clock tree synthesis were missing from the library file, and layer definitions in the LEF file were incomplete, which hampered work during the routing phase.

To address these issues, the authors constructed a dataset containing comprehensive physical implementation information of the entire process.

The dataset covers designs from a range of different domains, including components such as CPUs, GPUs, network interfaces, image processing techniques, IoT devices, cryptographic units, and microcontrollers.

The authors executed six state-of-the-art AI-based chip physical layout algorithms on these designs and fed the results of each single-point algorithm into the physical implementation workflow through a standard input/output format to obtain the final PPA results.

The generation of the initial data set takes Verilog files as raw data. OpenROAD performs logic synthesis to convert these high-level descriptions into netlists, which describe in detail the electrical connections between circuit elements.

OpenROAD's integrated floorplanning tool then uses this netlist to configure the physical layout of the circuit on the silicon wafer.

OpenROAD converts the design generated in the floorplanning phase into LEF/DEF files to facilitate the application of subsequent layout algorithms.

At the same time, the authors completed the entire EDA design process through OpenROAD, generating data including layout, timing tree synthesis and routing in the subsequent stages.

The ChipBench data set contains all the design toolkits required for each stage of the physical design flow.

When evaluating the algorithm in the layout phase, the output files from the previous phase are used as input to the evaluation algorithm. The algorithm processes these input files and generates corresponding output files, which are then integrated into the OpenROAD design flow.

Ultimately, the dataset will report performance metrics including TNS, WNS, area, and power consumption to provide a comprehensive end-to-end performance evaluation.

This approach provides a comprehensive set of evaluation metrics that can measure the impact of a specific stage algorithm on the final chip design optimization effect, ensuring the consistency of the evaluation metrics and avoiding the limitations of relying solely on simplified metrics of a single stage.

This evaluation method is conducive to the optimization and development of various algorithms, ensuring that algorithm improvements can be transformed into actual performance improvements in chip designs. At the same time, through a powerful testing and improvement framework, it promotes the development of more efficient and effective open source EDA tools.

Chip layout requires development of new algorithms

Using the above workflow, the authors evaluated multiple AI-based chip layout algorithms, including SA, WireMask-EA, DREAMPlace, AutoDMP, MaskPlace, ChiPFormer, and the default algorithm in OpenROAD.

The authors perform an end-to-end evaluation of these algorithms and report the final performance metrics.

In addition, the correlation analysis results show that the correlation between MacroHPWL and the final performance indicators is very weak, which indicates that optimizing MacroHPWL has a very limited impact on these performance indicators.

Wirelength also has a weak correlation with WNS and TNS, which means that even if some single-point algorithms are successful in optimizing intermediate indicators such as Wirelength, they may only improve one aspect of the PPA indicator in the final physical implementation, but fail to fully optimize it.

Therefore, there is a need to find more appropriate intermediate indicators that can be better associated with actual PPA targets.

Our evaluation results reveal inconsistencies between the intermediate metrics emphasized by current mainstream layout algorithms and the final performance results. These findings highlight the necessity of developing layout algorithms from a new perspective.