Article count:10350 Read by:146647018

Account Entry

Arm v9's toothpaste is out, the super-large core machine learning performance is x2, and the small core has finally been updated in 4 years

Latest update time:2021-09-03 16:09
    Reads:
Mengchen sent this from Aofei Temple
Quantum Bit Report | Public Account QbitAI

The Arm v9 architecture can be said to be the biggest upgrade in 10 years .

After the release of the server-side Neoverse V1 and N2 platforms last month, the first batch of CPUs for the consumer side have finally been unveiled.

Including the Cortex-X2 super core , the Cortex-A710 large core and the Cortex-A510 small core , which replace the X1, A78 and A55 respectively.

It is worth mentioning that the small core series was last updated in 2017.

The super-large core X2 and the small core A510 are already completely based on the 64-bit instruction set, and only the A710 is still compatible with 32-bit.

Arm said this is reserved for the Chinese mobile market , because only China still has a large number of 32-bit mobile apps.

Arm will completely abandon 32-bit by 2023. App developers will be eliminated if they don't upgrade.

Complete solution for large, medium and small cores

Starting last year, Arm has allowed the A series to continue to maintain the PPA (performance, power consumption, and area) design concept.

The large-core A700 series will be used first for sustained main multi-core loads, while the small-core A500 series will be responsible for light and background tasks that prioritize efficiency.

The super-large core X series is allowed to continue to grow in size and power to achieve higher single-core performance and cope with sudden workloads.

Let’s take a look at how much toothpaste was squeezed out this time.

Super Core X2: Double the machine learning performance

Compared with X1, X2 has doubled its machine learning performance and improved its integer computing performance by 16% .

Specific improvements include:

Decoupling branch prediction from fetch can effectively reduce MPKI (misses per thousand instructions)

The instruction cycles in the scheduling stage are reduced from 2 to 1, thus reducing the total cycles from 11 to 10.

Arm said that although this change will increase engineering difficulty and have the cost of increased power consumption and area, it is still worth it compared to the significant performance improvement.

The ROB (reorder buffer) is increased by 30%, improving out-of-order execution capabilities.

Support for SVE2 scalable vector extensions allows developers to reduce the difficulty of code writing and debugging.

In addition to being used in high-end mobile phones , the super-core X series will also be used in large-screen computing devices such as notebooks .

Big core A710: 30% efficiency improvement, 10% performance improvement

A710 will continue to maintain a balance between performance and efficiency, with designs similar to X2 such as improved branch prediction, reduced instruction cycles, and support for SVE2.

The special thing is that the width of the Macro-OP cache is reduced from 6 in A78 to 5, mainly for power consumption and efficiency considerations.

There are also some improvements to make communication between CPU cores, DSU and memory more efficient.

Small core A510: The first update in 4 years, can merge cores

The small core series will continue to use in-order execution (In-order Execution Flow) , which is different from the out-of-order execution process used by Apple M1's efficiency core Icestorm. Arm said that this design is the most power-efficient.

The biggest change is that two cores can be merged together to form a cluster.

This can reduce area, and L2 cache, L2 TLB, etc. can be shared among the merged cores.

Due to the four-year gap, the performance improvement of A510 is relatively large compared to the previous generation A55, ranging from 35% to 62%.

Configurable clustering mode

All of these CPUs can be combined together in different CPU cluster configurations via the new DynamIQ sharing unit DSU-110 .

The new DSU-110 supports up to 16MB of L3 cache, allowing up to 8 Cortex-X2 core clusters.

This configurable cluster approach can meet different market needs from high-end smartphones and laptops to digital TVs and wearable devices.

It will take some time for new CPUs to appear on the market. Chip providers such as Qualcomm generally release new products at the end of the year.

Therefore, mobile phones, notebooks and other products based on Arm v9 architecture will be seen in 2022.

Huawei may switch to RISC-V

Currently, Nvidia's $40 billion acquisition of Arm is still in progress, and it is still unknown whether the Arm v9 architecture can eventually be licensed to Huawei.

Among the partners listed at the end of the Arm v9 release page, there are messages from domestic manufacturers such as Xiaomi, OPPO, and Vivo, but Huawei is not among them .

Huawei is also actively looking for alternative solutions, such as the latest Hongmeng development version Hi3861 announced by Huawei HiSilicon.

Although Huawei did not explicitly disclose the model of the main chip, its development environment requirements require the use of RISC-V related tools.

RISC-V is a completely open source instruction set architecture that uses a relaxed BSD protocol. Enterprises can use it for free and add their own instruction set extensions without having to share them.

Reference links:
[1]
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/first-armv9-cpu-cores
[2] https://www.anandtech.com/show/16693/arm-announces-mobile-armv9-cpu-microarchitectures-cortexx2-cortexa710-cortexa510/6
[3] https://device.harmonyos.com/cn/docs/start/introduce/oem_quickstart_3861_build-0000001054781998

-over-

This article is the original content of [Quantum位], a signed account of NetEase News•NetEase's special content incentive plan. Any unauthorized reproduction is prohibited without the account's authorization.

click here

Featured Posts


Latest articlesabout

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us About Us Service Contact us Device Index Site Map Latest Updates Mobile Version

Site Related: TI Training

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

EEWORLD all rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号 Copyright © 2005-2021 EEWORLD.com.cn, Inc. All rights reserved