Detailed explanation of AWS Graviton4
At the recent Amazon AWS re:Invent 2023, the company launched its fourth generation of custom in-house server processors - Graviton4. Developed by Israel's Annapurna Labs, the chip uses the latest Arm Neoverse IP as well as custom IP primarily aimed at scaling and accelerator connectivity improvements.
Amazon Web Services CEO Adam Selipsky announced the launch of Graviton4, the company's latest custom server processor, during a keynote speech. He said: "Graviton4 is the most powerful and energy-efficient chip we have ever made; it has 50% more cores and 75% more memory bandwidth than Graviton3. Graviton4 is on average 30% faster than Graviton3 and for some Workload performance is even better, such as 40% faster for database applications and 45% faster for Java applications.”
In addition to the Graviton4 announcement, Selipsky also announced new R8g instances for EC2 based on Graviton4 processors. R8g instances are available for preview now, with additional Graviton4-based instances expected early next year.
A non-financially relevant way to measure product health is to use product development cadence as a proxy. The first in-house Graviton chip was launched at re:Invent 2018. Amazon soon launched the second generation Graviton2, and launched the Arm Neoverse series of cores the following year. Amazon launched Graviton3 in November 2021. Graviton3 makes a number of changes to the chip architecture and packaging - moving to Arm's new Neoverse V platform and taking advantage of chiplet architecture.
Graviton4 Introduction
At re:Invent 2023, AWS Senior Principal Engineer Ali Saidi provided some additional details about the latest chips. Architecturally, this chip has changed a lot from last year. The new Graviton4 processor integrates Arm’s latest Neoverse V2. Graviton3 is the first chip to implement ARMv8.4-A ISA while bringing (2×256-bit) SVE support. Graviton4 updates its core to Neoverse V2 for the first time and supports Armv9.0 ISA.
Compared with the previous generation, the new Graviton4 integrates 96 cores, which is 1.5 times more than Graviton3. All cores are interconnected using CMN-700 mesh interconnects. It is also important to accommodate the increased number of cores. To this end, Annapurna Labs increased the number of memory channels and their data rates by 50%, from 8x DDR5-4800 to 12x DDR5-5600. This increases the theoretical peak bandwidth from 307.2 GB/s to 537.6 GB/s. This change increases per-core saturation from 4.8 GB/s in Graviton3 to 5.6 GB/s, a 17% improvement per core. In terms of connectivity, Graviton4 triples the number of PCIe lanes. Graviton3 was the first to launch 32-lane PCIe 5.0; the new chip increases the number of PCIe 5.0 lanes to 96.
V2 L2$ weakly contains L1 and is constructed as an 8-way set association using four groups. Arm officially offers V2 with two cache configurations - 1 MiB and 2 MiB configurations.
For Graviton4, Amazon chose to use the large 2 MiB option, doubling the effective L2 cache of previous Graviton3. "When we looked at the actual workloads, we noticed that their working sets didn't fit into the cache that we had, so each core now has 2 MiB of L2 cache," Saidi noted. There are 96 cores on the chip, and you're looking at 192 MiB of L2 cache. Like Graviton3, Saidi confirmed that the L3 cache is decentralized and shared among all cores. There is no official mention of cache capacity, but the number is 96 MiB.
Similar to Graviton3, Graviton4 uses a 7-chiplet architecture, although the PCIe chiplets are arranged slightly differently on the package. To the east and west of the main computing chip are DDR5 controller chiplets, each with three channels. To the north and south of the computing chip are PCIe chiplets. Perhaps the most notable difference from Graviton3 is the layout of the PCIe controller chiplet. They are no longer adjacent to the SoC chip, which means there is no longer an expensive buried bridge between the two. Given the nature of the PCIe interface, it is unlikely to have much of an impact on performance while reducing packaging costs, which may have been the motivation for this change.
With the new chip, Amazon has addressed most of the flaws criticized by rival Ampere.
Each generation of chips has different concerns
One thing Saidi explained during his presentation at re:Invent 2023 is that each generation of Graviton has a unique main focus that they want to solve. When they launched Graviton1, their main focus was on a proof of concept. “When we started using Graviton1, the focus was to prove that you could have an alternative architecture in EC2; you could configure instances the same way and run various workloads using security groups and they would work as you expected ”
With Graviton2, the focus shifts to better general-purpose computing, increasing the number of applicable workloads. "With Graviton2, we've significantly increased those workloads. We're seeing people running Java applications, key-value stores, databases and a host of other workloads."
With Graviton3, the focus shifts to higher performance, especially in HPC and machine learning applications. This was achieved by moving to the Neoverse V series and introducing SVE support and greater SIMD width.
“With the new Graviton 4, our focus now is on increasing scale; increasing applicability again. We have customers coming to us and telling us ‘I’ve moved all my databases to Graviton. "
I'm currently using 32 vCPUs and I think in the next 1-2 years as my business grows I may end up using 64 vCPUs. But you have no greater choice. "So with Graviton4 we now have a choice."
With Graviton4, base support is now increased by 50%, supporting 96 cores in a single socket in an AWS 24xlarge instance with 96 vCPUs. For applications that must scale further, the new chip introduces new multi-socket consistency. Two Graviton4 chips can be connected together to form a system that actually has three times the number of cores and three times the DRAM of Graviton3. It is worth pointing out that since the data rate of Graviton4 is also increased, the total system peak theoretical bandwidth is actually higher, 3.5 times that of Graviton3.
With the launch of Neoverse N2 and V1, Arm also launched the Coherent Mesh Network (CMN) 700 mesh network, which is the basis of the Graviton4 mesh interconnect. One of the features of the new network is multi-chip coherence, and it also supports CCIX 2.0 and CXL. The slide below appears to show three CCIX links between the two slots, it's unclear if this is purely for illustration purposes or if the chip has 3x BiDi links integrated.
While the Graviton team was developing Graviton4, another team was also developing the Nitro chip. Saudi explained that this allowed them to develop the two together and perform some additional optimizations. The resulting dual-socket platform can operate in a number of different modes. It can operate as two non-coherent virtual systems, one coherent virtual system, two metallic systems, or one metallic system. One of the reasons for these configurations is the ability to turn off consistency when not in use and gain additional energy savings.
Performance
We also saw some performance gains. These benchmarks compare Graviton3 and Graviton4 on R7g to R8g instances in similar systems.
In the MySQL HammerDB load generator test, Graviton4 improved performance by 40% compared to Graviton3. Similarly, using Nginx in the load balancing test, the performance of Graviton4 increased by 30% compared to Graviton3. Likewise, in Groovy/Grails web applications, Graviton4 improves performance by more than 40%. In a test of the popular Redis key-value store using two load generators and a latency tester, Saidi reported a 25% performance improvement.
When comparing all generations of Graviton on the same Groovy and MySQL workload tests mentioned above, Saidi noted that they saw performance improvements of nearly 4x or more compared to the original Graviton chips launched in 2018.
R8g instances powered by Graviton4 are now available in preview, with general availability planned for early 2024.
Original English text:
https://fuse.wikichip.org/news/7633/amazon-debuts-4th-gen-graviton/
*Disclaimer: This article is original by the author. The content of the article is the personal opinion of the author. The reprinting by Semiconductor Industry Watch is only to convey a different point of view. It does not mean that Semiconductor Industry Watch agrees or supports the view. If you have any objections, please contact Semiconductor Industry Watch.
Today is the 3610th issue shared by "Semiconductor Industry Observation" with you. Welcome to pay attention.
Recommended reading
Semiconductor Industry Watch
" Semiconductor's First Vertical Media "
Real-time professional original depth
Identify the QR code , reply to the keywords below, and read more
Wafers | Integrated circuits | Equipment | Automotive chips | Storage | TSMC | AI | Packaging
Reply
Submit an article
and read "How to Become a Member of "Semiconductor Industry Watch""
Reply Search and you can easily find other articles you are interested in!