Unveiling the Cortex-A55: Why is it such an important processor for the future digital world?
Have you heard about the recent release of several new CPUs? They are very powerful! Yes, they are the ARM Cortex-A75 and Cortex-A55, the first Cortex-A series processors based on the newly released DynamIQ technology. For more information about the Cortex-A75, click here → In-depth article! Here is everything you want to know about ARM's latest high-performance processor, the Cortex-A75 . This time we will discuss the Cortex-A55 and why it is a processor that is crucial to the future of the digital world.
Born into a prestigious family, well-tested
⁜Higher performance to meet the needs of artificial intelligence tasks, which are a major focus of current and future ARM IP
To understand the true potential of the Cortex-A55, let’s take a quick look back at its predecessor: the ARM Cortex-A53. With more than 1.5 billion devices shipping, this CPU remains the industry’s highest-shipping 64-bit Cortex-A series CPU today. Launched in 2012, the Cortex-A53’s unique design combines performance, low power consumption, and scalability in size, with a range of versatile features that enable it to be used in a wide range of markets, including high-end smartphones, network infrastructure, automotive infotainment, advanced driver assistance systems (ADAS), digital TVs, entry-level mobile and consumer devices, and even satellites.
Yet a lot has changed in the world around us since 2012. The emerging trends we are seeing now point to the potential of a digital world that is connected and intelligent. From fully autonomous self-driving cars to smart apps on all kinds of devices, it is a foregone conclusion that artificial intelligence (AI) and machine learning (ML) will truly be integrated into our daily lives. The prevalence of Internet of Things (IoT) applications means an explosion of “things” that are constantly generating, consuming, and interacting with data. Augmented, virtual, and mixed reality (AR, VR, and MR) are set to revolutionize the way we interact with each other and with machines, merging the physical and digital worlds.
Over the past two years, ARM engineers have been working on the successor to the Cortex-A53 to meet the needs of this emerging technology. Our goal was to create a CPU with greatly improved performance, efficiency and scalability. This CPU also needs to have many advanced features to meet the needs of various future applications from end to cloud. Fortunately, we did it.
Overall performance improvement
⁜ Cortex-A55 achieves comprehensive performance improvement
The Cortex-A55 is based on the latest ARMv8.2 architecture and builds on its predecessor. It pushes the limits in terms of performance while maintaining the same power consumption level as the Cortex-A53. We have done our best to improve the Cortex-A53 and give it the following features:
➤ Memory performance is up to twice that of Cortex-A53 at the same frequency and process conditions
➤ Under the same frequency and process conditions, the performance is 15% higher than Cortex-A53
➤ Scalability is more than ten times higher than Cortex-A53
This is due to our focus on the existing design concepts of Cortex-A53 and challenging them:
1
The branch predictor has been completely revised, with neural network elements incorporated into its algorithm to improve predictions. A zero-cycle branch predictor has also been added to further reduce bubbles in the pipeline. This allows for shorter and shorter idle times between instructions.
2
Our design makes the L2 cache a dedicated cache for each CPU, which reduces the L2 cache access time by more than 50% compared to Cortex-A53. We also designed the L2 cache to operate at the same frequency as the CPU. This reduces latency and significantly improves CPU performance in various benchmark tools.
3
Introduced a Level 3 cache that can be shared by all Cortex-A55 CPUs in the cluster. This allows the DynamIQ cluster to benefit from increased memory capacity close to the CPU, thereby improving performance and reducing system power. The Level 3 cache is part of the DynamIQ Shared Unit (DSU), a new functional unit in the DynamIQ processor.
4
8-bit integer matrix multiplications contribute more than 85% to neural network performance. New architectural instructions have been added to the Cortex-A55 NEON pipeline, enabling it to perform 16 8-bit integer operations per cycle. These new instructions also enable the CPU to perform 8 16-bit floating point operations per cycle, rounding operations for two MAC instructions, and facilitate color space conversion.
Compared with Cortex-A53, a significant improvement in performance
⁜Cortex-A 55 continues to lead in power and thermal efficiency
These improvements to the branch predictor, NEON and FP units, as well as reduced memory latency, are just some of the reasons for the dramatic performance gains of the Cortex-A55. The Cortex-A55 achieves dramatic performance gains while maintaining similar power consumption to the Cortex-A53. All in all, the Cortex-A55 achieves a 15% improvement in energy efficiency. Power is more important in product design than performance. The Cortex-A55 consumes 30% less power than the Cortex-A53 while delivering the same performance!
The Cortex-A55 delivers sustained performance for far longer than today’s Cortex-A53 solutions. This is critical for the user experience in areas such as AR, VR and MR, which are expected to dominate the future mobile market. These use cases are already highly threaded and have strict requirements for latency. The latter refers to the movement time delay, which according to industry research needs to be kept at 20 milliseconds or less to avoid nausea and dizziness. While today’s CPUs have achieved the performance levels required to achieve 20 millisecond latency, thermal limitations mean that these CPUs cannot maintain this performance level for long. With the Cortex-A55, we can provide a solution for extended sustained performance time in future VR devices.
⁜Advanced features and higher performance to meet the needs of the infrastructure market
Industry-leading efficiency sets the Cortex-A55 apart in the infrastructure market, where applications such as Power over Ethernet (PoE) wireless access points and thermally constrained automotive solutions mounted on rear-view mirrors can leverage the thermally efficient Cortex-A55 to deliver the highest performance within a specific thermal envelope. In 5G Remote Radio Heads (RRHs), the Cortex-A55 CPU is also able to maximize network throughput within a specific power envelope.
Extend from edge to cloud
⁜ The right size and computing performance to meet every need
In addition to performance and efficiency, the Cortex-A55 is also highly scalable in terms of physical die size and compute performance. To this end, it includes multiple RTL configuration options, making the configurability ten times that of the Cortex-A53. In fact, it has more than 3,000 unique configurations, making it the most scalable Cortex-A CPU ever.
The Cortex-A55 continues the flexibility of the Cortex-A53, with options such as NEON, Crypto and ECC (error correction code), but also adopts new practical configuration options. For example, the dedicated L2 cache can be configured from 64KB to 256KB, which can bring a 10% performance improvement. The dedicated L2 cache can greatly improve performance, and it will undoubtedly become the default choice in many markets. It is also designed to be optional to further reduce the chip size in size-sensitive markets such as the Internet of Things.
⁜ Detailed explanation of new features in DynamIQ Shared Unit (DSU)
The DSU is common on both the Cortex-A55 and the Cortex-A75. It includes more configuration options and can be customized according to the user's own application. For example, the shared L3 cache between the CPUs can be expanded from 0KB to a maximum of 4MB. It also supports multi-purpose interface options through AMBA 5 ACE or CHI, so it can be used in a wider range of systems. The accelerator coherence port (ACP) and low-latency peripheral port (PP) are also integrated into the DSU, which allows tightly coupled accelerators to be connected to the Cortex-A55 to handle general-purpose calculations. These features, coupled with the machine learning capabilities of the Cortex-A55, allow more calculations to be performed closer to the "end" of IoT gateway applications.
Includes many advanced features for emerging applications
⁜Accelerate the application of artificial intelligence in various fields
It’s no secret that artificial intelligence will become more commonplace. By extension, it will also become common for our devices to run machine learning tasks. There are many ways to implement machine learning processing on a chip, but the CPU has a unique advantage in this regard. The CPU can perform general-purpose computing, so it can run on chips for artificial intelligence applications. At present, machine learning and artificial intelligence are constantly changing, and fixed-function hardware is not only expensive, but also easily outdated for machine learning.
Improvements to the Cortex-A55 NEON pipeline and the addition of machine learning instructions mean that the Cortex-A55 has much higher machine learning performance in matrix multiplication operations than the Cortex A53. The recently released ARM Compute Libraries are an entry-level software function set optimized for the ARM Cortex-A NEON and Mali GPU IP, which can also be applied to the Cortex-A55 NEON and further improve its machine learning performance!
⁜Cortex-A55 enables safer autonomous systems
The Cortex-A55 also has high reliability, availability and serviceability (RAS) features, which enable it to serve various fields such as infrastructure and automobiles. For the automotive market, the security of the Cortex-A55 has now been improved. It provides optional ECC and parity features on each level of cache, and also supports "data poisoning", which can postpone detected, uncorrectable errors for more resilient systems. It is also the first Cortex-A series CPU to adopt a new design process to avoid system failures, making it very suitable for ASIL D applications when paired with the Cortex-R52.
Deeply embedded advanced power management features
⁜Advanced power management features to improve energy efficiency
The Cortex-A55 has many new power features, such as hardware-controlled state transitions that can switch from ON to OFF faster. The Cortex-A55 can also autonomously turn off the L3 cache based on the currently running application. For heavy-load applications such as VR that require more memory, the L3 cache is fully turned on. However, for light-load applications such as music playback that reside entirely in the L1 and L2 caches, the L3 cache is turned off. There are two additional power modes for application scenarios between heavy and light loads.
It is now also possible to create a single CPU or groups of CPUs, each in its own independent voltage domain within the cluster, allowing for more granular dynamic scaling of voltage and frequency. This has two major benefits: first, it allows designers to further tune the system for optimal performance and power savings. Second, it also means that DynamIQ systems can more easily closely match the varying thermal limits of a device, so performance can be maximized.
A new era of big.LITTLE processing
big.LITTLE technology has been synonymous with heterogeneous processing since its introduction in 2011. As a result, two out of every three Android ARMv8 devices on the market today rely on big.LITTLE technology for power and performance optimization. DynamIQ big.LITTLE is the next generation of heterogeneous computing technology for DynamIQ systems.
It enables designers to create a fully integrated solution using the Cortex-A75 "big" CPU and the Cortex-A55 "little" CPU, which are physically located in a single CPU cluster. All software thread migrations and resulting cache snoops between the big and small CPUs now occur within the cluster. Compared to the Cortex-A73, the Cortex-A75 CPU can be used in higher frequency applications while still maintaining a continuous DVFS curve with the Cortex-A55. This is an important design requirement for big.LITTLE systems. Together, these features can significantly improve peak performance, sustained performance, and intelligent functions compared to the previous generation of big.LITTLE technology.
⁜DynamIQ big.LITTLE enables a richer user experience
Quad-core and 8-core solutions based on Cortex-A53 are commonly used in today's mid-range mobile and consumer markets. However, as advanced use cases such as artificial intelligence and virtual reality penetrate from the high-end market to the mid-range market, manufacturers need to provide higher performance and intelligent functions at a lower cost. DynamIQ big.LITTLE meets this demand by launching new heterogeneous CPU configurations, such as 1 Cortex-A75 + 3 Cortex-A55 (1 big + 3 small) and 1 Cortex-A75 + 7 Cortex-A55 (1 big + 7 small). These new configurations can achieve more than 2 times the single-threaded performance compared to 4-core and 8-core Cortex-A55 designs with similar chip size.
Infrastructure and Mobile System-on-Chip (SoC) Design Guide Now Available
ARM has a long history of investing heavily in example SoC designs to validate our IP. As the ARM IP portfolio has grown, so has the complexity and scope of these example systems. This work covers everything from SoC architecture to detailed pre-production analysis. ARM delivers this knowledge in the form of "System Guides".
In addition to the new CPUs, ARM is also delivering a variety of new system guides covering both mobile and infrastructure systems:
➤CoreLink SGM-775 System Guide for Mobile Systems Designed and optimized for Cortex-A75, Cortex-A55 and Mali-G72
➤ The SGM-775 includes documentation, models and software and is available free of charge to ARM partners.
When can we expect Cortex-A55 based devices to be available?
The final release of the Cortex-A55 is exciting. The Cortex-A55's significant advances in performance, energy efficiency and scalability will make it ARM's next most shipped Cortex-A series CPU. However, the excitement does not stop there. A large number of ARM partners in the ecosystem have now obtained relevant licenses for the Cortex-A55. Let us look forward to what new rounds of intelligent computing solutions they will release in the coming months. Although we cannot predict what form Cortex-A55-based devices will take, we can be sure that the future will be extremely exciting from 2018!