A new direction for the development of mobile processors: integrating more GPUs will become the mainstream?-EEWORLD

Collect

　　High-end mobile devices have increasingly higher requirements for visual experiences such as multimedia, prompting mobile processor developers to integrate more GPU cores on a large scale, hoping to use parallel computing capabilities to distribute the CPU computing burden and thereby enhance graphics and visual performance.

　　In the global consumer market, smartphones and tablet devices are undoubtedly the hottest products. According to the latest forecast released by Gartner, mobile phone shipments will exceed 1.8 billion units in 2013, a 3.7% increase over 2012; tablet shipments will reach 184 million units, a 42.7% increase, showing a rapid growth trend.

　　It is particularly noteworthy that high-end mobile device products are constantly innovating, and the visual application experience provided to users is close to the level of personal computers (PCs) and televisions. They can provide rich and smooth two-dimensional (2D) or three-dimensional (3D) user graphics interface (GUI), retina-level high image quality, fast web page presentation and photography functions, and more realistic 3D games.

　　To achieve these user experiences on a small mobile device, the design threshold for developers has become increasingly high. Take 3D games as an example. To make mobile devices achieve the same gaming experience as PCs and TVs, the visual effects that need to be improved include physical performance, dynamic lighting, high dynamic range materials (HDR Texture), advanced shadow effects, geometric details, subsurface scattering, and dynamic reflection.

　　Fortunately, the most critical mobile processor architecture is constantly being upgraded. In addition to the emergence of heterogeneous multi-core architectures that integrate central processing units (CPUs) and graphics processing units (GPUs), the number and processing power of GPUs have also increased significantly, becoming the biggest contributor to achieving smooth and long-lasting visual experience. The following will analyze the changes and latest developments in the architecture of advanced GPUs.

　　Heterogeneous multi-core SoCs are unstoppable in achieving cooler graphics functions

　　More and more mid-to-high-end mobile devices are equipped with mobile processors with quad-core CPUs. For example, NVIDIA's Tegra series has adopted a 4+1 multi-core architecture since Tegra 3, that is, four performance cores plus one power-saving core. The latest generation Tegra 4 also adopts a 4+1 multi-core architecture, but the processor core has been upgraded from the previous generation Cortex-A9 to Cortex-A15; as for Tegra 4i, it still uses the Cortex-A9 (r4) CPU.

　　Although more CPUs mean higher processing performance, due to the sequential processing characteristics of the CPU, more cores mean more difficult application writing. In contrast, since GPUs have parallel processing characteristics and can expand performance in a nearly linear manner, the benefits of increasing the number of GPUs are much more significant than those of CPUs.

　　Under such circumstances, the heterogeneous multi-core architecture integrating CPU and GPU has become inevitable. When the GPU cores are more, developers have more space and flexibility to create cooler graphics effects, more delicate details and more vivid scene creation, which greatly improves the mobile visual and gaming experience.

　　The GPU subsystem of Tegra 4 is a good example. It has increased the number of GeForce GPU cores from the previous generation of 12 to 72. The six times the number of cores also brings six times the graphics performance of Tegra 3. Please refer to Table 1 for the difference in GPU performance between Tegra 4 and Tegra 3. In terms of system configuration, its architecture has the so-called vertex shader and pixel shader; the former allows engineers to customize the conversion process of vertices in the scene, and the latter is used to control the shading calculation of each pixel on the screen.

　　Going further, Tegra 4 splits the 72 GeForce cores into 24 Vertex Shaders and 48 Pixel Shaders. Every four Vertex Shaders form a Vertex Processing Engine (VPE), so there are six VPEs, each with 16KB, 96-entry cache memory, which can effectively reduce the need to access data from external chips. At the same clock speed, the new GeForce core can bring 1.5 times the performance of Tegra 3, and the number of Vertex Shaders between the previous and next generations differs by six times, which multiplies to a difference of nine times. In addition, Tegar 4 has a total of four pixel pipelines (Pixel Fragment Shader Pipeline), each of which can be subdivided into three arithmetic logic units (ALU), and each ALU is composed of four GeForce cores (i.e. Pixel Shader). In actual operation, the ALU is used as the lowest level unit and is called a Multi-Function Unit (MFU). Therefore, Tegar 4 has a total of twelve MFUs. The MFUs can execute functions, trigonometric functions, logarithms, reciprocals, square roots, and MOV instructions (copy in assembly language) (Figures 1 and 2).

　　Figure 1. Flowchart of the logical graphics processing pipeline of Tegra4

　　Figure 2 Tegra 4 GPU architecture block diagram 　　Reducing the power consumption of multi-core SoCs Architectural design plays an important role

　　For mobile devices, battery life and performance/functionality are equally important. Even if they are quad-core mobile chips, they often have different performance and power consumption performance due to different individual architectures. For example, Tegra 4 uses the most advanced CPU core from ARM International (ARM). Through the variable symmetric multi-processing (vSMP) architecture, it can be deployed according to usage needs, allowing the four performance cores to maximize their processing power, and can automatically enable and disable each core according to the workload to significantly save power.

　　In order to improve battery life, Tegra 4 continues the power saving concept of Tegra 3 and adds a fifth processor core to the chip, but the name is changed from Companion Core to Battery Saver Core. When the device is in the background processing emails, social software synchronization, or playing videos, music and other low-performance demand scenarios, the system will shut down the performance core and use the battery saving core to execute programs.

　　From the perspective of chip design, multi-core processors will inevitably face major bottlenecks in memory bandwidth and overall system power. In order to address this issue, Tegra 4 proposes a dual-channel (2x32-bit) memory subsystem. In addition, in order to reduce the need to access external memory, Tegra 4's GPU architecture plans dedicated cache memories for vertices, pixels, and textures, allowing computing tasks to be completed within the chip as much as possible to improve processing efficiency and reduce power consumption.

　　Another important strategy to reduce the power consumption of the system-on-chip (SoC) is to adopt advanced power management technology. For Tegra 4, multiple power management technologies such as Multiple Levels of Clock Gating, Display Request Grouping, and Dynamic Voltage and Frequency Scaling (DVFS) are used to minimize power requirements for different usage scenarios.

　　Computational photography architecture helps upgrade imaging performance of mobile devices

　　Looking at the development of GPU architecture from the application side, today's users rely heavily on mobile devices for photography and video recording, and hope to achieve professional-level results. However, compared to cameras, mobile phones or tablet devices are inherently difficult to configure with large lenses. In this case, if you want to get high-quality images, you have to rely on more advanced image processing technology, or even use computer algorithms to create images.

　　In order to enhance the consumer mobile imaging experience and truly capture the "fleeting" moment, Tegra 4 has a built-in Chimera computational photography architecture, which combines the processing power of the CPU, GPU and image signal processor (ISP), allowing device manufacturers to greatly enhance mobile imaging. Under this architecture, mobile devices can instantly capture high-quality Always-On high dynamic range photos and videos, high dynamic range panoramic photography and continuous touch tracking (Tap-to-Track) and other functions.

　　Take high dynamic range panorama photography, where wide-angle or "fisheye" lenses are used to create effects that are usually only available on expensive DSLR cameras. The Chimera architecture allows the camera to capture the scene as it moves, without scanning in a specific direction, and can move in any direction, such as left, right, up, down, or diagonally, allowing users to use more angles and any order of images to "paint" a panoramic photo in real time. Continuous Tap-to-Track technology allows users to automatically expose and lock on a person or object in the scene when taking a photo. The camera will then automatically track the previously locked subject regardless of the focus subject moving position or the camera adjusting to another better shooting angle. The continuous Tap-to-track function also adjusts the exposure as the camera moves, avoiding underexposure or overexposure of the image subject or background.

　　Going a step further, the reason why the Chimera architecture can do what humans cannot do is that its image processing speed is as high as about 100 billion mathematical operations per second. At the same time, it has introduced many advanced algorithms, including computer computing technology used by X-ray computer tomography (CT) scanners, deep space telescopes and spy satellites, thereby eliminating previously unsolvable problems and making the presentation of action images just like the world seen by the human eye, with many different scenes, locations and scenes, and rich in various light changes.

　　Heterogeneous multi-core SoC expands application scope

　　The advantages of heterogeneous multi-core SoC architecture are obvious. Coupled with the increasingly mature technology evolution driven by the mobile market, more embedded applications are beginning to choose to introduce such mobile processors. One of the fastest growing markets is the automotive electronics field, especially in-vehicle infotainment (IVI) systems, digital instrument panels, driving support and other applications, all of which rely on more powerful GPU/CPU for support.

　　For IVI systems, realistic 3D maps and terrain, stylish and smooth user interfaces, and feature-rich audio systems are required. Through the Tegra mobile processor, which has been proven to be feasible in mobile applications, automakers can integrate these functions into vehicles more quickly. In terms of visual processing, NVIDIA has developed a visual computing module (VCM) based on the Tegra mobile processor specifically for automakers.

　　Compared to other in-vehicle electronic systems with longer update cycles, car users expect their IVI systems to have similar experiences with mobile applications. Through this VCM modular design, car manufacturers can independently develop and integrate the rapidly developing mobile processor technology, and then quickly build IVI in-vehicle systems in different car models, which will also help to significantly save development time and costs.

　　For example, Audi, a well-known automobile brand, has introduced VCM and launched the Audi MIB system with networking function, which allows the Audi Connect platform to fully update the 360-degree panoramic images of Google Earth images and Google Maps Street View services at any time. It can also realize the transmission of other network data, such as real-time gasoline prices, weather forecasts and useful information for Google local searches.

　　Software development support/development tools become the criteria for selecting CPU and GPU

　　In addition to advanced hardware features, software development support and development tools are also key for developers to choose GPU/CPU. As mentioned earlier, the flexible architecture of the Tegra series allows developers to use customized algorithms to adjust the GPU architecture to obtain more outstanding visual effects. In addition, the Tegra developer tool (Tegra Android Developer Pack) supports functions such as CPU sampling analysis (Tegra Profiler) and GPU analysis (PerfHUD ES), and Nsight Tegra provides an Android local development environment, allowing developers to achieve their work goals more conveniently and quickly.

　　Driven by higher GPU graphics performance, it is foreseeable that mobile or in-vehicle devices will be able to enjoy a better visual experience. Another development worth noting is that in the future, the demand for 3D scenes, high-definition displays and fast-response games through browsers will emerge, and the driving force behind this comes from the development of technologies such as HTML5 and WebGL.

　　In fact, HTML5 can already support the use of GPUs, and more and more browsers are beginning to use GPUs to accelerate their visual performance. The era of providing powerful visual content on cross-device and cross-platform websites is coming, which will bring great business opportunities, but of course, the challenges are not small!

Keywords：SOC Reference address：A new direction for the development of mobile processors: integrating more GPUs will become the mainstream?

Previous article：Design of video intelligent analysis system based on DM8168
Next article：Milbeaut image signal processing LSI chip for mobile phones

Popular Resources
Popular amplifiers