Article count:25239 Read by:103424336

Account Entry

Apple launches M4 chip, is it mediocre?

Latest update time:2024-05-08
    Reads:

????If you want to see each other often, please mark it as a star???? and add it to your collection~

Source : Content comes from Semiconductor Industry Observation (ID: i cba n k) compiled from Apple, thank you.


Apple today announced M4, the latest chip that powers the new iPad Pro. Built on second-generation 3nm technology, M4 is a system-on-a-chip (SoC) that improves the industry-leading power efficiency of Apple silicon and enables the incredibly thin and light design of iPad Pro. It also features an all-new display engine that drives the breakthrough Ultra Retina XDR display on iPad Pro for stunning precision, color, and brightness.


The new chip's CPU has up to 10 cores, while the new 10-core GPU builds on the next-generation GPU architecture introduced with the M3 and brings dynamic caching, hardware-accelerated ray tracing, and hardware-accelerated mesh shading to iPad for the first time. time. M4 has Apple’s fastest neural engine ever, capable of performing up to 38 trillion operations per second, which is faster than the neural processing unit of any AI PC today. Combined with faster memory bandwidth, next-generation machine learning (ML) accelerators in the CPU, and high-performance GPU, M4 makes the new iPad Pro an extremely powerful AI device.


“The new iPad Pro with M4 is a great example of how best-in-class custom silicon can be built to enable breakthrough products,” said Johny Srouji, Apple’s senior vice president of hardware technologies. “The energy-efficient performance of M4 and its new display engine , makes iPad Pro's thin and light design and game-changing display possible, while fundamental improvements in the CPU, GPU, Neural Engine, and memory system make M4 ideal for understanding the latest applications that leverage artificial intelligence. All in all, this new chip makes iPad. Pro becomes the most powerful device in its class.”



TSMC’s second generation 3nm process


M4 is composed of 28 billion transistors and is built using second-generation 3nm technology, further improving the power efficiency of Apple chips. M4 also features an all-new display engine engineered with groundbreaking technology to achieve the stunning precision, color accuracy and brightness uniformity of Ultra Retina XDR displays, a state-of-the-art display created by combining light from two OLED panels display.


From this description we can be highly certain. Apple's description of "second-generation 3nm process" is completely consistent with TSMC's second-generation 3nm process N3E. An enhanced version of their 3nm process node is a bit inferior to the N3B process used by the M3 series of chips; N3E is not as dense as N3B, but according to TSMC, it offers slightly better performance and power consumption characteristics. The difference is so close that architecture plays a bigger role, but in the race for energy efficiency, Apple will take any advantage they can get.



Over the years, Apple has established itself as a launch partner for TSMC's new process nodes, and Apple appears to be the first company to launch N3E process chips. However, they won't be the last, as nearly all of TSMC's high-performance customers are expected to adopt N3E next year. So, as usual, Apple's immediate advantage in chip manufacturing is only temporary.


Apple's early leadership may also explain why we're now seeing the M4 on the iPad (one of Apple's relatively low-selling devices) rather than the MacBook line. At some point, TSMC's N3E capacity will catch up, and then some. I won't hazard a guess at Apple's plans for the series at the time, as I really can't see Apple stopping production of the M3 chips so soon, but it also puts them in the awkward position of having to keep the M4 around.


The chip size of the new chip has not yet been announced (or photos of the chip released), but the total number of transistors is 28 billion, which is only slightly more than the number of transistors in the M3, indicating that Apple is not investing too much in new hardware.


M4 CPU architecture: four performance cores, six efficiency cores


Starting from the CPU side, we are faced with the mystery of Apple's M4 CPU core design. Apple is tight-lipped and the lack of performance comparisons with the M3 means we don't get much information on how the CPU designs compare. So it remains to be seen whether the M4 represents a watershed moment in Apple's CPU design - the new Monsoon/A11 - or a minor update similar to the Everest CPU cores in the A17. Of course, we hope for the latter, but without more details we'll work with what we know.


Apple's brief keynote about the SoC noted that both the performance and efficiency cores implement improved branch prediction, and for the performance core, a broader decoding and execution engine. However, these are the same broad announcements Apple made for the M3, so this doesn't in itself represent a new CPU architecture.


According to Apple, the M4 has a new up to 10-core CPU, which includes up to four performance cores and now six efficiency cores. The next-generation cores feature improved branch prediction, a broader decoding and execution engine for performance cores, and a deeper execution engine for efficiency cores. Both types of cores also feature enhanced next-generation machine learning accelerators.



Compared with the powerful M2 in the previous generation iPad Pro, the CPU performance of the M4 is improved by 1.5 times. 1 Whether you're processing complex orchestral files in Logic Pro or adding demanding effects to 4K video in LumaFusion, M4 improves the performance of your entire professional workflow.


However, what Apple claims is unique about the M4 CPU is that both CPU core types are "next-generation machine learning accelerators." This ties in closely with Apple's broader focus on ML/AI performance in the M4, although the company didn't detail exactly what these accelerators are used for. Since the NPU does all the heavy lifting, the purpose of AI enhancements on the CPU cores is less about overall throughput/performance and more about handling light inference workloads mixed in more general workloads without spending time and resources on Processing dedicated NPU.


An educated guess is that Apple has updated their poorly documented AMX matrix units, which have been part of the M-series SoCs since the beginning. However, recent AMX versions already support common ML number formats like FP16, BF16, and INT8, so if Apple were to make a change here, it wouldn't be a straightforward matter of adding (more) common formats. Meanwhile, if it's AMX, it would be a bit surprising to see Apple mention it since they are very secretive about these devices.


Another reasonable option is that Apple made some changes to the SIMD units within its CPUs to add common ML number formats, since developers would have more direct access to these units. But at the same time, Apple has been pushing developers to use higher-level frameworks from the start (that's how AMX is accessed), so this could actually happen either way.


无论如何,无论支撑 M4 的 CPU 核心是什么,有一点是确定的:它们的数量更多。完整的 M4 配置包括 4 个性能核心和 6 个效率核心,比 M3 多 2 个效率核心。精简版 iPad 型号获得 3P+6E 配置,而更高级别的配置则获得完整的 4P+6E 体验,因此对性能的影响可能是显而易见的。


在其他条件相同的情况下,与 M3 的 4P+4E 配置相比,添加两个效率核心不会大幅提高 CPU 性能。但苹果的效率核心也不应该被低估,因为即使苹果的效率核心由于使用了乱序执行而也相对强大。特别是当固定工作负载可以保留在效率核心上而不是提升到性能核心上时,能源效率提升的空间很大。


除此之外,Apple 尚未发布新 SoC/CPU 内核的任何详细性能图表,因此几乎没有什么硬数据可讨论。但该公司声称 M4 的 CPU 性能比 M2 快 50%。这大概是针对可以利用 M4 的 CPU 核心数量优势的多线程工作负载。另外,苹果在主题演讲中还声称他们可以以一半的功耗提供 M2 性能,结合工艺节点改进、架构改进和 CPU 核心数量增加,这似乎是一个合理的主张。


然而,与往常一样,我们必须看看独立基准的结果如何。


M4 GPU 架构:光线追踪和动态缓存


M4 的全新 10 核 GPU 建立在 M3 系列芯片的下一代图形架构之上。它具有动态缓存功能,这是 Apple 的一项创新,可以在硬件中实时动态分配本地内存,从而显着提高 GPU 的平均利用率。这显着提高了最苛刻的专业应用程序和游戏的性能。


与 M4 上的 CPU 情况相比,GPU 情况要简单得多。最近刚刚在 M3 中引入了新的 GPU 架构(Apple 不像 CPU 那样频繁迭代这种核心类型),Apple 几乎已经确认 M4 中的 GPU 与 M3 中的架构相同。



拥有 10 个 GPU 核心,高级配置与 M3 上的配置相同。这是否意味着各种块和缓存与 M3 真正相同还有待观察,但苹果并没有对 M4 的 GPU 性能做出任何声明,无论以何种方式都可以解释为它优于 M3 的 GPU。事实上,iPad 较小的外形尺寸和更有限的冷却能力意味着 GPU 在任何持续的工作负载下都会受到热量限制,特别是与 M3 在 14-14 等主动冷却设备中的表现相比。英寸 MacBook Pro。


无论如何,这意味着 M4 配备了 M3 GPU 引入的所有主要新架构功能:光线追踪、网格着色和动态缓存。苹果也强调,硬件加速光线追踪也首次登陆 iPad,在游戏和其他图形丰富的体验中实现更真实的阴影和反射。硬件加速网格着色也内置于 GPU 中,可提供更强大的几何处理能力和效率,从而在游戏和图形密集型应用程序中实现视觉上更加复杂的场景。M4 使 Octane 等应用程序中的专业渲染性能得到了巨大提升,现在比 M2 快四倍。


这里,我们不强调光纤追踪,但网格着色是一种重要的下一代几何处理方法。与此同时,动态缓存是 Apple 对其在 M 系列芯片上改进的内存分配技术的术语,该技术可以避免从 Apple 统一的内存池中向 GPU 过度分配内存。


通过对 CPU 和 GPU 的这些改进,M4 保持了 Apple 芯片业界领先的每瓦性能。M4 只需一半的功耗即可提供与 M2 相同的性能。与轻薄笔记本电脑中最新的 PC 芯片相比,M4 只需四分之一的功耗即可提供相同的性能。


除了 GPU 渲染之外,M4 还获得了 M3 更新的媒体引擎块,该块来自 M2,对于 iPad 使用来说是一个相对重要的事情。最值得注意的是,M3/M4 的媒体引擎模块增加了对 AV1 视频解码(下一代开放视频编解码器)的支持。虽然 Apple 非常乐意为 HEVC/H.265 支付版税以确保其在其生态系统中可用,但免版税的 AV1 编解码器预计将在未来几年中发挥重要作用和使用,而 iPad Pro可以更好地使用最新的编解码器(或者至少不必在软件中低效地解码 AV1)。


然而,M4 在显示方面的创新之处在于新的显示引擎。该模块负责合成图像并驱动设备上连接的显示器,Apple 从未给予该模块特别多的关注,但当他们对其进行更新时,它通常会立即带来一些功能改进。



这里的关键变化似乎是启用苹果新的夹层“串联”OLED 面板配置,该配置在 iPad Pro 中首次亮相。iPad 的 Ultra Retina XDR 显示屏将两块 OLED 面板直接叠置在一起,以便显示屏能够累计达到苹果 1600 尼特的亮度目标,而单块 OLED 面板显然无法做到这一点。这反过来又需要一个知道如何操纵面板的显示控制器,不仅要驱动一组镜像显示器,还要考虑由于一个面板位于另一个面板之下而导致的性能损失。


虽然与 iPad Pro 没有直接关系,但看看苹果是否利用这个机会增加 M4 可以驱动的显示器总数将会很有趣,因为普通的 M 系列 SoC 通常仅限于 2 个显示器,这对于MacBook 用户的惊愕。事实上,M4 可以驱动串联 OLED 面板和外部 6K 显示器,这一点是有希望的,但当 M4 登陆 Mac 时,我们将看到这如何转化为 Mac 生态系统。


M4 NPU 架构:新的东西,更快的东西


可以说,苹果 M4 SoC 的最大焦点是该公司的 NPU,也称为神经引擎。自 M1 以来,该公司一直在推出 16 核设计(在此之前的 A 系列芯片上也采用了较小的设计),每一代都提供了适度的性能提升。但苹果表示,随着 M4 一代的出现,他们的性能有了更大的飞跃。



M4 NPU 仍采用 16 核设计,额定速度为 38 TOPS,仅是 M3 中 18 TOPS 神经引擎的两倍多。巧合的是,仅比 A17 中的神经引擎高几个 TOPS。因此,作为基准声明,苹果宣称 M4 NPU 比 M3 中的 NPU 强大得多,更不用说为以前的 iPad 提供动力的 M2,甚至更早,比 A11 的 NPU 快 60 倍。


不幸的是,问题(再一次)出现在细节中,因为 Apple 没有列出所有重要的精度信息 - 无论该数字是基于 INT16、INT8 还是 INT4 精度。作为目前 ML 推理的法律精度,INT8 是最有可能的选择,特别是因为这是苹果去年 A17 的报价。但自由地混合精度,甚至只是不披露它们,至少可以说是令人头痛的。这使得同类规格的比较变得困难。


Regardless, the M4 NPU is expected to bring significant performance improvements to AI performance, similar to what was already happening with the A17, even if most of the performance improvements come from INT8 support rather than INT16/FP16 support. Since Apple was one of the first chip vendors to launch a consumer-grade SoC with what we now call an NPU, the company isn't afraid to make a big deal about the issue, especially in comparison to what's happening in the market . Computer field. Especially since Apple offers a complete hardware/software ecosystem, the company has the advantage of being able to use its own NPU to shape their software, rather than waiting for a killer app to be invented for it.


According to Apple's description, the M4 has an extremely fast neural engine, which is an IP module in the chip specifically designed to accelerate AI workloads. It's Apple's most powerful Neural Engine ever, capable of performing a staggering 38 trillion operations per second, an astonishing 60 times faster than the first Neural Engine in the A11 Bionic. The Neural Engine, along with next-generation machine learning accelerators in the CPU, high-performance GPUs, and higher-bandwidth unified memory, make the M4 an extremely powerful AI chip. With AI features in iPadOS, such as Live Captions for real-time audio captions and Visual Look Up for identifying objects in videos and photos, the new iPad Pro allows users to quickly complete amazing AI tasks on the device.


iPad Pro with M4 makes it easy to separate a subject from its background in 4K video in Final Cut Pro with just one tap, and automatically create a musical score in StaffPad in real time just by listening to someone play the piano. Inference workloads can be completed efficiently and privately with minimal impact on application memory, application responsiveness, and battery life. The Neural Engine in M4 is Apple's most powerful neural engine yet, more powerful than any neural processing unit in any AI PC today.


M4 memory: using faster LPDDR5X


Last but not least, the memory capabilities of the M4 SoC have also been significantly improved. Given the memory bandwidth figures Apple quotes for the M4 (120GB/sec), all signs point to them finally adopting LPDDR5X in their new SoCs.


LPDDR5X is a mid-cycle update to the LPDDR5 standard and offers higher memory clock speeds than LPDDR5, up to 6400 MT/sec. While LPDDR5X is currently capable of speeds up to 8533 MT/sec (and faster speeds are coming), based on the Apple M4's 120GB/sec, this puts the memory clock speed around LPDDR5X-7700.


Since the M4 will be available on the iPad first, we don't yet know its maximum memory capacity. The M3 can accommodate up to 24GB of memory, and while Apple is unlikely to step back on this front, there's no indication whether they'll be able to increase the memory to 32GB. Meanwhile, the iPad Pros will all come with 8GB or 16GB of RAM, depending on the specific model.


Original link

https://www.anandtech.com/show/21387/apple-announces-m4-soc-latest-and-greatest-starts-on-ipad-pro

Click here to follow and lock in more original content

END


*Disclaimer: This article is original by the author. The content of the article is the personal opinion of the author. The reprinting by Semiconductor Industry Watch is only to convey a different point of view. It does not mean that Semiconductor Industry Watch agrees or supports the view. If you have any objections, please contact Semiconductor Industry Watch.



Today is the 3759th issue shared by "Semiconductor Industry Observation" with you. Welcome to pay attention.


Recommended reading


"Semiconductor's First Vertical Media"

Real-time professional original depth

Public account ID: icbank


If you like our content, click "Watching" to share it with your friends.

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号