Will chips undergo major changes?

Latest update time：2024-06-14

Reads：

????If you hope to meet more often, please mark the star ?????? and add it to your collection~

Source: The content is compiled from semiengineering by Semiconductor Industry Observer (ID: i c bank), thank you.

The chip industry is moving toward domain-specific computing, while artificial intelligence (AI) is moving in the opposite direction, and this gap may force major changes in future chip and system architectures.

Behind this split is the time it takes to design hardware and software. In the 18 months since ChatGPT was launched globally, a large number of software startups have explored new architectures and technologies. This trend is likely to continue, given the rapid change in the tasks mapped to them. But it usually takes more than 18 months to produce a custom chip.

In a standard world, where software doesn’t change much over time, it’s worthwhile to customize hardware to meet the exact needs of an application or workload, and that’s it. This is one of the main drivers behind RISC-V, where the processor ISA can be designed specifically for a given task. However, with the many variations of AI, hardware may be outdated by the time it goes into mass production. Therefore, unless the specification is constantly updated, hardware optimized specifically for an application is unlikely to reach the market quickly enough to be used.

As a result, there is an increased risk that a domain-specific AI chip will fail on its first try. While this is being fixed, generative AI will continue to advance.

But that doesn’t spell the end of custom silicon. Data centers are deploying a growing number of processing architectures, each of which excels at specific tasks better than a single general-purpose CPU. “As data center AI workloads proliferate, even the last bastion of vanilla computing power is crumbling as data center chips and systems are forced to adapt to the rapidly evolving landscape,” said Steve Roddy, chief marketing officer at Quadric.

But it does point to an architecture that balances ultra-fast, low-power silicon with more general-purpose chips or chiplets.

“In AI, there’s a huge push to make things as general and programmable as possible because no one knows when the next LLM thing is going to come along and completely change the way they do things,” said Elad Alon, CEO of Blue Cheetah. “The more you get stuck in a rut, the more likely you are to miss the boat. At the same time, it’s become clear that it’s almost impossible to meet the computational power, and therefore the power and energy requirements, required with a completely general system. There’s a huge push to customize hardware to be more efficient at specific things that are known today.”

The challenge is mapping software efficiently onto this heterogeneous array of processors, something the industry hasn’t quite gotten to grips with yet. The more processor architectures that coexist, the harder the mapping problem becomes. “In a modern chip, you have a GPU, you have a neural processing unit, and you have core processing,” Frank Schirrmeister, vice president of solutions and business development at Arteris (he’s currently executive director of strategic programs and system solutions at Synopsys), said in an interview. “You have at least three compute options, and you have to decide where to put things and set up the appropriate abstraction layers. We used to call it software co-design. When you port an algorithm or part of an algorithm to an NPU or a GPU, you rejigger the software to move more of the software execution to the more efficient implementation. There’s still a common component in the compute that supports the different elements.”

Chasing the leaders

The advent of AI is due to the processing power of GPUs, and the functions required for graphics processing are very close to those required for the core parts of AI. In addition, software tool chains have been created to enable non-graphic functions to be mapped onto the architecture, making NVIDIA GPUs the easiest processors to target.

“When someone becomes a market leader, they may be the only player in the market, and everyone tries to react to it,” said Chris Mueth, business manager for new opportunities at Keysight. “But that doesn’t mean it’s the optimal architecture. We may not know that yet. GPUs are good for certain applications, like doing repetitive math operations, and they’re hard to beat. If you optimize your software to work with a GPU, it’s very fast.”

Being a general-purpose accelerator leader can bring resistance. “If you’re building a general-purpose accelerator, you need to think about future-proofing,” says Russell Klein, director of advanced synthesis programs at Siemens EDA. “When NVIDIA sat down to build the TPU, they had to make sure that the TPU would address the broadest possible market, which meant that anyone who came up with a new neural network needed to be able to put it in this accelerator and run it. If you’re building something for a certain application, you don’t have to think about future-proofing very much. I might want to build in a little flexibility so that I have the ability to solve problems. But if I’m just fixed to one specific implementation that does one job really well, then in 18 months someone’s going to come up with a completely new algorithm. The good news is that I’ll be ahead of everyone else, using my custom implementation, until they can catch up with their own custom implementation. There’s only so much we can do with off-the-shelf hardware.”

But specificity can also be built in layers. “Part of the IP delivery is a hardware abstraction layer that’s exposed to software in a standardized way,” Schirrmeister said. “A graphics core is useless without the middleware. Application specificity moves up in the abstraction. If you look at CUDA, the compute capabilities of the NVIDIA core itself are fairly general. CUDA is the abstraction layer, and then on top of it there are libraries for all kinds of things in biology. That’s great because the application specificity goes up to a higher level.”

These abstraction layers have been very important in the past. “Arm consolidated the software ecosystem on top of the application processor,” said Sharad Chole, chief scientist and co-founder of Expedera. “After that, heterogeneous computing allowed everyone to build their own add-ons on top of that software stack. For example, Qualcomm’s stack is completely independent of Apple’s stack. If you extend it, there’s an interface that you can use to get better performance or better power profile. And then there’s room for coprocessors. These coprocessors will allow you to differentiate more than just building with heterogeneous computing because you can add it or remove it, or you can build a newer coprocessor without having to start a new application process, which is much more expensive.”

Economics are a big factor. “The proliferation of fully programmable devices that accept C++ or other high-level languages, as well as function-specific GPUs, GPNPUs, and DSPs, has reduced the need for specialized, fixed-function, and financially risky hardware acceleration blocks in new designs,” said Quadric’s Roddy.

It’s as much a technical question as a business one. “Someone might say, I’m going to do this very specific target application, in which case I know I’m going to do the following things in the AI or other stack, and then you just make them work,” said Blue Cheetah’s Alon. “If that market is big enough, then it can be an interesting option for a company. But for an AI accelerator or AI chip startup, it’s a much trickier bet. If there’s not enough of a market to justify the entire investment, then you have to anticipate the capabilities that are needed for a market that doesn’t exist yet. It’s really a mix of what business model and bet you’re taking, and therefore what technical strategy you can take to optimize it as much as possible.”

The case for dedicated hardware

Hardware implementations require choices. “If we could standardize neural networks and say this is all we’re going to do, you still have to consider the number of parameters, the number of operations that are necessary, and the latency that’s required,” said Expedera’s Chole. “That’s never been the case, especially for AI. In the beginning, we started with 224 x 224 postage stamp images, then moved to HD, and now we’re moving to 4k. It’s the same with LLMs. We started with 300-megabit models, like BERT, and now we’re moving toward billions, billions, and trillions of parameters. Initially we started with just language translation models, like token prediction models. Now we have multimodal models that support language, vision, and audio at the same time. The workloads are constantly evolving, and that’s the game of catch-up that’s happening.”

There are many aspects of existing architectures that are questionable. “A key part of designing a good system is to find significant bottlenecks in system performance and find ways to accelerate them,” said Dave Fick, CEO and co-founder of Mythic. “AI is an exciting and far-reaching technology. However, it requires performance levels of trillions of operations per second and memory bandwidths that standard cache and DRAM architectures simply cannot support. This combination of practicality and challenge makes AI a prime candidate for a dedicated hardware unit.”

There just aren’t enough general-purpose devices to meet demand, which may be a factor that forces the industry to start adopting more efficient hardware solutions. “The field of generative AI is moving very fast,” said Chole. “There’s nothing that can meet the hardware requirements in terms of cost and power. Nothing. Even GPUs are not shipping in sufficient quantities. There are orders, but not enough shipments. That’s the problem that everyone sees. There’s not enough compute power to really support the workloads of generative AI.”

Chiplets may help alleviate this problem. “The coming tsunami of chiplets will accelerate this shift in the data center,” Roddy said. “The ability to mix and match fully programmable CPUs, GPUs, GPNPUs (general purpose programmable NPUs), and other processing engines to accomplish specific tasks will impact the data center first as chiplet packages replace monolithic integrated circuits, then slowly radiate to higher-volume, more cost-sensitive markets as chiplet packaging costs inevitably decrease with increased production volumes.”

Multiple markets, multiple trade-offs

While most of the attention has been on the massive data centers that train new models, the ultimate gains will accrue to the devices that use those models for inference. Those devices can’t afford the huge power budgets used for training. “The hardware for training AI is somewhat standardized,” said Marc Swinnen, director of product marketing at Ansys. “You buy an NVIDIA chip, and that’s how you train AI. But once you’ve built the model, how do you execute that model in the end application, perhaps at the edge. That’s typically a chip tailored to a specific implementation of that AI algorithm. The only way to get a high-speed, low-power AI model is to build a custom chip for it. AI is going to be a huge driver for custom hardware that executes these models.”

They have a host of similar decisions to make. “Not every AI accelerator is created equal,” Mythic’s Fick says. “There are a lot of great ideas about how to address the memory and performance challenges that AI presents. In particular, there are new data types that go all the way down to 4-bit floating point or even 1-bit precision. Analog compute can be used to get extremely high memory bandwidth, which improves performance and energy efficiency. Others are looking at stripping neural networks down to the most critical bits to save memory and compute. All of these techniques will produce hardware that is strong in some areas and weak in others. This means greater hardware and software co-optimization, and the need to build an ecosystem with a variety of AI processing options.”

This is where the interests of AI and RISC-V converge. “For software tasks like LLM, they will be dominant enough to drive new hardware architectures, but will not stop differentiation completely, at least not in the short term,” said Dieter Therssen, CEO of Sigasi. “Even customization of RISC-V is based on the need to do some CNN or LLM processing. A key factor here is how AI is deployed. Currently, there are too many ways to do it, so imaging fusion remains out of reach.”

in conclusion

AI is new and evolving so fast that no one has a definitive answer. What is the best architecture for existing applications? Will future applications look similar enough that existing architectures will simply scale? This seems like a very naive prediction, but it may be the best option for many companies today.

The GPU and the software abstractions built on top of it have made the rapid rise of AI possible. It provides an adequate framework for the expansion we have seen, but that does not mean it is the most efficient platform. Model development has been forced to some extent in the direction of existing hardware support, but as more architectures become available, AI and model development may diverge based on available hardware resources and their demand for power. Power is likely to become the dominant factor between the two, as current predictions are that AI will soon consume a large portion of the world's power generation capacity. This situation cannot continue.

Reference Links

https://semiengineering.com/when-to-expect-domain-specific-ai/

Click here???? to follow us and lock in more original content

END

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.

Today is the 3796th issue of content shared by "Semiconductor Industry Observer" for you, welcome to follow.