IC Design Experience of Microprocessor Master-EEWORLD

Collect

Even in the entire IC industry, it is rare to find someone who loves to talk about technology like Dr. Chris Rowen, and who can explain it in such a simple and profound way. He has experienced almost all the ups and downs of microprocessors, past and present. He is one of the founders of RISC architecture, and participated in the Great Leap Forward era of Intel and MIPS. Now, with the concept of a new SoC architecture, he is more or less changing our digital era.

1. I read your book "Complex SoC Design" and I really liked Chapter 8 (Chris Rowen: "The Future of SoC Design" haha). Yes, you still remember it. Because all your predictions have come true, it's very interesting. So my question is, how do you see SoC design in 6 years, or the technical trends related to SoC design?

Chris Rowen: Wow, that's a good question! I think the general direction of this market is pretty clear. If you look at the basic trends at the market level and then look at the basic trends at the technology level, you can see where they overlap. In terms of technology, you have the fact that Moore's Law is a driving force in economics. But the really interesting part of Moore's Law is that, first, it's "density continues to increase." Every two and a half or three years, the density of silicon is doubled, which means that the cost is nearly doubled, allowing the digital modules in RF products to be more and more integrated. This also means that various systems are getting smaller and smaller in scale. So whether it's a computer or a consumer electronic device, the purpose of every system is to be integrated on the same chip. This becomes interesting. Because in the past, you could make generic memory, generic processors, generic RF, and so on, and then put them all together to build a very powerful specific system. But today, things are turned upside down. You would rather combine a bunch of different functions on a chip. Of course, in this sense, you still have to make a specific chip. But the challenge has become greater, because the chip itself needs to be more focused on a specific application, and other things such as application processors, internal buses, etc. must also become smaller, stronger and faster!

Another paradox is that Moore's Law has not brought much power improvement at the transistor level. In the past, when things became smaller, power naturally decreased, so engineers didn't need to worry about chip architecture at all. Now, if engineers want to optimize power consumption, they must first optimize the architecture. He has to consider how can I complete this calculation more efficiently? For example, using fewer transistor gates or computing cycles, or even shutting down the corresponding subsystem when the task is not running. In short, this job has become intelligent.

So for example, if you want to make a mobile phone, you have to pay attention to different usage scenarios, such as listening to music, watching YouTube videos, sending text messages, surfing the Internet, or talking on the phone, these are completely different scenarios. You have to turn off all the subsystems that are not used. Be more careful and more proactive. So for a chip architect or a system designer, this is the best of times. Because there are so many things to do. But for a transistor guy, this is really the worst of times! Everything has risen to the system or application level. This is the big thing happening in the technology field.

2. What will happen to the market in the next few years?

Chris Rowen: Speaking of the market. I think the biggest trend is that everything has become mobile because people's lifestyles have completely changed. When you can carry so many devices with you, you will want to be constantly connected to the Internet. This impact is not only reflected in the devices, but also in the wireless infrastructure and cloud computing. And the economic opportunities will become very, very profound. Because you will see, for example, at the level of this device (Chris picked up the iPhone in his hand and started to demonstrate), the bandwidth of wireless connections will increase by at least 30 times. In order to get a rich enough entertainment experience, we may need tens or even hundreds of megabits of bandwidth. In every region of the world, there are more and more high-end users. China is a vivid example. Not only that. In India, South America, Africa, and the Caribbean, everyone wants to be constantly connected to the Internet.

So you have to set people's expectations very well. Now there is a 10-fold increase in broadband population, and everyone has 30-fold bandwidth demand, so there is a 300-fold bandwidth requirement. And every layer of the system needs to meet this demand. For wireless infrastructure manufacturers, their opportunities are huge. For example, Huawei. But manufacturers are not likely to win 300 times the revenue. It is possible to get more revenue, but not more than 300 times. Therefore, they must significantly reduce capital costs and operating costs while getting a significant increase in bandwidth.

3. So what will happen next in SoC design?

Chris Rowen: You can look at wireless base stations as an example. Traditionally, they are expensive. You can find some general-purpose chips, general-purpose DSPs, and general-purpose FPGAs. But today, in order to meet the bandwidth requirements, you need more highly customized SoCs, and the requirements for chip platforms and software are also rising rapidly. So this will lead to higher integration, more DSPs on each chip, and more software programs embedded on each DSP, and even an explosion of software content.

Interestingly, the power consumption of each part of the network infrastructure is huge. Therefore, even for green energy saving considerations, it is extremely important to reduce it to a more tightly integrated system. The base station will be significantly smaller, which means that the entire base station can become a small box on the top of the tower, rather than... It is much simpler to install it on the top of the tower.

Of course, at the system level, once you reduce costs, it is natural to reduce power consumption. So there is a very positive relationship between the two. The key is the integration of silicon wafers. This is also the reason why Tensilica has grown so quickly to become one of the world's leading DSP core suppliers.

You can even see this change reflected in cloud computing, because now you need 300 times the bandwidth, which puts higher demands on video services, video compression, Internet database search, social networking, and so on. And all of these things are really complex applications.

But the interesting thing is that they are all parallel applications. This is good news. Because one thing that has happened in the computer industry is that the speed of a single microprocessor has been difficult to increase. Intel in 1990 dramatically discovered that the performance of a single processor was improving exponentially. But they also soon found that when the processor frequency reached about 3.5 to 4GHz, the power density reached a bottleneck. So they started to try multi-core technology.

Fortunately, most of what people want to do can be processed in parallel. So, when you do Internet Database Search, you can indeed set up multiple cores, multiple chips, and even multiple systems. Because your query requests will usually be sent to multiple locations. Therefore, in the field of Internet cloud computing, the opportunities for using multiple cores are extremely broad.

And there is a real question of how do you get the power consumption low enough within the available MIPS instructions? Or how do you get the correlation between designing mobile devices with the longest battery life and the most scalable servers? Because it's all about power consumption, not peak performance.

4. So how does Tensilica overcome the power consumption challenge? How does it differ from its competitors?

Chris Rowen: Let me give you an example. Tensilica believes in optimizing processors for specific tasks. Optimize the pipeline, optimize the interface, optimize the design level, and then put multiple cores together to build a multi-core system. This ability to optimize will have a huge impact. I will talk about this specialized processor called Turbo decoder in the meeting this afternoon. Turbo is a special algorithm that can extract useful information from noisy noise. In a working cycle, this decoder can execute about 30,000 times, oh yes, 30,000 RISK instructions. Yes, a general-purpose compression processor can only execute one instruction, while this specialized processor can execute 30,000 times. Of course, this is an extreme example, but it just shows that when you know where your problem is, you can do a lot of incredible things. Parallel, and therefore incredibly efficient.

The same principles can be applied at all levels, to various other categories of dedicated DSPs, wireless receivers, general-purpose DSPs for baseband and audio, and to customers who want to perform video processing or other graphics compression, security operations, network protocol processing, and deeply embedded control (Deeply Embedded Control) widely used in RF.

Tensilica is particularly focused on capabilities that can be specifically optimized and that really facilitate the use of multiple cores. And because of that, we are differentiated from the traditional CPU old guys. Like Intel, ARM, MIPS, or whoever. They all face the same physical problem, that Moore's Law gave them more transistors, but it didn't give them better power control, right?

They rarely think about parallelism. On the contrary, we work very hard at the application level to find solutions. In the cloud computing, we can indeed split tasks into many subtasks, but when I play games here (Chris picked up his iPhone again to start demonstrating), I am really limited. You see, one finger can only play one thing. So at the application processor level, you really can't get any benefits. MIPS, ARM, and even Intel all face the problem of not being able to effectively multitask with current silicon technology. And that's what we are good at.

We see this market growing rapidly, with shipments growing by about 70% last year. Then we try to enter all DPP (Data-Plane Processor) areas, including DSP, audio and video, security, and deeply embedded control, which is actually far away from the scope of application processors. So, we often find ourselves on the same chip with MIPS, ARM or Intel. You know, we are actually factory workers (Chris suddenly laughed)! Because there are so many different processors and so many different tasks in Date-Plane, there will be many opportunities and many interfaces (Socket) for those small and efficient processors.

This complementarity with the application processor or interface can even allow the application processor to be completely shut down when performing real-time tasks such as signal processing. Or for multimedia applications, the application processor can certainly do it, but if we optimize the dedicated audio DSP, we will get 4 to 5 times the efficiency. Smaller size, but higher throughput per unit time. And there are so many audio and video processors for you to choose from. So at almost any time, system designers or SoC designers can decide which processor to offload by distinguishing application scenarios.

That's why I think we've been so successful in audio. When you're designing a cell phone, or a reading display, or a set-top box, or a digital TV, or a digital camera, you say, oh, here's a scenario where I need to do a lot of audio work. So, that offloading is naturally designed into the basic architecture.

Moreover, we can automatically generate software and hardware for application-level processors, especially very comprehensive software libraries based on audio and baseband. Therefore, whether it is an old hand or a novice, they can find all the software and hardware solutions they need in our store to help them enter the market as quickly as possible. Integrated audio, integrated baseband, or various other functions.

5. So what are the specific applications of Tensilica?

Chris Rowen: This afternoon, I will talk about mobile phones. This is a huge market, a market that can meet the bandwidth needs mentioned earlier. Especially since it is currently upgrading from 3G to 4G, everyone is focusing on LTE. Not only does LTE look like the final standard winner, but also because it is very similar to WiMax. We have been able to provide reference designs to help customers build their own customized multi-core LTE phones and seize the market. This is just an example of our entry into the field.

We are also working on a very similar digital TV demodulator. Because someone wants to design a universal digital TV receiver for both mobile applications and the living room. There is a big problem here, that is, there are many different standards and concepts in the video field around the world, and everyone really wants to have a video chip that can solve everything. We are going to design one. In fact, the same principle is to find some DSPs and special cores, optimize the most intensive tasks, and make full use of our most important capabilities - to generate processors with very low power efficiency and software tools that are as easy to program as the most stable general-purpose DSPs in the world. Customers also told us last night that the main reason why DSPs are so popular is that they are programmable. For example, those DSPs from TI. We are also working hard to make the compiler more powerful and make the program model simpler to make programmers less worried. We have also enhanced the visual effects on the pipeline design of the microprocessor. It is also very difficult to generate incorrect code under this architecture.

So we have a very efficient processor. But efficiency is a questionable word. Traditionally, efficiency means minimum gate count, minimum power consumption, blah blah. But efficiency is also time to market. How many engineers does it take to deploy this system? What is the cost per line of code? What is the revenue per engineer hour? These are important parameters to measure efficiency in addition to silicon efficiency. I think we have driven both sides very well. The kind of architecture just discussed is also particularly suitable for areas with high volume shipments. Mobile devices, living room devices, digital cameras, these are areas where we do very well. Three of the top four manufacturers and six of the top ten manufacturers in these areas are our customers.

We have strong knowledge in DPP, but the same impact is starting to happen in cloud computing. Of course, cloud computing is still slow to change, partly because it is not so sensitive to power consumption, but I think it will still have an impact overall.

Other fields Use this structure? #e# 6. Will you use this structure in many other fields such as digital television and wired communications?

Chris Rowen: Absolutely. It's important to have architectures that can optimize processors for different applications. And we also found that even at a new level, many of the requirements are similar. So the same Hi-Fi tools, the same audio DSP, can be deployed in the world's best smartphones, and can also be deployed in the best digital TVs, Blue Ray disc players. Because it's very small and fast. The requirements in this regard are the same.

Likewise, if you look at the internal architecture of Altas LTE, its main building block BBE16 is probably the fastest DSP core in the world. And it is also used in the digital TV demodulation subsystem. Again because it is fast, easy to program, and saves power. So we can see that there are common needs between the mobile phone and the living room, between these two media processors and baseband processors.

7. I saw you said that chip integration will focus on RF, storage and digital circuits. Do you think it is possible to merge the three into one?

Chris Rowen: Yeah. If you look at it from a semiconductor process technology perspective, I think there will be some things happening at the transistor and device optimization level. So in some cases, you can make trade-offs. In particular, we are working with a lot of customers to simplify RF circuits. With as many digital processors as possible, you can partially get rid of the boundary between RF and digital. Because digital has a steeper production cost curve than RF, we have a greater motivation to do it. Therefore, we will rely more and more on effective solutions in digital.

The same thing happens with memory. People occasionally combine them together, but it's not a simple combination, and the fabrication facility for memory is different from the general optimization. So I believe that multi-chip packaging will become more and more important. Especially when you stack chipsets (dies) on top of each other. So you can stack a memory chipset on top of a digital chipset, and then stack an RF chipset on top. This may be the most cost-effective. Then there may also be a compromise process technology that puts all three of them on a single silicon die. It depends on your application, such as needing some memory units, or some RF units.

But in the end, I think we will stick to three different processing technologies, and then rely on packaging technology to integrate them together. But this does not mean that as long as you can find a way to squeeze the three together, you will get a system. Because there are still physical property requirements, such as adding an extra battery. But in general, you have to know that the physical size will get smaller and smaller.

But you have to know that there is a huge challenge, that is, people's fingers cannot be made smaller, and their eyes cannot be made smaller. So we still have practical limitations in getting "small size" devices. Our smallness at the component level is actually corresponding to how small the screen and buttons we can accept. So in the end, this matter is still more related to cost.

8. In your book, you also predicted the future of FPGA. A few days ago, Xilinx announced that it would embed ARM's Cortex A9 core. Do you think this is a new trend? Will it compete with Tensilica's DPU?

Chris Rowen: Actually... not much. I mean, this work of embedding FPGAs into processors has been going on for about 10 years. When Altera announced that they were embedding with ARM, let me think about it, it was also 8 years ago, right? (Larry: That's right!)

So, it's just like any system looking for a chip, or three chips to stay together. Of course, occasionally you will happen to come up with a digital chip that has everything. Having said that, there is a fundamental challenge with FPGA brothers, which is that FPGAs are very versatile and can do a lot of things. But the good and the bad are interdependent. If you let it focus on doing one thing, it will not be so efficient. So if you want to really use the processor effectively, I guess you would rather embed something that is slightly more stable in the processor rather than an FPGA.

I think this is a very natural step. Xlinx has worked on Power PC before, right? It's the same thing. It didn't change any of the original architecture, nor did it achieve any logical merger between the CPU and FPGA functions. Part of the reason is that they don't have any tool software model for the merger.

Of course, FPGAs are very easy to configure and cheap. Therefore, they occupy a part of the market, especially those with low volume and low development cost. Therefore, we see a lot of FPGA designs in the market. But the total number of designs based on FPGAs is very small. It is actually a niche market. To put it in an extreme way, even if there are many engineers using it, almost all of them are low-volume.

So what I mean is, FPGAs are important, but they are not what Tensilica focuses on. We focus on high volume and helping our brothers who are trying to save a few nanometers of silicon in their designs. They are a little bit away. Of course, they occasionally overlap. For example, base stations. In the past, many base stations used Altera's storage solutions. It was quite heavy. Slowly, we saw more and more shifts from FPGAs to more highly integrated chip solutions due to capacity, cost and power requirements.

9. I saw a talk you gave in IEEE Design & Test. You said that if we want to enter the field of massive parallelism in embedded system design, there are some problems that must be solved for configurable multi-core processor SoCs. A few years ago, you also mentioned that Intel's biggest problem is how to configure multi-core processors for general computing applications. Do you still think that multi-core processors are in trouble?

Chris Rowen: This is... actually two separate things. There is indeed a major challenge for multi-core applications. It is how to find enough threads to run. But it is not a problem that Intel faces alone. This is a problem that involves how applications are called and how they are architected on such a small device today. Even when I open my own laptop and want to see how many threads are ready to run, it is basically very few. Usually, the way the operating system, user interface, and application development are called, the number of threads is not maximized at all.

So I think what you can do at the basic architectural level is to provide more threads to run and make full use of parallelism. Of course, there are many hierarchical restrictions at the application level. You know, it is very convenient to get a quad-core, eight-core, or sixteen-core processor now, but on the PC side, compared to the server, there are relatively few conditions for us to find these threads. So a major phenomenon is the gradual restructuring of operating systems and applications.

Another equally important phenomenon is determining what tasks can be put into the data plane. Let's think about what can usually be put into the data processor, such as communication subsystems such as wireless channels, such as storage systems, such as how you distribute data, or you know, security redundancy, or it may be a special network processor for packing streams, which can be video or audio. These things are actually more inherently parallel processing.

So I think there are two kinds of parallel reorganization here. One is the so-called, go find more threaded applications everywhere. The other is to maintain the maximum amount of off-load parallelism in the overall system and get it into the data layer. I actually think it's easier to extract parallelism at the data layer. Therefore, the number of multi-cores used effectively at the data layer is much greater than the number of multi-cores used at the application layer alone. This is why we think we are on the right track. Focusing on the data layer allows us to grow multi-core much faster than our brothers who are only focusing on the application layer.

10. So it’s not a problem on mobile phones?

Chris Rowen: Well, you can say that. It's become quite easy, let's take an example of LTE baseband. Our Atlas platform can have seven or eight cores, depending on how you want to use it. DoCoMo and its partners, NEC, Fujitsu and Panasonic, have announced and described their LTE baseband architecture in detail. The first generation was 8 to 10 cores. Another partner called Blue Wonder Communication also launched their 8 to 10 core LTE baseband. So now there are three different LTE basebands, and all three use about 8 cores. There can be a lot of parallel solutions at this level.

Looking at the next generation of LTE, there are about six performance factors to consider. Some of them are about how to make a single core faster, but most of them are related to multi-core. So it is easy to find cases where 20 or more cores are effectively used for a single function such as baseband. Compared with those guys who are watching the application processors, if they feel good, they will play with two cores; if they are still very satisfied, they will play with four. I think multi-core has completely different opportunities at the data layer and application layer.

11. Last question: You helped to establish the RISC architecture at Stanford, and later you were also the co-founder of MIPS. So, how do you see the future of the RISC architecture? Will it still be a war between ARM and MIPS, or will there be some new big events?

Chris Rowen: Well… in science, the ideal architecture has completely changed. The debate about CISC vs. RISC architectures is really just General Purpose Architecture A vs. General Purpose Architecture B. RISC won a title and a half because it had dozens of semiconductor technologies at its disposal at a given time. But in this war, Moore's Law gave so many transistors that you could use as many as you wanted for a simple decoder or pipeline. No one cared. So a RISC decoder might have 10,000 gates, while a CISC decoder might have 50,000 gates. It's about the same.

But I think there is a more profound revolution going on than just the general purpose architectures fighting each other. How about we compare general purpose architectures to a whole family of special purpose architectures? Almost any time you can say that if a product is designed around a specific need, then the special purpose architecture will definitely win. RISC beat CISC for a while because it was more than 2 times more efficient. So architectures that are specifically tailored for specific applications are 5 to 10 times more efficient than all general purpose architectures.

Therefore, the world can no longer be simply divided into my universal architecture and your universal architecture. Of course, for those very defused and generic applications, such as those used on laptops, we still need a universal architecture. Because one moment you want to watch a video, the next moment you want to run Word or play a game, or an Excel worksheet. This is very diverse. So you need a processor that is well-rounded in moral, intellectual and physical aspects. It can't be too special.

But at the end of the day, you have to deal with a world where there are all sorts of different tasks, and each task is unique, and more importantly, as you put more and more systems on a chip because of Moore's Law, you find that there are enough processors for various specific application subsystems.

So for me, the future of computing is not about creating a new general architecture, but a collection of special purpose architectures. For example, an audio subsystem, a video subsystem, a baseband subsystem, a storage subsystem, and yes, an application processor subsystem. Only one of them needs a general construction, and the others will all be special architectures. In science, Moore's Law brings multi-core, and multi-core will bring solutions for special architectures. Heterogenic Multi-Core is a new architecture. And I think it will become mainstream. Companies like Intel, ARM, and MIPS will certainly still have a large market, but only in the field of application processors. In fact, in science, general purpose will eventually become a specific purpose.

Reference address：IC Design Experience of Microprocessor Master

Previous article：Instrumentation Amplifier Misconceptions and How to Correct Them
Next article：IC power consumption control technology

Popular Resources
Popular amplifiers