New challenges facing chips

Latest update time：2018-03-01

Reads：

Source: This article is translated from Semiconductor Industry Observer Semiengineering ,Thanks.

As chips enter markets such as automobiles, cloud computing and the industrial Internet of Things, chip reliability has become a major concern for developers. It has also been proven that the functions that chips want to achieve will become more and more difficult to achieve over time.

In the past, chip reliability was generally attributed to foundry problems. Chips designed for computers and mobile phones can be used normally at peak performance for an average of two to four years. After two to four years, the chip function begins to decline and users upgrade to the next version of the product, which has more functions, better performance and longer standby time.

But it’s no longer a simple question as chips break into new or less mature electronics markets such as automotive, machine learning, the Internet of Things (IoT) and Industrial Internet of Things (IIoT), virtual and augmented reality, home automation, the cloud, cryptocurrency mining and more.

Each end market has its own unique needs and characteristics that affect how and under which conditions the chip is used, which in turn has a significant impact on other issues such as aging, safety, etc. Consider the following questions:

Reliability is no longer measured in years. The use cases are changing dramatically. Today’s cars are idle 90% to 95% of the time, but self-driving cars may only be idle 5% to 10% of the time. This affects the architecture of electronics and the potential business models for developing technology.
As edge electronics become more complex, people’s definitions of functionality and “good enough” change. In the past, if a camera on a drone or robot was damaged or dirty, it would usually be replaced. But as the electronics in edge devices become more complex, it is possible to compensate for a broken camera while ensuring that it is functional enough. On the other hand, due to tighter system tolerances, what is acceptable in a less complex system may not be acceptable in a complex system.
There are more factors that affect aging and quality modeling than in the past. While some of these may not be apparent when the chip is being developed, a known good chip may behave differently when packaged with other chips than when it is on a PCB.

Across the electronics world, use cases are changing. Even within data centers, which have historically been very conservative when it comes to adopting new technologies and approaches, this is happening.

“Aging is a function of clock speed and power consumption,” said Simon Segars, Arm CEO. “In the past, servers were occasionally used during operation, and most of the time they were idle. But when you move to the cloud, the design criteria need to be different because it depends on how long it is used. This raises a lot of questions about how to design for longevity.”

At the start of the millennium, average server utilization was about 5% to 15%, a trend that persisted through the 1990s as IT managers became reluctant to run one or two applications on a single commodity server to avoid equipment failure. Then two things changed that. First, energy costs began to rise, and second, and perhaps more importantly, companies reorganized to make their IT departments responsible for their energy costs, rather than their equipment departments. Both factors led to a surge in sales of virtualization software to increase server utilization, which meant fewer racks of servers to power and cool.

The cloud takes this operational efficiency to a higher level. Cloud operations are designed to maximize utilization by balancing computing jobs across the data center. This significantly increases utilization across all servers in a data center rather than on a single rack, and can quickly shut them down when they are not needed. This approach saves energy but has a significant impact on the degradation and aging of electronic circuits.

“Chips are aging faster and failing,” said Magdy Abadir, vice president of marketing at Helic. “They might be missing clocks or having extra jitter, or having dielectric breakdown. At any time, something could happen that you’re worried about. Many aging models were advanced in the days of occasional electronics use, but now chips are always running, and inside the chip, the module is heating up, so aging is accelerated, and aged chips do all kinds of weird things. Many companies haven’t revised their aging models yet. They’re assuming these devices will last three or four years, but they may fail sooner. Given the small margins when they’re designed in to begin with, aging can throw them off.”

Chip utilization trends are also changing in the automotive sector and will continue until fully autonomous cars that can replace human drivers emerge. Cars are processing more and more data, some of which is streamed from sensors such as radar, lidar, and cameras. All of this data needs to be processed in a shorter time and with higher accuracy than in the past, which puts tremendous pressure on electronic devices.

“ADAS reliability is at least 15 years, not two to five years,” said Norman Chang, chief technologist at ADAS. “Aging is not just about time, but also about negative bias temperature instability (NBTI), electromobility related to heat, electrostatic discharge (ESD), and thermal coupling.”

Figure 1: Thermal modeling of chip and package. Source: ANSYS

While many automotive Tier 1 suppliers build chips that can withstand extreme temperatures, mechanical shock, and all kinds of noise, the advanced node CMOS that has been in use for a long time has never been subjected to these types of stresses. Many industry insiders confirm that automakers are developing 10/7nm chips to manage all this data and work at leading-edge nodes to avoid obsolescence of their designs, which are typically used in recent generations of cars. The problem is that there is very little real-world data to prove that these devices can operate reliably under any environmental conditions over time.

“You have to design differently,” Segars said. “One school of thought is that you’re going to need fewer cars because they won’t be sitting idle all the time. But the other school of thought is that autonomous cars are going to go faster and faster, and they’re going to wear out faster, and eventually everything is going to wear out. The challenge is making sure the electronics don’t wear out before the mechanical parts, and that requires designing differently. That includes everything from taking noise seriously to reducing peak current.”

Thinner insulation layer, thinner substrate

One irony of increasing chip reliability is that it contradicts 50 years of semiconductor development, because the size is reduced every two years to reduce costs, which means thinner dielectrics, finer lines, and greater dynamic power, and the substrate is getting thinner and thinner. At the most advanced node processes, this leads to higher leakage current, more noise, greater electromobility and other electrical effects.

“From a circuit perspective, you know you have to take process variations into account,” said André Lange, division manager for quality and reliability at Fraunhofer EAS. “But from a design-for-function perspective, it’s about what happens when you deal with known defects in the system. If you look at an autonomous car, there’s a central processing unit that decides which information to use from which sensor. One of them might be dirty or not functioning.”

This makes degradation modeling more complex because it needs to be done in the context of the system. “Many things can cause circuit degradation, whether it’s NBTI or more defects per a given area or larger process variations,” Lange said. A big challenge, he noted, is determining what causes defects without all the huge data available.

Figure 2: What went wrong. Source: Fraunhofer

Different methods

Process variation increases with each new node. Over the past decade, smartphones have driven the scaling roadmap forward (the iPhone was launched in 2007). Now, the largest users of advanced node technology are servers for data mining, machine learning, AI, and the cloud.

The link between process variation and reliability is well documented, but the presence of variation makes aging models more difficult to accurately model. Many different approaches have been proposed to address this problem, ranging from complex statistical modeling and simulation to placing sensors on the chip or packaging them.

“When there is a heat source, you have to use local and global ‘random walk’ methods to track the temperature,” said Ralph Iverson, chief R&D engineer for 5nm at Synopsys. “In the case of a random walk, the voltage is the average of the voltages around it, so the delta is zero.”

This helps with modeling, but according to Iverson, at 5nm and below, resistivity is not always clean. There are surface effects, and the data does not necessarily represent the copper connectivity, and we need more localized data to judge. Therefore, hybrid methods begin to emerge because this uncertainty is difficult to abstract.

“Bipolar CMOS DMOS (BCD) has been well investigated in the automotive industry, but we are also seeing requirements and demand for advanced CMOS,” said Mick Tegethoff, director of AMS product marketing at Mentor, a Siemens Business. “We are seeing more interest from foundries, and EDA companies are simulating aging due to stress. Is that enough? Any kind of modeling is an approximation of the real world, so you do circuit simulation and build a chip that lasts as long as possible, but then you need to do physical testing or something like putting it in an oven to create physical stress. There are a lot of electronic products that are being tested like this.”

Analog vs. Digital

To date, most aging/degradation modeling has focused on digital circuits. Analog provides a different perspective on aging.

“Because they have leading-edge silicon at the heart of their products, the company has a good understanding of aging and process excursions, so they don’t go in blindly,” said Oliver King, Moortec’s chief technology officer. “Analog has a lot of variable effects. A digital chip might be unusable, but for analog it might be slightly worse or the circuit slightly flawed, so you have to adjust for that. Traditional analog developers don’t push geometry effects as much as digital developers. Electromigration is still an issue, current density is an issue, but there aren’t as many aging effects. Still, the chips need to be more aggressive in repair and whether to take action.”

Frank Ferro, senior director of product management at Rambus, has a similar view: “With the physical layer (PHY), the biggest challenge is the ambient temperature. As the temperature increases, the performance drifts, so you need to recalibrate. For consumers, there is such a thing as the ‘Christmas test’. You store a Playstation or other electronic device in the garage when it is cold, and then turn it on Christmas morning, and the circuit needs to be able to work immediately from cold. This is similar to the storage system in a car or base station. Aging will have an impact on these systems, and you need to recalibrate the system to mitigate these effects.”

Ferro said the physical layer goes through the same qualifications as digital components, including aging and testing for voltage and temperature variations. But the physical layer is designed to scale with those variations, which are difficult to design into digital circuits, especially at advanced nodes, where margining has an impact on power and performance.

Analog circuits are often designed based on what are called “mission profiles.” Therefore, a specific function in a self-driving car will represent a mission profile for which the self-driving car IP is designed.

“One of the big things we see is that there’s not just one scenario, depending on how they operate,” said Art Schaldenbrand, senior marketing manager for the IC and PCB division at Cadence. “There are many ways that a device can fail, so we look at what might fail under different stresses. Ten percent bias temperature instability (BTI) of the device might cause a failure, but that’s the worst possible stress. So we need better ways to express degradation. A finFET is stressed differently than a planar device, so different phenomena need to be simulated.”

Packaging and other unknowns

As Moore's Law slows, more and more companies are turning to advanced packaging to improve performance and provide more design flexibility. Until now, it has not been entirely clear how to model advanced packaging to determine stress and aging. This is partly because there are so many packages to choose from that no one is sure which one is the best. It is also partly because many of these packages are relatively new and the internals of the package need to be explored over time.

“The package may be too close to other components or have stress from the other side,” Helic Abadir said. “That needs to be modeled. Even before it ages, it has to be modeled for aging because the effects are increasing. So placement is important because if you move it, you change the resonant frequency. There is no easy way. You have to analyze and design it, and if you find a problem, you may need to move it.”

There are other anomalies in complex designs that can affect reliability over time. For example, some usage models may turn the circuit on and off more frequently than others, which can stress the circuit.

“If something sits idle for too long, it ages differently than other circuits,” said Jushan Xie, senior software architect at Cadence. “The smaller the device, the stronger the aging effect. The more stress, the faster the aging.”

How all of this will be handled is not entirely clear. At least some of it will involve new materials and technologies.

“For power electronics, this is driving a shift from silicon-based devices to silicon carbide and gallium nitride (GaN), which can operate at higher switching frequencies, with higher efficiency and at higher temperatures,” said John Parry, electronics product marketing manager at Mentor. “In some applications, this can allow the power electronics to be placed closer to the motor drive, into a higher temperature environment. In other cases, the ability of the semiconductor to withstand higher temperatures means less cooling is required. However, the semiconductor must be packaged, and the package must also be able to withstand higher temperatures. There has been huge investment in new technologies, such as sintered silver as a die attach material, rather than using traditional wire bonding, so the packaging of power devices such as IGBTs has undergone a huge change in materials, processing technology and design.”

in conclusion

Aging, stress, and other effects become increasingly problematic as designs move to advanced nodes or grow in use over time in new markets for safety reasons.

“It depends on the questions that customers are asking today,” Fraunhofer’s Lange said. “You talk to different people and their starting points are different, but the frequency of questions is much greater. Many are only at the beginning, they see higher voltages and higher temperatures and do some experiments to infer overstress. But understanding how the degradation affects the entire circuit is much more difficult. There is still a lot of work to be done for complex chips.”

But with the attention paid to it, the investment in solving these problems will also increase. Chip designers are just now paying attention to degradation modeling and aging issues. Like power consumption a decade ago, all this will change.

Original link: https://semiengineering.com/chip-aging-accelerates/

Today is the 1513th content shared by "Semiconductor Industry Observer" for you, welcome to follow.

eading

Recommended reading (click on the article title to read directly)

★ How SMIC achieved transformation during its transition period

★ Will China succeed in the memory field?

★ The right time, right place and right people for China’s semiconductor industry!

Follow the WeChat public account Semiconductor Industry Observation , reply to the keyword in the background to get more content

Reply Popular Science , read more popular science articles about the semiconductor industry

Reply Bitcoin , read more articles related to Bitcoin and mining machines

Reply Wafer , read articles related to wafer manufacturing

Reply Ziguang , read more articles related to Ziguang Company

Reply to ISSCC , see "The Trend of Semiconductor Industry from ISSCC Papers"

Reply BOE , read more articles related to BOE

Reply Storage , see more articles related to storage technology

Reply to A shares to see more articles related to listed companies

Reply Exhibition , see "2017 Latest Semiconductor Exhibition and Conference Calendar"

Reply Submit your article and read "How to become a member of "Semiconductor Industry Observer""

Reply Search and you can easily find other articles that interest you!

Click to read the original text and join the Moore Elite

Latest articles about

■SiC giant, rebirth: how to predict the future?

■Apple chips may hit Qualcomm hard

■Chip cost per car: soaring to $1,000

■TSMC 2nm, important information

■Huang Renxun's latest views

■The risks of this type of chips that are promising have increased significantly!

■NPU, how to see it?

■Storage giants are abandoning DDR 4

■Intel, why?

■Nvidia will definitely be disrupted