Analog or digital circuits, how to choose for machine learning?
Source: This article is translated by the public account Semiconductor Industry Observation (ID: icbank) From "eenews" , by Avi Baum, thank you.
In recent years, we have witnessed the rise of “deep learning,” a field that attempts to achieve a level of reasoning and intelligence similar to human behavior.
The mathematical formulas that have been refined for Artificial Neural Networks (ANNs) have been developed relative to the development of physical devices that can run these networks efficiently. Computers and human brains are often compared, and their underlying structures are very different. One of the obvious characteristics of neural networks is their cellular nature. Therefore, the basic "cell" structure is one of the aspects explored, and one obvious reason is that it is repeated many times. Hence, the importance of efficiency. Will be the focus of this article.
ANN Theory Guide
The basis of artificial neural networks is a large number of elements called neurons, usually arranged in tightly connected bundles. In short, a neuron is a unit characterized by having multiple inputs and a single output. The output of the unit is a direct function of the inputs, with each input receiving a different level of "attention" in the overall contribution to the output, and this level of "attention" is usually called a weight. In addition, the output may carry some threshold effects, where the neuron will only respond when it exceeds the threshold (also called "fired"). The relevant inputs of the offline neurons connected to the "firing" neuron will be "fired", and this process will be transmitted throughout the network to reach the final output.
Figure 1: A biologically inspired neuron (left) and its artificial, conceptual equivalent (right). Dendrites serve as input; axons are outputs, and aggregation occurs within the “cell.”
When defining equivalent models, the most common approach is to use a weighted sum with nonlinearities applied to the outputs. This approach is very useful in capturing the essence of a concept in a simple yet meaningful way. However, when trying to capture finer aspects of biological behavior, more complex models are sought. These reflect additional properties that may lead to a more complete description of neurons and, for practical reasons, may provide implementation alternatives that overcome some of the performance barriers inherent to the basic representation.
The options for simulating neuron behavior involve time domain, frequency domain, and amplitude domain representation. These options can be easily expressed in closed mathematical form as described below.
The straightforward discrete model represents neurons as a weighted sum of inputs (Fig. 2a); a graphical representation (Fig. 2b), where the train of spikes represents activity and their temporal rate determines the firing level - this is the closest representation to representing nerve cell activity in the human body; and a continuous representation.
Figure 2: Mathematical representation of a) discrete (b) impulse and (c) continuous models.
Analog and digital implementation
The various approaches used for neuron implementation need to address two fundamental issues: (i) processing - the part responsible for computing inputs and weighted outputs; and (ii) data transfer - the part responsible for data transfer and storage.
While digital implementation is more common in modern large-scale IC design, recent approaches are increasingly implemented with analog circuits. The digital implementation of neurons is based on multiplication and accumulation circuits. Each operation involves reading the input and weights, and producing an intermediate result. This process is repeated many times. After the summation is completed, nonlinearity needs to be applied to the resulting value, and the result is presented as the neuron output. The result is available once every N cycles. The result should be stored thereafter.
Figure 3: Digital circuit building blocks
Analog circuits use the continuity of signals to represent the sum of certain physical levels (for example, the sum of voltage potentials, or the sum of currents) and obtain continuous signals that are exempt from the finite world length representation problem.
Figure 4: Functional blocks built from analog circuits (continuous operation)
Another variation of analog circuits is pulse based circuits which utilize the concept of a train of pulses of constant amplitude. In this case, the level of excitation depends on the rate. This concept is largely analogous to brain neuron activity.
Figure 5: Functional blocks constructed from analog circuits (pulse operation)
In the analog case, data storage is a very significant challenge. It can be solved by converting to the digital domain, which means some kind of analog-to-digital conversion is required, performing both data and digital-to-analog conversion as the data is acquired. Alternatively, the output can be fed directly to the next stage, avoiding any storage operations. The latter approach is efficient if the design can support the required bandwidth. If necessary, some capacitance can be applied to allow bandwidth control. (Note: Figures 3, 4, and 5 show one option for implementing each of the previously mentioned methods and do not contain all implementation details)
performance
When studying the performance of various approaches, it became apparent that while the digital approach is well established, it is limited by CMOS technology barriers such as transistor-level threshold voltages of ~0.4V, standard cells below 3GHz, and process-dependent maximum clock frequencies and duty cycle limitations. For a single 8-bit multiply-add operation, this results in a lower bound of ~100fJ at the processing node.
In contrast, analog circuits are theoretically bounded by thermal noise, which is approximately 0.01fJ. This is four orders of magnitude lower than digital circuits. Therefore, there is a reason to build circuits based on analog computing structures. However, practical deployment is challenged by various issues, such as the ability to pass data to the large number of computing cells described, parasitic effects associated with their connection, effectively store the output, and ultimately translate into large-scale design processes and mass production technologies. In practice, there are reports that the achievable energy of the computing cell is 1~10fJ. In these implementations, the computing cell energy becomes negligible in practice, however, the total energy is dominated by the surrounding circuits and storage cells.
In summary, actual efficiencies of X10~X100 based on functional blocks built on digital circuits are achievable at a small scale, but once the unit scale increases, its efficiency will decline rapidly.
Figure 6: Related Operational Domain Diagram
Figure 6 is a qualitative description of the different approaches. The efficiency loss of the analog circuit is mainly due to implementation losses (i.e., the detector circuit has some internal noise, which reduces the signal-to-noise ratio and requires a better margin). In this case, the pulse approach has a lower detection threshold. When scaling up the analog solution, noise coupling is observed. This effect grows with the solution size (it is more significant in the continuous approach). The digital approach is less affected by this coupling effect. In fact, the energy gap from analog to digital is due to higher voltage levels and operating frequencies, which are much higher in the analog case.
In fact, large-scale circuit design has matured over the past few decades, and the industry experience gained cannot be easily ignored. Therefore, scalability and productization have largely limited the ability to make analog-based solutions the main approach to solving general problems. In addition, at the system level, minor contributors cannot be ignored. Once the computational unit contribution is reduced to a reasonable level, further improvements become less important.
System level
So far, this discussion has been devoted to building blocks at the functional block level. However, it would be incomplete to ignore the rest of the system. System-level analysis should consider all contributors and take into account the fact that at a certain point the improvement factor of the basic processing can be negligible. Such is the case with energy distribution.
To date, state-of-the-art solutions are striving to reach 0.1~1TOPS/W when running machine learning tasks. This is equivalent to 1~10pJ per operation. As mentioned before, due to the digital implementation of the neuron platform of 0.1pJ, then 90%~99% of the energy still exists in other areas including memory cells, control structures and bus architecture. Therefore, it is crucial to utilize the potential for architectural transformation. The energy recovered by switching to analog schemes alone is capped at 10% of the total energy consumed.
Comparison
The following table lists some key properties of the various methods and summarizes most of the items mentioned above.
Table 1: Comparison of analog and digital circuit-based neural networks
Conclusion
In summary, it is clear that the dynamic nature of machine learning will lead to new and interesting technologies that will gradually mature and meet various market needs.
Analog solutions show great potential in the field of neural network computing engines. Once mature, it is likely to become a complementary element in various neural computing solutions and may solve some challenging cases. However, due to its limited scalability, technology node sensitivity, and the fact that the solutions it provides are related to a relatively limited subset of applications, where digital solutions can provide effective solutions, it is difficult to foresee when analog circuit-based solutions will be able to achieve flexible replacement and dominate in this field.
Original link: http://www.eenewsanalog.com/news/analog-and-digital-circuits-machine-learning
*This article is translated by the official account Semiconductor Industry Observation (ID: icbank). If you need to reprint, please add WeChat ID: icbank_kf01, or reply to the keyword "reprint" in the background of the official account, thank you.
Today is the 1664th issue of content shared by "Semiconductor Industry Observer" for you, welcome to follow.
How the Intel Empire was built
★ Inventory of domestic semiconductor listed companies
Follow the WeChat public account Semiconductor Industry Observer (ID: icbank) and reply to the following keywords to get more relevant content
SSD | Moore's Law | ASR | Panel | Transistor | Open Source | Unicorn | Packaging | Exhibition
Reply to the submission and see "How to become a member of "Semiconductor Industry Observer""
Reply to the search and you can easily find other articles that interest you!
About Moore Elite
Moore Elite is a leading chip design accelerator that reconstructs semiconductor infrastructure to make it easier for China to make chips. Its main businesses include "chip design services, wafer packaging and testing services, talent services, and incubation services". It covers more than 1,500 chip design companies and 500,000 engineers in the semiconductor industry chain, and has precise big data on integrated circuits. It currently has 200 employees and is growing rapidly. It has branches and employees in Shanghai, Silicon Valley, Nanjing, Beijing, Shenzhen, Xi'an, Chengdu, Hefei, Guangzhou and other places.
Click to read the original article to learn more about Moore's elite
Featured Posts