Popular Science of FPGA Chip Industry

Latest update time：2020-12-27

Reads：

Source: This article is reprinted from Semiconductor Industry Observer Yushi Capital, thank you.

Overview of China's FPGA Chip Industry

FPGA chip definition and physical structure

FPGA chips are marketed as semi-custom circuits in the field of application-specific integrated circuits (ASICs), overcoming the lack of flexibility in custom circuits and the limited number of gate arrays in traditional programmable devices.

FPGA (Field Programmable Gate Array) chip is developed based on programmable devices (PAL, GAL) and is a semi-customized, programmable integrated circuit.

Inventor: Xilinx co-founder Ross Freeman invented the FPGA integrated circuit structure in 1984. The world's first commercial FPGA chip is the Xilinx XC4000 series FPGA product.

FPGA chips process signals in a fixed pattern and can perform new tasks (computing tasks, communication tasks, etc.). FPGA chips are more flexible than dedicated integrated circuits (such as ASIC chips) and can add larger-scale circuits to achieve multiple functions compared to traditional programmable devices.

Physical structure: FPGA chip is mainly composed of three parts, namely IOE (input output element), LAB (logic array block, defined by Xilinx as configurable logic block CLB) and Interconnect (internal connection line).

FPGA chip characteristics and classification

FPGA chips have significant advantages in real-time (fast data signal processing speed) and flexibility, and occupy an irreplaceable position in the field of deep learning. At the same time, they are difficult to develop.

FPGA chips have the following characteristics:

Flexible design: It is a hardware-reconfigurable chip structure with a large number of input and output unit pins and triggers inside.

Strong compatibility: FPGA chips are compatible with large-scale integrated circuits such as CMOS and TTL, and can work together to complete computing tasks.

Parallel computing: The internal structure of FPGA can build a corresponding number of pipelines according to the number of data packet steps. Different pipelines process different data packets to achieve pipeline parallelism and data parallelism.

Strong applicability: It is one of the devices with the shortest development cycle and the lowest application risk in special circuits (some customers can obtain suitable FPGA chips without investing in research and development).

Improved status: In the early days, it was a mass replacement for ASIC chips in some application scenarios; recently, with the expansion of data centers of leading Internet companies such as Microsoft, the application scope of FPGA chips has expanded.

FPGA manufacturers mainly provide FPGA chips based on two types of technologies: Flash technology and SRAM technology (Static Random-access Memory).

Both types of technologies can realize system-level programming functions, have high computing performance, and the system gate array density can exceed 1 trillion.

Core differences:

1. Flash-based programmable devices have non-volatile characteristics, that is, the stored data does not disappear after the current is turned off.

2. FPGA chips based on SRAM technology do not have non-volatility characteristics and are the most widely used architecture.

Comparison of FPGA chips with other mainstream chips

The CPU is a general-purpose device, and the FPGA architecture focuses more on computing efficiency than the CPU architecture. Relying on FPGA parallel computing to process visual algorithms can greatly improve the computing rate and reduce latency.

Comparison of FPGA chips with other mainstream chips

CPU processing calculation instruction flow:

The CPU receives task instructions through a dedicated decoder. The receiving process is divided into two steps: instruction acquisition (the CPU extracts execution instructions from a memory dedicated to storing instructions) and instruction translation (translates instructions into data according to specific rules and transmits them to the computing unit). The computing unit is a transistor (the basic component of the CPU), and "on" and "off" correspond to the machine code numbers "1" and "0" respectively.

CPU processing calculation instruction characteristics:

•The physical structure of CPU includes Control (instruction fetch, instruction translation), Cache (temporary instruction memory), and computing unit ALU (occupies about 20% of CPU space).

•The CPU is a general-purpose computing task processing core that can handle computing requests from multiple devices and can terminate the current calculation at any time and switch to other calculations.

•The logic control unit and instruction translation structure are relatively complex, and the computing task can be continued from the interruption point, sacrificing computing efficiency for high versatility.

Comparison between CPU vision algorithm and FPGA vision algorithm:

•CPU architecture: The CPU is used to process visual algorithms and must execute instructions in a specified order. The first instruction is executed after the entire image is completed, and the second instruction starts to run. In a 4-step operation instruction environment, it takes 10 milliseconds to run a single operation instruction, and it takes about 40 milliseconds to complete the total algorithm.

•FPGA architecture: FPGA is used to process visual algorithms in a large-scale parallel operation mode, and can run 4-step operation instructions simultaneously in different pixels of the image. If a single operation instruction takes 10 milliseconds to run, the FPGA only takes 10 milliseconds to complete the overall visual algorithm processing of the image. The FPGA image processing speed is significantly faster than the CPU.

•“FPGA+CPU” architecture: In this architecture, the image is transmitted between the CPU and FPGA, and the overall algorithm processing time including the transmission time is still lower than that of the pure CPU architecture.

• Algorithm example: Taking the convolution filter image sharpening calculation task as an example, the system needs to run the image through the threshold to produce a binary image. Under the CPU architecture, the system needs to complete the overall image convolution step before the threshold step. FPGA supports the same algorithm to run at the same time, and the convolution calculation speed is increased by about 20 times compared to the CPU architecture.

As a graphics processing device, GPU has a higher computing peak value. In the long-term field of machine learning (multiple instructions processing single data in parallel), FPGA is superior to GPU in terms of flexibility and power consumption.

FPGA chips compared to GPU chips

GPU physical structure:

GPU is a graphics processor that performs calculations for various computer graphics rendering behaviors (such as vertex settings, light and shadow operations, pixel operations, etc.). Standard GPUs include 2D engines, 3D engines, video processing engines, video memory management units, etc. Among them, the 3D engine includes T&L units, PiexlShader, etc.

GPU processing calculation instruction flow:

•Vertex processing: The GPU reads the 3D graphics vertex data, determines the 3D graphics shape and positional relationship based on the appearance data, and builds the 3D graphics skeleton.

• Rasterization calculation: The display image is composed of pixels, and the system needs to convert the graphic points and lines into pixels through algorithms. The conversion of vector graphics into pixels is the rasterization calculation process.

•Texture Mapping: Texture mapping is used to map deformable surfaces to generate realistic graphics.

•Pixel processing: The GPU calculates and processes the pixels that have been rasterized to determine the final properties of the pixels, which is usually done through the Pixel Shader.

Comparison of GPU and FPGA features:

• Peak performance: GPU computing peak (10Tflops) is significantly higher than FPGA computing peak (less than 1TFlops). GPU architecture relies on deep pipeline and other technologies to achieve manual circuit customization based on standard cell library. Relatively speaking, FPGA design resources are limited, model selection determines the upper limit of logic resources (floating point operation resources are high), FPGA logic units are based on SRAM lookup tables, and wiring resources are limited.

•Memory interface: The bandwidth of GPU memory interface (double data rate memory, etc.) is better than the DDR (double data rate synchronous dynamic random access memory) interface used by FPGA, which meets the frequent memory access requirements of machine learning.

•Flexibility: FPGA can program hardware according to specific applications. After the GPU design is completed, the hardware resources cannot be changed. In the future, machine learning uses multiple instructions to process a single data in parallel. The flexibility of FPGA hardware resources can better meet the needs.

•Power consumption: The average power consumption of GPU (200W) is much higher than the average power consumption of FPGA (10W), which can effectively solve the heat dissipation problem.

ASIC chips are highly specialized, and the non-repetitive cost of the development process (tapeout) is extremely high. In the early stage of 5G commercialization, FPGA can seize the market by relying on flexibility, but in the scenario of large-scale mass production, ASIC chips have more competitive advantages.

FPGA chips compared to ASIC chips

Differences between ASIC and FPGA development processes:

•ASIC needs to be designed from standard units. When functional and performance requirements change, the ASIC chip design needs to be re-spinned, and the design process has high time and economic costs.

•FPGAs include prefabricated gates and triggers, with programmable interconnect features that allow chip functions to be reconfigured. Relatively speaking, ASIC chips rarely have reconfiguration capabilities.

The difference between ASIC and FPGA in economic cost and time cost:

•The ASIC design process involves fixed costs, the design process causes less material waste, and the recurring costs are lower than FPGAs, but the non-recurring costs are higher (over a million US dollars on average).

•FPGA recurring costs are higher than similar ASIC chips. In large-scale mass production scenarios, the unit IC cost of ASIC chips continues to decline as output increases, and the total cost is significantly lower than that of FPGA chips.

•FPGA does not require waiting for the chip tape-out cycle and can be used directly after programming. Compared with ASIC, it helps companies save time to market.

•In the immature stage of technology, FPGA architecture supports flexible changes in chip functions, which helps reduce device product costs and risks and is more suitable for the market environment in the early stage of 5G commercialization.

Analysis of China's FPGA chip industry chain

FPGA chips constitute an important market segment of artificial intelligence chips. The industry chain is slender. As midstream companies, FPGA manufacturers have strong bargaining power with upstream software and hardware suppliers and downstream customer companies.

The industrial chain of China's FPGA chip industry is composed of upstream underlying algorithm design companies, EDA tool suppliers, wafer foundries, special materials and equipment suppliers, midstream various FPGA chip manufacturers, packaging and testing manufacturers, and downstream application scenario customer companies including visual industry manufacturers, automobile manufacturers, communication service providers, cloud data centers, etc.

Analysis of the upstream of China's FPGA chip industry chain

As a programmable device, FPGA chips have fewer tape-out requirements and are less dependent on upstream foundries. They require professional design software and algorithm architecture support.

Underlying algorithm architecture design company

FPGA chip design has a low reliance on the underlying algorithm architecture, and upstream algorithm suppliers have limited bargaining power over midstream FPGA chip R&D and manufacturing companies. Overseas algorithm architecture design companies include Qualcomm, ARM, Google, Microsoft, IBM, etc.

Specialized software vendors

FPGA chip companies need to complete the design through EDA and other development auxiliary software (quartus, vivado, etc.). World-class companies that can provide EDA software (such as Synopsys) charge high module usage fees to chip R&D companies. There are few companies that can provide EDA products in the Chinese market, represented by Xinhe Electronics, Huada Jiutian, Boda Micro Technology, etc. Chinese EDA companies started late in research and development, and the stability and maturity of software products need to be improved. It is costly for Chinese FPGA chip R&D companies to purchase overseas EDA software products. In the long run, domestic EDA companies need to eliminate the gap with similar overseas companies and provide price-friendly EDA products for midstream chip companies.

Wafer Foundry

There are currently about 30 mainstream wafer fabs in China, covering 8-inch wafers and 12-inch wafers in terms of specifications. Among them, there are more 8-inch wafer fabs than 12-inch wafer fabs. Taking Wuhan Xinxin, SMIC, and Tsinghua Unigroup as examples of China's local 12-inch wafer fabs, the average monthly production capacity is about 65,000 wafers. Foreign manufacturers that have set up wafer fabs in China include Intel, Hynix, etc. The development speed of Chinese wafer fabs is relatively fast. For example, Wuhan Xinxin's 12-inch wafer has an average monthly production capacity of 200,000 wafers, which exceeds Hynix's average monthly production capacity of 160,000 wafers.

Analysis of the midstream of China's FPGA chip industry chain

Midstream companies in China's FPGA chip industry have large profit margins. As R&D capabilities accumulate and the application market matures, the midstream industry structure may undergo fission, shifting from developing hardware and device R&D business to developing software and platform building business.

FPGA chip products can quickly enter the application market and are irreplaceable. At present, the application scenarios are relatively scattered. As the technology matures, terminal manufacturers may consider using ASIC chips to replace FPGA chips to reduce costs (ASIC mass production costs are lower than FPGA).

FPGA chips have huge profit margins:

Compared with CPU, GPU, ASIC and other products, FPGA chips have a higher profit margin. The profit margin of medium and low density million-gate and ten-million-gate FPGA chip R&D companies is close to 50% (refer to the iPhone's gross profit margin of nearly 50%). The profit margin of high-density billion-gate FPGA chip R&D companies is close to 70% (take Xilinx and Altera acquired by Intel as examples).

China's midstream enterprises are facing a market potential release node

Compared with giants such as Xilinx and Intel, China started late in FPGA research and development, but its R&D progress has gradually caught up (the gap with the world's leading manufacturers has been shortened from 3 generations to about 2 generations).

Since 2017, China's FPGA has entered a critical stage of development (a comprehensive transition from reverse design to forward design). Against the backdrop of intensified Sino-US trade frictions during the reporting period, midstream companies in China's FPGA industry that have completed their initial accumulation are facing good development opportunities. Compared with the global integrated circuit market size of over US$460 billion, the FPGA market size is relatively small, and there is room for incremental release.

The industry structure may change

As the concentration of midstream enterprises in the FPGA industry increases, the industry structure may undergo fission. Chinese enterprises can adjust their market strategies, shifting from hardware R&D to software design, and from device R&D to platform construction.

Analysis of downstream industry chain of China's FPGA chip industry

The downstream application market of China's FPGA chip industry covers a wide range, with electronic communications and consumer electronics occupying the top position. It has huge development potential in many fields such as industrial control, robot control, video control, autonomous driving and servers.

FPGA manufacturers focus on the communications market and consumer electronics scenarios

The Chinese FPGA application market is dominated by consumer electronics and communications. Domestic chips lag behind foreign high-end products in terms of product hardware performance and are not yet competitive in the high-end civilian market, but in the short term, shipments in LED display, industrial vision and other fields are relatively high. With the technological breakthroughs of Chinese companies and the maturity of 5G technology, Chinese FPGA manufacturers may achieve high market share growth in the communications field.

Automotive and data center applications follow closely behind

After 2025, edge computing technology and cloud computing technology will be fully rolled out in smart transportation networks and supercomputing centers, and the growth rate of the FPGA application market in the fields of autonomous driving and data centers will exceed that of the communications and consumer electronics markets.

Growth of FPGA chip downstream application market size:

In 2018, communications, consumer electronics, and automobiles accounted for more than 80% of the total global demand for FPGA chips, and the market size continued to expand. As core components of 5G base stations, automotive terminal equipment, and edge computing equipment, FPGA devices have significant acceleration effects and face deterministic incremental demand in the downstream market. As the strength of local midstream companies increases, domestic FPGA chip products may enter the downstream market with low prices in the long term, reducing the cost of downstream companies purchasing high-end programmable devices.

Market size of China's FPGA chip industry

The demand for existing FPGA chips in application scenarios continues to increase. The development of 5G and artificial intelligence technologies is driving the expansion of China's FPGA market and stimulating the release of incremental demand.

FPGA chip industry market size

With the expansion of downstream application markets, the market size of China's FPGA industry continues to increase. In 2018, the scale of China's FPGA market was close to 14 billion yuan.

The development of 5G new air interface communication technology and machine learning technology will further stimulate the expansion of China's FPGA market. It is estimated that in 2023, the scale of China's FPGA chip market will be close to 46 billion yuan.

The global FPGA market size potential will be released, mainly due to the following factors:

Downstream application scenarios tend to be extensive: FPGA chips are more flexible than ASICs, can save tape-out time costs, and have a short time to market. Application scenarios have expanded from communication transceivers, consumer electronics, etc. to automotive electronics, data centers, high-performance computing, industrial vision, medical testing, etc. In the short term, China's FPGA application scenarios will remain dispersed, and there is room for expansion in both the existing market and the incremental market.

Irreplaceability in some application scenarios: FPGA chips have irreplaceable advantages in low R&D and manufacturing costs that are not available in ASICs, CPUs, or GPUs in scenarios with unstable technology, high flexibility requirements, and small demand (the device can complete on-site programming requirements based on specific needs).

Global Market Share Analysis:

Significant demand in the Asia-Pacific market

The Asia-Pacific market is the main application market for FPGAs, accounting for more than 40% of the global market share. By the end of 2018, the scale of China's FPGA market was close to 14 billion yuan, and it faces a large incremental demand space as 5G communication infrastructure is rolled out.

Leading North American companies dominate the market

In North America, Xilinx and Intel (which acquired Altera) maintain a duopoly in the FPGA market. In China’s FPGA market, Xilinx’s share exceeds 50%, while Intel’s share is close to 30%.

FPGA chip technology analysis

Computing tasks: FPGA can be used to process multiple computing-intensive tasks. Relying on the pipeline parallel structure system, FPGA has a technical advantage over GPU and CPU in terms of the latency of returning computing results.

Computation-intensive tasks: Matrix operations, machine vision, image processing, search engine ranking, asymmetric encryption and other types of operations are computationally intensive tasks. Such computing tasks can be offloaded from the CPU to the FPGA for execution.

FPGA performs computationally intensive tasks:

• Computing performance relative to CPU: For example, the performance of Stratix series FPGAs for integer multiplication is comparable to that of a 20-core CPU, and for floating-point multiplication, its performance is comparable to that of an 8-core CPU.

• Computing performance relative to GPU: FPGA performs integer multiplication and floating-point multiplication, and its performance is an order of magnitude lower than that of GPU. It can be approached by configuring multipliers and floating-point operation components.

FPGA core advantages in executing computationally intensive tasks:

Tasks such as search engine ranking and image processing have strict requirements on the time limit for returning results, and the latency of the calculation steps needs to be reduced. Under the traditional GPU acceleration solution, the data packet size is large and the latency can reach the millisecond level. Under the FPGA acceleration solution, the PCIe latency can be reduced to the microsecond level. Driven by long-term technology, the CPU and FPGA data transmission latency can be reduced to less than 100 nanoseconds.

FPGA architecture advantages: FPGA can build the same number of pipelines (pipeline parallel structure) for the number of data packet steps, and the data packet can be output immediately after being processed by multiple pipelines. The GPU data parallel mode relies on different data units to process different data packets, and the data units need to be input and output consistently. For streaming computing tasks, the FPGA pipeline parallel structure has a natural advantage in latency.

Communication tasks: FPGA is used to process communication-intensive tasks without being restricted by the network card. It performs better than CPU solutions in terms of packet throughput and latency, and has stronger latency stability.

Communication-intensive tasks: Symmetric encryption, firewalls, network virtualization and other operations are communication-intensive computing tasks. Communication-intensive data processing is less complex than computing-intensive data processing and is easily limited by communication hardware equipment.

Advantages of FPGA in executing communication-intensive tasks:

1. Throughput advantage:

The CPU solution needs to receive data through the network card when processing communication-intensive tasks, which is easily limited by the network card performance (the network card that can process 64-byte data packets at line speed is limited, and the number of CPU and motherboard PCIe network card slots is limited).

The GPU solution (high computing performance) lacks network ports to process data packets for communication-intensive tasks, and needs to rely on network cards to collect data packets. The data throughput is limited by the CPU and network card, and the latency is relatively long.

FPGA can access 40Gbps and 100Gbps network cables and process various data packets at line speed, which can reduce the configuration cost of network cards and switches.

2. Latency advantage:

The CPU solution collects data packets through the network card and sends the calculation results to the network card. Due to the performance limitations of the network card, under the DPDK packet processing framework, the CPU processing communication-intensive tasks has a latency of nearly 5 microseconds, and the CPU latency stability is weak. Under high load conditions, the latency may exceed tens of microseconds, causing uncertainty in task scheduling.

FPGA does not require instructions and can ensure stability and extremely low latency. The FPGA-CPU heterogeneous mode can expand the application of FPGA solutions in complex end devices.

Deployment method: FPGA deployment includes cluster and distributed deployment, gradually transitioning from centralized to distributed. Under different deployment methods, server communication efficiency and fault conduction effects vary.

FPGA embedded power consumption burden: FPGA embedded has little impact on the overall power consumption of the server. For example, in the FPGA accelerated machine translation project jointly carried out by Catapult and Microsoft, the overall total computing power of the acceleration module reached 103Tops/W, which is equivalent to the computing power of 100,000 GPUs. Relatively speaking, embedding a single FPGA will increase the overall power consumption of the server by about 30W.

Features and limitations of FPGA deployment methods:

1. Cluster deployment characteristics and limitations: FPGA chips form a dedicated cluster, forming a supercomputer composed of FPGA acceleration cards (for example, the early Virtex series experimental boards deployed 6 FPGAs on the same silicon wafer, and a unit server was equipped with 4 experimental boards).

• Dedicated cluster mode cannot achieve communication between FPGAs on different machines;

•Other machines in the data center need to send tasks to the FPGA cluster, which may cause network delays;

• Single point failures limit the overall acceleration capabilities of the data center

2. Distributed deployment with network cable connection: To ensure the homogeneity of data center servers (which cannot be met by ASIC solutions), this deployment solution embeds FPGAs in different servers and connects them through a dedicated network, which can solve problems such as single-point fault transmission and network delay.

•Similar to the cluster deployment mode, this mode does not support communication between FPGAs on different machines;

•Servers equipped with FPGA chips are highly customized and have high operation and maintenance costs.

3. Shared server network deployment: In this deployment mode, the FPGA is placed between the network card and the switch, which can greatly improve the acceleration network function and realize storage virtualization.

FPGA sets up a virtual network card for each virtual machine, and the data plane function of the virtual switch is moved to the FPGA, eliminating the need for the CPU or physical network card to participate in the process of sending and receiving network data packets.

The solution significantly improves virtual machine network performance (25Gbps) and reduces data transmission network latency (10 times).

Shared deployment: In the shared server network deployment mode, FPGA accelerators help reduce data transmission latency, maintain data center latency stability, and significantly improve virtual machine network performance.

FPGA acceleration of Bing search sorting in the shared server network deployment mode: In this mode, Bing search sorting uses a 10Gbps dedicated network cable for communication, and each network consists of 8 FPGAs. Some are responsible for extracting signal features, some are responsible for calculating feature expressions, and some are responsible for calculating document scores, ultimately forming a Robot as a Service (RaaS) platform. Under the FPGA acceleration solution, Bing search latency is greatly reduced, and latency stability presents a normal distribution. In this deployment mode, the remote FPGA communication latency is negligible relative to the search latency.

Azure server deployment FPGA model: Azure adopts FPGA sharing server network deployment model to address the high cost of network and storage virtualization. As network computing speed reaches 40Gbps, the CPU cost of network and storage virtualization surges (each CPU core can only handle 100Mbps throughput). By deploying FPGAs between network cards and switches, network connections are extended to the entire data center. Through the lightweight transport layer, the latency of the same server rack can be controlled within 3 microseconds, and the latency of reaching all FPGA racks in the same data center can be controlled within 20 microseconds.

Acceleration layer: Relying on the advantages of high bandwidth and low latency, FPGA can form the data center acceleration layer between the network switching layer and the server software, and achieve super-linear performance improvement as the scale of distributed accelerators expands.

Data center acceleration layer: FPGA is embedded in the data center acceleration plane, located between the network switching layer (bracket layer, first layer, second layer) and traditional server software (software running at the CPU level).

Advantages of the acceleration layer:

•The FPGA acceleration layer is responsible for providing network acceleration and storage virtualization acceleration support for each server (providing cloud services). The remaining resources of the acceleration layer can be used for computing tasks such as deep neural networks (DNNs).

•As the scale of FPGA accelerators in distributed network mode increases, the performance improvement of virtual networks shows superlinear characteristics.

Principle of performance improvement of the acceleration layer: When using a single FPGA, the memory of a single silicon chip is not enough to support the full model computing task, and continuous access to DRAM is required to obtain weights, which is subject to DRAM performance. The acceleration layer supports the computing tasks of a single layer or part of a single layer of the virtual network model through a large number of FPGAs. In this mode, the silicon chip memory fully loads the model weights, which can break through the DRAM performance bottleneck and give full play to the FPGA computing performance. The acceleration layer needs to avoid excessive splitting of computing tasks, which may lead to imbalance in computing and communication.

eFPGA: Embedded eFPGA technology is superior to traditional FPGA embedded solutions in terms of performance, cost, power consumption, profitability, etc., and can provide flexible solutions for different application scenarios and different market segments.

eFPGA technology drivers:

The economic trend of increasing design complexity and decreasing equipment costs has triggered market demand for eFPGA technology.

Increased device design complexity: Software tools related to the SoC design implementation process tend to be more complex (e.g. Imagination Technologies provides the PowerVR graphical interface and Eclipse integrated development environment to meet customers' needs for complete development solutions), engineering time increases (compilation time, synthesis time, mapping time; the larger the FPGA, the longer the compilation time), and molding costs increase (FPGA chip costs are 100 times the cost of ASIC chips of the same specification).

The unit functional cost of equipment continues to decline: At the end of the 20th century, the average selling price of FPGA was high (over 1,000 yuan). In the traditional model, the integrated design of FPGA and ASIC led to an increase in the die area and size of ASIC chips, increased complexity, and high cost of early hybrid devices. In the 21st century, compared with mass-produced hybrid devices, FPGA is more used in prototype design and pre-production design, and the cost continues to decline compared with traditional integration (the lowest is about 100 yuan), and the application is flexible.

eFPGA technology advantages:

1. Better quality: The SoC design of eFPGA IP core and other functional modules performs better in terms of power consumption, performance, size, cost, etc. compared with traditional FPGA embedded ASIC solutions.

2. More convenient: The demand of downstream application market changes rapidly, and the reprogrammable feature of eFPGA helps design engineers update SoC, so that products can occupy the market for a longer time, and profits, revenues, and profitability can be greatly improved. Under the eFPGA solution, SoC can achieve efficient operation, and on the one hand, it can be quickly updated and upgraded to support new interface standards, and on the other hand, it can quickly access new functions to meet the needs of segmented markets.

3. More energy-efficient: Embedding eFPGA technology in SoC design can improve overall performance while reducing overall power consumption. Using the reprogrammable nature of eFPGA technology, engineers can reconfigure solutions for specific problems based on hardware, thereby improving design performance and reducing power consumption.

Cloud computing: FPGA technology does not rely on instructions or shared memory, and provides low-latency streaming communication capabilities in cloud computing network interconnection systems, which can widely meet the acceleration needs between virtual machines and processes.

FPGA cloud computing task execution process: Mainstream data centers use FPGA as an acceleration card for computing-intensive tasks. Xilinx and Altera have launched a high-level programming model based on OpenCL. The model relies on the CPU to reach DRAM, transmit tasks to the FPGA, notify execution, and the FPGA completes the calculation and transmits the execution results to DRAM, and finally to the CPU.

FPGA cloud computing performance upgrade space: Limited by engineering implementation capabilities, the current data center FPGA and CPU communication is mostly mediated by DRAM, which is completed through the process of burning DRAM, starting the kernel, and reading DRAM (FPGA DRAM has a slower data transmission speed than CPU DRAM), with a latency of nearly 2 milliseconds (OpenCL, shared memory between multiple kernels). There is room for improvement in the communication latency between the CPU and FPGA, and efficient direct communication can be achieved with the help of PCIe DMA, with the latency as low as 1 microsecond.

New mode of FPGA cloud computing communication scheduling: Under the new communication mode, FPGA and CPU do not need to rely on shared memory structure, and can achieve high-speed communication between intelligent driving units and host software through pipelines. The tasks of cloud computing data centers are relatively simple and highly repetitive, mainly including virtual platform network construction and storage (communication tasks) and machine learning, symmetric and asymmetric encryption and decryption (computing tasks), and the algorithms are relatively complex. Under the new scheduling mode, CPU computing tasks tend to be fragmented. In the long term, cloud platform computing centers may be mainly based on FPGA, and complex computing tasks will be offloaded to CPU through FPGA (different from the traditional mode where CPU offloads tasks to FPGA).

Competition among global FPGA manufacturers

Competition in the global FPGA chip market is highly concentrated, with leading manufacturers occupying the "air supremacy". New entrants are providing momentum for industry development through product innovation, and the demand for the intelligent market may push FPGA technology into the mainstream.

The global FPGA market is dominated by four giants: Xilinx, Intel (acquired Altera), Lattice, and Microsemi. The four major manufacturers monopolize more than 9,000 patented technologies and hold the "air supremacy" in the industry.

By the end of 2018, Xilinx ranked first in the global FPGA market (49%), Intel (Altra) accounted for more than 30%, and Lattice and Microsemi accounted for more than 5% of the global market. In comparison, Chinese manufacturers as a whole accounted for less than 3% of the global FPGA market share.

Since the formation of the FPGA chip industry, more than 70 companies have participated in the competition worldwide, and new start-ups have emerged in an endless stream (such as Achronix Semiconductor, MathStar, etc.). Product innovation provides momentum for the development of the industry. In addition to traditional programmable logic devices (pure digital logic properties), the innovation speed of new programmable logic devices (mixed signal properties, analog properties) has accelerated. For example, Cypress Semiconductor has developed a configurable mixed signal circuit PSoC (Programmable System on Chip), and Actel has launched Fusion (programmable mixed signal chip). In addition, some start-ups have launched field programmable analog arrays FPAA (Field Programmable Analog Array), etc.

As the demand for intelligent market changes and evolves, the market risk of highly customized chips (SoC ASIC) has increased dramatically due to the large scale of non-repetitive investment and long R&D cycle. Relatively speaking, FPGA has advantages in the field of parallel computing tasks and can replace some ASICs in the field of high performance and multi-channel. The demand for multi-channel computing tasks in the field of artificial intelligence has driven FPGA technology to evolve towards the mainstream.

Based on the advantages of FPGA chips in the fields of smaller batches (50,000 pieces are the limit for tape-out) and multi-channel computing special equipment (radar, aerospace equipment), some downstream application markets are replacing ASIC application solutions with FPGA.

Driving factors of China's FPGA chip industry

The construction of 5G communication system increases the demand for FPGA chips

Communication scenarios are the most widely used scenarios for FPGA chips in the downstream of the industrial chain (accounting for about 40%). With the development of 5G communication technology and the upgrading of hardware equipment (base station antenna transceiver innovation), FPGAs are driven by strong market demand.

The large-scale commercial use of 5G communications is imminent, which will drive the use of FPGA chips and release room for price increases.

New base station antenna transceiver uses FPGA chip

Under the technical conditions of Massive MIMO base stations in the 5G era, the number of base station transceiver channels has been increased from 16T16R (dual-mode solution) to a maximum of 128T128R, and FPGA chips can be used to implement multi-channel signal beamforming. For example, the 64-channel millimeter-wave MIMO full DBF transceiver intermediate frequency and baseband subsystems use Xilinx Kintex-7 series FPGAs. The intermediate frequency and baseband subsystems are superimposed to achieve universal wireless access functions.

Industry experts with more than 10 years of experience in product development and algorithm research in the FPGA chip industry said that FPGA has advantages over CPU and GPU in terms of power consumption and computing speed. Communication equipment companies will increase the application of FPGA devices in core equipment such as base station antenna transceivers (for example, leading mobile communication equipment manufacturer Comba Communication has embedded FPGA chips in its new transceiver products).

The global FPGA communications market is growing rapidly

As of the end of 2018, the global FPGA communication market accounted for nearly 45% of the overall application market. From 2020 to 2025, the global FPGA communication market size is expected to grow at a compound annual growth rate of nearly 10%.

5G infrastructure will use FPGA devices as core components

The growth of the 5G communication market is certain. Related infrastructure (computer rooms, macro base stations, micro base stations, etc.) has penetrated multiple fields such as the Internet of Things and edge computing. 5G infrastructure projects use FPGA as a core component, which has promoted the release of FPGA price increases.

•In the next 10 years, the number of small base stations may exceed 10,000, and the number of base stations will drive the increase in the use of FPGA devices.

•5G MIMO base stations face the need for high-concurrency data processing, and the overall FPGA usage of a single base station has increased (from 2 to 3 pieces in the 4G era to 4 to 5 pieces in the 5G era).

•At present, the average price of FPGA for base stations is less than 100 yuan, and factors such as increased technological complexity are pushing prices higher (> 100 yuan).

Large-scale commercial use of autonomous driving increases demand for mass production

In the field of autonomous driving, ADAS systems, sensor systems, in-vehicle communication systems, entertainment information systems and other sectors have generated incremental demand for FPGA chip products, and the world's leading FPGA manufacturers are actively deploying in the autonomous driving track.

FPGA giant is optimistic about the autonomous driving track

As of the end of 2018, the global automotive semiconductor industry market size was close to US$40 billion, of which FPGA applications in the automotive semiconductor field accounted for only about 2.5%. Autonomous driving systems place higher demands on on-board chips, and the demand for main control chips has expanded from traditional GPUs to ASIC, FPGA and other chip types. At this stage, the application of FPGA chips in hardware devices such as on-board cameras and sensors has become mature. In addition, thanks to programming flexibility, FPGA chips are widely used in the field of lidar. Autonomous driving vehicles are highly dependent on hardware devices such as sensors and cameras and software systems such as in-vehicle networks, and there is a significant demand for the number of FPGA chips. Leading FPGA manufacturers (such as Xilinx) have seized the intelligent driving track and gradually increased cooperation with car companies and Internet of Vehicles companies. As of the end of 2018, the number of models embedded with Xilinx FPGA solutions has expanded to 111.

FPGA has a wide range of applications in the field of autonomous driving systems

In the field of autonomous driving, FPGA chips can be applied to ADAS systems, LiDAR, automatic parking systems, motor control, in-vehicle entertainment information systems, driver information systems and other sectors, with a wide range of applications. Specifically, the Magic Vision intelligent automatic parking system can be taken as an example. The system connects the FPGA chip to the in-vehicle network CAN bus, connects to communication components such as Bluetooth and SD cards, and connects to cameras and sensor devices through MCUs. FPGA giant Xilinx is actively deploying in the ADAS field. In the long run, ADAS systems will become more complex (including front-view cameras, driving monitoring cameras, panoramic cameras, short-range radars, long-range LiDARs, etc.), which will drive the use of FPGAs. In 2025, autonomous driving will enter a large-scale

In the commercialization stage, the integration of FPGA with automotive electronics and in-vehicle software systems will continue to be promoted.

Constraints of China's FPGA Chip Industry

FPGA design talent team is lacking

The threshold for FPGA chip design is high (higher than CPU, memory, and DSP). Chinese local manufacturers started late and are in the early stages of building an industrial ecosystem, with a weak foundation in talent resource reserves.

Compared with the international market, China's FPGA chip design talent pool is insufficient

China's FPGA talent pool is about 1/10 of that in the United States

According to the "China Integrated Circuit Industry Talent White Paper" released by the China International Talent Exchange Foundation and other institutions, as of the end of 2018, the stock of talent in China's integrated circuit industry was about 400,000, and the talent demand in the industry is expected to exceed 700,000 by 2020, with a talent gap of more than 300,000. In the FPGA sector, there are nearly 10,000 talents from leading American manufacturers such as Intel, Xilinx, Lattice, and universities and research institutions. Relatively speaking, China lacks FPGA design and R&D talents. The average R&D personnel reserve of leading manufacturers such as Unigroup Tongchuang, Gowin Semiconductor, and Anlu Technology is less than 200, and the overall talent team of the industry is less than 1,000, which has become a core factor restricting the technological development and product upgrades of China's FPGA chip industry.

The industry started late and there is a lack of linkage between industry, academia and research institutes

China's FPGA industry started in 2000, while the United States had a background of research and development since the 1980s. In 2010, China's FPGA chips were put into mass production. American universities and chip manufacturers have close cooperation and transfer a large amount of technology to enterprises. In comparison, Chinese enterprises lack experience in cooperation with universities and other research institutions, and the industry-university-research cooperation is insufficient. Most of the core talents in the industry are introduced from overseas.

Lack of R&D capabilities restricts corporate growth

The world's leading FPGA manufacturers rely on their patent technology accumulation and talent training, as well as their development experience 20 years earlier than Chinese companies, to firmly occupy the first echelon camp globally. The FPGA industry has a high entry barrier, and it is difficult for Chinese leading companies to gain latecomer advantages. At this stage, Xilinx has entered the 7-nanometer process billion-gate high-end FPGA product development stage. Chinese leading manufacturers such as Tsinghua Unigroup and Gowin Semiconductor have started the 28-nanometer process 10 million-gate (70 million) medium and high-density FPGA development work, which is about 2 to 3 generations behind the world's top level, and urgently need talent resources support.

China's FPGA chip industry policies and regulations

policy Analysis

In order to further guide the orderly development of the FPGA industry and highlight the strategic position of the integrated circuit industry, national policy departments have integrated industry, market, and user resources to create a policy foundation for Chinese integrated circuit companies to develop towards the goal of becoming one of the world's first-tier companies.

Since the 12th Five-Year Plan, the country has emphasized the status of the integrated circuit industry as a leading industry, and paid more attention to the driving force of chip technology development on the transformation and upgrading of industrial manufacturing and the development of information technology. The country has introduced a number of favorable policies from the perspectives of market demand, supply, industrial chain structure, and value chain.

Development Trend of China's FPGA Chip Industry

FPGA chip design complexity continues to increase

From 2016 to 2018, the proportion of high-performance, high-security programmable chip design projects in the global FPGA R&D field increased, and the complexity of FPGA design has increased, with the increase in security feature design as an example.

As the demand for security features increases, the design complexity of high-performance FPGA chips increases

Increase in safety-critical standards and guidelines

The increase in demand for safety features can be reflected in the increase in safety-critical standards and guidelines. In 2016 and before, most FPGA development projects were based on one safety-critical standard. In 2018 and beyond, more FPGA R&D projects were developed based on one or more safety-critical standards and guidelines.

Safety assurance hardware module design projects increased

Security assurance hardware module designs are mostly used in encryption keys, digital rights management keys, passwords, biometric reference data, etc. Compared with 2016, the proportion of global FPGA security feature module design projects increased significantly in 2018 (an increase of more than 5%). The improvement of security features increases the demand for design verification and the complexity of verification.

Other design projects increase chip verification complexity

① Increase in the number of embedded processor cores: Compared with 2016, more FPGA designs tended to be SoC-class designs in 2018. In 2018, more than 40% of FPGA designs contained 2 or more embedded processors, and nearly 15% of FPGA designs contained 4 or more embedded processors. SoC-class designs increase the complexity of the verification process.

② Increase in the number of asynchronous clock domains: In 2018, approximately 90% of FPGA design projects contained two or more asynchronous clock domains. The verification requirements for multiple asynchronous clock domains increased the verification workload (verification models tended to be more complex and code exceptions increased).

Widely used in machine learning reinforcement projects

Medical diagnosis, industrial vision and other fields have a growing demand for machine learning and are facing challenges brought by the evolution of neural networks. Compared with CPUs and GPUs, FPGA technology is more adaptable to non-fixed and non-standard design platforms, and is more integrated with machine learning.

FPGA chips are more suitable for non-fixed and non-standard machine learning evolution environments

FPGAs excel in machine learning

• For performance comparison, please refer to Xilinx public test results

Regarding the performance of GPU and FPGA in the field of machine learning, Xilinx has published the benchmark comparison results of reVISION series FPGA chips and NVIDIA Tegra X1 series GPU chips. The data shows that the FPGA solution is 6 times better than the GPU solution in terms of image capture speed per unit power consumption, and 42 times better than the GPU solution in terms of computer vision processing frame rate. At the same time, the FPGA latency is 1/5 of the GPU latency.

• Energy efficiency comparison between Xilinx FPGA and Intel chip

Compared with Intel Arria 10 SoC series CPU devices, Xilinx FPGA devices can help increase the efficiency of deep learning and computer vision computing by 3 to 7 times.

Enterprises adopt new architecture (visual data transmission to FPGA-accelerated edge server clusters)

•FPGA optimized for stream processing

FPGA solutions can optimize stream processing (one of the big data processing technologies) for video analysis and deep learning reasoning. Based on its flexible and programmable characteristics, FPGA solutions can meet reconfiguration requirements and are suitable for common models such as inventory management, fraud control, and facial recognition, as well as complex models such as tracking, natural language interaction, and emotion detection.

• Startups actively adopt FPGA solutions

Startups such as Megh Computing and PointR.ai are actively adopting FPGA solutions to establish new video data processing architectures, leveraging the advantages of compact, low-power computing modules.

References: TouBao Research Institute, YuShi Capital Research Institute

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.

Today is the 2536th content shared by "Semiconductor Industry Observer" for you, welcome to follow.

Latest articles about

■SiC giant, rebirth: how to predict the future?

■Apple chips may hit Qualcomm hard

■Chip cost per car: soaring to $1,000

■TSMC 2nm, important information

■Huang Renxun's latest views

■The risks of this type of chips that are promising have increased significantly!

■NPU, how to see it?

■Storage giants are abandoning DDR 4

■Intel, why?

■Nvidia will definitely be disrupted