Source: This article is translated from ZDnet by Semiconductor Industry Observer, thank you!
The field of AI chips is booming, and in addition to traditional products, there are innovations from many startups. You may not have heard of NeuReality before, but after today, you will hear more about it.
NeuReality is a startup founded in Israel in 2019. Today, it announced the launch of NR1-P, a novel AI-centric inference platform. It’s a bold claim about the previously unknown, and it took a short time to get there — even if it’s the first of many more implementations to follow.
ZDNet recently reached out to NeuReality CEO and co-founder Moshe Tanach to learn more.
Founded by Industry Veterans
Tanach has more than 20 years of experience in the semiconductor and systems space, working on solutions ranging from computing and wireless to data center networking and storage.
He and his co-founders, Tzvika Shmueli, VP of Operations, and Yossi Kasus, VP of VLSI, have come a long way and have impressive prior experience in key positions between them.
NeuReality's founders have extensive work experience at companies such as Habana Labs, Intel, Marvell, and Mellanox, and Xilinx is also an important partner of NeuReality, Tanach explained. At this point, NR1-P is now a prototype implementation on Xilinx FPGA. Their ultimate goal is to implement NR1-P as a system on chip (SoC).
NeuReality has already begun demonstrating the NR1-P to customers and partners, though it has not revealed the name. The company claims the prototype platform can validate its technology and allow customers to integrate it into well-designed data centers and other facilities.
Tanach refined NeuReality's philosophy, saying that systems and semiconductors should be designed from the outside in: "You need to understand the system. If you can build a system like what Qualcomm does, they're building phones and base stations in order to make the best phone chip."
From the beginning, NeuReality chose to focus solely on inference workloads. As Tanach points out, dealing with how to train AI models has received a lot of attention and has resulted in very expensive computer systems that have excellent results in training models.
But when you push to use AI for real-life applications, you need to care about how the models are deployed and used — hence inference. And when you try to use expensive systems, the cost of each AI operation remains high, and it’s difficult to solve these two problems together.
This philosophy is one of the reasons why they brought Dr. Naveen Rao, former general manager of Intel’s AI products group, to the NeuReality board. Rao is the founder of Nervana, which was acquired by Intel in 2016. While at Intel, Rao had two product lines, one for training and the other for inference.
Uncertainty in calculations
As Tanach puts it, Rao appreciates NeuReality’s “fresh perspective.” But what exactly does that mean? NR1-P relies heavily on FPGA solutions, so the partnership with Xilinx is important. Tanach points out that Xilinx is more than just programmable logic and FPGAs:
“When you look at how their advanced FPGAs are built, they are a system on a chip. They have an ARM processor built into their latest Versal ACAP technology. They also integrate an array of VLAW engines that you can program. We can build a very powerful 16-card server chassis.”
NeuReality implemented the NR1-P in a Xilinx FPGA, so they didn’t have to manufacture anything – just build the chassis. As Tanach noted, they worked with Xilinx and came up with an autonomous inference engine that’s implemented inside the FPGA. An SoC is in development and will be available in early 2022.
This means that NR1-P is not targeting embedded chips, as using FPGAs for this is impractical. Even with SoCs, NeuReality will continue to target near-end solutions:
“Edge devices require more optimized solutions that are specifically designed for the needs of the device. You need processing in microwatts, milliwatts, or less than 50 milliwatts. However, there is uncertainty in computing. The current trend is to push more and more application computing to the cloud, but we are starting to see the pendulum swing back.
Look at the deal Microsoft made with AT&T to build a number of data centers in AT&T facilities in the US to bring more computing power closer to the edge. Many IoT devices will not be able to embed AI capabilities due to cost and power reasons, so they will need a computing server to provide them with services closer to their location. All the way to the cloud and back will introduce high latency".
Object-oriented hardware architecture
Tanach said NeuReality's "secret sauce" is simple in concept:
Other deep learning accelerators may do a good job of offloading neural network processing from applications, but they are PCI devices.
They have to be installed throughout the server, and they cost a lot.
The CPU is the center of the system, and when it is offloaded, it runs the drivers for the device. This is not the case with NeuReality. The NR1-P is an autonomous device connected to the network. It has all the data path functions so it does not need to run in software, eliminating this bottleneck and the need for additional devices. Tanach calls this object-oriented hardware:
“The main object here is the AI compute engine. For a long time, we have been using object-oriented software, which changed the way we write things. We wrapped the main objects with the required functionality. Now it’s time to develop hardware to do the same. If you want to invest in an AI compute engine, make it a top priority.”
Another topic Tanach touched on was the communication protocols used. He pointed out that inference solutions such as Nvidia use REST APIs, which makes network connections very expensive. NeuReality has other implementations that they will disclose later.
Last but not least, elasticity and utilization of cloud data centers are also important. Tanach said that existing deep learning accelerators are not included in this equation. Kubernetes connection, communication with the orchestrator, all of this is done on the CPU hosting these deep learning accelerators. NeuReality integrates these functions into the device.
Tanach added that this all means that the cost of AI inference operations is very low, both in terms of capital expenditure and operating expenditure. Currently, FPGAs can be used in data centers and places like 5G base stations where power consumption is small. There will be two types of SoCs, one for data centers and another to reduce costs and power specifications to enable edge nodes closer to the node.
NeuReality claims to have 15 times better performance per dollar than GPUs and ASICs from deep learning accelerator vendors. When asked for a reference for these claims, Tanach mentioned using MLPerf as the basis for internal benchmarking. Tanach added that NeuReality will share proposed updates to MLPerf soon.
In addition to delivering its SoC, NeuReality is also working on delivering its software stack. The goal is to be able to work with whatever machine learning framework people are using, whether it's PyTorch or TensorFlow or anything else. Tanach noted that ONNX makes that much easier, and NeuReality is investing in software.
He went on to add that the future of AI compute offloading is to completely offload the pipeline. It is certain that NeuReality's software stack will support the computational graph representation that will enable this capability. On the customer side, NeuReality is targeting three market segments.
Hyperscalers and the next wave of cloud service providers, solution providers building data centers for OEMs with clients such as the military, government and financial industries, and last but not least.
Today’s report follows NeuReality’s emergence from stealth in February 2021, when they raised $8 million in seed funding. Granted, it’s still early days for NeuReality. However, the company’s background and progress make them worth keeping an eye on.
Click "
Read original text
" at the end of the article to view the original English text.
*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.
Today is the 2667th content shared by "Semiconductor Industry Observer" for you, welcome to follow.
Semiconductor Industry Observation
"
The first vertical media in semiconductor industry
"
Real-time professional original depth
Scan the QR code
, reply to the keywords below, and read more
Wafer|IC|Equipment|Packaging and Testing
|RF|Storage|US|TSMC
Reply
Submit your article
and read "How to become a member of "Semiconductor Industry Observer""
Reply
Search
and you can easily find other articles that interest you!