Kill GPU and make CPU the first choice for AI. This company has taken action

Latest update time：2021-08-18

Reads：

Source: The content is compiled by Semiconductor Industry Observer (ID: icbank) from " allaboutcircuit ", thank you.

When it comes to AI/ML classification, developers often turn to GPU-based accelerators rather than general-purpose processors (CPUs). In this case, these developers must make a significant investment in specialized hardware, but the value of this hardware often decreases when the next generation of algorithms emerges.

NVIDIA touts that its V100 has 32 times the training throughput of a regular CPU.

ThirdAI, a startup dedicated to reducing the cost of AI deep learning, says they have a better way. ThirdAI recently raised $6 million to further its own deep learning approach. After spinning out of Rice University, the company launched SLIDE (Sub-Linear Deep Learning Engine), an algorithm deployed on general-purpose CPUs that aims to counter the dominance of GPUs.

The SLIDE algorithm is said to achieve faster training results on a CPU than on a hardware accelerator such as the NVIDIA V100. What do higher-performance CPUs mean for next-generation GPUs?

Co -founded by Associate Professor Anshumali Shrivastava , ThirdAI's success stems from research at Rice University .

Anshumali Shrivastava, co-founder of ThirdAI

Shrivastava, who has a background in mathematics, has always been interested in artificial intelligence and machine learning, especially rethinking how to develop artificial intelligence in a more efficient way. When he was at Rice University, he studied how to make deep learning work. He founded ThirdAI in April with some Rice graduate students.

Shrivastava said ThirdAI’s technology aims to be “a smarter approach to deep learning,” using its algorithmic and software innovations to make general-purpose central processing units (CPUs) faster than graphics processing units for training large neural networks. In this regard, many companies abandoned CPUs several years ago in favor of graphics processing units, which can render high-resolution images and videos simultaneously more quickly. The downside is that graphics processing units don’t have much memory, and users often encounter bottlenecks when trying to develop artificial intelligence, he added.

“When we look at the deep learning landscape, we see that most of the technology is from the 1980s, and most of the market (about 80%) uses graphics processing units, but invests in expensive hardware and expensive engineers, and then waits for the magic of AI to happen,” he said.

He and his team looked at how artificial intelligence might be developed in the future and wanted to create a cost-effective alternative to graphics processing units. Their algorithm, the "Sublinear Deep Learning Engine," uses CPUs without the need for specialized acceleration hardware.

Swaroop "Kittu" Kolluri, founder and managing partner of Neotribe, said the technology is still in its early stages. He added that current methods are laborious, expensive and slow, and can run into problems if a company is running language models that require more memory, for example.

“That’s where ThirdAI comes in, where you can have your cake and eat it, too,” Kolluri said. “That’s why we want to invest. It’s not just compute, it’s memory, and ThirdAI is going to enable anyone to do it, which is going to be a game changer. As the technology around deep learning starts to get more sophisticated, anything is possible.”

Artificial intelligence is already at the stage where it can solve some of the most difficult problems, such as those in health care and earthquake processing, but he noted that running AI models on the impact of climate change is also problematic.

“Training a deep learning model can be more expensive than owning five cars for a lifetime,” Shrivastava said. “We need to take these into account as we continue to scale AI.”

Initial university research showed results comparable to GPU hardware, but was hampered by cache thrashing. That’s when Intel stepped in. Shrivastava explained:

“They [Intel] told us they could work with us to make it train faster, and they were right. With their help, we got about 50 percent better results.”

SLIDE, or Sub-Linear Deep Learning Engine, is a "smart" algorithm that has the potential to replace hardware accelerators for large-scale deep learning applications. Ultimately, ThirdAI's goal is to squeeze more out of processors using algorithmic and software innovations.

SLIDE is said to provide results that are 3.5 times faster than the best available Tensorflow GPU hardware and 10 times better performance than Tensorflow CPU. Although the CPU used by the researchers is unnamed, it is described as a modest "44-core" CPU.

The Intel Xeon 22-core processor E5-2699V4 is the closest match to the unnamed processor used by the Rice University researchers. The CPU is 22 cores and 44 threads. Regardless of the exact CPU, SLIDE claims to be a breakthrough algorithm for AI training. So how does it work?

At the most basic level, SLIDE uses sampled hash tables, specifically a modified version of Locality Sensitive Hashing (LSH) , to quickly look up neuron IDs for activations, rather than computing the entire network matrix one by one. It combines this technique with another called adaptive dropouts , which is used to improve classification performance in neural networks.

Using hashing to sample specific neurons

Because it can query specific neurons, SLIDE is said to overcome a major limitation in AI deep learning: batch size.

SLIDE maintains its time advantage regardless of batch size

By using multi-core CPU processing and optimizations—along with locality-sensitive hashing (LSH) and adaptive loss—SLIDE achieves O(1) or constant time complexity regardless of the batch size.

Thanks to this design, the company has received investment from Neotribe Ventures, Cervin Ventures and Firebolt Ventures, which will be used to hire more employees and invest in computing resources.

Hardware accelerators are expensive, with high-end platforms costing more than $100,000 (compared to $4,115 for an E5-2699V4). Despite the high cost, demand for high-performance GPUs has strengthened the hand of manufacturers such as NVIDIA .

However, as AI training datasets continue to grow, the matrix multiplications required for each convergence continue to grow. When AI models change, investments in specialized hardware to run current AI models can quickly become degraded.

Finally, as cost dominates engineering, the ability to run industrial-scale deep learning on general-purpose processors may be a holy grail situation. If SLIDE continues to prove viable, companies like Intel may reap the rewards in the long term.

★ Click [Read original text] at the end of the article to view the original link of this article!

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.

Today is the 2770th content shared by "Semiconductor Industry Observer" for you, welcome to follow.

Latest articles about

■TSMC's 2nm is too powerful, UMC is too miserable

■The United States has heavily funded this semiconductor technology

■Tesla is also snapping up HBM 4

■Intel's next-generation AI chip is exposed for the first time

■Nvidia releases its largest chip yet

■The Danish robot giant invites you to "do something" together

■Self-developed DPU released: Microsoft chip, full of firepower

■In the post-Moore era, optical computing chips have become the key to breakthrough, and domestic manufacturers have great potential!

■Open source software in crisis

■The global semiconductor equipment giants are all in trouble