Former Dell EMC China Research Institute Dean Starts Business: Launches AI Acceleration Virtualization Platform, Developers Can Use It for Free

Latest update time：2019-07-02

Reads：

Lei Gang from Aofei Temple
Quantum Bit Report | Public Account QbitAI

Do you feel the embarrassing situation of AI acceleration?

Exclusive solutions, non-virtualized use, high cost, lack of heterogeneous acceleration management and scheduling, difficult solutions, and easy to lock in suppliers.

For AI developers, virtualization uses accelerator computing resources and existing scheduling and management software, which is not user-friendly.

So now, several experts in the field of virtualized computing have initially built a solution and officially launched it on GitHub, which is free for developers to download and use.

This is the OrionAI computing platform that has just been launched .

AI Accelerator Virtualization

The entire OrionAI computing platform includes two major components: AI accelerator virtualization software and heterogeneous accelerator management and scheduling software.

Among them, OrionAI accelerator virtualization software not only supports users to use and share local accelerator resources, but also supports applications to transparently use remote accelerator resources without modifying the code.

This breaks the physical boundaries of resource scheduling and builds a more efficient resource pool.

Heterogeneous accelerator management and scheduling software also supports user applications to run transparently on a variety of different accelerators without modifying the code.

Ultimately, it helps users better leverage the advantages of a variety of different accelerators and build a more efficient heterogeneous resource pool.

The newly launched OrionAI computing platform community version v1.0 supports virtualization of NVIDIA GPUs and is available for trial by leading AI, Internet and public cloud customers. Developer users can download and use it for free.

AI Acceleration Pain Points

Why did the OrionAI computing platform come about?

The solution maker said that with the rapid development and popularization of AI technology, more and more customers are beginning to use high-performance AI accelerators, including GPUs, FPGAs and AI ASIC chips.

At the same time, more and more customers need efficient AI accelerator virtualization software to improve the utilization of accelerator resources, as well as efficient heterogeneous accelerator management and scheduling software to better utilize a variety of different accelerators, improve performance, reduce costs, and avoid vendor lock-in.

But correspondingly, it faces the two major pain points mentioned at the beginning.

First, AI accelerators are relatively expensive.

Taking the well-known NVIDIA V100 GPU as an example, the price is around 80,000 RMB, and the high-performance FPGA card is also priced at 50,000 RMB.

Secondly, due to the lack of efficient and economical AI accelerator virtualization solutions, most companies currently have to use the above-mentioned expensive accelerator resources exclusively, resulting in low resource utilization and high costs.

According to data disclosed by AWS at re:Invent 2018, GPU utilization on AWS is only 10% to 30%.

When there is only one GPU on a physical machine, if there is no GPU virtualization solution, users can only allow one virtual machine to use the GPU exclusively, resulting in the GPU being unable to be shared by multiple virtual machines.

So several veterans in the field of accelerated virtualization decided to test the waters and eventually launched their own solution: OrionAI computing platform v1.0.

Solution Details

The platform supports users to share local and remote GPU resources through multiple virtual machines or containers.

Typical scenarios for using the OrionAI platform include:

First, multiple virtual machines or containers share the local GPU.

Users only need to replace the CUDA runtime in the virtual machine or container with the Orion Runtime.

The user's AI applications and the deep learning frameworks used (TensorFlow, PyTorch, etc.) do not require any changes and can run just like in the native CUDA operating environment.

At the same time, users need to run the Orion service (Orion Server) on the physical server, which will take over the physical GPU and virtualize the physical GPU into multiple Orion vGPUs.

AI applications running on different virtual machines will be assigned to different Orion vGPUs, which will significantly improve the utilization of physical GPUs.

Second, multiple virtual machines or containers share a remote GPU.

Users can run virtual machines/containers on servers without GPUs, and AI applications can use the Orion vGPU on another server through the Orion Runtime without modification.

In this way, the user's AI application can be deployed on any server in the data center, and the user's resource allocation and management are greatly improved in flexibility.

Third, a single virtual machine or container uses GPUs that span multiple physical servers.

Through Orion Runtime, users' virtual machines/containers can use GPU resources across multiple physical machines without modifying AI applications and frameworks.

The current situation is that AI applications may require 64 GPUs - or even more GPUs to train models, but there is no physical server today that can fully meet the needs.

With Orion Runtime, applications can use GPUs on multiple physical servers without modification, such as 16 servers with 4 GPUs each.

In this way, user GPU resources can become a true data center-level resource pool.

Users' AI applications can transparently use the GPU resources on any server, greatly improving resource utilization and management scheduling flexibility.

The GPU resources allocated to users through the Orion AI Platform, whether local GPU resources or remote GPU resources, are software-defined and allocated on demand.

These resources are different from the resources obtained through hardware virtualization technology. Their allocation and release can be completed instantly without restarting the virtual machine or container.

For example, when a user starts a virtual machine, if the user does not need to run AI applications, Orion AI Platform will not allocate GPU resources to the virtual machine.

When a user needs to run a large training task, for example, requiring 16 Orion vGPUs, the Orion AI Platform will instantly allocate 16 Orion vGPUs to the virtual machine.

When the user completes training and only needs one Orion vGPU for inference, the Orion AI Platform can instantly release 15 Orion vGPUs.

It is worth mentioning that all the above resource allocation and release do not require the virtual machine to be restarted.

Technical details and benchmarks

What are the technical details behind the above solution?

In fact, Orion Runtime provides an API interface that is fully compatible with CUDA Runtime, ensuring that user applications can run without modification.

After receiving all the user's calls to CUDA Runtime, Orion Runtime sends these calls to Orion Server.

Orion Server will offload these calls to the physical GPU for execution and then return the results to Orion Runtime.

OrionAI computing platform v1.0 also announced performance comparison results.

Let’s look at the configuration first:

GPU server configuration: dual-socket Intel Xeon Gold 6132, 128GB memory, single nVidia Tesla P40.

Performance test set: TensorFlow v1.12, official benchmark, no code modification, test using synthetic data.

"Native GPU" means running the performance test on a physical GPU without using a virtual machine or container;

"Orion Local Container" runs the performance test in a container with Orion Runtime installed, and Orion Server runs on the same physical machine;

"Orion Local KVM" runs the performance test in a KVM virtual machine with Orion Runtime installed, and Orion Server runs on the same physical machine;

“Orion Remote – 25G RDMA” is a performance test that runs on a physical machine without a GPU, and Orion Server runs on a physical machine with a GPU. The two physical machines are connected via a 25G RDMA network card.

The final comparison results are as follows:

The data shows that the performance loss introduced by Orion Runtime and Orion Server is very small compared to running on a physical GPU.

Especially when using a remote GPU through a network connection, the OrionAI computing platform has made a lot of optimizations to make its performance very close to that of using a local GPU.

OrionAI computing platform builder

Finally, let me introduce the creators behind the OrionAI computing platform:

VirtAI Tech .

Founded in January 2019, it focuses on AI accelerator virtualization software, as well as heterogeneous AI accelerator management and scheduling software.

There are three main founders, all of whom are senior veterans in the field.

Wang Kun , CEO of Trending Technology, holds a Ph.D. from the Department of Computer Science at the University of Science and Technology of China.

Prior to founding TrendForce, Dr. Wang Kun worked at Dell EMC China Research Institute as the director, responsible for managing and leading all research teams at Dell EMC in Greater China.

He has long been engaged in research in the fields of computer architecture, GPU and FPGA virtualization, distributed systems, etc. He was the first in the industry to promote FPGA virtualization-related research and has more than ten years of work experience and accumulation in this field.

Chen Fei , CTO of Trending Technology, holds a Ph.D. from the Institute of Computing Technology, Chinese Academy of Sciences.

Prior to founding TrendForce, Dr. Chen Fei worked at Dell EMC as Chief Scientist of Dell EMC China Research Institute, where he has long been engaged in research in the fields of high-performance computing, computer architecture, GPU and FPGA virtualization.

Zou Mao , Chief Architect of Trending Technology, holds a PhD from the University of Science and Technology of China.

Prior to founding TrendForce, Dr. Zou Mao worked at Dell EMC as a senior researcher at Dell EMC China Research Institute, where he has long been engaged in research in the fields of computer architecture and GPU virtualization.