Alibaba DAMO Academy experts reveal: AI algorithm shortens genetic analysis of suspected pneumonia cases to half an hour
We look forward to more "Vulcan Mountain speed"
Text | Li Yuchen
To this day, the development of the epidemic is still the most pressing issue on everyone’s mind.
As of 24:00 on February 1, the National Health Commission has received a total of 14,380 confirmed cases and 19,544 suspected cases reported from 31 provinces (autonomous regions, municipalities) and the Xinjiang Production and Construction Corps.
Despite the severity of the epidemic, the good news is that AI is becoming an important supporting force for frontline medical staff.
On February 1, the Zhejiang Provincial Center for Disease Control and Prevention launched an automated whole-genome detection and analysis platform. Using the AI algorithm developed by Alibaba DAMO Academy, the genetic analysis of suspected cases can be shortened from several hours to half an hour, significantly shortening the diagnosis time and accurately detecting virus mutations.
1
Limitations of nucleic acid testing methods
After so many days of crazy output of popular science knowledge, one thing we can know is that the clinical manifestations of patients with pneumonia infected with the new coronavirus are not much different from those of influenza.
Therefore, the test results of the nucleic acid testing kit, which is referred to as the "little box" by medical staff, are very important for diagnosis.
Genome sequencing is an essential step in the development of nucleic acid test kits, a process that takes several days. Once completed, the development of nucleic acid test kits will become very fast.
So, is it enough to have enough test kits? Far from it.
Previously, Luo Guangxiang, professor of the Department of Pathogenic Biology, Peking University School of Medicine and tenured professor of molecular virology, Department of Microbiology, University of Alabama at Birmingham School of Medicine, said that nucleic acid testing kits cannot be used independently and need to be used in conjunction with PCR instruments. Such instruments are only available in larger hospitals. Community hospitals and county hospitals may not have them yet, so samples can only be sent to hospitals or CDCs in central cities for testing.
Therefore, at the beginning, the nucleic acid test for the new coronavirus could only be carried out at the Hubei Provincial Center for Disease Control and Prevention, and the production of nucleic acid testing kits was relatively small, which made it difficult to diagnose suspected patients and they could not receive timely treatment.
In addition, the new coronavirus itself is also very "difficult to deal with".
Public information shows that the virus has one of the longest genome sequences, with a total length of 29,847 bp. Clinical diagnosis requires comparing patient samples with the virus's gene sequence to determine the diagnosis.
Dr. Gu Fei, an algorithm expert at DAMO Academy, said that currently hospitals generally use nucleic acid testing methods, which can only detect part of the virus gene. Once the virus mutates, it may be missed.
In an interview with Leifeng.com, an insider revealed that the nucleic acid testing method is a molecular biology technology used to amplify specific DNA fragments. It can use the polymerase chain reaction to greatly amplify trace amounts of DNA, thereby detecting viruses with specific gene fragments. In other words, this method can only detect part of the sample gene.
2
AI Algorithms Become a Breakthrough
In the face of the severe anti-epidemic situation, AI has become a powerful tool to break the deadlock.
The automated whole-genome detection and analysis platform developed this time is a high-throughput sequencing platform jointly developed by Zhejiang Provincial Center for Disease Control and Prevention, Alibaba DAMO Academy's medical AI team, and Jiyi Biotechnology Co., Ltd. The breakthrough is that it greatly shortens the detection time.
Leiphone.com interviewed an algorithm expert from Alibaba DAMO Academy regarding the platform’s cooperation details and future application plans.
Q: What are the main steps and stages of the traditional viral gene analysis process?
A: Generally speaking, the entire process covers: sample labeling and packaging; nucleic acid extraction; preparation of fluorescent quantitative PCR system; machine testing; and data report analysis.
Q: How much manpower did DAMO Academy invest in such a platform? When did you start working on it? How long did it take?
A: After the outbreak, DAMO Academy invested more than ten people to develop this new platform. For example, we analyzed the characteristics of the new coronavirus genes and optimized the algorithms based on data from public data sets such as PDB. Algorithm experts also went to the front line of Zhejiang Provincial Center for Disease Control and Prevention to communicate and cooperate with the two partners to develop this platform.
Q: Since it is an AI algorithm, there is an issue of accuracy. Does the DAMO Academy's AI algorithms and models need to consider accuracy issues during application at each stage?
A: Currently, there are no inaccuracies. During the detection process, the algorithm needs to achieve 100% accuracy.
Q: The detection and analysis of viral genes are two different tasks. How do you collaborate?
A: Jieyi Bio has developed a fully automatic high-throughput sequencing library construction instrument, which shortens the overall conventional manual work from 12 hours to 2 hours. To put it simply, the test results are "digitized" and then analyzed by the algorithm developed by the DAMO Academy. Each sequencing process will generate a huge amount of data. Based on a series of optimized algorithms, the detection speed of sample cases can be accelerated. In this link, computing power and algorithms are equally important.
Q: For a platform that helps medical staff maintain a balance between reducing workload and improving efficiency, what will be the future application methods and cooperation channels?
A: Next, the whole genome detection and analysis platform will be widely used throughout the province. DAMO Academy will work with its partners to promote this technology nationwide. It is not possible to disclose whether other provincial and municipal hospitals are seeking to use it.
3
The finishing touch: distributed design algorithm
During the sequence comparison process, DAMO Academy added distributed design to the algorithm to improve the comparison efficiency; the speed of sample gene analysis was shortened from several hours to half an hour; in the virus sequence splicing stage, DAMO Academy used the distributed design de Bruijn graph algorithm, which can also accurately detect mutant viruses, and the speed of virus splicing was shortened from 30 minutes to 1 hour to 15-30 minutes.
In addition, unlike traditional nucleic acid detection methods, this platform can also detect the full picture of the virus and perform whole genome sequence analysis and comparison on virus samples from suspected cases to avoid missed detections due to virus mutations.
The virus detection and virus mutation parts of this analysis are mainly based on open source algorithms, and distributed algorithms are designed to accelerate the analysis process. After the virus sequence is spliced, the BiLSTM+DNN method is designed to train the model to predict the secondary structure of the virus protein.
At the same time, DAMO Academy is also researching sequence-based protein three-dimensional structure prediction models and drug screening models.
Dr. Sun Yi, head of gene sequencing at Zhejiang Provincial Center for Disease Control and Prevention, said: "The platform, based on Alibaba Cloud's powerful computing power and DAMO Academy's new algorithm, can provide support for virus analysis. Based on this platform, in the future, the detection range can cover all confirmed cases in a short period of time, which also lays a solid foundation for subsequent vaccine and drug research and development. "
4
A technology war that concerns all citizens
In order to fight the epidemic, a battle that concerns the entire nation, domestic technology giants stepped forward at the first opportunity.
In order to help accelerate the development of new drugs and vaccines, Alibaba Cloud previously announced that it would open all AI computing power to public research institutions around the world for free.
At present, the Chinese Center for Disease Control and Prevention has successfully isolated the virus, but during the development of new drugs and vaccines, a large amount of data analysis, large-scale literature screening and scientific supercomputing work are required. Alibaba Cloud's AI computing power can support viral gene sequencing, new drug development, protein screening and other tasks, helping scientific research institutions shorten the research and development cycle.
In addition to this whole genome detection and analysis platform, the DAMO Academy also launched an "intelligent epidemic robot" in five days during the Spring Festival, and is currently providing services for Zhejiang Province's new pneumonia public service and management platform.
Today (February 2), the Wuhan Vulcan Mountain Hospital, which was built in 9 days with the efforts of 7,000 people, was officially delivered. It is believed that as the epidemic prevention campaign becomes more and more in-depth, the whole genome detection and analysis platform will use the power of AI to bring more confidence to the public and medical workers, just like the Vulcan Mountain Hospital.
Previous recommendations
100,000 AI talents vote for you, companies scan the QR code to register