Yitu Medical CEO: Interpreting the Chinese NLP-assisted diagnosis research results first published in Nature Medicine
▲Click above Leifeng.com Follow
Text | Wang Yi
Report from Leiphone.com (leiphone-sz)
Leiphone.com: Yitu Medical, which started out as a medical image analysis company, has actually been quietly working in the field of medical NLP for more than two years. Recently, the Chinese AI auxiliary diagnosis system developed by it in collaboration with Guangzhou Women's and Children's Hospital became a big hit. Because the relevant paper is the "world's first" Chinese electronic medical record NLP technology published in a top medical journal, the diagnosis system has attracted much attention from the industry. Leiphone.com was the first to interview Yitu Medical CEO Ni Hao to dig into the technical details behind the system and Yitu Medical's layout and thinking in the field of NLP.
On February 12, the internationally renowned medical research journal Nature Medicine published online a paper titled "Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence." This is the first time in the world that a top medical journal has published a paper on the technology of clinical diagnosis through natural language processing of Chinese text-based electronic medical records.
The paper describes the technology of combining deep learning with knowledge graphs to deconstruct clinical electronic medical record data, form an intelligent disease database, and build an auxiliary diagnosis model based on it. In other words, with this technology, computers can "understand" medical records and make preliminary diagnoses.
Among them, the "intelligent disease database" is the core achievement of this research, and there is a lot of room for imagination in system development based on this disease database. In addition to the above-mentioned auxiliary diagnosis model, intelligent guidance, auxiliary consultation and other systems can also be built based on this disease database, which can effectively alleviate the problem of insufficient and uneven distribution of medical resources and promote the process of medical supply-side reform.
It is reported that this technology and paper were jointly developed and written by Professor Xia Huimin of Guangzhou Women and Children's Medical Center (hereinafter referred to as "Guangzhou Women and Children's Medical Center"), Professor Zhang Kang of University of California, San Diego, Dr. Liang Huiying of Guangzhou Women and Children's Medical Center Data Center, Director Sun Xin of the Medical Department, and Director He Liya of the Pediatric Clinic, together with top research teams in the industry such as Yitu Medical and Kangrui Intelligent Technology, and the Guangdong Key Laboratory of Regenerative Medicine.
More than 6,000 schemas, 55 diseases
Ni Hao, CEO of Yitu Medical, introduced that this paper describes the results accumulated by Yitu Medical in the field of NLP over the past two years. During this period, Yitu Medical has conducted a lot of basic research, such as the construction of knowledge graphs, the cleaning and annotation of structured data, the design of annotation systems, and the selection of algorithms.
The operation of the entire system is divided into two parts. First, based on the medical knowledge graph, deep learning technology is used to deconstruct clinical electronic medical record data according to certain rules, converting unstructured text data into structured data, and building an intelligent disease database. Then, various diagnostic models are built based on this intelligent disease database. In the paper published this time, the team built an auxiliary diagnosis system that provides doctors with diagnosis and treatment suggestions by reading patient medical records.
Specifically, during the construction of the disease database, the team first built a medical knowledge graph based on existing materials such as medical guidelines and expert consensus databases, and based on the knowledge graph, used deep learning technology to deconstruct the electronic medical record data used for training according to the "standard deconstruction schema". These schemas were jointly developed by Yitu Medical and the experts and directors of Guangwu and Children's Hospital to describe all the meaningful features of a certain disease.
Different dimensions of the same disease (such as diagnosis, family history, chief complaint, laboratory tests, imaging examinations, ultrasound examinations, etc.) are constructed into independent schemas. Yitu Medical said that it has worked with more than 30 senior pediatricians and more than 10 informatics researchers to build more than 6,000 schemas, set up basic models, and trained with a large amount of data to form the "intelligent disease library" mentioned above. The disease library now covers 55 diseases and is under continuous testing and iteration.
Ni Hao gave a more vivid explanation of the whole process. The purpose of the system is to extract information points from the original electronic medical record data based on the Schema, and to structure and standardize them. Therefore, the LSTM attention mechanism is used to build a model to extract information by continuously "asking questions" about the text. For example, in the process of deconstructing the text "A mass is visible in the left upper lobe of the lung", the system extracts information by continuously "asking questions" - "Is it the left upper lobe of the lung?" "Is there a mass in the left upper lobe of the lung?"... In fact, the process of asking questions is the process of scanning text.
After the disease database was built, the team used a hierarchical logistic regression classifier to build a diagnostic model. Ni Hao said that the difference between this model and other systems is that it uses a hierarchical structure for judgment.
The first level of classification uses an organ-based approach, where the diagnosis is first standardized into broad organ systems (such as respiratory system, nervous system, digestive system, etc.); the second level is further refined into organ subsystems and more specific diagnostic groups (such as upper respiratory tract and lower respiratory tract); at the same time, the design of the diagnostic stratification decision tree is adjusted to the most clinically applicable scenario using pathophysiological or etiological methods (such as infectious, inflammatory, traumatic, neoplastic, etc.).
Hierarchical structure of the diagnostic model
1.3 million training cases, 88.5% diagnostic accuracy
The training data for this model is concentrated on pediatrics. Ni Hao said that the choice of pediatrics was based on a very simple idea - to solve the problem of shortage of pediatricians. In addition, since children do not have the ability to accurately express their symptoms, they are called "dumb departments", which brings greater challenges to the design of the model. Ni Hao believes that if the system trained with pediatric data wants to be migrated to adult departments, the technology is similar and it is not difficult.
According to Dr. Liang Huiying from the Guangdong Women's and Children's Data Center, from January 2016 to June 2017, the team collected 1.3 million electronic medical records of outpatient clinics from nearly 600,000 patients, with an average age of 2.5 years old, of which 40% were girls and 60% were boys. They covered 55 diseases including gastroenterology and respiratory diseases, more than 80% of common pediatric diseases, and several critical diseases such as meningitis.
Ni Hao introduced that 1.3 million training data can be said to be a very large volume. For papers published in top journals, tens of thousands of training data are very rare, and most of them are hundreds or thousands of data. Ni Hao said that the team obtained a large amount of data this time thanks to the data construction and outpatient reception capacity of Guangzhou Women and Children's Hospital.
According to Leifeng.com, the outpatient volume of Guangzhou Women's and Children's Hospital ranks among the top ten hospitals in China, with 4.7 million outpatient visits in 2017 alone, and the data generated is concentrated in the field of women and children. In addition, the informatization construction of Guangzhou Women's and Children's Hospital can be traced back to 2015, and the interconnection of outpatient medical records has been completed.
During the model verification phase, the team randomly sampled 12,000 electronic medical records and selected 20 doctors, who were divided into five groups according to their seniority for human-machine comparison. The results showed that the model accuracy was 0.885, higher than the two junior doctor groups (0.841 and 0.839, respectively), and close to the data of the third group of doctors, but not surpassing it.
At present, the entire system is installed in the hospital outpatient system. Doctors can get auxiliary diagnosis results with one click after entering medical records. Since the system was launched in May 2018, it has served 33 doctors, including 6 senior doctors, 13 associate senior doctors, and 14 attending doctors. The system has received 64,000 visits, including 30,000 visits in the first 20 days of January 2019 alone. Liang Huiying calculated that if the monthly call volume reaches 10,000 times, it is equivalent to the outpatient reception volume of 5 resident doctors.
A step toward multimodal medical data processing
Ni Hao said that the addition of NLP technology will bring great value to the medical field, because medical data itself presents multimodal characteristics. When a patient enters the hospital for treatment, imaging examination data, electronic medical record data, laboratory structured data, etc. will be generated. If artificial intelligence is to provide comprehensive diagnostic and treatment assistance to future doctors, its ability to understand various modal data is very important.
"What is the greatest significance of this experiment? In essence, we have provided a complete theoretical system and practical methods for auxiliary diagnosis using original electronic medical records that are applicable to clinical environments." Ni Hao said, "Of course, this method cannot be said to be complete, but it is relatively complete and proven to be effective among the theoretical systems currently available in the world."
Talking about the future, Ni Hao said that he is not in a hurry to implement the plan widely, and hopes to use the scenarios of GWCH to improve system performance and cover more diseases. At present, GWCH's Internet hospital has been launched, supporting a full set of online medical services such as online consultation, registration, and robot consultation. In the Internet hospital project, Yitu Medical, as a technology provider, also has close cooperation with GWCH. With the help of this project, Yitu Medical's technology has a larger space for display and a more diversified source of data samples.
Ni Hao said that in the future, the system will be combined with voice recognition technology, and the process of doctor's questions and patient's complaints will be converted into text in real time. After the conversation, the electronic medical record will be generated immediately. Combined with the auxiliary diagnosis system, the doctor can generate the possible diseases of the patient and the next examination suggestion with one click, which will further reduce the burden on doctors and improve the efficiency of diagnosis and treatment.
◆ ◆ ◆
Recommended Reading
Tencent QQ denies the "212" incident; Qualcomm fined $180 million in South Korea; Google pays Apple nearly $9.5 billion in "tolls"
Trump urgently issues the "AI Initiative" executive order: everything is for the United States to dominate artificial intelligence
True 5G is still some time away, but ridiculous fake 5G marketing is already happening
Snapdragon 712 mobile platform is officially released, and Qualcomm has also started to squeeze toothpaste
Apple officially announced the departure of its No. 2 executive, who was once predicted to be the company's next CEO
The "2018 AI Adaptive Education Industry Research Report" was released on January 14, 2019. The current early bird discount price is ¥599, and the original price will be restored to ¥699 on January 20. It is a must-read for educational technology researchers, entrepreneurs, education practitioners, and investors. Scan the QR code for details.
Featured Posts
- [Xingkong Board Python Programming Learning Main Control Board] Review 1. Unboxing and Experience
- Iwouldliketothankthesponsorsofthisreview,DFRobotandEEWorld,forgivingmetheopportunitytobeoneofthetesters.IhavebeenlearningPythonrecently,andwiththisdevelopmentboard,learninghasbecomemoreinteresting. Although
- 天意无罪 Embedded System
- 【Xianji HPM6750】Review and unboxing
- Thankstotheeventorganizersandtheboardmanufacturers.I'mnotgoodatsayingpolitewords,solet'sgettowork. Iwasquitebusyinthepastfewdays,andwillstartfocusingontheevaluationfromnowon. First,let’stakealookatthe
- full_stack Domestic Chip Exchange
- EEWORLD University Hall----Live Replay: Infineon's high power, high efficiency and high reliability solutions for power tools
- Livereplay:Infineon'shighpower,highefficiencyandhighreliabilitysolutionsforpowertools:https://training.eeworld.com.cn/course/68563
- hi5 Integrated technical exchanges
- How to debug an application using J-Link's Infinite Flash breakpoint feature?
- Intheprocessofdevelopingembeddedapplications,althoughgoodprogrammingpracticesandappropriatecodingstandarddetectionsoftwarecanreducethenumberoferrorsinthecodewritingstage,itisstillinevitablethatsomeerrorswill
- MamoYU Embedded System
- [DigiKey "Smart Manufacturing, Non-stop Happiness" Creative Competition] Unboxing of the STM32MP157D-DK1 Smart Screen
- Helloeveryone,I’mMr.Zheng.Firstofall,westillneedtoexpressourgratitudetotheorganizersEEWORLDandDigi-KeyElectronicsfortheirstrongsupport,whichallowsustohaveaccesstomorecutting-edgetechnologiesandbringsustogether
- eew_Ya3s2d DigiKey Technology Zone
- Creative Pen Holder with Charger
- AcreativepenholderDIYedbyaforeignnetizen.Arethereanynetizenswhowanttomakeonetogether? https://www.hackster.io/Arnov_Sharma_makes/overengineered-pen-holder-6a251e Schematic Cool Youcandiscussit
- dcexpert DIY/Open Source Hardware