Key technologies are at the top, while industry cognition is at the bottom. After breaking the barriers between the laboratory and reality, technology is no longer limited to itself, but is connected with a wide range of external scenarios, which ultimately enables iFlytek's AI technology to quickly move from research and development to large-scale implementation.
Author | Yue Xing Qing Mu
Editor | Cen Feng
Natural language processing (NLP) has always been a difficult problem that artificial intelligence is eager to solve.
It was not until 2006 that the large amount of electronic text data accumulated from the Internet era at the end of the last century, coupled with the support of deep learning, finally put machine translation and even natural language processing on the fast track.
Deep learning adheres to the probabilistic tradition of statistical methods. The difference is that it basically does not require feature engineering, which requires a lot of expert knowledge.
But Sheng Zhichao found that even now, more than a decade later, when implementing NLP applications based on deep learning technology, they must abandon their obsession with technology and return to industry expert knowledge.
This is the most valuable experience he has gained in his eight years of studying NLP technology at iFLYTEK.
After graduating from Fudan University in 2011, Sheng Zhichao worked in a startup company doing NLP research. After more than two years of practical experience, he hoped to find a larger platform to create real social value with technology. At that time, iFlytek was also well-known in the field of artificial intelligence speech with its newly released iFlytek input method and voice cloud. With the opportunity of front-end text prosody prediction and text association in speech synthesis technology, it has already begun to explore NLP and has already put it into practice in voice interaction and machine translation.
It was a natural thing for people who were eager to use technology to create real social value to choose a company that hoped to "build a better world with artificial intelligence."
On the eve of dawn, transformation has become the only way forward
The history of NLP is almost as long as the history of computers and artificial intelligence (AI). Since it has the natural property of being a bridge between humans and computers to achieve effective communication using natural language, it also brings about a very interesting phenomenon, that is, when we start to explore perceptual intelligence, cognitive intelligence is always involved.
This phenomenon is also demonstrated in iFLYTEK.
In speech synthesis technology, the front-end text prosody prediction is closely related to text. Therefore, iFLYTEK started to explore speech in the early days of its establishment, and entered the field of NLP. However, in the early days, it was limited to text prediction, language models for speech recognition, and text retrieval.
In 2005, iFLYTEK established the AI Research Institute and officially identified NLP and speech synthesis, evaluation and recognition as its core research directions.
Since then, iFLYTEK's attempts to implement NLP have been progressing with many setbacks.
By 2005, speech assessment technology had basically matured, and the Mandarin testing system passed the appraisal of the National Language Commission; speech synthesis technology also exceeded the speaking level of ordinary people for the first time in 2008, and won the International English Speech Synthesis Competition for many consecutive years.
However, in many areas including knowledge graphs, semantic retrieval, SMS classification, and text customer service, due to the lack of technological maturity and high migration costs, the implementation of text-based technologies has mostly ended in failure.
"At that time, everyone was actually thinking based on technology to match its possible future usage scenarios, and gradually they found that this path was particularly difficult."
The lessons learned from this difficult exploration experience were also confirmed in later practice. Perhaps it is time to reverse this way of thinking.
Since "looking for holes with carrots" doesn't work, it's better to do the opposite.
The idea of forcing technology to be refined based on actual business scenarios and needs has begun to reverse the current difficult situation.
In 2014, the neural machine translation model based on the encoder-decoder structure was born, and machine translation officially entered the era of deep learning.
In the same year, Wei Si, chief scientist of iFLYTEK AI Research Institute, keenly realized that if the company wanted to establish its own technological advantages in the industry in the future, it must form a dual-wheel drive model of data + model, and deep learning is the key to the success of this model.
Sheng Zhichao, who had just joined the company, welcomed an important turning point in the development of iFLYTEK's NLP technology. This time, he personally experienced it. At the beginning of 2015, the NLP cognitive group where Sheng Zhichao worked formed a "7-person attack team" to open the curtain for iFLYTEK's application of deep learning in the field of NLP: they first searched all the relevant papers on the market, and divided into several different "Paper reading" groups to study different directions separately, and then explained the code to each other, and at the same time tried to reproduce the models and algorithms in the papers.
In this way, the "7-person task force" successfully applied deep learning to NLP technology and quickly promoted it within the company.
"At that time, our exploration was ahead of many universities and peers," Sheng Zhichao recalled this experience, saying that the trust, cohesion and common determination of the team were indispensable factors for their success. Today, the original 7-member team members have long become the core backbone of iFLYTEK's different business directions.
By applying deep learning and changing its mindset based on scenario-driven technology refinement, iFLYTEK's NLP is finally about to see the dawn from the eve of dawn.
From the scene to the industry
When shaping a character, many outstanding actors often go to the character's real work or life scenes to "experience life" in the early stages, and strive to reach a state of selflessness when performing.
This shaping method is simple and valuable, but it is similar to the path Sheng Zhichao took when implementing NLP.
In September 2014, Sheng Zhichao, who had just joined the company for 10 days, was assigned to iFLYTEK Beijing Research Institute to participate in the technical research and development and implementation of Chinese composition review.
Essay review is divided into two aspects: scoring and correction. Scoring is to give the document a score, while correction requires a comprehensive assessment based on dimensions such as whether the grammar in the article is used correctly, whether the sentence expression is advanced, and whether the content meets the main requirements.
The former technology is relatively simple, while the latter is more complicated because it involves cognitive issues.
As everyone knows, the definition standards of advanced expressions and vocabulary vary greatly in different learning stages, from primary school to middle school to high school to university, so when making specific corrections, it is also necessary to make specific "definitions" based on the specific circumstances of each learning stage.
As Sheng Zhichao said, "Review technology is not only about scoring but also about giving reasonable feedback. It must be modularized based on scenario knowledge and broken down layer by layer to give relatively scientific scores and the feedback results that users want."
“When iFlytek Intelligent Learning Network was first launched, its essay reviewing technology had some issues,” said Sheng Zhichao, adding that this was an experience he will never forget.
At that time, the school required that one exam cover 1,000 people, and no one person's marking could be wrong, but deep learning and traditional machine learning are both statistical models that consider overall probabilities and do not take into account the situation of each student.
So, the situation arose.
The English composition test paper begins with an introduction, asking students to continue writing, but the machine treated the introduction as an answer that needed to be graded, and one of the essays was graded even though it was blank. Grading exam papers is a very serious matter, and the impact of such a mistake on the objectivity and fairness of the exam is irreparable, both for the teacher and Sheng Zhichao himself.
Looking back, the root cause of the failure of this implementation was the inconsistency between the indicators we focused on and the indicators that users actually focused on in the scenarios.
After that, Sheng Zhichao and his team began to "experience learning life" frequently and for a long time, communicating with teachers, students, parents and other key roles related to learning, trying to fully and truly understand and define every detail of the needs in the field of education.
"If you want to truly implement your knowledge in the field of education, you must first forget your original identity and become a student, parent or teacher."
Sheng Zhichao's idea is exactly the same as the key point when Zhang Sanfeng taught Zhang Wuji Tai Chi: "Tai Chi only emphasizes its meaning, not its moves. If you forget all the moves, you have mastered Tai Chi."
In 2016, Sheng Zhichao and his team finally succeeded in applying essay review technology to the college entrance examination and the senior high school entrance examination. This was also the first time in China that educational assessment technology was used in large-scale formal examinations.
If this only solves a problem in a specific "scenario" of education, then the subsequent "teaching students in accordance with their aptitude and personalized learning" proves iFLYTEK's determination to deepen its roots in the field of education.
At the beginning of 2020, Sheng Zhichao returned to education and began to tackle the more difficult personalized learning direction of teaching students in accordance with their aptitude.
Sheng Zhichao admitted that he was once a student and spent more than 20 years in school. Even as a top student, he still could not summarize his so-called experience to give reference to other learners. The reason behind this may not be summarized by a simple sentence "After all, the learning method suitable for each person is different."
Perhaps it points to a beautiful ideal that was born thousands of years ago: "Teach students in accordance with their aptitude and teach without distinction." We have been pursuing this goal for thousands of years, and now Sheng Zhichao and his team are getting closer to it step by step.
A personalized program that combines learners' knowledge levels and provides them with customized dynamic teaching strategies has begun the mission of "reducing burden and increasing efficiency".
Take the topic recommendation as an example. The "sea of questions" strategy highly praised by teachers and students can help you find the solution to "effectively practice questions". This involves a series of technical collections such as cognitive diagnosis, deep learning, and knowledge graphs.
Referring to the "Zone of Proximal Development Theory" proposed by the famous psychologist Vygotsky, the logic of stimulating students' "potential" through personalized questions is very simple to understand: the learning questions recommended for students at their current level are neither too difficult to cause fear of difficulty, nor too simple to waste too much time. In the words of Sheng Zhichao, these are learning resources that are "within reach with a little effort."
However, it is not easy to accurately locate the learning resources that each student can reach with just a jump. This requires modeling the students' cognitive methods through knowledge graphs.
iFlytek has long been a knowledge graph technology company. It started research and development in 2013 and won first place in the NIST TAC (KBP2016), an international knowledge graph construction competition, in 2016. iFlytek has accumulated knowledge graph technology for seven years.
This picture shows a student's cognitive modeling case, where red represents knowledge points that are poorly mastered, yellow represents knowledge points that are averagely mastered, and green represents knowledge points that are well mastered.
Students start with the green knowledge points, then learn the yellow knowledge points, and then the red knowledge points, which constitutes a unique learning path for each student. This step-by-step approach not only improves learning efficiency, but also truly achieves individualized teaching.
The methodology of in-depth scenarios and industries has been best verified in the field of education. It can be foreseen that artificial intelligence will continue to bring about changes in production and life. Even major historical propositions that have never been solved may find new solutions in the field of artificial intelligence.
New solutions to major historical issues
However, since major historical issues related to people's basic needs such as education, medical care, and justice are themselves a collection of multiple complex problems, even if artificial intelligence can provide a solution, it will no longer rely on a single technology, but must be the combined force of a complex system.
"Take the AI learning machine for education as an example. It involves a series of related technologies such as voice interaction and evaluation, image and text recognition, cognitive understanding, knowledge graph, and multi-dimensional learning situation portrait." Sheng Zhichao is right. In addition to the cognitive diagnosis and knowledge graph in the personalized learning link we have mentioned above, the completion of an ordinary learning link is far more complicated than imagined:
A student uses an AI learning machine to take a photo of his or her completed homework and upload it to the machine. After image and text recognition technology performs surface correction and image noise reduction on the photo, it can recognize the homework that is full of printed text, handwritten text, and even formulas. After that, NLP and other technologies begin to infer the answers and make corrections based on the questions and information mentioned in the text. For wrong questions, the knowledge graph-based technology can recommend exercise questions related to the zone of proximal development based on the knowledge points covered.
The key technologies in the innovation chain are deeply integrated and connected to form a systematic learning chain.
If we look back, we will find that the deep integration of key technologies also requires crucial underlying infrastructure - single-point core technology breakthroughs and crossing the application gap.
We may be able to verify this conclusion from the practice of multilingual interaction.
At present, voice has become the key entry point for human-computer interaction in the era of the Internet of Everything. Voice input, voice search, voice interaction and other technologies have become standard features of smart products such as mobile phones, cars, toys, etc. On the other hand, the construction of the "Belt and Road" national strategy relies on language communication, and the value of multilingual translation technology is highlighted. However, it is not so easy to make multilingual intelligent voice language technology practical.
The unique linguistic phenomena of different languages are very complex, the accumulation and investment in language analysis and research of small languages are insufficient, and training data is scarce... These objectively existing problems are before us.
Everyone chose to face the challenges head-on and defeat them one by one.
In terms of data, iFLYTEK has developed a multilingual data annotation platform based on human-machine collaboration; in terms of algorithms, it has focused on research on multilingual end-to-end unified modeling framework, unsupervised/weakly supervised training, and multi-task collaborative optimization of speech/image translation; in terms of R&D training efficiency optimization, it has built a multilingual model automatic training and customized optimization platform to promote the batch development of multilingual systems and solve the problem of time-consuming and labor-intensive manual work.
These efforts finally paid off. On October 26, 2021, the HIT iFLYTEK Joint Laboratory (HFL) team ranked first in the authoritative multilingual understanding evaluation XTREME (organized by Google to comprehensively examine the model's multilingual understanding and cross-language transfer capabilities) with a total average score of 84.1, and achieved the best results in three of the four tracks. Then on November 10, the international low-resource multilingual speech recognition competition OpenASR came to an end. The iFLYTEK-USTC National Engineering Laboratory for Speech and Language Information Processing (USTC-NELSLIP) joint team participated in all 15 language-restricted tracks and 7 language-unrestricted tracks, and won first place in all of them.
From achieving breakthroughs in single-point core technology effects, crossing the application threshold, to deeply integrating various key technologies in the innovation chain, "systematic innovation" has not yet formed a closed loop in the strict sense.
After all, although the path to solving the problem has gradually become clearer, "what problem to solve" is the root of the difficulty that has troubled these scientists.
Education, healthcare, justice, urban ecology, each of these words is extremely heavy. For a moment, it is impossible to use a few words to summarize and summarize the core issues behind them: whether it is "reducing burden and increasing efficiency", "teaching students in accordance with their aptitude", "balanced resources" in education, or "medical level" and "medical experience" in healthcare...
The transformation of these major systematic propositions into scientific problems may be a return to the essence of NLP or cognitive intelligence -
definition problems.
"360 has experts in every industry. How to define the problems and knowledge characteristics of each industry, and how to form a framework to continuously replicate and apply the model to various industries?" This is the challenge faced by Sheng Zhichao and his team, and it is also the key to iFLYTEK's continued breakthroughs in the future.
When the ability to transform major systemic propositions into scientific problems becomes stronger, and single-point core technologies continue to break through and are deeply integrated and organically connected, systemic innovation can truly become a new solution to grand historical propositions.
Infinitely widening neural network
When we were talking to the CV group of iFLYTEK AI Research Institute, we compared iFLYTEK to a very wide and deep generative neural network.
A typical generative neural network includes an input layer, a coding layer, and an output layer. For an AI company, the input is the three elements of AI: computing power, data, and algorithms; the output is technology and products; and the coding layer is the company's organizational structure and technical methodology, as well as the company's talent.
The three research directions established by the institute - CV, cognition, and speech - are independent of each other but deeply integrated, providing an equal and open communication platform for outstanding talents, allowing them to exercise themselves, develop their potential, and learn from others.
But this only unlocks half of the secret of the encoding layer of iFLYTEK's neural network. The other half of the secret may be revealed from the NLP implementation path of Sheng Zhichao and his team: whether it was the inevitable transformation path at the beginning, or the subsequent polishing in scenarios such as education and medical care, the core of everything was to do one thing, that is, to define and establish a true understanding of different industries.
By understanding the industry and defining the problems, iFLYTEK is not limited by its own ability to act when choosing a direction, thereby infinitely broadening the width of iFLYTEK's neural network.
Key technologies are at the top, and industry knowledge is at the bottom. After breaking the barriers between the laboratory and reality, technology is no longer limited to itself, but is connected to a wide range of external scenarios, which ultimately enables iFlytek's AI technology to quickly move from research and development to large-scale implementation. We also have reason to believe that the mission of "building a better world with artificial intelligence" is by no means just talk.
Recommended Reading
Investors of Tao: Failed to set up the game and unable to break it