Article count:16439 Read by:87952319

What other hurdles do large models have to overcome in order to become "productive"?

Latest update time：2023-04-26

Reads：

" The time has come for the submission of the first wave of large AI models - a ' vase ' that cannot actually provide productivity will probably be difficult to pass. "

Author | Dong Zibo

Editor | Cen Feng

The big AI models in 2023 have become popular from Q1 to Q2.

Domestically, everyone from large manufacturers to start-ups are testing the waters. In the past few months, many large model products have been "submitted" for review, which seems to be a trend of "squandering flowers and becoming more and more charming".

Onlookers only saw the large AI models blooming, but in addition to the excitement, some people began to ask this question:

"The big AI model is so powerful, but what is its use?"

It is true that many of the large models released on the market have not yet been fully developed - they are still far from being "usable" and "easy to use".

While some large AI model products are still struggling with problems such as "nonsense", difficulty in scene implementation, and weak semantic understanding, OpenAI first saw the pain points of implementation and cooperated with Microsoft to launch Copilot, which focuses on improving work efficiency in the office field. It fired the starting gun for large models to march towards productivity.

In people's imagination, AI should be able to help deal with mechanical and repetitive daily work, provide unique analytical perspectives and creative inspiration, and give unique suggestions and help in specific fields - such as education, medical care, and law, allowing people to Work and life are easier and more convenient, allowing everyone to enjoy more considerate and meticulous services.

And what kind of large AI model can really help users "work"? To further improve the efficiency of daily office and production? What kind of large model can really be regarded as a productivity tool? What secret sauce does a large model need to meet the requirements of a productivity tool?

For the industry, if these questions cannot be answered, sooner or later, it will encounter market bottlenecks; and the sooner it can provide the market with large-model products with improved efficiency, the sooner it can seize the lead.

Understanding & Remembering: A watershed in large model productivity

Memory and understanding ability can be said to be the most hard-core strength competition for large-scale model products that are mushrooming now.

The model's understanding ability is rooted in the ability to process natural language and can clearly identify semantics, especially some common sayings and humor rooted in the local language context. It is essential for understanding what users want and completing text generation and creation. It's important.

The stronger the model’s memory ability—that is, the ability to hold multiple rounds of dialogue—the more users can describe their needs in more detail, and then use AI to complete more complex tasks;

The two most "hard" subjects in the large model competition are also the key guarantee for the productivity of large models. Not only that, but rather, if large models are to be able to truly help people "work", their memory and understanding abilities must meet higher requirements.

But whether it is "understanding" or "memory", they are both difficult to solve in order to improve the current capabilities of large models. On the one hand, there are huge pain points in the market, and on the other hand, there is a "high wall" that is technically difficult to overcome. If these contradictions are not resolved, the productivity of AI will always face a difficult bottleneck.

First of all, to solve the problem of poor semantic understanding of AI, AI scientists from Kunlun Wanwei and Singularity Intelligence came up with an alternative method - Monte Carlo tree search algorithm.

Monte Carlo tree search algorithm is simply a reinforcement learning algorithm based on random simulation. People who don’t know much about AI may not know its name, but it is AlphaGo’s secret weapon that can defeat Go masters such as Li Shishi and Ke Jie. The core of Monte Carlo tree search is to conduct random searches at each node through a tree structure and find the optimal decision-making method.

In the AI large language model "Tiangong" jointly released by Kunlun Wanwei and Singularity Intelligence, Monte Carlo tree search can make AI "think twice before acting" - AI will be based on past user conversation records and current user's The input generates a large number of candidate responses, and combined with NLP technology, the best response plan is selected and fed back to the user.

By combining the Monte Carlo tree search algorithm with natural language processing, the security and accuracy of Decoder have been greatly enhanced, allowing Tiangong to respond quickly and accurately in relatively complex tasks and scenarios. Instructions to output high-quality answers.

In order to test Tiangong's semantic understanding ability, Leifeng.com asked Tiangong: "What is the Monte Carlo tree search algorithm?" Tiangong's answer was relatively clear and satisfactory:

Another advantage of applying Monte Carlo tree search to AI conversational robots is that AI can understand how to change topics in the conversation, ask questions, and guide users to improve their prompts to get better reply results.

For example, Leifeng.com deliberately asked a very broad question that was difficult to answer. Tiangongze did not fall into this "trap" and narrowed the scope of the problem by actively asking questions:

In order to test Tiangong’s understanding of Chinese semantics, Leifeng.com asked Tiangong about the emotional color of an ancient poem. It must be said that Tiangong grasped it quite well:

In addition to his good semantic understanding ability, Tiangong's "literary talent" is also unexpected. The results it gives can be turned into a good short article with a little modification:

In the translation of the text, we can also see Tiangong's skillful use of Chinese and English bilingualism. When describing poems in English, we can even taste the "original flavor" of Chinese:

When it comes to memory capabilities, Tiangong is even more surprising, capable of handling more than 20 rounds of dialogue and supporting long texts of more than 10,000 words. This alone is enough to put many similar products out of reach.

For example, in the following dialogue, Tiangong tried his best in continuous dialogue and was able to understand the football memes of "Saudi Football King".

Behind the super continuous dialogue ability is Tiangong's "rich wealth". Backed by one of the largest GPU clusters in China, Tiangong has super abundant resources to ensure operation and corresponding speed, while also making user data security and user experience more stable and reliable.

The ability to understand and remember can be regarded as a watershed in the productivity of large model products - on the basis of a deep understanding of user needs, it can complete continuous conversations in real time. Only when this threshold can be crossed can AI begin to provide users with productivity. guarantee.

Scene optimization & model robustness: easy to use = usable + reliable

Most reporters who have tried to use AI to write articles, especially those in certain vertical fields, have also had vague worries - if AI "seriously talks nonsense" at certain key information points and they do not notice it, in the end Serious accidents may result.

It is true that the "illusion" problem of large models can be suppressed to a certain extent through knowledge graphs and the Monte Carlo tree search algorithm mentioned above; but in the professional field, scene optimization is not done well and the quality of training data is low. , No matter how clever the AI is, it still faces a problem without rice.

If you don’t feel at ease when using AI, you might as well go into battle yourself. This is why many people stay away from AI. Without people using it, it is impossible to obtain enough data to continue training and modifying the model, thus forming a vicious cycle.

Although the problem of hallucinations will not be solved overnight, in order for today's large AI models to be "easy to use", they must first be "available" and "reliable". When implemented in vertical scenarios such as work and education, large models must have some "unique skills".

First of all, the data needs to be of “quality and quantity”. On the one hand, the data must be sufficient to support the requirements of model training; on the other hand, the quality of the data must also be high enough, otherwise the trained model will easily be “led astray” by bad data. ”, even making training counterproductive.

Secondly, it is the robustness of the model - that is, the model's own "resistance" when abnormal situations occur or when faced with bad data. The more robust a model is, the less likely its stability and effectiveness will be affected by internal and external adverse effects, making it naturally more "reliable" and thus able to improve users' productivity in a wider range of scenarios.

In order to truly help users "work", Tiangong has also made great efforts on these two points.

First of all, Kunlun Wanwei and Singularity Intelligence Source, through layer-by-layer cleaning and screening from tens of trillions of data, obtained three trillion high-quality word data, which was provided to Tiangong to complete training.

Secondly, Kunlun Wanwei has begun its layout in the AI field since 2020, using the four open source AIGC models of "Tiangong Qiaohui", "Tiangong Yuefu", "Tiangong Miaobi" and "Tiangong Zhima" as its banners. In its own open source community, it has gathered hundreds of open source community AI scientists and accumulated profound open source community strength.

At the same time, Tiangong has made scene-based fine-tuning based on large-scale and training based on different situations, allowing Tiangong to cope with more scenarios and provide efficient and personalized help.

Whether engaged in legal work, medical care, finance, etc., Tiangong can provide help from a professional perspective:

Not only that, Tiangong can also easily cope with educational scenarios. Whether it is mathematics, physics, history, or politics, Tiangong's tutoring is also professional, helping parents save a lot of time:

In addition, AI large model products often lag behind the times, and the database cannot be integrated with the latest information. This is also one of the reasons why they are often criticized by the outside world: If it cannot provide the latest knowledge, how can AI solve the ever-changing problems for users? What's the problem?

Therefore, the real-time nature of AI dialogue has naturally become an important criterion for judging whether large models can provide productivity.

In this dimension, Tiangong relies on the powerful intelligence emergence capabilities of large models, and after being connected with the real-time knowledge base, it has achieved the effect of iterating knowledge in real time, allowing users to obtain the latest information through AI in real time, and no longer "lag behind the times":

Hundreds of billions of models: Only “one” may not be enough

When talking about the capabilities of large models, one concept that cannot be avoided is "emergence".

Simply put, "emergence" means that when the training parameters of pre-trained AI reach a certain level, its performance will suddenly increase exponentially, and it may even acquire abilities that have not been specially trained.

According to the general understanding in the industry, 50-60 billion training parameters are the threshold for emergence phenomena in pre-trained large models. The larger the parameter scale, the more powerful the model is.

Therefore, hundreds of billions of parameters have now become the "standard configuration" of large models. Nowadays, many large model products call themselves "100 billion models", and the strength of the model is demonstrated by the number of parameters.

But at the moment, someone also asked a question:

If you want a large model to provide productivity, if there are hundreds of billions of models, one is enough?

For Kunlun Wanwei and Singularity Intelligence, the underlying architecture of their ideal AI large model is composed of the "100 billion pre-training base model" and the "100 billion RLHF model" - two 100 billion models. .

The former, Qianyi pre-trained base model, is mainly responsible for various natural language processing tasks and can realize functions such as language generation, text classification, and machine translation.

The latter, the Qianyi RLHF (Human Feedback Deep Learning) model, will improve the performance of reinforcement learning through human feedback on the AI output results.

If the pre-training base model is compared to a "student" who has read thousands of books and is talented, then the RLHF model is like a "student" who continues to trial and error and improve while solving questions.

Today, we have seen the rapid progress of ChatGPT in the field of AI, and RLHF is gradually becoming the standard configuration of many large models. Tiangong uses a model system of pre-trained base model + RLHF, allowing the two models to reflect and cooperate with each other, which is also meaningful.

On the one hand, there is the structure of the double-billion model, which can greatly improve the performance of the final model, and also greatly enhance the interpretability, learning ability, and task support of the model.

On the one hand, it is the reduction of training time and resource consumption - the common features learned by the pre-training model can be used as the initial parameters of the RLHF model, so that training, the most "cost-consuming" project, can be completed quickly and cost-effectively.

As mentioned above, the model's robustness to abnormal situations and bad data is largely achieved through the "combination of two swords" of the two billion-dollar models.

No matter how tall the building is, the most important work is on the foundation. The double 100 billion model is one of the most important top-level designs for Tiangong to become a productivity tool. Kunlun Wanwei and Singularity Intelligence have already seen the design limitations of current large-model products and the feasible technical path for double-hundred-billion models at the beginning of planning their technical paths. Based on this, they have built the entire Tiangong on it. superior.

Just like a tree, only with a healthy and strong root system can it grow into a thick trunk and dense branches, and only then can abundant fruits grow and eventually become people's harvest.

Conclusion

In the past few years, the technology circle has seen too many trends, which have come and gone, but finally disappeared without a trace.

In the final analysis, when the trend was at its peak, these people's imaginations of the future were not able to be turned into actual productivity and promote the progress of the industry and even the entire society. In the end, when the craze passed, the fate of silence was probably unavoidable.

Therefore, in this wave of generative AI, some people are also asking: Will this time be the same as before, with the tide ebbing and flowing, leaving only batches of "naked swimmers" on the beach?

If AI entrepreneurs in 2023 are not willing to stop at empty talk, then they should know: a large model should not be just a beautiful but empty vase. AI should become the internal combustion engine and alternating current of the next decade, driving the next industrial revolution.

In this process, what Tiangong wants to do has always been a productivity tool, an "AI that can really help you work."

It is precisely based on this that Tiangong started from the super computing power supported by China's largest GPU cluster and created a double 100 billion model system. With the help of the AI open source community, it pioneered the Monte Carlo tree search algorithm. Combined with NLP technology, it ensures that AI can provide users with real productivity empowerment.

What kind of large model can become productive? Tiangong's model can be said to have set a pattern for other competitors in the large model circuit - whether they are starters or latecomers.

Reader benefits: Leifeng.com has received 5 invitation codes. Please leave a message about the questions you want to interact with "Tiangong". The 5 readers with the most likes before 24:00 on April 27 will receive them.

Recent popular articles

Wang Xiaochuan officially announced Baichuan Intelligence: What is the endgame of AGI?

The past of NEC Lab in Silicon Valley: The man who dragged Chinese companies into the AI era

Exclusive丨Wang Huiwen is still recruiting and intends to acquire two Tsinghua NLP alumni companies

Latest articles about

■Xiaomi air conditioners are selling like hot cakes. Lu Weibing: A competitor's product that costs 3,000 yuan is sold for 20,000 yuan. Dong Mingzhu is caught in the crossfire. Royole Technology declares bankruptcy. Employees' claims may not be repaid. Zhong Shanshan says he looks down on entrepreneurs who sell goods through live streaming.

■Baidu: Making big model applications more practical

■Dahua Technology joins hands with Hongmeng, is it the direction of the tide or the collision of wisdom?

■Leading the westward expansion of e-commerce, the 150 billionth package will be delivered on Pinduoduo in 2024

■Exclusive: Vipshop Senior Operations Director Fan Li resigns

■Performance exploded! Xiaomi Motors' quarterly revenue sprinted to 10 billion yuan, Lu Weibing said there is no upper limit on the investment in intelligent driving; the widow of the founder of Shanshan Holdings took over from her eldest son as chairman; Zeekr executives called for vigilance against pig-killing scams

■Alibaba Cloud returns to growth track

■Scolding employees and being criticized for being overbearing, Dong Mingzhu: You are so funny, I am the boss; Hycan Auto was exposed to have defaulted on compensation for laid-off employees; Chairman of a state-owned enterprise responded to the high school education of the operations director丨Leifeng Morning News

■1688 is an OEM brand, not following the old path of strict selection

■The Double 11 changes in online retail: Who is driving the direction of the tide?