In 2024, how can Zhipu become faster?
Latest update time:2024-01-16
Reads:
“
Domestic “GPT-4” is really here.
”
Author | Zhang Jin
Editor | Chen Caixian
If you want to ask about one thing that the AI large-scale language model community is looking forward to, and one thing that all general large model manufacturers are secretly working on, it must be catching up with GPT-4.
Looking back on the past 2023, the entire domestic AI industry has spent a busy and passionate year. In the first half of the year, we experienced a financing battle and a scramble to recruit people to form a team. In the second half of the year, we ushered in a blowout of large models, entering the turbulent period of models and the early stages of commercialization exploration.
According to public information, as of October last year, 238 large models had been released in China, which means that China used to release a new large model every day. We will find that when introducing their own large models, they all mentioned that the model capabilities are "close to GPT-4", some bold people even claimed to "catch up with GPT-4".
For a time, it seemed that China’s large models were already ahead of the international advanced level, bringing many unrealistic illusions and confidence to investors and users who did not understand large model technology and were concerned about China’s AI development.
Because nothing could be further from the truth. In November last year, Yao Xing, founder of Yuanxiang XVERSE Technology and former vice president of Tencent, told Leifeng.com that everyone said it was close to GPT-4, which was obviously not in line with the actual situation. Many of them were generated by rankings and were of little significance.
"Brushing the rankings is a bad habit of ours." The result is that everyone does not have a clear understanding of the capabilities of China's large models. In fact, everyone is still far from GPT-4.
Although, with the release of OpenAI large model papers and the strong entry of Meta open source, the mystery of large models has been lifted one by one, and the gap between us and foreign large models is gradually shortening, we are still far away from GPT-4, the ceiling of other models. Not reached.
This is still a matter with a high threshold. Training models requires a lot of money, people who have written model training code, a firm technical line and unremitting investment at the company's strategic level. It doesn't take anyone to shout, China's big model You can compete with GPT-4 on the same stage.
Therefore, in an era when rankings have become a habit, we should focus our attention and resources on those teams and people who are truly working hard for China's large model cause, instead of "blindly boasting" under the guise of others.
Catching up with GPT-4 is already the most urgent task for domestic large models. As for general large model manufacturers, whoever can take the lead in training a large model that is truly comparable to GPT-4 will be able to "enter Xianyang first" in terms of commercialization and ecology. Welcome to advancement.
Speculations, discussions and bets on who can be the first to break through the GPT-4 threshold have been going on intensely in the past year. Finally, today, Zhipu AI has released a new generation of large base model GLM-4, with comparable model performance. Compared with the previous generation, the overall improvement is 60%, and various indicators are close to GPT-4, allowing us to see that "domestic GPT-4" is really coming.
An expected result, but I didn't expect them to be so fast.
01
The most powerful model GPT-4, no one has ever caught up with it
After the Spring Festival in 2023, as we bid farewell to the old and welcome the new, a wave of investors who are concerned about AI accidentally used ChatGPT (GPT-3.5). They were shocked and spread the word, spreading it to hundreds of people. This triggered a wave of ChatGPT craze in the investment circle. As time continued to ferment, it led to an upsurge in the entire Chinese Internet to "worship" ChatGPT.
While people had not yet calmed down from the shock brought by ChatGPT, a month later, OpenAI launched a new product, GPT-4, a more powerful large model, which once again ignited people's imagination of large models.
How powerful is it? For a hand-drawn sketch of a website, GPT-4 can directly generate the final designed web page code; the GRE exam is close to a perfect score; in the simulated bar exam, GPT-4 defeated 90% of humans and achieved a good score in the top 10%. GPT-3.5 is the bottom 10%.
GPT-4 performs at human level on a variety of professional tests and academic benchmarks. Among them, the biggest breakthrough of GPT-4 is its ability to process images, accurately understand the meaning of images, and provide answers.
Various amazing performances have led to GPT-4 becoming the most powerful large model as soon as it came out, and becoming a common goal pursued by global technology companies.
Returning to ourselves, in this wave of large model competition, it is unanimously believed that China’s breakthrough and advantage lies in our rich application scenarios, our large-scale market, and our ability to apply large models best.
Then can't we just use open source large models? Why do we have to spend a lot of energy chasing GPT-4?
First of all, as Zhipu CEO Zhang Peng said, a useful base model ultimately depends on whether the base model has sufficient capabilities. If the current large-scale domestic models are to be implemented in actual scenarios and bring business value to enterprises, the general capabilities of the models need to be greatly improved.
Looking at the current most advanced model, GPT-4, although it continues to evolve new human-like capabilities, it still has not been able to completely overcome the most basic "model illusion" problem. In the short term, AGI will still be a battle for mankind. "Intracranial Carnival".
"To truly implement it on the B-side, chat products alone don't seem to be enough." Zhang Peng believes that the current challenges encountered in the commercialization of large-scale models are essentially breakthroughs in model capabilities.
Since top students still have room for improvement, what qualifications do we have to not make progress, not to mention that the model capabilities of domestic large models are not enough to support the commercialization of many business scenarios, so currently GPT-4 is still a goal worth pursuing.
Secondly, at the national level, independent and controllable technology is the general trend, and looking up to the most ambitious technological ideal is still the goal we must reach.
"Now it mainly depends on who can catch up with or surpass GPT-4. It is very likely that most manufacturers will not be able to make it." An industry insider with an in-depth understanding of the large model ecology said. He also pointed out that after the release of Meta's Llama2, the model capabilities were once close to GPT-3.5, but Meta has not released new progress so far. It seems that the technical threshold for large models is still very high, which will be a test for many domestic teams.
Many domestic manufacturers train models based on Llama open source.
02
GLM-4, performance is close to GPT-4
Today, January 16, Zhipu AI (hereinafter referred to as "Zhipu") held the 2024 Zhipu AI Technology Open Day in Beijing and released the new generation base model GLM-4.
According to Zhipu, GLM-4 has achieved significant improvements in basic capabilities, and its performance has been improved by 60% compared to the previous generation GLM-3. According to the evaluation data provided by Zhipu,
the performance of GLM-4 is close to that of GPT-4
.
First,
in terms of
basic capabilities
, MMLU 81.5 reaches the GPT-4 94% level, GSM8K 87.6 reaches the GPT-4 95% level, MATH 47.9 reaches the GPT-4 91% level, BBH 82.25 reaches the GPT-4 99% level, and HellaSwag 85.4 reaches the GPT -4 90% level, HumanEval 72 reaches GPT-4 100% level.
Tuyuan Zhipu Open Day
In terms of command following ability
, compared with GPT-4, IFEval reaches 88% level in prompt word following (Chinese) and 90% level in command following (Chinese). Significantly exceeds GPT-3.5.
In terms of alignment capabilities
, based on the AlignBench data set, GLM-4 surpassed the version of GPT-4 released on June 13, and was close to the latest (November 6 version) effect of GPT-4 in terms of professional ability, Chinese understanding, and role playing. Exceeds GPT-4 accuracy. My ability in Chinese reasoning needs to be further improved.
Surprisingly, this release of Zhipu demonstrates GLM-4’s efforts to catch up with GPT-4 in the past year. In multiple model evaluations, its basic capabilities have reached 90% of GPT-4’s level. This achievement is already very rare, but they did not simply say "catching up with GPT-4", but maintained a pragmatic and low-key attitude, showing that the performance of GLM-4 is only "approaching" GPT-4, and there is still a gap with GPT-4. , and even specifically pointed out his current shortcomings and the need for "further improvement."
Different from the current trend of exaggeration, Zhipu has always given people the impression of a "low-key academic master".
In addition to performance improvements, GLM-4 supports a context window length of 128K, and a single prompt word can process up to 300 pages of text. In the needle test, the GLM-4 model can achieve almost 100% precision recall within a text length of 128K.
Based on the powerful Agent capabilities of the GLM model, Zhipu has launched GLM-4-All Tools, which can automatically understand and plan complex instructions according to user intentions, and freely call WebGLM search enhancement, Code Interpreter code interpreter and multi-modal generation capabilities to complete complex tasks.
Multimodality has become an important direction and path for AI development. We can see that large head model manufacturers are developing towards multimodality, such as Meta’s SAM, OpenAI’s GPT-4V, Google Gemini, and today’s CogView3. Spectrum has been "aligned" with the world's advanced level.
Modality refers to the way of expressing or perceiving things. Every source or form of information can be called a modality. The visual modality is a primary modality obtained directly from the real world. It has abundant data sources and low cost. It is more intuitive and easier to understand than the language modality.
In real applications, text, images, and sounds are often interspersed and interacted with each other, and they are not all pure text. In some complex application scenarios, the interaction method of pure text will be limited by the expression ability of the text, making it difficult to convey complex concepts or needs. In contrast, the image interaction method in the multi-modal model has a lower threshold and is more To be intuitive.
A securities analyst believes that
a small step in multi-modal technology will bring a big step in the implementation of industrial applications
. Multimodality is an important milestone for large language models to move towards thousands of industries and even general artificial intelligence.
Therefore, if AI is to penetrate into all walks of life, it is an inevitable trend for large models to develop into multi-modal models.
At this time, Zhipu has been running in the large model industry for more than ten months. This time, GLM-4's multi-modal capabilities have also been significantly improved, with both venogram and multi-modal understanding enhanced. The effect of CogView3 is significantly better than the best open source Stable Diffusion XL, and is close to the latest DALLE3 released by OpenAI. In various evaluation dimensions such as alignment, fidelity, safety, and combined layout, CogView3's performance reaches more than 90% of the level of DALLE3.
Zhang Peng, CEO of Zhipu AI, said at the Technology Open Day: The launch of GLM-4 marks that the level of domestic large models is in line with the world's advanced level, laying a fundamental foundation for us to comprehensively open up a new situation in the domestic large model industry.
The release of GLM-4 will become a watershed in the development of domestic large models, bringing more room for imagination to the commercialization and industrial implementation of large models.
03
GLM-4 brings large models into the era of accelerated commercialization
When ChatGPT first ignited the Chinese Internet last year, Zhipu decided to start commercialization. According to Zhipu, since March this year, it has met more than 2,000 customers, formed cooperation with more than 1,000 of them, and conducted in-depth co-creation with more than 200 of them.
Looking at the progress of the entire large model, we can see that Zhipu has been focusing on commercialization in the past year. Compared with other leading large model startups, they only started to call for commercialization after October. Its commercialization is almost half a year ahead of the industry.
Commercialization also faced challenges.
CEO Zhang Peng frankly told Leifeng.com at the end of October last year that Zhipu's large models faced the challenge of "approval but not popularity", that is, many people recognized it, but when it came to paying for it, they would retreat.
On the one hand, people don’t know enough about large models. On the other hand, the reason is very practical. There is GPT-4 in front of them. Even if users don’t know much about large models, they all know GPT-4 and will ask questions about it. How far is the model from GPT-4?
Regarding commercialization, Zhang Peng believed at that time that if one day it reaches the level of GPT-4, many of the problems currently faced will be solved, and there is no need to even consider the business model, and only APIs will be provided.
Unexpectedly, in just over two months, GLM-4 has become comparable to GPT-4, which will be a major benefit to the overall development and commercialization of smart spectrum.
At this technology open day, Zhipu also launched a series of important measures to promote the accelerated construction of the GLM model ecosystem. The most important of these is GLMs personalized agents.
Based on the powerful capabilities of the GLM-4 model, any user can create his or her own GLM personalized agent with simple prompt word instructions. The GLM model agent and agent center have been launched on the Technology Open Day.
In addition, Zhipu AI has also launched a number of targeted measures for partners such as commercial customers, open source communities, and large-scale small and micro enterprises.
For example, after the GLM-4 upgrade, the API call price remains unchanged at 0.1 yuan/thousand tokens, which is already a low level in the industry. In addition, Zhipu AI will also establish a large model open source fund with a total amount of 10 million yuan, upgrade the Zhipu AI "Z Plan" for global large model entrepreneurs, and launch large model entrepreneurship with a total amount of 1 billion yuan in conjunction with ecological partners. Funds are used to support original innovations in large models.
The above-mentioned various measures to promote the GLM model ecology are to build the ecosystem of Zhipu. Their essence is also to contribute to the commercialization of Zhipu.
According to Zhang Fan, Chief Operating Officer of Zhipu AI, in the past nine months, he has led Zhipu from the initial "selling models" to the establishment of a complete commercialization system.
Zhipu's commercialization system is in the form of a pyramid. The lowest level is the open source layer. Open source has tens of millions of downloads and has a very large group. When Zhang Fan was chatting with customers, he found that many technicians use ChatGLM to get started; the upper layer is API. layer, the core customers who call APIs every day; the next layer is cloud privatization, oriented to medium-sized enterprises. Medium-sized enterprises not only have the need to use models, but also hope to transform the data assets in the business into their own competitive barriers; The highest level is local privatization. Many companies have extremely high security requirements, or many companies hope to convert model capabilities into their own and hope that they can control the model. This type of volume will be smaller.
For Zhipu, each layer has its own ecological niche, and the commercialization goal is to hope that lower-level users will continue to move to the upper levels, gradually enriching the commercialization of Zhipu.
This is very consistent with Zhipu’s development strategy: always insist on walking on two legs: technology and commercialization.
The release of GLM-4 will shock the entire large model industry, prompting large models to enter an era of accelerated commercialization.
04
postscript
On March 14, 2023, the same day that GPT-4 was released, Zhipu AI subsequently released the dialogue model ChatGLM based on the Qianyi base model, and open sourced the Chinese-English bilingual dialogue model ChatGLM-6B, which can support Used for inference on consumer-grade graphics cards.
This highlights the ambition of Zhipu AI to benchmark OpenAI. Today's successful release of GLM-4 is a reflection of Zhipu's humility in the past year to keep up with the world's most advanced levels, and is also the realization of Zhipu's determination and confidence.
The goal of Zhipu benchmarking against OpenAI is being achieved step by step.
Today's GLM-4 performance is close to that of GPT-4, giving domestic large models the confidence and persistence to catch up with or even surpass GPT-5 and GPT-6... on the road to realizing AGI.
As Sam Altman said, "Always be faster", the era of large models has accelerated everything. In the first month of 2024, Zhipu AI took the lead, which can be said to have set the tone for the fierce competition in 2024. People can't help but look forward to what surprises the artificial intelligence industry will bring us in the future.
//
Recent popular articles