Domestic multi-modal large model open source! Unconditionally free for commercial use, superior performance to Claude 3 Sonnet

Latest update time：2024-04-28

Reads：

Jian Tong from Ao Fei Temple
Qubit | Public account QbitAI

Another domestic multi-modal large model is open source!

XVERSE-V, from Yuanxiang, is still the same and is unconditionally free for commercial use .

Previously, Yuanxiang was the first to release the largest open source large model in China, and now there is another open source family series.

The latest multi-modal large model supports image input with any aspect ratio and maintains the leading effect in mainstream evaluation——

In a number of authoritative multi-modal evaluations, XVERSE-V surpassed open source models such as Yi-VL-34B, wall-facing intelligent OmniLMM-12B and deep search DeepSeek-VL-7B.

In the comprehensive capability evaluation MMBench, it surpassed well-known closed source models such as Google GeminiProVision, Alibaba Qwen-VL-Plus and Claude-3V Sonnet.

Support image input with any aspect ratio

The image representation of traditional multi-modal models only represents the whole. XVERSE-V adopts a strategy of fusing the whole and the part and supports the input of images with any aspect ratio.

Taking into account both global overview information and local detailed information, it can identify and analyze subtle features in images to see more clearly and understand more accurately.

This processing method allows the model to be applied to a wide range of fields, including panoramic image recognition, satellite images, scanning analysis of ancient cultural relics, etc.

△ Example-HD panorama recognition

△ Example-image detail text recognition

In addition to performing well in basic abilities, he can also easily handle various practical application scenarios, such as charts, documents, code conversion, and real-life scenarios for visually impaired people.

Diagram comprehension .

Whether it is the understanding of information graphics that combine complex graphics and text, or the analysis and calculation of a single chart, the model can handle it with ease.

Autopilot .

Code writing .

There are also real-life scenarios for the visually impaired .

In the real visually impaired scene test set VizWiz, XVERSE-V outperformed almost all mainstream open source multi-modal large models such as InternVL-Chat-V1.5 and DeepSeek-VL-7B. This test set contains more than 31,000 visual questions and answers from real visually impaired users, which can accurately reflect users' real needs and trivial problems, and help visually impaired people overcome their daily real visual challenges.

from Yuan Xiang

Yuanxiang XVERSE was established in Shenzhen in early 2021. The cumulative financing amount exceeds US$200 million, and investment institutions include Tencent, Gaorong Capital, Wuyuan Capital, Hillhouse Ventures, Sequoia China, Temasek and CPE Yuanfeng.

Yao Xing, the founder of Yuanxiang, is the former vice president of Tencent and founder of Tencent AI Lab, and a member of the New Generation Artificial Intelligence Strategic Advisory Committee of the Ministry of Science and Technology.

Previously, Yuanxiang was the first in China to open source a MoE model with a maximum parameter of 65B and the world's first to open source a MoE model with a longest context of 256K, and led the country in the SuperCLUE evaluation.

In terms of commercial applications, the Yuanxiang large model is one of the first models in Guangdong to obtain national registration and can provide services to the whole society.

Yuanxiang Large Model has been conducting in-depth cooperation and application exploration with multiple Tencent products since last year, including QQ Music, Huya Live, National Karaoke, Tencent Cloud, etc., to create innovative and leading user experiences in the fields of culture, entertainment, tourism, and finance. .

Project link:
Hugging Face: https://huggingface.co/xverse/XVERSE-V-13B
ModelScope: https://modelscope.cn/models/xverse/XVERSE-V-13B
Github: https://github. com/xverse-ai/XVERSE-V-13B