Multimodal celebrities all open Xiaohongshu

Latest update time：2022-04-15

Reads：

In the real world, humans use a combination of vision, hearing, touch, smell and other senses to access and understand the world. The information we obtain through different senses is naturally in a "multimodal" form.

In this sense, the development of artificial intelligence is a process of approaching human intelligence. Multimodal learning is an inevitable development direction.

Multimodal learning brings new application scenarios

In an era of explosive Internet information and increasingly diverse forms, the proportion of text, pictures and short videos in Internet content continues to increase. A single modality often cannot provide a complete description of information such as text, pictures and videos. At the same time, the understanding of content in the application field runs through the entire search and recommendation system.

Just as we are accustomed to using "image recognition" instead of text search on shopping apps; in the field of smart homes, voice interaction and gesture interaction are becoming popular trends; communication with smart robots is no longer a mechanical text dialogue, but has entered a deeper stage of voice understanding and image understanding.

We need to understand content from multiple granularities. How to integrate feature information from multiple modalities has gradually become a new challenge that exists in many fields. Therefore, it is a consensus in academia and industry today to focus on the development of multimodal technology.

Today’s Challenges

Although multimodality has been around since the 1970s and has undergone decades of development, it still faces many challenges in the current industrial implementation, resulting in some scenarios being in a "pseudo-multimodal" application state, which affects the user experience.

The “semantic gap” still exists;
How to master large amounts of matching multimodal data;
uncertainty in multimodal information;
Fine alignment between different modalities;
Effective architectures for multimodal pre-training, and more.

So, now that academia and industry are once again focusing on multimodality, what are they discussing?

Expert interpretation is waiting for you

At 19:00 on April 20, the first episode of the live broadcast program "REDtech is here" produced by the Xiaohongshu technical team will focus on the topic of multimodality.

In the first half of the issue, the Xiaohongshu technical team invited Xie Weidi, associate professor and doctoral supervisor from the School of Electronic Information and Electrical Engineering of Shanghai Jiao Tong University, Liu Si, professor and doctoral supervisor from the Beijing University of Aeronautics and Astronautics, and Gao Shenghua, associate professor and doctoral supervisor from the School of Information of ShanghaiTech University to share their research on multimodal content understanding.

The second half of the live broadcast, which is expected to be held on April 27, will focus on multimodal understanding and creation. He Ran, a researcher at the Institute of Automation, Chinese Academy of Sciences, Zhou Xiaowei, a "Hundred Talents Program" researcher and doctoral supervisor at Zhejiang University, and Zhu Chaolin, a lecturer at the ReLER Laboratory of the University of Technology Sydney, will bring the latest research results on multimodality from the academic community.

Scholars from the above-mentioned universities will share topics such as " Cross-modal image content understanding and video generation ", " Language-guided visual localization ", " Multimodal visual content generation ", " Multimodal retrieval, localization and generation methods ", " Convenient three-dimensional digitization technology ", and " Technology and application of self-supervised learning in multimodal content understanding ". You are welcome to interact and ask questions in the live broadcast room!

Unlock Xiaohongshu multimodal password

In addition, Tang Shen, the head of Xiaohongshu's multimodal algorithm group, will also use Xiaohongshu's practice as an example to focus on introducing Xiaohongshu's exploration and R&D application of multimodal technology in the fields of content quality evaluation, multimodal search and transaction content understanding .

Zhang Debing, head of Xiaohongshu's intelligent algorithm group, will bring the application and challenges of multimodal technology in intelligent creation , and discuss how to make understanding more refined and make creation more personalized, diverse, expressive, and convenient.

As a unique content community in China, Xiaohongshu has more than 200 million monthly active users as of October 2021. How to process and understand such a large amount of UGC content and distribute it more accurately and efficiently is one of the biggest application directions of multimodal technology.

It can also be seen that among the current domestic Internet applications, the content of Xiaohongshu is mainly graphic notes and short videos, and a large number of shared notes and massive user instant behaviors are generated every day. This leaves a lot of room for imagination for multimodal human-computer interaction.

This ecosystem has given rise to many valuable and challenging problems, involving the understanding and comprehensive use of multiple modal information such as vision, NLP, audio, and user behavior. Therefore, Xiaohongshu is also an excellent practical scenario for discussing how to better define multimodality and give full play to the core value of multimodality .

The multimodal understanding of content runs through Xiaohongshu's entire search, recommendation and transaction system. Currently, Xiaohongshu's technical team has developed and applied multimodal technology in the fields of short video understanding, content quality evaluation, multimodal retrieval, transaction content understanding, three-dimensional digitization, and intelligent creation.

The unique community ecology, extensive, complex, high-real-time, and real user scenarios, massive multimodal data, and complex and changeable instant user behaviors have jointly created Xiaohongshu's unique advantages in multimodal practice. Xiaohongshu's innovation and exploration will also provide new directions and paradigms for the real implementation of multimodality.

Follow [Xiaohongshu Technical Team], the live broadcast will start on time, and we will see you there.

After the appointment is successful, please scan the QR code below to enter the live communication group. If the limit for scanning the code to enter the group is reached, you can add the assistant WeChat and reply "multimodal".

We will release the live broadcast link, guest speech highlights and lucky draw activities in the WeChat group. You can ask questions and interact. Your questions may be picked and answered by the guests .

Exclusive resume delivery entrance:

REDtech@xiaohongshu.com

Live chat group

Little Helper

*This article is authorized to be published by Quantum位, and the views expressed are solely those of the author.

-over-