Article count:16428 Read by:87919360

Hottest Technical Articles
Exclusive: A senior executive of NetEase Games was taken away for investigation due to corruption
OPPO is going global, and moving forward
It is reported that Xiaohongshu is testing to directly direct traffic to personal WeChat; Luckin Coffee is reported to enter the US and hit Starbucks with $2, but the official declined to comment; It is reported that JD Pay will be connected to Taobao and Tmall丨E-commerce Morning News
Yu Kai of Horizon Robotics stands at the historical crossroads of China's intelligent driving
Lei Jun: Don't be superstitious about BBA, domestic brands are rising in an all-round way; Big V angrily criticized Porsche 4S store recall "sexy operation": brainless and illegal; Renault returns to China and is building a research and development team
A single sentence from an overseas blogger caused an overseas product to become scrapped instantly. This is a painful lesson. Amazon, Walmart, etc. began to implement a no-return and refund policy. A "civil war" broke out between Temu's semi-hosted and fully-hosted services.
Tmall 3C home appliances double 11 explosion: brands and platforms rush to
Shareholders reveal the inside story of Huayun Data fraud: thousands of official seals were forged, and more than 3 billion yuan was defrauded; Musk was exposed to want 14 mothers and children to live in a secret family estate; Yang Yuanqing said that Lenovo had difficulty recruiting employees when it went overseas in the early days
The app is coming! Robin Li will give a keynote speech on November 12, and the poster reveals a huge amount of information
It is said that Zhong Shanshan asked the packaged water department to sign a "military order" and the entire department would be dismissed if the performance did not meet the standard; Ren Zhengfei said that it is still impossible to say that Huawei has survived; Bilibili reported that employees manipulated the lottery丨Leifeng Morning News
Account Entry

From commercial complexes to bakery shelves, everything is "calculated" by the video model

Latest update time:2024-06-25
    Reads:

Visual AI has changed, and big video models are the future.


Author | Bao Yonggang

Editor | Wang Chuan

When you walk into a shopping mall that has just been digitally upgraded, you may find that it is easier to find your favorite stores than before.
You may not even realize that the bakery you just left was easier to buy than the last time you visited because the shelves were rotated 90 degrees.
It is also possible that when it is mealtime, you can follow the crowd and find new delicious food.
Behind these changes is the fact that AI cameras, which once only counted crowds and monitored dangerous behavior, have now become smart enough to “calculate” commercial complexes.
The intelligent upgrade of the camera is due to the fact that visual AI has entered a new era of large video models.
The video big model allows AI capabilities to leap from the primary school student level to the professor level. Retail, smart manufacturing, urban management, environmental monitoring and other scenes that have already used visual AI will enter a new video big model AI era.
The Intel Video AI Computing Box, which is a combination of the most well-known Intel Core CPU and the Intel GPU, is the most easily accessible key to entering the new video AI era.

01

Store layout and shelves calculated by AI

The layout and management of traditional commercial complexes rely on experience. For example, the underground floor is for supermarkets and restaurants, the first floor is for cosmetics and jewelry, the second floor is for women's and children's clothing, and the third floor is for men's clothing.
However, consumer habits are changing, and consumers in different regions have different consumption preferences. The role of experience is decreasing, and the value of AI is becoming increasingly obvious.
AI cameras, which have been widely used, can count the flow of people and help shopping mall customers find lost items more quickly, but their effects in attracting customers and improving shopping mall operations are not yet significant.
The big video model in the era of generative AI will take the digitalization of the retail industry to a higher level.
Chen Tiesheng, deputy general manager of Beijing Mapleland International Shopping Center, has a lot of experience. Mapleland International Shopping Center, which has a 17-year history, has undergone two transformations. The second transformation introduced the digital system of Kaiyu Group, which can count the passenger flow of each elevator and floor of the mall, the passenger flow characteristics and consumption demand of different floors, and deeply understand the preferences of shopping center consumers for catering and retail.
With richer data insights, it is easier to optimize store layout and adjust product categories and marketing strategies.
By introducing the digital system of Cognitive Group, Maple Blue International Shopping Center has transformed from experience management to refined management, which has also brought about an increase in performance. The number of store visits for shopping mall activities has increased by 20%, and sales have increased by nearly 30%. The shopping mall has been "calculated" by generative AI.
It can “calculate” large shopping malls, and large video models can also “calculate” stores and shelves.
The Beijing Yingke store of the bakery chain Tous Les Jours also uses the digital system of the Kaiyu Group. With the help of the customer flow map generated by the new generation of video AI, it was found that about 60% of customers would go directly to the adjacent cash register after passing the bread cabinet, which resulted in relatively few customers at the sandwich counter.
The operations team made a simple adjustment and turned the sandwich display case 90 degrees to follow the customer's movement. Data for several days showed that the number of customers visiting the sandwich counter increased.
Both cases fully demonstrate that the video big model used in Kaiyu Group's digital system has undergone a revolutionary change compared to the AI ​​used in new retail after 2018, and the video big model has greater commercial value.

02

From traditional visual AI to a new era of video big models

The reason why traditional visual AI algorithms cannot provide more valuable data and suggestions for applications including retail like large video models is mainly due to the limitations of the technology.
The CNN and RNN algorithms integrated into traditional AI cameras can represent video content, such as locations and people, one; the other can capture dynamics, such as the movement direction and trajectory of people in the video. It is difficult to remember a person and his or her movement trajectory at the same time.
This makes it difficult for traditional AI vision algorithms to provide shopping malls or bakeries with the consumption characteristics of specific customers and help them make operational decisions.
The Transformer architecture of the large video model balances the content representation and video dynamics, and can remember both a specific person in the video and the trajectory of that person's movement.
This is an innovation in algorithms. Traditional CNN, RNN, and LSTM algorithms are like a primary school student who cannot apply knowledge to other situations. The teacher uses a lot of pictures and texts to teach the student knowledge, such as recognizing cats. However, when the student tries to identify, as long as there is a significant difference from what the teacher has taught, the student may fail to recognize.
In addition, the information transmission of traditional AI algorithms must be carried out in sequence. If the transmission process is long, the information will be distorted or lost.
Therefore, traditional AI algorithms have poor generalization ability, and require a professional AI team to be deployed to train and deploy them for different scenarios. This not only consumes resources and time, but also has an extremely long construction cycle.
There is another problem. Traditional video AI solutions need to be deployed in a centralized manner, and video stream data needs to be transmitted over the network to the back-end for processing. Massive data transmission and data security face huge challenges.
Poor generalization and the need for centralized deployment limit the large-scale application of traditional visual AI and the exploration of its commercial value.
Generative AI is a step further than traditional AI , just like a college student who can learn by self-supervision and apply what he has learned to other situations.
The learning process is completely different from that of primary school students. College students do not rely on the teacher's experience to learn independently. They learn from high-quality materials (representative videos with accurate natural language descriptions), such as an appropriate description of a white cat lying on the sofa in the living room, and a large amount of less good quality materials, such as a video of a gray cat running, and the corresponding description is a room. After a lot of learning, the college student can judge that it is a gray cat running in the room.
Transformer is not only capable of self-supervised learning, but also understands based on context, thanks to the fact that information does not need to be transmitted in sequence. This opens up a new world of visual AI and can complete more complex tasks in more scenarios.

For example, in a shopping mall, traditional AI video search is limited to specific keywords. With a solution based on a large video model, you can directly search for "find the little boy in white clothes" and quickly complete the search and positioning.

The time and accuracy of obtaining results depends largely on the underlying software and hardware.

03
The most accessible hardware and software foundation in the era of large video models

Compared with one-dimensional text and two-dimensional images, processing three-dimensional videos places higher requirements on the processor, and large video models have only appeared recently, which puts a great test on the hardware's rapid adaptation to the algorithm.
The Intel Video AI Computing Box, which is widely used worldwide and built with Intel Core CPU and Sharp GPU, is the only choice for deploying large video models.
The Intel Core CPU processor can meet the needs of large video model solutions in high-speed data processing, computer vision, and low-latency deterministic computing in video stream reading and data analysis. In view of complex working environments, Intel has also optimized the stability and reliability of the processor to ensure 24-hour uninterrupted work.
Intel's Xeon graphics cards provide computing power support for a large number of inference tasks in large video models. The Xe core in the microarchitecture integrates a high-bandwidth matrix engine XMX, which can provide hardware-based performance acceleration for matrix multiplication and accumulation calculations commonly used in AI reasoning.
Powerful hardware is not enough. The OpenVINO toolkit ensures that Intel's video AI computing box can quickly adapt to large video model algorithms and implement deployment.
The inference engine based on the x86 core instruction set in the OpenVINO toolkit can use the hardware instruction set to accelerate AI inference. The OpenVINO toolkit can also further optimize the computational graph structure and improve the inference efficiency of large video model solutions by improving the parallelism of operator calculations.
Kaiyu Group is leveraging the powerful computing power of Intel's video AI computing box and the OpenVINO toolkit to provide AI acceleration to build a digital shopping mall solution based on large video models, effectively extending the capabilities of large video models to various terminal products in shopping malls, including visual terminals and digital work badges.
“We have used Intel GPUs to fully tap the potential of Intel GPUs in AI model reasoning through OpenVINO and Intel oneAPI toolkits, making model migration and deployment simpler and faster, while significantly improving the model’s reasoning speed,” said Zhao Yudi, CTO of Kaiyu Group.
Of course, Kaiyu Group will also leverage more powerful Intel software and hardware in addition to the Intel Video AI Computing Box to fully leverage the advantages of generative AI, provide advanced digital solutions for retail, real estate, parks and other fields, and help users unlock new codes for digital transformation.
Intel Video AI Computing Box also has a very significant advantage - it is compatible with existing security monitoring systems.
Thanks to the more compatible design of Intel's video AI computing box, the new solution can be easily connected to most existing security monitoring systems and quickly deployed and debugged. For example, the camera only needs one network cable to complete data transmission and power supply, greatly reducing the difficulty of installation and maintenance.
On this basis, the generalization performance of Intel Video AI Compute Box enables richer AI functions and supports a wider range of scenarios.

04

The huge commercial value of the video model

Kaiyu Group's solution is a "cloud-edge-end" architecture design. The large video model deployed at the edge allows the system to avoid massive network data transmission and makes AI respond faster.

The data is processed by the Intel Video AI Computing Box at the edge and does not need to be uploaded to the cloud, which can also ensure data security and privacy.
Combined with Kaiyu Group's technological accumulation and rich experience in the field of retail digitalization, the combination of self-developed algorithms and big models can not only help merchants optimize store layout and innovate marketing strategies, but also significantly improve the efficiency of shopping malls and personnel management.
For example, it can realize complete cross-camera recognition of the behavioral trajectory of "people" in the space, and achieve accurate statistics of the number of customers/personnel while ensuring personal privacy and security. It can also seamlessly separate the impact of non-customer behaviors such as shopping guides and security guards on customer flow data.
It can not only realize common functions such as traffic statistics and store navigation, but also realize data insights in more dimensions such as store attractiveness, customer flow preferences, consumer analysis, trajectory and heat, length of stay, climbing rate, etc., thereby achieving more refined business operations and management.
It can also help mall managers and merchants reduce costs and increase efficiency by performing automated inspections of fire escape occupancy, fall identification, non-business hour intrusions, employee vacancies, mobile phone monitoring, and traffic statistics.
The large video model has powerful generalization and automatic processing capabilities, which can reduce the workload and cost of deploying AI in commercial complexes, improve users' ability to handle emergencies, and can also be applied in multiple industries such as real estate retail, production logistics, park management, and urban management.
Warehousing and logistics parks can use cameras, sensors and other equipment to monitor vehicle dynamics in real time, optimize logistics efficiency and eliminate safety hazards.
Smart manufacturing production lines can use video big model solutions to automatically identify early signs of equipment failure, provide early warning and maintenance.
Big video models can also play a greater role in solving traffic congestion in urban management. By learning from historical traffic video data, we can understand the changing patterns of traffic flow and predict congestion in the future.
The frozen pre-trained large model is already able to achieve such powerful AI functions. The video large model will continue to evolve in the direction of understanding longer videos and adapting to richer scenarios.
No matter how the algorithm evolves and how fast it evolves, the Intel Video AI Compute Box based on powerful Core processors and Intel Core processors, as well as the ecosystem of OpenVINO and Intel oneAPI toolkit, are the cornerstones of the implementation of large video models.
//

Recent Hot Articles


Latest articles about

Database "Suicide Squad" 
Exclusive: Yin Shiming takes over as President of Google Cloud China 
After more than 150 days in space, the US astronaut has become thin and has a cone-shaped face. NASA insists that she is safe and healthy; it is reported that the general manager of marketing of NetEase Games has resigned but has not lost contact; Yuanhang Automobile has reduced salaries and laid off employees, and delayed salary payments 
Exclusive: Google Cloud China's top executive Li Kongyuan may leave, former Microsoft executive Shen Bin is expected to take over 
Tiktok's daily transaction volume is growing very slowly, far behind Temu; Amazon employees exposed that they work overtime without compensation; Trump's tariff proposal may cause a surge in the prices of imported goods in the United States 
OpenAI's 7-year security veteran and Chinese executive officially announced his resignation and may return to China; Yan Shuicheng resigned as the president of Kunlun Wanwei Research Institute; ByteDance's self-developed video generation model is open for use丨AI Intelligence Bureau 
Seven Swordsmen 
A 39-year-old man died suddenly while working after working 41 hours of overtime in 8 days. The company involved: It is a labor dispatch company; NetEase Games executives were taken away for investigation due to corruption; ByteDance does not encourage employees to call each other "brother" or "sister" 
The competition pressure on Douyin products is getting bigger and bigger, and the original hot-selling routines are no longer effective; scalpers are frantically making money across borders, and Pop Mart has become the code for wealth; Chinese has become the highest-paid foreign language in Mexico丨Overseas Morning News 
ByteDance has launched internal testing of Doubao, officially entering the field of AI video generation; Trump's return may be beneficial to the development of AI; Taobao upgrades its AI product "Business Manager" to help Double Eleven丨AI Intelligence Bureau 

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号