Urban brain "eye disease" and upgrade: Analysis of the "digital retina" system proposed by Academician Gao Wen-EEWORLD

Collect

In recent years, the concept of urban brain has been in full swing.

Compared to the past two years when everyone was relatively unfamiliar with this concept, I believe that most readers now already know that the basic connotation of the so-called city brain is to use a large number of cameras on the road to identify traffic, license plates and vehicles, as well as opportunistic cameras on some roads and squares to perform facial recognition and crowd judgment.

The data collected by these "eyes of the city" can help improve the intelligent level of urban security, such as predicting crowd congestion, identifying traffic accidents and suspicious vehicles, etc. On the other hand, the role of the city brain is mainly reflected in the interaction with traffic lights and viaduct access gates. It is widely believed that the identification and judgment of vehicle data and the use of the city brain for traffic control are an effective way to manage urban congestion.

These technical logics have been discussed repeatedly in the past two years, and coupled with the continuous investment and publicity from technology giants, it is likely that the public will think that the city brain is already very complete and can truly serve as the "AI traffic commander" of a city. However, this is not the case in fact. From an industry concept to its actual implementation, the city brain, or the smart city system, still faces many objective difficulties and obstacles in the physical world. In particular, there is a "generation gap" between the city camera system itself and the later added AI brain that cannot be ignored.

There has been a lot of academic discussion in the past two years on how to solve this problem. For example, the concept of "digital retina" proposed by Gao Wen, an academician of the Chinese Academy of Engineering, chairman of the China Computer Society, professor and doctoral supervisor at Peking University, has been widely valued by industry and academia. It constitutes a representative solution to this problem.

We can discuss two issues from the practical problems arising from the digital retina and its development ideas: How far are we from the intelligence of the city? And if we want to shorten the distance, what urgent work is needed?

Thinking from this perspective, we may find that digital retina is both an academic innovation and a new industrial opportunity.

The "eye disease" has not been cured, and the city's brain is still immature

The first problem that the city brain concept faces in actual implementation, and perhaps one of the most fundamental problems, is how to connect city cameras that were not originally used for intelligent computing with the recognition and even intelligent analysis capabilities brought by AI?

This question involves a fundamental contradiction, which is where intelligence occurs.

Today, general city brain and smart city projects mainly store video data collected by cameras, so that they can be identified and analyzed using algorithms in the cloud.

Many contradictions arise here. For example, the video data collected by traditional cameras is too large and has poor clarity, making it difficult for AI algorithms to recognize.

Even if the camera can provide high-definition data, since the captured video does not perform feature extraction, the entire chain of calculations from extraction to recognition, retrieval, and reasoning must occur in the cloud. The amount of data this brings is very large, and the cloud will be under unbearable data pressure, which will affect the recognition accuracy and data processing accuracy. At the same time, stacking basic video data in the cloud will objectively cause excessive delays, making it difficult to meet the rigid demand for real-time response in traffic scenarios.

At the same time, the video data generated by urban complexes every day is essentially a "data burden". Where does this data exist? How long does it last? Who will view it? How to retrieve it? In the face of massive amounts of data, these questions have become a heavy burden for the urban data management system.

So what if the camera itself has recognition capabilities? This is the main solution for smart cities today, but because the original cameras are inconvenient to disassemble, if you want to add new recognition capabilities, you have to install a new camera. So we see photo cameras, face recognition cameras, license plate recognition cameras, vehicle recognition cameras, etc. Looking up at the intersection can trigger intensive phobia.

The data identified by these "smart cameras" cannot be integrated and connected to the underlying layer, but can only operate independently. The back-end AI can only listen to one side and cannot conduct a complete traffic scene analysis and give real "smart suggestions".

So what if the full set of AI capabilities for recognition and reasoning are all in the camera? The biggest problem this brings is that the computing power on the end is not sufficient to support complex calculations. If a large amount of AI computing power and dedicated hardware modules are loaded on each camera, it will be an unaffordable cost. And if you want AI to understand the city globally, each camera must also be globally summarized.

This dilemma is the gap between the ideal and reality of urban intelligence today. Academician Gao Wen summarized it into four problems: "difficult storage, difficult retrieval, difficult identification and diversified functions", and figuratively compared them to the "autism" and "amblyopia" of the city.

So how do we solve the various "eye diseases" in cities? If we use bionics as an analogy, today's cities have cloud computing and AI as their brains, and cameras as their eyes, but there is one thing missing between the two: the retina.

Future cities need a "digital retina" installation surgery

The mammalian retina can be called a masterpiece of nature.

One of the characteristics of the retina is that it is hidden between the brain and the eyes, silently acting as a translator between human wisdom and the world. Our retina does not actually transmit the real picture and color to the brain, but optimizes these "data" so that our brain can directly process the visual information that can be felt and understood.

In the opinion of Academician Gao Wen, what needs to be installed between the city brain and thousands of cameras today is such a "digital retina".

Of course, the digital retina is not really a hardware that mimics the bionic retina, but it hopes to change the current camera's ability to only see or identify a single link. Let the camera itself have a certain AI processing capability, and be able to actively extract features from the identified cars, people, and scenes.

This allows the video data uploaded by the camera to the cloud to be stored as data through efficient encoding on one path, and directly used as "readable material" by the intelligent brain through feature extraction on the other path.

This not only keeps intelligent computing such as recognition and reasoning in the cloud, but also allows the client to complete preliminary intelligent analysis, balancing the cost and efficiency of both sides. The intelligent combination of cloud brain and camera is to install a new "digital retina" for the city.

This technical concept is to combine the city's "brain" and countless "eyes" into one from the perspective of computing and video coding feasibility. To make this concept a reality, it is necessary to build on the end-side equipment with optimized video coding capabilities and high-intensity video feature extraction capabilities in the AI field. This will create a functionally integrated video and image perception system that can integrate coding and feature coding.

In other words, innovation at the software layer will bring better cost-effectiveness and more optimized efficiency to the city's hardware system. Compared with most current industry solutions, this system is characterized by a balance between ideal goals and practical feasibility: the camera completes feature extraction, and the cloud is only responsible for recognition and reasoning. Cloud computing and terminals each bear part of the computing tasks, and the computing power is reasonably allocated. Using more optimized video decoding technology and the video feature extraction capabilities brought by AI, the entire system can be executed under limited computing power and bandwidth conditions.

Even so, it is not easy to perform a "digital retina" surgery on a city.

Challenges, paths and industry opportunities: the future of digital retina

Although it is necessary and technically feasible to achieve “intelligent integration” between the emerging city brain technology and China’s huge and diverse urban camera system, many challenges still need to be addressed in reality.

For example, the innovative logic of digital retina is to use leading video coding standards and coding technologies to lower the hardware threshold. This requires a series of new breakthroughs in video coding technology and machine vision technology, so that digital retina can truly "win people with internal strength."

The more important challenge is that the digital retina system requires the end-side camera to have relatively universal video processing capabilities, especially AI-related video feature extraction capabilities. According to the current general understanding of the industry, this needs to be based on dedicated chips to provide cameras with more targeted AI computing power. This requires an overall coordination from the basic hardware to the algorithm layer and then to the industrial layer.

Secondly, we must face the fact that the digital retina system cannot be completed overnight. The vast urban camera technology in the country must be gradually replaced. This requires the priority establishment of more edge technology nodes in real-world scenarios, and the use of edge computing to meet the needs of end-side AI computing power, gradually transitioning to a complete digital retina system. In addition, how to re-extract and identify features for stored video data is also a problem, which may require video encoding software with better capabilities to execute.

[1] [2]

Reference address：Urban brain "eye disease" and upgrade: Analysis of the "digital retina" system proposed by Academician Gao Wen

Previous article：Magic stickers fool AI! Humans are "invisible", is the crisis of smart surveillance coming?
Next article：Ultra-high-definition video policies are frequently released to promote technological changes in the security industry

Popular Resources
Popular amplifiers