How did big data help the police catch the "arrogant Audi man"? | Interview with Minglue Huang Yan
Text | Guo Jia
Report from Leiphone.com (leiphone-sz)
When you think of road rage, what image comes to your mind?
On May 30, 2017, a video posted by Weibo user @写字李华良 sparked public outrage among many netizens.
The video shows that at around 9:30 am that day, when Li Hualiang was turning right in his car, he saw pedestrians crossing the zebra crossing in front of him (including the elderly and people pushing baby strollers), so he stopped to give way. However, the driver of a red Audi with a license plate number of Ji A02*** behind him thought he was blocking his right turn. So the Audi driver forcibly stopped him in anger, insulted and beat him after getting out of the car, and then drove away.
Afterwards, Li Hualiang immediately called the police and posted on Weibo, attaching the dashcam video.
As soon as the video was released, it was forwarded and commented on by a large number of netizens and media. The attacker became an instant hit on Weibo and was given the title #Arrogant Audi Man#. Netizens hope that the police will quickly bring the attacker to justice. After all, he not only beat people, but he is also likely to be driving under the influence, endangering public safety.
One day after the incident, the attacker had still not been caught, and some netizens began to question the response speed of the public security organs.
The car owner lost contact and the case was at a standstill
In fact, the process of solving the case was far more complicated than we imagined, and the investigation of the case was once at a stalemate.
During the investigation, the police discovered that the owner of the Audi involved in the case, identified through the license plate number, was not the attacker and had temporarily lost contact. Failure to contact the owner also meant that the attacker could not be found quickly. The task force needed to rely on the license plate number, which was a clue that was interrupted, and a blurry video to identify the target among the more than 10 million people in Shijiazhuang.
Although the "arrogant Audi man" was eventually arrested on June 1, the specific investigation process was rarely reported. Leifeng.com once found the following passage in public information:
Amid the stalemate, an intelligence system called "Kunlun Mirror" began to join the police in solving the case. Under the operation of the analysts, the police quickly grasped the key intelligence of the suspect based on the license plate number and through analysis paths other than looking for the car owner, locked in the attacker and directly pointed out the direction for the arrest.
What is the power of the mysterious "Kunlun Mirror" intelligence system? How does it help police officers solve cases?
The person who revealed the secrets of this big data system to me was Huang Yan, general manager of the Public Security Division of Minglue Data. Her company is one of the important builders of this system.
Huang Yan has been engaged in work related to police data since graduation. She told Leifeng.com that this seemingly simple case actually requires the concerted efforts of multiple relevant departments such as traffic control, intelligence, criminal investigation and jurisdictional bureaus.
There are two key points in the process of solving a case: one is whether you have enough information, and the other is how to unravel the key points of the case. In traditional case solving, the former requires a long and solid investigation, and the latter requires you to have rich investigation experience.
The situation should be similar to the plot in the TV series "Criminal Minds". Faced with a wall of complicated information, you need to find the most core connection and solve the case accordingly. For example, among the many pieces of information about the suspect and the victim, you need to find which key time point and key physical evidence are the breakthrough points for conviction.
So in reality, how was the "arrogant Audi man" found out through the intelligence system?
Because the details of the case were confidential, Huang Yan did not answer directly, but she gave an example to explain to me what the intelligence system actually does.
First, the intelligence system needs to integrate various information systems.
It turns out that the system's portrayal of a person may be like this: System A knows that I am a woman, System B knows that I have a master's degree, System C knows that I often travel on business and stay in hotels in various places, System D knows which flights I have taken, and System F may know who my daily contacts are... But before these systems are integrated, their portrayal of a person is one-sided. Only by integrating them can there be a more three-dimensional and comprehensive description.
The first thing an intelligence system needs to do is to integrate the data between various systems. This is just like if you want to cook a delicious meal, you must first prepare enough ingredients to avoid the situation of "a good cook cannot cook without rice".
In other words, in order to solve a case, the police need to have data not only from the public security system within the police station, but also from multiple systems of other commissions, bureaus or government departments.
Second, find out the key points to solve the case based on different scenarios.
The data is complex and diverse, and we cannot expect all police officers to have the sharp eyes of Sun Wukong, so we must pick out the key information related to solving the case. This is like if you want to make a delicious meal, in addition to having enough ingredients, you also have to consider how to match so many ingredients and how to cook them to make the taste the best.
The arrest of the "arrogant Audi man" was made by using the intelligence system to process the above two types of information. In fact, looking around the world, the construction of such a system has long been a precedent in the United States.
After 9/11, while the CIA (Central Intelligence Agency) and other departments were investigating various clues, several Stanford professors used computers to build a network of relationships based on massive amounts of public information. In the end, they locked on a group of suspected people and quickly released the results to the public.
This shocked the CIA and other departments because the results obtained by the professors had a lot of overlap with the results of the CIA's extensive investigation and interrogation. The authorities quickly flew to Stanford to question the professors, suspecting that they were involved in the terrorist attacks.
Since then, the possibility of using "human brain + computer" to analyze complex problems and assist in solving cases has begun to attract media attention.
Back to the question at the beginning of the article, how was the "arrogant Audi man" finally caught at the airport? Although Huang Yan did not reveal the details of the case, Leifeng.com speculated that this somewhat mysterious "Kunlun Mirror" intelligence system should be a platform that integrates information from multiple departments. After finding the owner from the license plate number, it can follow the clues to find out the family relationship, recent call records, related flight information, etc., and finally catch the attacker. (This is purely speculation)
The process of integrating information systems is actually full of difficulties.
Although the integration of information systems is a must for all big data systems, Huang Yan told Leifeng.com that the whole process was not smooth. The police data systems in some places are relatively scattered. In the integration process, not only must the different permissions between different systems be met, but also the different confidentiality requirements of each system must be met. These top data engineers encountered many difficulties in the tedious development process.
How to integrate data from different police forces is a challenging project. For example, some are responsible for drug control, some are responsible for catching economic crimes, and some are responsible for catching criminal crimes. The data between them are not completely public, and some data also involve citizens' privacy. They cannot be shown to others casually. Detailed permissions must be established. This is a real problem.
During the actual integration process, Huang Yan and her colleagues found that the higher the degree of local optimization of a single system, the more complex and impossible the final integration work would be. There might even be a lot of duplication of construction and waste of resources. There were many detailed issues to be resolved throughout the entire process.
At first, some engineers were not used to this somewhat basic and cumbersome way of working. For example, basic search tools are also part of the intelligence system, but for some engineers, they think that a high-tech company does not need to do such very basic work, such as customized requirements, which are inconsistent with their identity as developers, but from the perspective of the police, this is a practical need.
The final solution to this problem is actually a bit "simple and crude", but very effective. The engineer himself actually experiences the specific case handling process, solves the problems encountered in this process, and thus understands the police's case handling logic.
In the specific information integration process, engineers will also think about why they need to query suspicious persons, suspicious relationships, and suspicious tags in this way? In what business scenarios will this query be needed? What is the next action after the query? Which type of police do they need to interact with? Is it necessary for the police station to assist in the investigation? This series of questions should be optimized in the product.
In the process of going deep into the front line, these big data engineers finally achieved the transformation from engineer thinking to police thinking.
To truly realize big data crime solving, these points need to be broken through
For Huang Yan and her colleagues, a big data company with a "police mentality" has always been their role model. It is called "Palantir". This American company has been serving intelligence agencies such as the CIA and FBI for many years. Recently, in a ranking of Silicon Valley intern salaries, Palantir surpassed a number of big companies such as Google and Apple and ranked first. In other words, working in police intelligence is very lucrative!
According to Leifeng.com, after the Iraq War, anti-terrorism became one of the most urgent needs of the government. Although intelligence agencies such as the CIA and FBI have thousands of databases, including financial data, DNA samples, voice data, video clips, and maps from all over the world, it is very time-consuming to establish connections between these data. How to quickly find valuable clues from the vast sea of data and get advance information about possible terrorist attacks requires a very high level of technical skills from the intelligence department.
The founding team of "Palantir" believes that Silicon Valley's technology is more advanced than that of government contractors because the government does not have access to the best engineers. If they build a data analysis library and integrate separate databases for search and analysis to improve data analysis efficiency, they may be able to "sell" this technology to the government.
What made the low-key "Palantir" famous overnight was that it identified the Ponzi scheme of Wall Street tycoon Madoff through big data analysis and assisted the US authorities in finding Bin Laden's hiding place . At that time, many media used headlines such as "Bin Laden was finally killed by big data" to report this explosive news.
As big data analysis has played a huge role in many cases, the integration and analysis of police system data has become a major trend both at home and abroad. However, Huang Yan believes that the view that "data management is big data" is still far from real big data. At present, in order to truly solve cases with big data, the following three points need to be broken through.
First, data integration requires policy support, which takes time.
The more data is available, the more clues can be obtained for solving cases. However, the decision on whether to integrate the data of various departments does not lie in Minglue's hands. As a third-party service company, it does not have the authority to require the traffic management departments of Beijing and Hainan Province to integrate each other's data. If it wants to push this matter forward, it must go through top-level design.
Huang Yan told me that China's public security system is currently still based on a decentralized model, which is a local financial construction model. It is not yet so centralized and centralized, but in the future, large-scale centralization is definitely a trend.
In the projects over the past two years, the team can clearly feel the effect of the top-down policy push initiated by the Ministry of Public Security. Data from various places are being collated and summarized more and more. Without the promotion of these policies and the top-down emphasis on data, there would not be companies like Minglue, and there would not be the rapid development in the past two years. It is the entire environment that has led to the development of the industry.
Second, we need to conduct in-depth exploration and understanding of the business.
For any industry to implement big data, it is far from enough to just build a data resource service platform. This is a process that requires repeated communication and co-creation with business personnel.
In specific projects, we need to do more than just data processing and analysis. We also need to pay more attention to the industry's business development model and focus on analyzing the details. These abilities to gain insight into the industry can only make great progress after accumulating a certain number of customers.
Third, although analytical tools are important, human judgment is also very important, and we must maximize the effectiveness of the combination of man and machine.
Once we have the data and have created a specific business model, we need to consider how to bring out the strongest advantages of both humans and machines .
In the process of data processing, if every piece of data needs to be processed by humans, it will not work due to the waste of manpower. But at the same time, data processing and analysis requires human recognition ability. How to coordinate the relationship between humans and machines is the core problem that needs to be solved.
“It doesn’t matter whether the cat is black or white, as long as it catches mice, it is a good cat.” This old saying is also applicable to the construction of the police system. No matter how much high technology is used, the ultimate test is whether it can help the police catch the bad guys in the end.
Minglue Data founder Wu Minghui's father was a policeman. Since childhood, Wu has understood the hard work of policemen better than others. A major case requires hundreds of policemen to analyze intelligence and find clues, which is a very painful process. Therefore, when Wu founded Minglue, he came up with the idea of building a big data network platform to allow artificial intelligence to assist police in handling cases.
He thought that with the help of data and intelligence, perhaps his father and police officers like his father would not have to work so hard to solve cases.
Follow Leiphone.com (leiphone-sz) and reply 2 to add the reader group and make a friend