SMP 2018 first day, an overview of the four major theme reports of the cutting-edge technology workshop
▲Click above Leifeng.com Follow
Text | AI Technology Review
Report from Leiphone.com (leiphone-sz)
The 7th National Social Media Processing Conference (SMP 2018), hosted by the Social Media Processing Committee of the Chinese Information Processing Society and organized by Harbin Institute of Technology, will be held in Harbin from August 2 to 4, 2018. Leifeng.com (official account: Leifeng.com) will provide cooperative coverage as the exclusive strategic media. SMP focuses on scientific research and engineering development with social media processing as the theme, providing a broad communication platform for the dissemination of the latest academic research and technological achievements in social media processing, aiming to build an industry-university-research ecosystem in the field of social media processing and become a benchmark for social media processing in China and even the world.
The 10th Advanced Technology Workshop (ATT 10) was held on August 2. Four well-known scholars were invited to give lectures on network representation learning, causal inference, deep reinforcement learning, and data visualization. The workshop was chaired by Assistant Professor Yang Yang from Zhejiang University.
In the morning workshop session, Associate Professor Song Guojie from the School of Information Science and Technology of Peking University gave a presentation on "Large-Scale Network Representation Learning", and gave a detailed and systematic explanation of the research on network representation learning.
Image source: Li Jiaqi, SCIR, Harbin Institute of Technology
A large amount of data in the real world exists in the form of networks. Although the computing power of computers is constantly increasing, considering various factors such as the high dimensionality of data, the sparsity of data, and the large volume of data, how to conduct machine learning and data mining research around large-scale network data has also become an important issue that industry and academia are paying close attention to.
He first reviewed the development process from the perspective of linearity and nonlinearity, and emphasized that the research goals of representation learning are mainly focused on two aspects: one is to restore the relationship between the original nodes of the network; the other is to maintain the properties of the nodes in the network space. In the lecture, he also introduced four classic representation learning methods, including Word2Vec, Adjacency-based similarity, LINE and Random-walk Approach.
He summarized the characteristics of each study and further extended network representation learning from multiple aspects, introducing a series of representative progress from static data to dynamic data (such as depthLGP, Dynamic Triad Model, etc.), from nodes to communities (such as M-NMF, etc.), from homogeneity to heterogeneity (such as meta path, etc.).
Afterwards, he introduced the team's related work in depth from three perspectives: multi-level network representation learning, dynamic network representation learning, and entity standardization based on network representation learning. Finally, he suggested that more research could be carried out in the future around Graph Neural Network, large-scale Network Embedding, and expanding embedding space.
Next, Associate Professor Meng Tianguang from the Department of Political Science at Tsinghua University gave a keynote speech entitled "New Advances in Computational Social Science: From Exploratory Analysis to Causal Inference".
Image source: Li Jiaqi, SCIR, Harbin Institute of Technology
At the beginning of the report, he elaborated on the relationship between big data analysis and causal inference. He said that big data analysis is oriented towards knowledge discovery, and data mining is to automatically extract patterns from data, and then convert them into knowledge that can be understood by the end user through interpretation and evaluation. Big data analysis from the perspective of causal inference includes descriptive inference, causal inference, and mechanistic inference.
He further stated that there are five reasons why causal relationships are important in computational social science: first, curiosity drives it; second, explanatory knowledge is more critical; third, social sciences need to be applied to social scenarios; fourth, identifying good causal relationships can help us make predictions more effectively; and fifth, data mining needs to be given social significance.
After bringing progress in computational social science methodology, he also responded to some current criticisms of big data methods, such as exploring "correlation" rather than "causality", and involving personal privacy protection issues during data collection. He also pointed out that big data methods also bring many opportunities. For example, data modalities are more diverse, it is "full data" rather than "sample data", it is "real data" rather than "designed data", the data contains rich spatiotemporal information that can be used for data fusion, etc. In terms of economy, it also has the three characteristics of low cost, timeliness and high efficiency, and it is also very advantageous in academic influence.
Afterwards, he elaborated on the four development directions of using big data for causal inference. The first is big data + econometric analysis, which is to use big data methods to reduce dimensions and measure, and then do regression and matching, etc. The second is big data + small data analysis, which is to extract small samples from big data to further test model assumptions. The third is big data + space-time model to perform some causal inference and visualization. The fourth is big data analysis + experimental design.
He said that there are a series of tools for big data analysis and causal inference: for example, statistical analysis methods such as principal component analysis, linear regression, nonlinear regression, spatial measurement, etc., as well as some experimental methods such as field experiments and natural experiments.
Finally, he elaborated on several methods and tools in these four directions and gave a series of examples, such as text matching, case registration system, etc.
In the afternoon, Associate Professor Huang Minlie from the Department of Computer Science at Tsinghua University shared "Deep Reinforcement Learning and Its Application in Natural Language Processing". He first introduced the basic concept of reinforcement learning. As "the first model to learn through interaction", reinforcement learning gives different rewards to strategies and reaches the optimal strategy in the process of trial and error. Due to its characteristics of sequential decision-making, trial and error, and delayed rewards, deep reinforcement learning has a wide range of application scenarios in many fields such as games, robots, and autonomous driving.
Image source: Li Jiaqi, SCIR, Harbin Institute of Technology
Based on value-based (Q-Learning), policy based and actor-critic methods, he elaborated on their representative methods and basic ideas. In his speech, he also summarized the main characteristics of reinforcement learning: 1) current decisions will affect future decisions; 2) the training process of reinforcement learning is essentially a trial and error process; 3) it is guided by the maximization of long-term rewards.
When reinforcement learning is applied to the field of NLP, it faces challenges at various levels, such as discrete feedback and high dimensionality of action space. However, in scenarios without direct supervision information and weak signals, the trial and error and probabilistic exploration capabilities of reinforcement learning can be used to encode prior or domain knowledge to achieve learning goals. Accordingly, from the retrieval and reasoning level, reinforcement learning can be used to extract models and texts; from the sample selection level, sample denoising and labeling error correction can be performed; in addition, in terms of strategy optimization, search strategy optimization and language generation can also be explored.
In the end, he summarized the key points of reinforcement learning in natural language processing applications, including 1) transforming tasks into natural sequence decision problems; 2) clarifying the "trial and error" nature of reinforcement learning; 3) adding prior knowledge to rewards; 4) being effective in unsupervised or weakly supervised scenarios. But at the same time, we should also see the importance of warm start, and also consider the limited effect improvement under full supervision and large action space problems, which also puts higher requirements on researchers in terms of training skills and parameter adjustment.
The last speaker was Cao Nan, professor at the School of Design and Innovation at Tongji University and director of the Intelligent Big Data Visualization Laboratory. He gave a series of introductions to data visualization and its application in anomaly detection.
Image source: Li Jiaqi, SCIR, Harbin Institute of Technology
At the beginning of his speech, he gave a brief introduction to Tongji University's Intelligent Big Data Visualization Laboratory. The laboratory spans multiple disciplines, and its research areas include data visualization, human-computer interaction, and machine learning. It is currently recruiting students.
Then, he introduced the basic concept of data visualization. An important function of visualization is data interpretation. When the amount of data is very large and the results are very complex, visualization can play a significant role in understanding the data. He said that in a broad sense, any technology that can create images, animations, etc. can be called visualization. Data visualization is a branch of visualization. Data visualization is divided into three sub-fields: scientific visualization, infographics, and information visualization. The focus here is on information visualization.
He cited a picture of Napoleon's march to Moscow to illustrate the role of visualization. This picture uses a two-dimensional chart to clearly show five or six-dimensional information. He emphasized that information visualization is not art, nor is it computer graphics, nor is it image processing. Instead, it revolves around data and reveals the true meaning of data. Statistical analysis will conceal the true meaning of data, and visualization can help observe data in context.
He mentioned three challenges of big data visualization: visual confusion, performance bottlenecks, and limited human cognition. He then elaborated on several key points in creating visualization: understanding the data, knowing the users and tasks; the design must be authentic, expressive, and elegant; layout, that is, solving the optimization equation, but due to time constraints, it is often impossible to get the global optimal solution; in addition, in order for people to observe the changes in the data, animation is necessary.
After that, he introduced some popular open source visualization toolkits, such as D3.js and Tableau. For learning visualization-related knowledge, he recommended the book "Visualization Analysis & Design". In addition, he introduced important visualization-related academic conferences, including IEEE InfoVis/VAST/SciVis.
After introducing a series of basic visualization concepts, he mentioned using visualization to find abnormal users in social media. He said that the behavior of anonymous users may threaten the entire community, and it is of great significance to find these abnormal users at this time. There are two challenges at this time: it is difficult to define what is normal and abnormal, and it is difficult to obtain labeled data to train the model. After that, he cited a series of work in their laboratory on anomaly detection. The research is mainly divided into two stages. The first stage is the analysis of group abnormal behavior, and the second stage is the analysis of individual anomalies. Their previous related work includes FluxFlow rumor detection, TargetVue user behavior profiling, etc. After that, he also introduced the anomaly detection related competition Bot Design/Detection.
So far, the content of the workshop has come to an end. In the next two days, SMP 2018 will have six special reports, eight sub-forums, technical evaluations, oral reports and many other exciting sessions. Leifeng.com will also continue to bring you special reports, so stay tuned.
Follow Leiphone.com (leiphone-sz) and reply 2 to add the reader group and make a friend
Featured Posts