The KDD Best Paper Award was awarded to a Chinese mainland institution for the first time! DAMO Academy's open source work won an award for federated graph learning

Latest update time：2022-08-19 17:21

Reads：

Mingmin House is from Aofei Temple
Qubit | Official account QbitAI

Just now, all KDD 2022 awards were officially announced!

As the highest academic conference in the field of data mining and knowledge discovery, whoever wins the KDD award every year triggers heated discussions in the academic community.

This year, the performance of the Chinese team is still impressive.

Qiu Jiezhong of Tsinghua University won the runner-up for the Doctoral Thesis Award, becoming the first recipient from an Asian university.

The Intelligent Computing Laboratory of Alibaba Damo Academy won the Best Paper Award in the direction of applied data science . It is the first time that a Chinese industrial research team has independently won this award.

The paper proposes a library for federated graph learning, FederatedScope-GNN.

The organizer SIGKDD commented that it “promoted the development of federated graph learning”.

Here, Quantum位 found the first author of the paper, Wang Zhen, and the corresponding author, Li Yaliang, and talked about their research on the paper and more things behind it~

Use federated learning capabilities on graph data

The core of this winning paper focuses on federated graph learning.

Simply put, it combines the advantages of graph learning and federated learning.

In recent years, as more and more application scenarios have increased the demand for privacy protection, federated learning has become increasingly popular.

It allows users to conduct joint training in the cloud by exchanging model parameters or intermediate results while keeping the data locally at all times, ultimately allowing multiple users to complete model training.

This is what is often referred to as making “data available but not visible”, thereby avoiding the “data island” problem.

At present, Google's Tensorflow Federated (TFF) and WeBank's FATE are currently popular open source federated learning frameworks.

However, existing federated learning work focuses more on the fields of vision and natural language, and has relatively limited support for graphs .

You know, graphs have great advantages in representing complex relationships.

It is a data structure composed of two parts: node and edge, and is used to describe the relationship between objects .

In daily life, you can think of each social account as a node. Predicting whether two accounts have a friend relationship means predicting whether there is a connection between the two nodes, so as to recommend "people you may know" to you.

However, traditional neural networks accept data in geometric space as input and cannot process data structures such as graphs.

In response to this situation, graph neural network was proposed. It can use neural network to perform deep feature extraction and other operations on graphs, thereby achieving better reasoning and prediction effects.

Commonly used scenarios include e-commerce, drug research and development, finance, Internet social networking, etc. In these scenarios, the demand for data protection is often great.

For example, in a bank anti-money laundering scenario, it is necessary to predict whether each account is a risk account, but the account information of each bank cannot be disclosed to each other.

△ Bank anti-money laundering scenario

In the process of drug development, different manufacturers only have access to part of the molecular map, and everyone needs to share information to complete the research and development tasks, but their respective data must be kept confidential.

All of the above makes everyone very eager for federated graph learning algorithms.

In this context, DAMO Academy applied graph learning to federated learning in this study.

FederatedScope-GNN (hereinafter referred to as FS-G) is proposed based on the open source federated learning framework FederatedScope (hereinafter referred to as FS) of DAMO Academy .

First, FS-G provides a unified view and flexibly supports the exchange of heterogeneous data.

Thanks to the event -driven programming paradigm of the underlying framework FS , a variety of message exchanges and rich behaviors of participants can be implemented in modular splits - FS-G allows flexible and rich modular behaviors.

Secondly, FS-G provides DataZoo and ModelZoo for graph learning .

The former provides users with rich and diverse federated graph data sets, while the latter provides corresponding models and algorithms.

In addition, DataZoo has also implemented a large number of different types of splitters. Even in a stand-alone scenario, through the registration mechanism provided by FS-G, developers can easily move stand-alone code to a federated scenario for reuse.

Furthermore, in response to the phenomenon that federated graph learning is sensitive to hyperparameters, FS-G also implements an efficient model tuning component .

These include the multi-fidelity Successive Halving Algorithm and the recently proposed federated hyperparameter optimization algorithm FedEx, as well as personalization for federated heterogeneous tasks.

△ A personalized graph neural network example

Because each participant is allowed to use an independent unique neural architecture and only aggregate the shared parts, FS-G allows developers to adopt different asynchronous training strategies based on actual conditions.

Finally, FS-G also provides a rich set of privacy assessment algorithms to test the algorithm's ability to protect privacy.

Submitting papers during Spring Festival

Regarding winning the award, the first author of the paper, Wang Zhen, said, "I am definitely happy and feel that my work has been recognized."

Corresponding author Li Yaliang said that because he saw how much effort the team put into this, he felt that it all seemed more like a natural progression.

In fact, preparations for this work began more than a year ago.

At that time, the team had insight into the development trend of the privacy-preserving computing industry. As technicians, they naturally thought of starting with tools to push this research wave forward faster.

Therefore, FederatedScope is put on the agenda, and FS-G is one of the very important parts.

As mentioned earlier, federated graph learning can meet a wider range of needs in application scenarios, but it is also more complex.

It just so happens that Dr. Wang Zhen is very good at research on graph learning.

The knowledge graph completion algorithm TransH, which has been cited more than 2,500 times , was completed by him as the first author.

At that time, he was pursuing a doctorate degree at the School of Data Science and Computer Science at Sun Yat-sen University , and completed this paper through the joint training program of Microsoft Research Asia.

After graduating with a Ph.D., Wang Zhen joined Alibaba and served as a senior algorithm engineer at Alibaba Cloud.

As the main developer, Wang Zhen participated in the research and development of the A3gent reinforcement learning component of Alibaba's machine learning platform PAI and open sourced it as the EasyRL project.

During the same period, he also participated in the co-construction of the Ray RLLib project at Berkeley University and was recognized by the community as a project committer.

After that, Wang Zhen joined Damo Academy and began to focus on research in the field of federal diagrams. He has achieved high rankings in KDD Cup competitions many times, and published many papers in top international conferences such as ICLR and WWW.

But even if there are outstanding scholars, since federated graph learning is a very cutting-edge field, and some basic work in the field has not been completed, the federated graph learning algorithm itself will be more difficult than ordinary federated learning algorithms, so the development of FS-G The difficulty is not small.

Wang Zhen mentioned that initially they didn't even have a usable data set.

In addition, compared with other data types, graph data has more risks in heterogeneous message exchanges; each participant in federated learning will also have richer behaviors to process this information.

Therefore, the research team needs to use a different programming paradigm on the graph federation algorithm and design a solution to maximize its effectiveness in the graph federation, which is different from conventional development.

Behind this, more manpower and time investment are needed.

Li Yaliang, the corresponding author of the paper, recalled that this year’s KDD paper was submitted on the tenth day of the Lunar New Year.

At that time, the entire team was excitedly busy submitting the paper and almost did not take a break during the New Year .

And all this energy invested was finally reflected in the results of the paper.

As you can see, FS-G contains a rich set of federated graph data sets and corresponding models and algorithms. And it allows developers without federated learning background to use FS-G freely.

This has done a lot of basic work for subsequent research and can be said to have established a new benchmark for federated graph learning .

Li Yaliang also said during the conversation that the completion of basic work can attract more researchers to participate in federal graph learning research.

I think this is a major reason why our work was recognized by the organizing committee.

It is worth mentioning that Li Yaliang, as the corresponding author of this achievement, was also responsible for the open source work of FederatedScope.

He is now a senior algorithm expert at the Intelligent Computing Laboratory of DAMO Academy.

He graduated with a PhD from the State University of New York at Buffalo in 2017. His research fields cover data fusion, causal inference, automatic machine learning, privacy-preserving computing and other fields.

He has served as the field chairman of NeurIPS'21, NeurIPS'21, and AAAI'22, organized workshops three times at IJCAI and NeurIPS, organized the AnalytiCup competition at CIKM'22, and conducted tutorials at KDD and AAAI many times.

According to him, FederatedScope is now open source version 0.2.0.

The new version can better support asynchronous federated learning on a large scale and is more user-friendly.

One More Thing

Finally, it’s the welfare moment~

After talking about the relevant content of the award-winning paper, we also asked two big guys about their learning experience in AI research. Everyone, hurry up and copy the homework!

First of all, both scholars said that mathematics is crucial if you want to learn AI well .

Li Yaliang mentioned that he has observed that the mathematical abilities of many students and interns have declined in recent years, which is actually very worthy of attention.

Now that many tools have become easier to use, everyone has begun to pursue short, fast and simple things, ignoring more in-depth and essential knowledge learning. In fact, mathematics as a basic ability and coding as an engineering ability are both indispensable.

Secondly, everyone is concerned about how to read the paper.

Wang Zhen said that reading good papers is the key.

You must first learn to identify what is a good paper, and then spend your time on the cutting edge.

And compared to reading papers, Li Yaliang encourages everyone to read more. Because books will help everyone better build a knowledge system.

Now, even though they have graduated many years ago, reading activities are often organized in the Intelligent Computing Laboratory of DAMO Academy.

I recommend you to read Fundation of Machine Learning! I believe that both beginners and experts will gain more new insights from this book.

In addition to learning experience, we also asked the big guys what their hobbies are.

As a result, they all said that their research is driven by interest, so they usually love to delve into it.

Have you failed to learn this?

-over-

"Artificial Intelligence" and "Smart Car" WeChat communities invite you to join!

Friends who are interested in artificial intelligence and smart cars are welcome to join us, communicate and discuss with AI practitioners, and not miss the latest industry development & technological progress.

ps. Please be sure to note your name, company and position when adding friends~

click here