GitHub offers free machine learning code vulnerability scanning, now supports JavaScript/TypeScript

Latest update time：2022-03-09

Reads：

Xiaocha from Aofei Temple
Quantum Bit | Public Account QbitAI

Today, GitHub updated an experimental new feature.

With the help of machine learning, the new version of CodeQL code scanning service can help developers discover more security vulnerabilities.

Currently developing and testing on JavaScript and TypeScript repositories, support for various languages will be gradually added in the future.

During testing, CodeQL has discovered more than 20,000 security issues from 12,000 repositories, including remote code execution (RCE), SQL injection, and cross-site scripting (XSS) vulnerabilities.

how to use

GitHub's CodeQL code scanning is free for public repositories.

The new JavaScript/TypeScript analysis tools are now available to all users of the security-extended and security-and-quality analysis suites.

If you are already using these suites, your analysis will automatically be performed using new machine learning techniques.

If you haven’t used it before, you can enable CodeQL by following the steps below.

1. Under your repository home page, click Security .

3. To the right of Code scanning alerts , click Set up code scanning . If this item is missing, GitHub Advanced Security needs to be enabled by the repository administrator.

4. Under “Get started with code scanning”, click Set up this workflow in CodeQL Analysis .

5. Use the Start commit drop-down menu, enter the file name and commit.

6. Choose whether to commit directly to the default branch, or to create a new branch and start a pull request.

8. Click Submit New File.

After the code scanning analysis is successful, the user will see security alert information in the "Security" tab.

Why ML can produce better results

To detect vulnerabilities in a repository, the CodeQL engine first builds a database that encodes a special relational representation of the code and then executes a series of CodeQL queries on the database.

But with the rapid development of the open source ecosystem, the long tail effect is becoming more and more obvious.

Security experts continue to extend and improve these queries to model other common libraries and known patterns. However, manual modeling is time consuming, and there will always be less common libraries and private code that cannot be modeled manually.

This is where machine learning comes in handy.

Given a large number of training code snippets, each query is labeled as a positive or negative example, features are extracted for each snippet, and a deep learning model is trained to classify new examples.

Instead of simply treating each code snippet as a string of words or characters and directly applying standard NLP techniques to classify these strings, GitHub uses CodeQL to access a wealth of information about the underlying source code, generates a rich set of features for each code snippet, and then tags and sub-tags them like NLP.

A vocabulary is thus generated from the training data, and the index list is input into the deep learning classifier, which outputs the probability that the current sample is each vulnerability.

Although ML-based vulnerability scanning is currently only available for JavaScript/TypeScript, GitHub promises to support more languages in the future . CodeQL currently supports many popular languages including Python, Go, and C/C++.

Finally, GitHub also emphasized that although the new tool can find more vulnerabilities, it may also increase the false positive rate (recall rate is about 80% and precision is about 60%). This feature will improve over time in the future.

Reference links:
[1] https://github.blog/2022-02-17-code-scanning-finds-vulnerabilities-using-machine-learning/
[2] https://github.blog/2022-02-17-leveraging-machine-learning-find-security-vulnerabilities/
[3] https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/setting-up-code-scanning-for-a-repository

-over-

The "Artificial Intelligence" and "Smart Car" WeChat communities invite you to join!

Friends who are interested in artificial intelligence and smart cars are welcome to join us, communicate and exchange ideas with AI practitioners, and not miss the latest industry developments and technological advances.

ps. Please be sure to note your name, company and position when adding friends~

click here

Latest articles about

■AI venom is all over Douyin and Xiaohongshu! Xianyu generates it for 10 yuan per time, but the official website is actually free

■The space-based intelligent version of ImageNet is here! Produced by Fei-Fei Li and Jia-Jun Wu’s team

■Multimodal models can be connected to the Internet without fine-tuning. A plug-and-play new framework is more effective than closed-source commercial solutions.

■Last week! 2024 Artificial Intelligence Annual Selection, the industry pioneers in the AI era are waiting for you

■The world's first legal o1 big model is released, slow thinking legal experts under the System2 paradigm | HKUST & Peking University

■Tsinghua University and Xiamen University proposed the "infinite length context" technology, which can find a needle in a million haystacks and make Llama\Qwen\MiniCPM score high

■Domestic AI can now shoot micro-movies! 4K, 60fps high-definition picture quality, with built-in sound effects

■Ant Group’s front-end technology team shares: What opportunities and changes will front-end development usher in under the wave of AI?

■AI protein published in Nature again after winning the Nobel Prize, with first-principles-level accuracy, a 4-year effort by Microsoft Research Asia

■A pop-up window confused Claude, and he suddenly couldn't use the computer | Stanford & HKU new research

GitHub offers free machine learning code vulnerability scanning, now supports JavaScript/TypeScript

Xiaocha from Aofei Temple Quantum Bit | Public Account QbitAI

how to use

Why ML can produce better results

Latest articles about

Xiaocha from Aofei Temple
Quantum Bit | Public Account QbitAI