Article count:10350 Read by:146647018

Account Entry

GitHub offers free machine learning code vulnerability scanning, now supports JavaScript/TypeScript

Latest update time:2022-03-09
    Reads:
Xiaocha from Aofei Temple
Quantum Bit | Public Account QbitAI

Today, GitHub updated an experimental new feature.

With the help of machine learning, the new version of CodeQL code scanning service can help developers discover more security vulnerabilities.

Currently developing and testing on JavaScript and TypeScript repositories, support for various languages ​​will be gradually added in the future.

During testing, CodeQL has discovered more than 20,000 security issues from 12,000 repositories, including remote code execution (RCE), SQL injection, and cross-site scripting (XSS) vulnerabilities.

how to use

GitHub's CodeQL code scanning is free for public repositories.

The new JavaScript/TypeScript analysis tools are now available to all users of the security-extended and security-and-quality analysis suites.

If you are already using these suites, your analysis will automatically be performed using new machine learning techniques.

If you haven’t used it before, you can enable CodeQL by following the steps below.

1. Under your repository home page, click Security .

3. To the right of Code scanning alerts , click Set up code scanning . If this item is missing, GitHub Advanced Security needs to be enabled by the repository administrator.

4. Under “Get started with code scanning”, click Set up this workflow in CodeQL Analysis .

5. Use the Start commit drop-down menu, enter the file name and commit.

6. Choose whether to commit directly to the default branch, or to create a new branch and start a pull request.

8. Click Submit New File.

After the code scanning analysis is successful, the user will see security alert information in the "Security" tab.

Why ML can produce better results

To detect vulnerabilities in a repository, the CodeQL engine first builds a database that encodes a special relational representation of the code and then executes a series of CodeQL queries on the database.

But with the rapid development of the open source ecosystem, the long tail effect is becoming more and more obvious.

Security experts continue to extend and improve these queries to model other common libraries and known patterns. However, manual modeling is time consuming, and there will always be less common libraries and private code that cannot be modeled manually.

This is where machine learning comes in handy.

Given a large number of training code snippets, each query is labeled as a positive or negative example, features are extracted for each snippet, and a deep learning model is trained to classify new examples.

Instead of simply treating each code snippet as a string of words or characters and directly applying standard NLP techniques to classify these strings, GitHub uses CodeQL to access a wealth of information about the underlying source code, generates a rich set of features for each code snippet, and then tags and sub-tags them like NLP.

A vocabulary is thus generated from the training data, and the index list is input into the deep learning classifier, which outputs the probability that the current sample is each vulnerability.

Although ML-based vulnerability scanning is currently only available for JavaScript/TypeScript, GitHub promises to support more languages ​​in the future . CodeQL currently supports many popular languages ​​including Python, Go, and C/C++.

Finally, GitHub also emphasized that although the new tool can find more vulnerabilities, it may also increase the false positive rate (recall rate is about 80% and precision is about 60%). This feature will improve over time in the future.

Reference links:
[1]
https://github.blog/2022-02-17-code-scanning-finds-vulnerabilities-using-machine-learning/
[2] https://github.blog/2022-02-17-leveraging-machine-learning-find-security-vulnerabilities/
[3] https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/setting-up-code-scanning-for-a-repository

-over-

The "Artificial Intelligence" and "Smart Car" WeChat communities invite you to join!

Friends who are interested in artificial intelligence and smart cars are welcome to join us, communicate and exchange ideas with AI practitioners, and not miss the latest industry developments and technological advances.

ps. Please be sure to note your name, company and position when adding friends~


click here


Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号