GitHub offers free machine learning code vulnerability scanning, now supports JavaScript/TypeScript
Xiaocha from Aofei Temple
Quantum Bit | Public Account QbitAI
Today, GitHub updated an experimental new feature.
With the help of machine learning, the new version of CodeQL code scanning service can help developers discover more security vulnerabilities.
Currently developing and testing on JavaScript and TypeScript repositories, support for various languages will be gradually added in the future.
During testing, CodeQL has discovered more than 20,000 security issues from 12,000 repositories, including remote code execution (RCE), SQL injection, and cross-site scripting (XSS) vulnerabilities.
how to use
GitHub's CodeQL code scanning is free for public repositories.
The new JavaScript/TypeScript analysis tools are now available to all users of the security-extended and security-and-quality analysis suites.
If you are already using these suites, your analysis will automatically be performed using new machine learning techniques.
If you haven’t used it before, you can enable CodeQL by following the steps below.
1. Under your repository home page, click Security .
3. To the right of Code scanning alerts , click Set up code scanning . If this item is missing, GitHub Advanced Security needs to be enabled by the repository administrator.
4. Under “Get started with code scanning”, click Set up this workflow in CodeQL Analysis .
5. Use the Start commit drop-down menu, enter the file name and commit.
6. Choose whether to commit directly to the default branch, or to create a new branch and start a pull request.
8. Click Submit New File.
After the code scanning analysis is successful, the user will see security alert information in the "Security" tab.
Why ML can produce better results
To detect vulnerabilities in a repository, the CodeQL engine first builds a database that encodes a special relational representation of the code and then executes a series of CodeQL queries on the database.
But with the rapid development of the open source ecosystem, the long tail effect is becoming more and more obvious.
Security experts continue to extend and improve these queries to model other common libraries and known patterns. However, manual modeling is time consuming, and there will always be less common libraries and private code that cannot be modeled manually.
This is where machine learning comes in handy.
Given a large number of training code snippets, each query is labeled as a positive or negative example, features are extracted for each snippet, and a deep learning model is trained to classify new examples.
Instead of simply treating each code snippet as a string of words or characters and directly applying standard NLP techniques to classify these strings, GitHub uses CodeQL to access a wealth of information about the underlying source code, generates a rich set of features for each code snippet, and then tags and sub-tags them like NLP.
A vocabulary is thus generated from the training data, and the index list is input into the deep learning classifier, which outputs the probability that the current sample is each vulnerability.
Although ML-based vulnerability scanning is currently only available for JavaScript/TypeScript, GitHub promises to support more languages in the future . CodeQL currently supports many popular languages including Python, Go, and C/C++.
Finally, GitHub also emphasized that although the new tool can find more vulnerabilities, it may also increase the false positive rate (recall rate is about 80% and precision is about 60%). This feature will improve over time in the future.
Reference links:
[1]
https://github.blog/2022-02-17-code-scanning-finds-vulnerabilities-using-machine-learning/
[2]
https://github.blog/2022-02-17-leveraging-machine-learning-find-security-vulnerabilities/
[3]
https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/setting-up-code-scanning-for-a-repository
-over-
The "Artificial Intelligence" and "Smart Car" WeChat communities invite you to join!
Friends who are interested in artificial intelligence and smart cars are welcome to join us, communicate and exchange ideas with AI practitioners, and not miss the latest industry developments and technological advances.
ps. Please be sure to note your name, company and position when adding friends~
click here