About CodeQL ¶

CodeQL is a language and toolchain for code analysis. It is designed to allow security researchers to scale their knowledge of a single vulnerability to identify variants of that vulnerability across a wide range of codebases. It is also designed to allow developers to automate security checks and integrate them into their development workflows.

Resources for learning CodeQL¶

CodeQL docs site: contains information on the CodeQL language and libraries, with tutorials and guides to help you learn how to write your own queries.
- CodeQL queries: A general, language-neutral overview of the key components of a query.
- QL tutorials: Solve puzzles to learn the basics of QL before you analyze code with CodeQL. The tutorials teach you how to write queries and introduce you to key logic concepts along the way.
- CodeQL language guides: Guides to the CodeQL libraries for each language, including the classes and predicates that are available for use in queries, with worked examples.
GitHub Security Lab: is GitHub’s own security research team. They’ve created a range of resources to help you learn how to use CodeQL to find security vulnerabilities in real-world codebases.
- Secure code game: A series of interactive sessions that guide you from finding insecure code patterns manually, through to using CodeQL to find insecure code patterns automatically.
- Security Lab CTF: A series of Capture the Flag (CTF) challenges that are designed to help you learn how to use CodeQL to find security vulnerabilities in real-world codebases.
- Security Lab blog: A series of blog posts that describe how CodeQL is used by security researchers to find security vulnerabilities in real-world codebases.

About variant analysis¶

Variant analysis is the process of using a known security vulnerability as aseed to find similar problems in your code. It’s a technique that securityengineers use to identify potential vulnerabilities, and ensure these threatsare properly fixed across multiple codebases.

Querying code using CodeQL is the most efficient way to perform variantanalysis. You can use the standard CodeQL queries to identify seedvulnerabilities, or find new vulnerabilities by writing your own custom CodeQLqueries. Then, develop or iterate over the query to automatically find logicalvariants of the same bug that could be missed using traditional manualtechniques.

When you have a query that finds variants of a vulnerability, you can use multi-repository variant analysis to run that query across a large number of codebases, and identify all of the places where that vulnerability exists. For more information, seeRunning CodeQL queries at scale with multi-repository variant analysis in the GitHub docs.

CodeQL analysis¶

CodeQL analysis consists of three steps:

Preparing the code, by creating a CodeQL database
Running CodeQL queries against the database
Interpreting the query results

For information on the CodeQL toolchain and on running CodeQL to analyze a codebase, see theCodeQL CLI,CodeQL for Visual Studio Code, andAbout code scanning with CodeQL in the GitHub docs.

Database creation¶

To create a database, CodeQL first extracts a single relational representationof each source file in the codebase.

For compiled languages, extraction works by monitoring the normal build process.Each time a compiler is invoked to process a source file, a copy of that file ismade, and all relevant information about the source code is collected. This includessyntactic data about the abstract syntax tree and semantic data about namebinding and type information.

For interpreted languages, the extractor runs directly on the source code,resolving dependencies to give an accurate representation of the codebase.

There is oneextractor for each language supported by CodeQLto ensure that the extraction process is as accurate as possible. Formulti-language codebases, databases are generated one language at a time.

After extraction, all the data required for analysis (relational data, copiedsource files, and a language-specificdatabase schema, which specifies the mutual relations in the data) isimported into a single directory, known as aCodeQL database.

Query execution¶

After you’ve created a CodeQL database, one or more queries are executedagainst it. CodeQL queries are written in a specially-designed object-orientedquery language called QL. You can run the queries checked out from the CodeQLrepo (or custom queries that you’ve written yourself) using theCodeQLfor VS Code extension or theCodeQL CLI. For more information about queries, see “About CodeQL queries.”

Query results¶

The final step converts results produced during query execution into a form thatis more meaningful in the context of the source code. That is, the results areinterpreted in a way that highlights the potential issue that the queries aredesigned to find.

Queries contain metadata properties that indicate how the results should beinterpreted. For instance, some queries display a simple message at a singlelocation in the code. Others display a series of locations that represent stepsalong a data-flow or control-flow path, along with a message explaining thesignificance of the result. Queries that don’t have metadata are notinterpreted—their results are output as a table and not displayed in the sourcecode.

Following interpretation, results are output for code review and triaging. InCodeQL for Visual Studio Code, interpreted query results are automaticallydisplayed in the source code. Results generated by the CodeQL CLI can be outputinto a number of different formats for use with different tools.

About CodeQL databases¶

CodeQL databases contain queryable data extracted from a codebase, for a singlelanguage at a particular point in time. The database contains a full,hierarchical representation of the code, including a representation of theabstract syntax tree, the data flow graph, and the control flow graph.

Each language has its own unique database schema that defines the relations usedto create a database. The schema provides an interface between the initiallexical analysis during the extraction process, and the actual complex analysisusing CodeQL. The schema specifies, for instance, that there is a table forevery language construct.

For each language, the CodeQL libraries define classes to provide a layer ofabstraction over the database tables. This provides an object-oriented view ofthe data which makes it easier to write queries.

For example, in a CodeQL database for a Java program, two key tables are:

Theexpressions table containing a row for every single expression in thesource code that was analyzed during the build process.
Thestatements table containing a row for every single statement in thesource code that was analyzed during the build process.

The CodeQL library defines classes to provide a layer of abstraction over eachof these tables (and the related auxiliary tables):Expr andStmt.

© GitHub, Inc.
Terms
Privacy

Movatterモバイル変換

About CodeQL¶