Creating path queries ¶

You can create path queries to visualize the flow of information through a codebase.

Note
The modular API for data flow described here is available from CodeQL 2.13.0. The legacy library is deprecated and will be removed in December 2024. For information about how the library has changed and how to migrate any existing queries to the modular API, seeNew dataflow API for CodeQL query writing.

Overview¶

Security researchers are particularly interested in the way that information flows in a program. Many vulnerabilities are caused by seemingly benign data flowing to unexpected locations, and being used in a malicious way.Path queries written with CodeQL are particularly useful for analyzing data flow as they can be used to track the path taken by a variable from its possible starting points (source) to its possible end points (sink).To model paths, your query must provide information about thesource and thesink, as well as the data flow steps that link them.

This topic provides information on how to structure a path query file so you can explore the paths associated with the results of data flow analysis.

Note
The alerts generated by path queries are included in the results generated using theCodeQL CLI and incode scanning. You can also view the path explanations generated by your path query in theCodeQL extension for VS Code.

To learn more about modeling data flow with CodeQL, see “About data flow analysis.”For more language-specific information on analyzing data flow, see:

“Analyzing data flow in C/C++”
“Analyzing data flow in C#”
“Analyzing data flow in Go”
“Analyzing data flow in Java/Kotlin”
“Analyzing data flow in JavaScript/TypeScript”
“Analyzing data flow in Python”
“Analyzing data flow in Ruby”
“Analyzing data flow in Swift”

Path query examples¶

The easiest way to get started writing your own path query is to modify one of the existing queries. For more information, see theCodeQL query help.

The Security Lab researchers have used path queries to find security vulnerabilities in various open source projects. To see articles describing how these queries were written, as well as other posts describing other aspects of security research such as exploiting vulnerabilities, see theGitHub Security Lab website.

Constructing a path query¶

Path queries require certain metadata, query predicates, andselect statement structures.Many of the built-in path queries included in CodeQL follow a simple structure, which depends on how the language you are analyzing is modeled with CodeQL.

You should use the following template:

/** * ... * @kind path-problem * ... */import<language>// For some languages (Java/C++/Python/Swift) you need to explicitly import the data flow library, such as// import semmle.code.java.dataflow.DataFlow or import codeql.swift.dataflow.DataFlow...moduleFlow=DataFlow::Global<MyConfiguration>;importFlow::PathGraphfromFlow::PathNodesource,Flow::PathNodesinkwhereFlow::flowPath(source,sink)selectsink.getNode(),source,sink,"<message>"

Where:

MyConfiguration is a module containing the predicates that define how data may flow between thesource and thesink.
Flow is the result of the data flow computation based onMyConfiguration.
Flow::Pathgraph is the resulting data flow graph module you need to import in order to include path explanations in the query.
source andsink are nodes in the graph as defined in the configuration, andFlow::PathNode is their type.
DataFlow::Global<..> is an invocation of data flow.TaintTracking::Global<..> can be used instead to include a default set of additional taint steps.

The following sections describe the main requirements for a valid path query.

Path query metadata¶

Path query metadata must contain the property@kindpath-problem–this ensures that query results are interpreted and displayed correctly.The other metadata requirements depend on how you intend to run the query. For more information, see “Metadata for CodeQL queries.”

Generating path explanations¶

In order to generate path explanations, your query needs to compute a graph.To do this you need to define aquery predicate callededges in your query.This predicate defines the edge relations of the graph you are computing, and it is used to compute the paths related to each result that your query generates.You can import a predefinededges predicate from a path graph module in one of the standard data flow libraries. In addition to the path graph module, the data flow libraries contain the otherclasses,predicates, andmodules that are commonly used in data flow analysis.

importMyFlow::PathGraph

This statement imports thePathGraph module from the data flow library (DataFlow.qll), in whichedges is defined.

You can also import libraries specifically designed to implement data flow analysis in various common frameworks and environments, and many additional libraries are included with CodeQL. To see examples of the different libraries used in data flow analysis, see the links to the built-in queries above or browse thestandard libraries.

For all languages, you can also optionally define anodes query predicate, which specifies the nodes of the path graph that you are interested in. Ifnodes is defined, only edges with endpoints defined by these nodes are selected. Ifnodes is not defined, you select all possible endpoints ofedges.

Defining your own`edges` predicate¶

You can also define your ownedges predicate in the body of your query. It should take the following form:

querypredicateedges(PathNodea,PathNodeb){/* Logical conditions which hold if `(a,b)` is an edge in the data flow graph */}

For more examples of how to define anedges predicate, visit thestandard CodeQL libraries and search foredges.

Declaring sources and sinks¶

You must provide information about thesource andsink in your path query. These are objects that correspond to the nodes of the paths that you are exploring.The name and the type of thesource and thesink must be declared in thefrom statement of the query, and the types must be compatible with the nodes of the graph computed by theedges predicate.

If you are querying C/C++, C#, Go, Java/Kotlin, JavaScript/TypeScript, Python, or Ruby code (and you have usedimportMyFlow::PathGraph in your query), the definitions of thesource andsink are accessed via the module resulting from the application of theGlobal<..> module in the data flow library. You should declare both of these objects in thefrom statement.For example:

moduleMyFlow=DataFlow::Global<MyConfiguration>;fromMyFlow::PathNodesource,MyFlow::PathNodesink

The configuration module must be defined to include definitions of sources and sinks. For example:

moduleMyConfigurationimplementsDataFlow::ConfigSig{predicateisSource(DataFlow::Nodesource){...}predicateisSink(DataFlow::Nodesource){...}}

isSource() defines where data may flow from.
isSink() defines where data may flow to.

For more information on using the configuration class in your analysis see the sections on global data flow in “Analyzing data flow in C/C++,” “Analyzing data flow in C#,” and “Analyzing data flow in Python.”

You can also create a configuration for different frameworks and environments by extending theConfiguration class. For more information, see “Types” in the QL language reference.

Defining flow conditions¶

Thewhere clause defines the logical conditions to apply to the variables declared in thefrom clause to generate your results.This clause can useaggregations,predicates, and logicalformulas to limit the variables of interest to a smaller set which meet the defined conditions.

When writing a path queries, you would typically include a predicate that holds only if data flows from thesource to thesink.

You can use theflowPath predicate to specify flow from thesource to thesink for a givenConfiguration:

whereMyFlow::flowPath(source,sink)

Select clause¶

Select clauses for path queries consist of four ‘columns’, with the following structure:

select element, source, sink, string

Theelement andstring columns represent the location of the alert and the alert message respectively, as explained in “About CodeQL queries.” The second and third columns,source andsink, are nodes on the path graph selected by the query.Each result generated by your query is displayed at a single location in the same way as an alert query. Additionally, each result also has an associated path, which can be viewed in theCodeQL extension for VS Code.

Theelement that you select in the first column depends on the purpose of the query and the type of issue that it is designed to find. This is particularly important for security issues. For example, if you believe thesource value to be globally invalid or malicious it may be best to display the alert at thesource. In contrast, you should consider displaying the alert at thesink if you believe it is the element that requires sanitization.

The alert message defined in the final column in theselect statement can be developed to give more detail about the alert or path found by the query using links and placeholders. For more information, see “Defining the results of a query.”

Movatterモバイル変換

Creating path queries ¶

Overview¶

Path query examples¶

Constructing a path query¶

Path query metadata¶

Generating path explanations¶

Defining your own`edges` predicate¶

Declaring sources and sinks¶

Defining flow conditions¶

Select clause¶

Further reading¶

Movatterモバイル変換

Creating path queries¶

Overview¶

Path query examples¶

Constructing a path query¶

Path query metadata¶

Generating path explanations¶

Defining your ownedges predicate¶

Declaring sources and sinks¶

Defining flow conditions¶

Select clause¶

Further reading¶

Creating path queries ¶

Defining your own`edges` predicate¶