How GitHub uses CodeQL to secure GitHub

How GitHub’s Product Security Engineering team manages our CodeQL implementation at scale and how you can, too.

February 12, 2025

|11 minutes

GitHub’s Product Security Engineering team writes code and implements tools that help secure the code that powers GitHub. We use GitHub Advanced Security (GHAS) to discover, track, and remediate vulnerabilities and enforce secure coding standards at scale. One tool we rely heavily on to analyze our code at scale isCodeQL.

CodeQL is GitHub’s static analysis engine that powers automated security analyses. You can use it to query code in much the same way you would query a database. It provides a much more robust way to analyze code and uncover problems than an old-fashioned text search through a codebase.

The following post will detail how we use CodeQL to keep GitHub secure and how you can apply these lessons to your own organization. You will learn why and how we use:

Custom query packs (and how we create and manage them).
Custom queries.
Variant analysis to uncover potentially insecure programming practices.

Enabling CodeQL at scale

We employ CodeQL in a variety of ways at GitHub.

Default setup with thedefault and security-extended query suites
Default setup with the default and security-extended query suites meets the needs of the vast majority of our over 10,000 repositories. With these settings, pull requests automatically get a security review from CodeQL.
Advanced setup with a custom query pack
A few repositories, like our large Ruby monolith, need extra special attention, so we use advanced setup with aquery pack containing custom queries to really tailor to our needs.
Multi-repository variant analysis (MRVA)
To conduct variant analysis and quick auditing, we use MRVA. We also write custom CodeQL queries to detect code patterns that are either specific to GitHub’s codebases or patterns we want a security engineer to manually review.

The specific custom Actions workflow step we use on our monolith is pretty simple. It looks like this:

- name: Initialize CodeQL    uses: github/codeql-action/init@v3    with:      languages: ${{ matrix.language }}      config-file: ./.github/codeql/${{ matrix.language }}/codeql-config.yml

Our Ruby configuration is pretty standard, but advanced setup offers a variety ofconfiguration options using custom configuration files. The interesting part is thepacks option, which is how we enable our custom query pack as part of the CodeQL analysis. This pack contains a collection of CodeQL queries we have written for Ruby, specifically for the GitHub codebase.

So, let’s dive deeper into why we did that—and how!

Publishing our CodeQL query pack

Initially, we published CodeQL query files directly to the GitHub monolith repository, but we moved away from this approach for several reasons:

It required going through the production deployment process for each new or updated query.
Queries not included in a query pack werenot pre-compiled, which slowed down CodeQL analysis in CI.
Ourtest suite for CodeQL queries ran as part of the monolith’s CI jobs. When a new version of the CodeQL CLI was released, it sometimes caused the query tests to fail because of changes in the query output, even when there were no changes to the code in the pull request. This often led to confusion and frustration among engineers, as the failure wasn’t related to their pull request changes.

By switching to publishing a query pack toGitHub Container Registry (GCR), we’ve simplified our process and eliminated many of these pain points, making it easier to ship and maintain our CodeQL queries. So while it’spossible to deploy custom CodeQL query files directly to a repository, we recommend publishing CodeQL queries as a query pack to the GCR for easier deployment and faster iteration.

Creating our query pack

When setting up our customquery pack, we faced several considerations, particularly around managing dependencies like theruby-all package.

To ensure our custom queries remain maintainable and concise, we extend classes from thedefault query suite, such as theruby-all library. This allows us to leverage existing functionality rather than reinventing the wheel, keeping our queries concise and maintainable. However, changes to the CodeQL library API can introduce breaking changes, potentially deprecating our queries or causing errors. Since CodeQL runs as part of our CI, we wanted to minimize the chance of this happening, as this can lead to frustration and loss of trust from developers.

We develop our queries against the latest version of theruby-all package, ensuring we’re always working with the most up-to-date functionality. To mitigate the risk of breaking changes affecting CI, we pin theruby-all version when we’re ready to release,locking it in thecodeql-pack.lock.yml file. This guarantees that when our queries are deployed, they will run with the specific version ofruby-all we’ve tested, avoiding potential issues from unintentional updates.

Here’s how we manage this setup:

In our qlpack.yml, we set the dependency to use the latest version ofruby-all

During development, this configurationpulls in the latest version) ofruby-all when runningcodeql pack init, ensuring we’re always up to date.

// Our custom query pack's qlpack.ymllibrary: falsename: github/internal-ruby-codeqlversion: 0.2.3extractor: 'ruby'dependencies:  codeql/ruby-all: "*"tests: 'test'description: "Ruby CodeQL queries used internally at GitHub"

Before releasing, we lock the version in thecodeql-pack.lock.yml file, specifying the exact version to ensure stability and prevent issues in CI.
```
// Our custom query pack's codeql-pack.lock.ymllockVersion: 1.0.0dependencies: ... codeql/ruby-all:   version: 1.0.6
```

This approach allows us to balance developing against the latest features of theruby-all package while ensuring stability when we release.

We also have a set ofCodeQL unit tests that exercise our queries against sample code snippets, which helps us quickly determine if any query will cause errors before we publish our pack. These tests are run as part of the CI process in our query pack repository, providing an early check for issues. We strongly recommend writing unit tests for your custom CodeQL queries to ensure stability and reliability.

Altogether, the basic flow for releasing new CodeQL queries via our pack is as follows:

Open a pull request with the new query.
Write unit tests for the new query.
Merge the pull request.
Increment the pack version in a new pull request.
Runcodeql pack init to resolve dependencies.
Correct unit tests as needed.
Publish the query pack to the GitHub Container Registry (GCR).
Repositories with the query pack in their config will start using the updated queries.

We have found this flow balances our team’s development experience while ensuring stability in our published query pack.

Configuring our repository to use our custom query pack

We won’t provide a general recommendation on configuration here, given that it ultimately depends on how your organization deploys code. We opted against locking our pack to a particular version in ourCodeQL configuration file (see above). Instead, we chose to manage our versioning by publishing the CodeQL package in GCR. This results in the GitHub monolith retrieving the latest published version of the query pack. To roll back changes, we simply have to republish the package. In one instance, we released a query that had a high number of false positives and we were able to publish a new version of the pack that removed that query in less than 15 minutes. This is faster than the time it would have taken us to merge a pull request on the monolith repository to roll back the version in the CodeQL configuration file.

One of the problems we encountered with publishing the query pack in GCR was how to easily make the package available to multiple repositories within our enterprise. There are several approaches we explored.

Grant access permissions for individual repositories. On the package management page, you can grant permissions for individual repositories to access your package. This was not a good solution for us since we have too many repositories for it to be feasible to do manually, yet there is not currently a way to configure programmatically using an API.
Mint a personal access token for the CodeQL action runner. We could have minted a personal access token (PAT) that has access to read all packages for our organization and added that to the CodeQL action runner. However, this would have required managing a new token, and it seemed a bit more permissive than we wanted because it could readall of our private packages rather than ones we explicitly allow it to have access to.
Provide access permissions via a linked repository. We ended up implementing the third solution that we explored. Welink a repository to the package and allow the package toinherit access permissions from the linked repository.

CodeQL query pack queries

We write a variety of custom queries to be used in our custom query packs. These cover GitHub-specific patterns that aren’t included in the default CodeQL query pack. This allows us to tailor the analysis to patterns and preferences that are specific to our company and codebase. Some of the types of things we alert on using our custom query pack include:

High-risk APIs specific to GitHub’s code that can be dangerous if they receive unsanitized user input.
Use of specific built-in Rails methods for which we have safer, custom methods or functions.
Required authorization methods not being used in our REST API endpoint definitions and GraphQL object/mutation definitions.
REST API endpoints and GraphQL mutations that require engineers to define access control methods to determine which actors can access them. (Specifically, the query detects the absence of this method definition to ensure that the actors’ permissions are being checked for these endpoints.)
Use of signed tokens so we can nudge engineers to include Product Security as a reviewer when using them.

Custom queries can be used more for educational purposes rather than being blockers to shipping code. For example, we want to alert engineers when they use theActiveRecord::decrypt method. This method should generally not be used in production code, as it will cause an encrypted column to become decrypted. We use the recommendation severity in thequery metadata so these alerts are treated as more of an informational alert. That means this may trigger an alert in a pull request, but it won’t cause the CodeQL CI job to fail. We use this lower severity level to allow engineers to assess the impact of new queries without immediate blocking. Additionally, this alert level isn’t tracked through ourFundamentals program, meaning it doesn’t require immediate action, reflecting the query’s maturity as we continue to refine its relevance and risk assessment.

/** * @id rb/github/use-of-activerecord-decrypt * @description Do not use the .decrypt method on AR models, this will decrypt all encrypted attributes and save * them unencrypted, effectively undoing encryption and possibly making the attributes inaccessible. * If you need to access the unencrypted value of any attribute, you can do so by calling my_model.attribute_name. * @kind problem * @severity recommendation * @name Use of ActiveRecord decrypt method * @tags security *      github-internal */import rubyimport DataFlowimport codeql.ruby.DataFlowimport codeql.ruby.frameworks.ActiveRecord/** Match against .decrypt method calls where the receiver may be an ActiveRecord object */class ActiveRecordDecryptMethodCall extends ActiveRecordInstanceMethodCall {  ActiveRecordDecryptMethodCall() { this.getMethodName() = "decrypt" }}from ActiveRecordDecryptMethodCall callselect call,  "Do not use the .decrypt method on AR models, this will decrypt all encrypted attributes and save them unencrypted.

Another educational query is the one mentioned above in which we detect the absence of the `control_access` method in a class that defines a REST API endpoint. If a pull request introduces a new endpoint without `control_access`, a comment will appear on the pull request saying that the `control_access` method wasn’t found and it’s a requirement for REST API endpoints. This will notify the reviewer of a potential issue and prompt the developer to fix it.

/** * @id rb/github/api-control-access * @name Rest API Without 'control_access' * @description All REST API endpoints must call the 'control_access' method, to ensure that only specified actor types are able to access the given endpoint. * @kind problem * @tags security * github-internal * @precision high * @problem.severity recommendation */import codeql.ruby.ASTimport codeql.ruby.DataFlowimport codeql.ruby.TaintTrackingimport codeql.ruby.ApiGraphs// Api::App REST API endpoints should generally call the control_access methodprivate DataFlow::ModuleNode appModule() {  result = API::getTopLevelMember("Api").getMember("App").getADescendentModule() and  not result = protectedApiModule() and  not result = staffAppApiModule()}// Api::Admin, Api::Staff, Api::Internal, and Api::ThirdParty REST API endpoints do not need to call the control_access methodprivate DataFlow::ModuleNode protectedApiModule() {  result =    API::getTopLevelMember(["Api"])        .getMember(["Admin", "Staff", "Internal", "ThirdParty"])        .getADescendentModule()}// Api::Staff::App REST API endpoints do not need to call the control_access methodprivate DataFlow::ModuleNode staffAppApiModule() {  result =    API::getTopLevelMember(["Api"]).getMember("Staff").getMember("App").getADescendentModule()}private class ApiRouteWithoutControlAccess extends DataFlow::CallNode {  ApiRouteWithoutControlAccess() {    this = appModule().getAModuleLevelCall(["get", "post", "delete", "patch", "put"]) and    not performsAccessControl(this.getBlock())  }}predicate performsAccessControl(DataFlow::BlockNode blocknode) {  accessControlCalled(blocknode.asExpr().getExpr())}predicate accessControlCalled(Block block) {  // the method `control_access` is called somewhere inside `block`  block.getAStmt().getAChild*().(MethodCall).getMethodName() = "control_access"}from ApiRouteWithoutControlAccess apiselect api.getLocation(),  "The control_access method was not detected in this REST API endpoint. All REST API endpoints must call this method to ensure that the endpoint is only accessible to the specified actor types."

Variant analysis

Variant analysis (VA) refers to the process of searching for variants of security vulnerabilities. This is particularly useful when we’re responding to abug bounty submission or a security incident. We use a combination of tools to do this, including GitHub’s code search functionality, custom scripts, and CodeQL. We will often start by using code search to find patterns similar to the one that caused a particular vulnerability across numerous repositories. This is sometimes not good enough, as code search is not semantically aware, meaning that it cannot determine whether a given variable is an Active Record object or whether it is being used in an `if` expression. To answer those types of questions we turn to CodeQL.

When we write CodeQL queries for variant analysis we are much less concerned about false positives, since the goal is to provide results for security engineers to analyze. The quality of the code is also not quite as important, as these queries will only be used for the duration of the VA effort. Some of the types of things we use CodeQL for during VAs are:

Where are we using SHA1 hashes?
One of our internal API endpoints was vulnerable to SQLi according to a recent bug bounty report. Where are we passing user input to that API endpoint?
There is a problem with how some HTTP request libraries in Ruby handle the proxy setting. Can we look at places we are instantiating our HTTP request libraries with a proxy setting?

One recent example involved a subtle vulnerability in Rails. We wanted to detect when the following condition was present in our code:

A parameter was used to look up an Active Record object.
That parameter is later reused after the Active Record object is looked up.

The concern with this condition is that it could lead to aninsecure direct object reference (IDOR) vulnerability because Active Record finder methods can accept an array. If the code looks up an Active Record object in one call to determine if a given entity has access to a resource, but later uses a different element from that array to find an object reference, that can lead to an IDOR vulnerability. It would be difficult to write a query to detectall vulnerable instances of this pattern, but we were able to write a query that found potential vulnerabilities that gave us a list of code paths to manually analyze. We ran the query against a large number of our Ruby codebases using CodeQL’s MRVA.

The query, which is a bit hacky and not quite production grade, is below:

/** * @name wip array query * @description an array is passed to an AR finder object */import rubyimport codeql.ruby.ASTimport codeql.ruby.ApiGraphsimport codeql.ruby.frameworks.Railsimport codeql.ruby.frameworks.ActiveRecordimport codeql.ruby.frameworks.ActionControllerimport codeql.ruby.DataFlowimport codeql.ruby.Frameworksimport codeql.ruby.TaintTracking// Gets the "final" receiver in a chain of method calls.// For example, in `Foo.bar`, this would give the `Foo` access, and in// `foo.bar.baz("arg")` it would give the `foo` variable accessprivate Expr getUltimateReceiver(MethodCall call) {  exists(Expr recv |    recv = call.getReceiver() and    (      result = getUltimateReceiver(recv)      or      not recv instanceof MethodCall and result = recv    )  )}// Names of class methods on ActiveRecord models that may return one or more// instances of that model. This also includes the `initialize` method.// See https://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.htmlprivate string staticFinderMethodName() {  exists(string baseName |    baseName = ["find_by", "find_or_create_by", "find_or_initialize_by", "where"] and    result = baseName + ["", "!"]  )  // or  // result = ["new", "create"]}private class ActiveRecordModelFinderCall extends ActiveRecordModelInstantiation, DataFlow::CallNode{  private ActiveRecordModelClass cls;  ActiveRecordModelFinderCall() {    exists(MethodCall call, Expr recv |      call = this.asExpr().getExpr() and      recv = getUltimateReceiver(call) and      (        // The receiver refers to an `ActiveRecordModelClass` by name        recv.(ConstantReadAccess).getAQualifiedName() = cls.getAQualifiedName()        or        // The receiver is self, and the call is within a singleton method of        // the `ActiveRecordModelClass`        recv instanceof SelfVariableAccess and        exists(SingletonMethod callScope |          callScope = call.getCfgScope() and          callScope = cls.getAMethod()        )      ) and      (        call.getMethodName() = staticFinderMethodName()        or        // dynamically generated finder methods        call.getMethodName().indexOf("find_by_") = 0      )    )  }  final override ActiveRecordModelClass getClass() { result = cls }}class FinderCallArgument extends DataFlow::Node {  private ActiveRecordModelFinderCall finderCallNode;  FinderCallArgument() { this = finderCallNode.getArgument(_) }}class ParamsHashReference extends DataFlow::CallNode {  private Rails::ParamsCall params;  // TODO: only direct element references against `params` calls are considered  ParamsHashReference() { this.getReceiver().asExpr().getExpr() = params }  string getArgString() {    result = this.getArgument(0).asExpr().getConstantValue().getStringlikeValue()  }}class ArrayPassedToActiveRecordFinder extends TaintTracking::Configuration {  ArrayPassedToActiveRecordFinder() { this = "ArrayPassedToActiveRecordFinder" }  override predicate isSource(DataFlow::Node source) { source instanceof ParamsHashReference }  override predicate isSink(DataFlow::Node sink) {    sink instanceof FinderCallArgument  }  string getParamsArg(DataFlow::CallNode paramsCall) {    result = paramsCall.getArgument(0).asExpr().getConstantValue().getStringlikeValue()  }  // this doesn't check for anything fancy like whether it's reuse in a if/else  // only intended for quick manual audit filtering of interesting candidates  // so remains fairly broad to not induce false negatives  predicate paramsUsedAfterLookups(DataFlow::Node source) {    exists(DataFlow::CallNode y | y instanceof ParamsHashReference    and source.getEnclosingMethod() = y.getEnclosingMethod()    and source != y    and getParamsArg(source) = getParamsArg(y)    // we only care if it's used again AFTER an object lookup    and y.getLocation().getStartLine() > source.getLocation().getStartLine())  }}from ArrayPassedToActiveRecordFinder config, DataFlow::Node source, DataFlow::Node sinkwhere config.hasFlow(source, sink) and config.paramsUsedAfterLookups(source)select source, sink.getLocation()

Conclusion

CodeQL can be very useful for product security engineering teams to detect and prevent vulnerabilities at scale. We use a combination of queries that run in CI using our query pack and one-off queries run through MRVA to find potential vulnerabilities and communicate them to engineers. CodeQL isn’t only useful for finding security vulnerabilities, though; it is also useful for detecting the presence or absence of security controls that are defined in code. This saves our security team time by surfacing certain security problems automatically, and saves our engineers time by detecting them earlier in the development process.

Writing custom CodeQL queries

Tips for getting started

We have a large number of articles and resources for writing custom CodeQL queries. If you haven’t written custom CodeQL queries before, here are some resources to help get you started:

Improve the security of your applications today byenabling CodeQL for free on your public repositories, or tryGitHub Advanced Security for your organization.

Michael Recachinas, GitHub Staff Security Engineer, also contributed to this blog post.

Written by

Brandon Stewart

@boveus

More onCI/CD

When to choose GitHub-Hosted runners or self-hosted runners with GitHub Actions

Comparing GitHub-hosted vs self-hosted runners for your CI/CD workflows? This deep dive explores important factors to consider when making this critical infrastructure decision for your development team.

kenmuse

Video: How to run dependency audits with GitHub Copilot

Learn to automate dependency management using GitHub Copilot, GitHub Actions, and Dependabot to eliminate manual checks, improve security, and save time for what really matters.

Andrea Griffiths