CodeQL zero to hero part 5: Debugging queries

Learn to debug and fix your CodeQL queries.

September 29, 2025|Updated October 7, 2025

|14 minutes

When you’re first getting started with CodeQL, you may find yourself in a situation where a query doesn’t return the results you expect. Debugging these queries can be tricky, because CodeQL is a Prolog-like language with an evaluation model that’s quite different from mainstream languages like Python. This means you can’t “step through” the code, and techniques such as attaching gdb or adding print statements don’t apply. Fortunately, CodeQL offers a variety of built-in features to help you diagnose and resolve issues in your queries.

Below, we’ll dig into these features — from an abstract syntax tree (AST) to partial path graphs — using questions from CodeQL users as examples. And if you ever have questions of your own, you can visit and ask in GitHub Security Lab’spublic Slack instance, which is monitored by CodeQL engineers.

This blog is written to be read standalone; however, if you are new to CodeQL or would like to dig deeper into static analysis and CodeQL, you may want to check out the other parts of my CodeQL zero to hero blog series. Each deals with a different topic: status analysis fundamentals, writing CodeQL, using CodeQL for security research, and modeling a new framework in CodeQL—Gradio.

Each part (also this one) has accompanying CodeQL queries and exercises, which are in the blogs and in theCodeQL zero to hero repository.

Minimal code example

The issue we are going to use was raised by user NgocKhanhC311, and later a similar issue was raised fromzhou noel. Both encountered difficulties writing a CodeQL query to detect a vulnerability in projects using the Gradio framework. Since I have personally added Gradio support to CodeQL — and even wrote a blog about the process (CodeQL zero to hero part 4: Gradio framework case study), which includes an introduction to Gradio and its attack surface — I jumped in to answer.

zhou noel wanted to detect variants of an unsafe deserialization vulnerability that was found inbrowser-use/web-ui v1.6. See the simplified code below.

import pickleimport gradio as grdef load_config_from_file(config_file):    """Load settings from a UUID.pkl file."""    try:        with open(config_file.name, 'rb') as f:            settings = pickle.load(f)        return settings    except Exception as e:        return f"Error loading configuration: {str(e)}"with gr.Blocks(title="Configuration Loader") as demo:    config_file_input = gr.File(label="Load Config File")    load_config_button = gr.Button("Load Existing Config From File", variant="primary")    config_status = gr.Textbox(label="Status")    load_config_button.click(        fn=load_config_from_file,        inputs=[config_file_input],        outputs=[config_status]    )demo.launch()

Using theload_config_button.click event handler (fromgr.Button), a user-supplied fileconfig_file_input (of typegr.File) is passed to theload_config_from_file function, which reads the file withopen(config_file.name, 'rb'), and loads the file’s contents usingpickle.load.

The vulnerability here is more of a “second order” vulnerability. First, an attacker uploads a malicious file, then the application loads it usingpickle. In this example, our source isgr.File. When usinggr.File, the uploaded file is stored locally, and the path is available in the name attribute config_file.name. Then the app opens the file withopen(config_file.name, 'rb') as f: and loads it using picklepickle.load(f), leading to unsafe deserialization.

What a pickle! 🙂

If you’d like to test the vulnerability, create a new folder with the code, call itexample.py, and then run:

python -m venv venvsource venv/bin/activatepip install gradiopython example.py

Then, followthese steps to create a malicious pickle file to exploit the vulnerability.

The user wrote a CodeQL taint tracking query, which at first glance should find the vulnerability.

/** * @name Gradio unsafe deserialization * @description This query tracks data flow from inputs passed to a Gradio's Button component to any sink. * @kind path-problem * @problem.severity warning * @id 5/1 */import pythonimport semmle.python.ApiGraphsimport semmle.python.Conceptsimport semmle.python.dataflow.new.RemoteFlowSourcesimport semmle.python.dataflow.new.TaintTrackingimport MyFlow::PathGraphclass GradioButton extends RemoteFlowSource::Range {    GradioButton() {        exists(API::CallNode n |        n = API::moduleImport("gradio").getMember("Button").getReturn()        .getMember("click").getACall() |        this = n.getParameter(0, "fn").getParameter(_).asSource())    }    override string getSourceType() { result = "Gradio untrusted input" }}private module MyConfig implements DataFlow::ConfigSig {    predicate isSource(DataFlow::Node source) { source instanceof GradioButton }    predicate isSink(DataFlow::Node sink) { exists(Decoding d | sink = d) }}module MyFlow = TaintTracking::Global<MyConfig>;from MyFlow::PathNode source, MyFlow::PathNode sinkwhere MyFlow::flowPath(source, sink)select sink.getNode(), source, sink, "Data Flow from a Gradio source to decoding"

The source is set to any parameter passed to function in agr.Button.click event handler. The sink is set to any sink of typeDecoding. In CodeQL for Python, theDecoding type includes unsafe deserialization sinks, such as the first argument topickle.load.

If you run the query on the database, you won’t get any results.

To figure out most CodeQL query issues, I suggest trying out the following options, which we’ll go through in the next sections of the blog:

Make a minimal code example and create a CodeQL database of it to reduce the number of results.
Simplify the query intopredicates andclasses, making it easier to run the specific parts of the query, and check if they provide the expected results.
Usequick evaluation on the simplified predicates.
View theabstract syntax tree of your codebase to see the expected CodeQL type for a given code element, and how to query for it.
Call thegetAQlClass predicate to identify what types a given code element is.
Use a partial path graph to see where taint stops propagating.
Write a taint step to help the taint propagate further.

Creating a CodeQL database

Using our minimal code example, we’ll create a CodeQL database, similarly to how we did itin CodeQL ZtH part 4, and run the following command in the directory that contains only the minimal code example.

codeql database create codeql-zth5 --language=python

This command will create a new directory,codeql-zth5, with the CodeQL database. Add it to your CodeQL workspace and then we can get started.

Simplifying the query and quick evaluation

The query is already simplified into predicates and classes, so we can quickly evaluate it using theQuick evaluation button over the predicate name, or by right-clicking on the predicate name and choosingCodeQL: Quick evaluation.

CodeQL taint tracking query, with `Quick Evaluation: isSource` button over the `isSource` predicate.

ClickingQuick Evaluation over theisSource andisSink predicate shows a result for each, which means that both source and sink were found correctly. Note, however, that theisSink result highlights the wholepickle.load(f) call, rather than just the first argument to the call. Typically, we prefer to set a sink as an argument to a call, not the call itself.

In this case, theDecoding abstract sinks have agetAnInput predicate, which specifies the argument to a sink call. To differentiate between normalDecoding sinks (for example,json.loads), and the ones that could execute code (such aspickle.load), we can use themayExecuteInput predicate.

predicate isSink(DataFlow::Node sink) {     exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }

Quick evaluation of theisSink predicate gives us one result.

VS Code screenshot with one result from running the query

With this, we verified that the sources and sinks are correctly reported. That means there’s an issue between the source and sink, which CodeQL can’t propagate through.

Abstract Syntax Tree (AST) viewer

We haven’t had issues identifying the source or sink nodes, but if there were an issue with identifying the source or sink nodes, it would be helpful to examine the abstract syntax tree (AST) of the code to determine the type of a particular code element.

After you runQuick Evaluation onisSink, you’ll see the file where CodeQL identified the sink. To see the abstract syntax tree for the file, right-click the code element you’re interested in and selectCodeQL: View AST.

Highlighted `CodeQL: View AST` option in a dropdown menu after right-clicking

The option will display the AST of the file in the CodeQL tab in VS Code, under the AST Viewer section.

abstract syntax tree of the code with highlighted `[Call] pickle.load(f) line 8` node

Once you know the type of a given code element from the AST, it can be easier to write a query for the code element you’re interested in.

`getAQlClass` predicate

Another good strategy to figure out the type of a code element you’re interested in is to usegetAQlClass predicate. Usually, it’s best to create a separate query, so you don’t clutter your original query.

For example, we could write a query to check the types of a parameter to the function fn passed togradio.Button.click:

/** * @name getAQlClass on Gradio Button input source * @description This query reports on a code element's types. * @id 5/2 * @severity error * @kind problem */import pythonimport semmle.python.ApiGraphsimport semmle.python.Conceptsimport semmle.python.dataflow.new.RemoteFlowSourcesfrom DataFlow::Node nodewhere node = API::moduleImport("gradio").getMember("Button").getReturn()        .getMember("click").getACall().getParameter(0, "fn").getParameter(_).asSource()select node, node.getAQlClass()

Running the query provides five results showing the types of the parameter:FutureTypeTrackingNode,ExprNode,LocalSourceNodeNotModuleVariableNode,ParameterNode, andLocalSourceParameterNode. From the results, the most interesting and useful types for writing queries are theExprNode andParameterNode.

VS Code screenshot with five results from running the query

Partial path graph: forwards

Now that we’ve identified that there’s an issue with connecting the source to the sink, we should verify where the taint flow stops. We can do that usingpartial path graphs, which show all the sinks the source flows toward and where those flows stop. This is also why having a minimal code example is so vital — otherwise we’d geta lot of results.

If you do end up working on a large codebase, you should try to limit the source you’re starting with to, for example, a specific file with a condition akin to:

predicate isSource(DataFlow::Node source) { source instanceof GradioButton     and source.getLocation().getFile().getBaseName() = "example.py" }

Seeother ways of providing location information.

Partial graphs come in two forms: forwardFlowExplorationFwd, which traces flow from a given source to any sink, and backward/reverseFlowExplorationRev, which traces flow from a given sink back to any source.

We have public templates for partial path graphs in most languages for your queries in CodeQL Community Packs — seethe template for Python.

Here’s how we would write a forward partial path graph query for our current issue:

/** * @name Gradio Button partial path graph * @description This query tracks data flow from inputs passed to a Gradio's Button component to any sink. * @kind path-problem * @problem.severity warning * @id 5/3 */import pythonimport semmle.python.ApiGraphsimport semmle.python.Conceptsimport semmle.python.dataflow.new.RemoteFlowSourcesimport semmle.python.dataflow.new.TaintTracking// import MyFlow::PathGraphimport PartialFlow::PartialPathGraphclass GradioButton extends RemoteFlowSource::Range {    GradioButton() {        exists(API::CallNode n |        n = API::moduleImport("gradio").getMember("Button").getReturn()        .getMember("click").getACall() |        this = n.getParameter(0, "fn").getParameter(_).asSource())    }    override string getSourceType() { result = "Gradio untrusted input" }}private module MyConfig implements DataFlow::ConfigSig {    predicate isSource(DataFlow::Node source) { source instanceof GradioButton }    predicate isSink(DataFlow::Node sink) { exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }}module MyFlow = TaintTracking::Global<MyConfig>;int explorationLimit() { result = 10 }module PartialFlow = MyFlow::FlowExplorationFwd<explorationLimit/0>;from PartialFlow::PartialPathNode source, PartialFlow::PartialPathNode sinkwhere PartialFlow::partialFlow(source, sink, _)select sink.getNode(), source, sink, "Partial Graph $@.", source.getNode(), "user-provided value."

What changed:

We commented outimport MyFlow::PathGraph and insteadimport PartialFlow::PartialPathGraph.
We setexplorationLimit() to10, which controls how deep the analysis goes. This is especially useful in larger codebases with complex flows.
We create aPartialFlow module withFlowExplorationFwd, meaning we are tracing flows from a specified source to any sink. If we want to start from a sink and trace back to any source, we’d useFlowExplorationRev with small changes in the query itself.See template forFlowExplorationRev.
Finally, we made changes to the from-where-select query to usePartialFlow::PartialPathNodes, and thePartialFlow::partialFlow predicate.

Running the query gives us one result, which ends atconfig_file in thewith open(config_file.name, 'rb') as f: line. This means CodeQL didn’t propagate to thename attribute inconfig_file.name.

VS Code screenshot of a code path from def load_config_from_file(config_file) to config_file in open(config_file.name, 'rb') call

Theconfig_name here is an instance ofgr.File, which has thename attribute, which stores the path to the uploaded file.

Quite often, if an object is tainted, we can’t tell if all of its attributes are tainted as well. By default, CodeQL would not propagate to an object’s attributes. As such, we need to help taint propagate from an object to itsname attribute by writing a taint step.

Taint step

The quickest way, though not the prettiest, would be to write a taint step to propagate from any object to that object’sname attribute. This is naturally not something we’d like to include in production CodeQL queries, since it might lead to false positives. For our use case it’s fine, since we are writing the query for security research.

We add a taint step into a taint tracking configuration by using anisAdditionalFlowStep predicate. This taint step will allow CodeQL to propagate to any read of aname attribute. We specify the two nodes that we want to connect —nodeFrom andnodeTo — and how they should be connected.nodeFrom is a node that accessesname attribute, andnodeTo is the node that represents the attribute read.

predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    exists(DataFlow::AttrRead attr |        attr.accesses(nodeFrom, "name")        and nodeTo = attr    )}

Let’s make it a separate predicate for easier testing, and plug it into our partial path graph query.

/** * @name Gradio Button partial path graph * @description This query tracks data flow from Gradio's Button component to any sink. * @kind path-problem * @problem.severity warning * @id 5/4 */import pythonimport semmle.python.ApiGraphsimport semmle.python.Conceptsimport semmle.python.dataflow.new.RemoteFlowSourcesimport semmle.python.dataflow.new.TaintTracking// import MyFlow::PathGraphimport PartialFlow::PartialPathGraphclass GradioButton extends RemoteFlowSource::Range {    GradioButton() {        exists(API::CallNode n |        n = API::moduleImport("gradio").getMember("Button").getReturn()        .getMember("click").getACall() |        this = n.getParameter(0, "fn").getParameter(_).asSource())    }    override string getSourceType() { result = "Gradio untrusted input" }}predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    // Connects an attribute read of an object's `name` attribute to the object itself    exists(DataFlow::AttrRead attr |      attr.accesses(nodeFrom, "name")      and nodeTo = attr    )}private module MyConfig implements DataFlow::ConfigSig {    predicate isSource(DataFlow::Node source) { source instanceof GradioButton }    predicate isSink(DataFlow::Node sink) { exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }    predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    nameAttrRead(nodeFrom, nodeTo)    }}module MyFlow = TaintTracking::Global<MyConfig>;int explorationLimit() { result = 10 }module PartialFlow = MyFlow::FlowExplorationFwd<explorationLimit/0>;from PartialFlow::PartialPathNode source, PartialFlow::PartialPathNode sinkwhere PartialFlow::partialFlow(source, sink, _)select sink.getNode(), source, sink, "Partial Graph $@.", source.getNode(), "user-provided value."

Running the query gives us two results. In the second path, we see that the taint propagated toconfig_file.name, but not further. What happened?

VS Code screenshot of a code path from `def load_config_from_file(config_file)` to `config_file.name` in `open(config_file.name, 'rb')` call

Taint step… again?

The specific piece of code turned out to be a bit of a special case. I mentioned earlier that this vulnerability is essentially a “second order” vulnerability — we first upload a malicious file, then load that locally stored file. Generally in these cases it’s the path to the file that we consider as tainted, and not the contents of the file itself, so CodeQL wouldn’t normally propagate here. In our case, in Gradio, we do control the file that is being loaded.

That’s why we need another taint step to propagate fromconfig_file.name toopen(config_file.name, 'rb').

We can write a predicate that would propagate from the argument toopen() to the result ofopen() (and also from the argument toos.open toos.open call since we are at it).

predicate osOpenStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    // Connects the argument to `open()` to the result of `open()`    // And argument to `os.open()` to the result of `os.open()`    exists(API::CallNode call |        call = API::moduleImport("os").getMember("open").getACall() and        nodeFrom = call.getArg(0) and        nodeTo = call)    or    exists(API::CallNode call |        call = API::builtin("open").getACall() and        nodeFrom = call.getArg(0) and        nodeTo = call)}

Then we can add this second taint step toisAdditionalFlowStep.

predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    nameAttrRead(nodeFrom, nodeTo)    or    osOpenStep(nodeFrom, nodeTo)}

Let’s add the taint step to a final taint tracking query, and make it a normal taint tracking query again.

/** * @name Gradio File Input Flow * @description This query tracks data flow from Gradio's Button component to a Decoding sink. * @kind path-problem * @problem.severity warning * @id 5/5 */import pythonimport semmle.python.ApiGraphsimport semmle.python.Conceptsimport semmle.python.dataflow.new.RemoteFlowSourcesimport semmle.python.dataflow.new.TaintTrackingimport MyFlow::PathGraphclass GradioButton extends RemoteFlowSource::Range {    GradioButton() {        exists(API::CallNode n |        n = API::moduleImport("gradio").getMember("Button").getReturn()        .getMember("click").getACall() |        this = n.getParameter(0, "fn").getParameter(_).asSource())    }    override string getSourceType() { result = "Gradio untrusted input" }}predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    // Connects an attribute read of an object's `name` attribute to the object itself    exists(DataFlow::AttrRead attr |      attr.accesses(nodeFrom, "name")      and nodeTo = attr    )}predicate osOpenStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    // Connects the argument to `open()` to the result of `open()`    // And argument to `os.open()` to the result of `os.open()`    exists(API::CallNode call |        call = API::moduleImport("os").getMember("open").getACall() and        nodeFrom = call.getArg(0) and        nodeTo = call)    or    exists(API::CallNode call |        call = API::builtin("open").getACall() and        nodeFrom = call.getArg(0) and        nodeTo = call)}private module MyConfig implements DataFlow::ConfigSig {    predicate isSource(DataFlow::Node source) { source instanceof GradioButton }    predicate isSink(DataFlow::Node sink) {        exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }    predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {        nameAttrRead(nodeFrom, nodeTo)        or        osOpenStep(nodeFrom, nodeTo)        }}module MyFlow = TaintTracking::Global<MyConfig>;from MyFlow::PathNode source, MyFlow::PathNode sinkwhere MyFlow::flowPath(source, sink)select sink.getNode(), source, sink, "Data Flow from a Gradio source to decoding"

Running the query provides one result — the vulnerability we’ve been looking for! 🎉

VS Code screenshot of a code path from `def load_config_from_file(config_file)` to `f` in `pickle.load(f)` sink

A prettier taint step

Note that the CodeQL written in this section is very specific to Gradio, and you’re unlikely to encounter similar modeling in other frameworks. What follows is a more advanced version of the previous taint step, which I added for those of you who want to dig deeper into writing a more maintainable solution to this taint step problem. You are unlikely to need to write this kind of granular CodeQL as a security researcher, but if you use CodeQL at work, this section might come in handy.

As we’ve mentioned, the taint step that propagates taint through aname attribute read on any object is a hacky solution. Not every object that propagates taint throughname read would cause a vulnerability. We’d like to limit the taint step to only propagate similarly to this case — only forgr.File type.

But we encounter a problem — Gradio sources are modeled as any parameters passed to function ingr.Button.click event handlers, so CodeQL is not aware of what type a given argument passed to a function ingr.Button.click is. For that reason, we can’t easily write a straightforward taint step that would check if the source is ofgr.File type before propagating to aname attribute.

We have to “look back” to where the source was instantiated, check its type, and later connect that object to aname attribute read.

Recall our minimal code example.

import pickleimport gradio as grdef load_config_from_file(config_file):    """Load settings from a UUID.pkl file."""    try:        with open(config_file.name, 'rb') as f:            settings = pickle.load(f)        return settings    except Exception as e:        return f"Error loading configuration: {str(e)}"with gr.Blocks(title="Configuration Loader") as demo:    config_file_input = gr.File(label="Load Config File")    load_config_button = gr.Button("Load Existing Config From File", variant="primary")    config_status = gr.Textbox(label="Status")    load_config_button.click(        fn=load_config_from_file,        inputs=[config_file_input],        outputs=[config_status]    )demo.launch()

Taint steps work by creating an edge (a connection) between two specified nodes. In our case, we are looking to connect two sets of nodes, which are on the same path.

First, we want CodeQL to connect the variables passed toinputs (hereconfig_file_input) in e.g.gr.Button.click and connect it to the parameterconfig_file in theload_config_from_file function. This way it will be able to propagate back to the instantiation, toconfig_file_input = gr.File(label="Load Config File").

Second, we want CodeQL to propagate from the nodes that we checked are ofgr.File type, to the cases where they read thename attribute.

Funnily enough, I’ve already written a taint step, calledListTaintStep that can track back to the instantiations, and evenwritten a section in the previous CodeQL zero to hero about it. We can reuse the implemented logic, and add it to our query. We’ll do it by modifying thenameAttrRead predicate.

predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    // Connects an attribute read of an object's `name` attribute to the object itself    exists(DataFlow::AttrRead attr |      attr.accesses(nodeFrom, "name")      and nodeTo = attr    )    and    exists(API::CallNode node, int i, DataFlow::Node n1, DataFlow::Node n2 |node = API::moduleImport("gradio").getAMember().getReturn().getAMember().getACall() and        n2 = node.getParameter(0, "fn").getParameter(i).asSource()        and n1.asCfgNode() =          node.getParameter(1, "inputs").asSink().asCfgNode().(ListNode).getElement(i)        and n1.getALocalSource() = API::moduleImport("gradio").getMember("File").getReturn().asSource()        and (DataFlow::localFlow(n2, nodeFrom) or DataFlow::localFlow(nodeTo, n1))        )}

The taint step connects any object to that object’sname read (like before). Then, it looks for the function passed tofn, variables passed toinputs in e.g.gr.Button.click and connects the variables ininputs to the parameters given to the functionfn by using an integeri to keep track of position of the variables.

Then, by using:

nodeFrom.getALocalSource()        = API::moduleImport("gradio").getMember("File").getReturn().asSource()

We check that the node we are tracking is ofgr.File type.

and (DataFlow::localFlow(n2, nodeFrom) or DataFlow::localFlow(nodeTo, n1)

At last, we check that there is a local flow (with any number of path steps) between thefn function parametern2 and an attribute readnodeFrom or that there is a local flow between specifically thename attribute readnodeTo, and a variable passed togr.Button.click’sinputs.

What we did is essentially two taint steps (we connect, that is create edges between two sets of nodes) connected by local flow, which combines them into one taint step. The reason we are making it into one taint step is because one condition can’t exist without the other. We uselocalFlow because there can be several steps between the connection we made from variables passed toinputs to the function defined infn ingr.Button.click and later reading thename attribute on an object.localFlow allows us to connect the two.

It looks complex, but it stems from how directed graphs work.

Full CodeQL query:

/** * @name Gradio File Input Flow * @description This query tracks data flow from Gradio's Button component to a Decoding sink. * @kind path-problem * @problem.severity warning * @id 5/6 */import pythonimport semmle.python.dataflow.new.DataFlowimport semmle.python.dataflow.new.TaintTrackingimport semmle.python.Conceptsimport semmle.python.dataflow.new.RemoteFlowSourcesimport semmle.python.ApiGraphsclass GradioButton extends RemoteFlowSource::Range {    GradioButton() {        exists(API::CallNode n |        n = API::moduleImport("gradio").getMember("Button").getReturn()        .getMember("click").getACall() |        this = n.getParameter(0, "fn").getParameter(_).asSource())    }    override string getSourceType() { result = "Gradio untrusted input" }}predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    // Connects an attribute read of an object's `name` attribute to the object itself    exists(DataFlow::AttrRead attr |      attr.accesses(nodeFrom, "name")      and nodeTo = attr    )    and    exists(API::CallNode node, int i, DataFlow::Node n1, DataFlow::Node n2 |node = API::moduleImport("gradio").getAMember().getReturn().getAMember().getACall() and        n2 = node.getParameter(0, "fn").getParameter(i).asSource()        and n1.asCfgNode() =          node.getParameter(1, "inputs").asSink().asCfgNode().(ListNode).getElement(i)        and n1.getALocalSource() = API::moduleImport("gradio").getMember("File").getReturn().asSource()        and (DataFlow::localFlow(n2, nodeFrom) or DataFlow::localFlow(nodeTo, n1))        )}predicate osOpenStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    exists(API::CallNode call |        call = API::moduleImport("os").getMember("open").getACall() and        nodeFrom = call.getArg(0) and        nodeTo = call)    or    exists(API::CallNode call |        call = API::builtin("open").getACall() and        nodeFrom = call.getArg(0) and        nodeTo = call)}module MyConfig implements DataFlow::ConfigSig {  predicate isSource(DataFlow::Node source) { source instanceof GradioButton }  predicate isSink(DataFlow::Node sink) {    exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput())  }  predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {    nameAttrRead(nodeFrom, nodeTo)    or    osOpenStep(nodeFrom, nodeTo)   }}import MyFlow::PathGraphmodule MyFlow = TaintTracking::Global<MyConfig>;from MyFlow::PathNode source, MyFlow::PathNode sinkwhere MyFlow::flowPath(source, sink)select sink.getNode(), source, sink, "Data Flow from a Gradio source to decoding"

Running the taint step will return a full path fromgr.File topickle.load(f).

A taint step in this form could be contributed to CodeQL upstream. However, this is a very specific taint step, which makes sense for some vulnerabilities, and not others. For example, it works for an unsafe deserialization vulnerability like described in the article, but not for path injection. That’s because this is a “second order” vulnerability — we control the uploaded file, but not its path (stored in “name”). For path injection vulnerabilities with sinks likeopen(file.name, ‘r’), this would be a false positive.

Conclusion

Some of the issues we encounter on theGHSL Slack around tracking taint can be a challenge. Cases like these don’t happen often, but when they do, it makes them a good candidate for sharing lessons learned and writing a blog post, like this one.

I hope my story of chasing taint helps you with debugging your queries. If, after trying out the tips in this blog, there are still issues with your query, feel free to ask for help on our publicGitHub Security Lab Slack instance or ingithub/codeql discussions.

Written by

Sylwia Budzynska

@sylwia-budzynska

Sylwia is a security researcher at GitHub Security Lab, where she works with finding vulnerabilities in open source software, helping secure the foundations on which all modern software is built upon.