github/codeqlPublic

NotificationsYou must be signed in to change notification settings
Fork1.9k
Star9k

[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate#18853

jackfromeast started this conversation inGeneral

jackfromeast

Feb 24, 2025

· 3 comments· 1 reply

Return to top

Discussion options

jackfromeast
Feb 24, 2025

I am trying to track how many times theget operation is performed on the return object. However, my current CodeQL query is unable to correctly distinguish between different numbers of get operations.

The functions below demonstrate my expected flow states:

def test1(obj, key):        # Source: Key, SourceKeyFlowState  for _ in range(2):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState  return obj                # Sink: obj, ObjectLayerTwoFlowStatedef test2(obj, key):        # Source: Key, SourceKeyFlowState  for _ in range(1):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState  return obj                # Sink: obj, ObjectLayerOneFlowState

Expected Taint Flows:

test1:SourceKeyFlowState →ObjectFlowState-MoreThanOne
test2:SourceKeyFlowState →ObjectFlowState-One

However, my current CodeQL query does not correctly distinguish between these cases and instead selects all possible four flows.

import pythonimport semmle.python.ApiGraphsimport codeql.dataflow.DataFlowimport semmle.python.dataflow.new.DataFlowimport semmle.python.dataflow.new.TaintTrackingimport semmle.python.dataflow.new.internal.DataFlowPublicimport semmle.python.dataflow.new.internal.TaintTrackingPublicmodule TempTest {abstract class FlowState extends string {  bindingset[this]  FlowState() { any() }}class SourceKeyFlowState extends TempTest::FlowState {  SourceKeyFlowState() { this = "SourceKeyFlowState" }}class Occurrence extends string {  Occurrence() {     this = "One" or    this = "MoreThanOne"  }}class ObjectFlowState extends TempTest::FlowState {  Occurrence occur;  ObjectFlowState() { this = "ObjectFlowState" + "-" + occur }}module TrackingGetOperationConfiguration implements DataFlow::StateConfigSig {  class FlowState = TempTest::FlowState;  predicate isSource(DataFlow::Node source, FlowState state) {    exists (Function func, Parameter param |       func.getArg(1) = param and      source.asExpr() = param and      func.getName().matches("test%")    ) and    (      state instanceof SourceKeyFlowState    )  }  predicate isSink(DataFlow::Node sink, FlowState state) {    exists (Function func, Return ret |       ret.getScope() = func.getEvaluatingScope() and      ret.contains(sink.asExpr()) and      func.getName().matches("test%")    ) and    (      state instanceof ObjectFlowState    )  }  predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {    exists( MethodCallNode callNode, Call call |      callNode.asExpr() = call and      callNode.getMethodName() = "get" and      callNode.asExpr() = toNode.asExpr() and      call.getArg(0) = fromNode.asExpr()    ) and     fromState instanceof SourceKeyFlowState and    toState instanceof ObjectFlowState  }}module TrackingGetOperationFlow = DataFlow::GlobalWithState<TrackingGetOperationConfiguration>;module Flow = TrackingGetOperationFlow; // For shortening the namepredicate run(Flow::PathNode source, Flow::PathNode sink, FlowState state) {  Flow::flowPath(source, sink) and  sink.getState() = state}}

I think the key problem here is that in theisAdditionalFlowStep predicate, I cannot select the path node beside thefromNode andtoNode, and use its flow state information to determine the flow state oftoNode. In my case, the taint propagation step should take into account both the flow state of the key (fromNode) and the flow state of the base object to correctly determine the flow state oftoNode.

Could anyone provide suggestions on how to fix this issue? Any insights would be greatly appreciated!

You must be logged in to vote

Replies: 3 comments 1 reply

Comment options

superboy-zjc
Feb 24, 2025

I have the same question!

You must be logged in to vote

0 replies

Comment options

aschackmull
Feb 25, 2025
Maintainer

You cannot inspect flow to bothobj andkey at the same time in order to determine the output state atobj.get(key) in that way - this is because the calculated flow is a path that goes from one node to another one step at a time. However, you can do something that's close and might actually get you what you want: You can use two additional steps: One fromkey to the output that updates the state fromSourceKeyFlowState toObjectFlowState-One and one from the qualifierobj to the output that updates the state fromObjectFlowState-One toObjectFlowState-MoreThanOne. Your current step definition mostly achieves the former, except that you've given the resulting state astoState instanceof ObjectFlowState which includes bothOne andMoreThanOne. You probably want to change that tostate = "ObjectFlowState-One". And then supplement with the other step. The two steps might look something like this:

  predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {    exists( MethodCallNode callNode, Call call |      callNode.asExpr() = call and      callNode.getMethodName() = "get" and      callNode.asExpr() = toNode.asExpr()    |      call.getArg(0) = fromNode.asExpr() and      fromState instanceof SourceKeyFlowState and      toState = "ObjectFlowState-One"      or      call.getQualifier() = fromNode.asExpr() and      fromState = "ObjectFlowState-One" and      toState = "ObjectFlowState-MoreThanOne"    )  }

Note that you cannot expect the out-of-the-box analysis to distinguishfor _ in range(1): andfor _ in range(2):. Determining the number of times a loop is executed in general is halting-problem territory.

You must be logged in to vote

0 replies

Comment options

jackfromeast
Feb 25, 2025
Author

Hi@aschackmull! Thank you for your answer!

However, I’m still trying to differentiate between the following two cases:

def test1(obj, key):        # Source: Key, SourceKeyFlowState  for _ in range(2):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState  return obj                # Sink: obj, ObjectLayerTwoFlowStatedef test3(obj, key):        # Source: Key, SourceKeyFlowState  key2 = "x"                # Not Source Key  for _ in range(1):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState    obj = obj.get(key2)     # Should not propagate the taint flow as key2 is not tainted with SourceKeyFlowState  return obj                # Sink: obj, ObjectLayerOneFlowState

If I add two additional taint flow steps for theget operation—one propagating from key to val, and the other from obj to val—I cannot make sure whether they refer to the sameget operation. Also, please ignore the for loop times, it is just for demonstration.

If this (propagating taint flow based on multiple nodes' taint flow state in theisAdditionalFlowStep) is challenging for CodeQL to achieve, do you think we could support this in the future?

You must be logged in to vote

1 reply

Comment options

aschackmull Feb 26, 2025
Maintainer

It's perfectly doable, you just need a slightly different setup. Let's consider your step here:

obj = obj.get(key)  # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState

If you're just tracking flow via a single path then you're either stepping fromobj or fromkey, so you can't rely on state propagation from both. However, you could do an initial flow calculation that only deals with the propagation of source keys. Then once that's done you can restrict the additional steps in your second flow calculation to those qualifiers inobj.get(key) for whichkey has flow in the first flow calculation. You can use FlowState in your second flow definition to track the One vs MoreThanOne, but you probably don't need FlowState in the first - there you just need to find thoseget call for which the argument comes from a key source.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate#18853

Uh oh!

{{title}}

Uh oh!

jackfromeast
Feb 24, 2025

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

superboy-zjc
Feb 24, 2025

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

aschackmull
Feb 25, 2025
Maintainer

Uh oh!

{{title}}

Uh oh!

jackfromeast
Feb 25, 2025
Author

Uh oh!

{{title}}

Uh oh!

aschackmull Feb 26, 2025
Maintainer

Select a reply

Uh oh!

Movatterモバイル変換

[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate#18853

Uh oh!

jackfromeastFeb 24, 2025

Replies: 3 comments· 1 reply

Uh oh!

superboy-zjcFeb 24, 2025

Uh oh!

Uh oh!

aschackmullFeb 25, 2025 Maintainer

Uh oh!

jackfromeastFeb 25, 2025 Author

Uh oh!

aschackmullFeb 26, 2025 Maintainer

Uh oh!

jackfromeast
Feb 24, 2025

Replies: 3 comments 1 reply

superboy-zjc
Feb 24, 2025

aschackmull
Feb 25, 2025
Maintainer

jackfromeast
Feb 25, 2025
Author

aschackmull Feb 26, 2025
Maintainer