Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate#18853

jackfromeast started this conversation inGeneral
Discussion options

I am trying to track how many times theget operation is performed on the return object. However, my current CodeQL query is unable to correctly distinguish between different numbers of get operations.

The functions below demonstrate my expected flow states:

def test1(obj, key):        # Source: Key, SourceKeyFlowState  for _ in range(2):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState  return obj                # Sink: obj, ObjectLayerTwoFlowStatedef test2(obj, key):        # Source: Key, SourceKeyFlowState  for _ in range(1):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState  return obj                # Sink: obj, ObjectLayerOneFlowState

Expected Taint Flows:

  • test1:SourceKeyFlowStateObjectFlowState-MoreThanOne
  • test2:SourceKeyFlowStateObjectFlowState-One

However, my current CodeQL query does not correctly distinguish between these cases and instead selects all possible four flows.

import pythonimport semmle.python.ApiGraphsimport codeql.dataflow.DataFlowimport semmle.python.dataflow.new.DataFlowimport semmle.python.dataflow.new.TaintTrackingimport semmle.python.dataflow.new.internal.DataFlowPublicimport semmle.python.dataflow.new.internal.TaintTrackingPublicmodule TempTest {abstract class FlowState extends string {  bindingset[this]  FlowState() { any() }}class SourceKeyFlowState extends TempTest::FlowState {  SourceKeyFlowState() { this = "SourceKeyFlowState" }}class Occurrence extends string {  Occurrence() {     this = "One" or    this = "MoreThanOne"  }}class ObjectFlowState extends TempTest::FlowState {  Occurrence occur;  ObjectFlowState() { this = "ObjectFlowState" + "-" + occur }}module TrackingGetOperationConfiguration implements DataFlow::StateConfigSig {  class FlowState = TempTest::FlowState;  predicate isSource(DataFlow::Node source, FlowState state) {    exists (Function func, Parameter param |       func.getArg(1) = param and      source.asExpr() = param and      func.getName().matches("test%")    ) and    (      state instanceof SourceKeyFlowState    )  }  predicate isSink(DataFlow::Node sink, FlowState state) {    exists (Function func, Return ret |       ret.getScope() = func.getEvaluatingScope() and      ret.contains(sink.asExpr()) and      func.getName().matches("test%")    ) and    (      state instanceof ObjectFlowState    )  }  predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {    exists( MethodCallNode callNode, Call call |      callNode.asExpr() = call and      callNode.getMethodName() = "get" and      callNode.asExpr() = toNode.asExpr() and      call.getArg(0) = fromNode.asExpr()    ) and     fromState instanceof SourceKeyFlowState and    toState instanceof ObjectFlowState  }}module TrackingGetOperationFlow = DataFlow::GlobalWithState<TrackingGetOperationConfiguration>;module Flow = TrackingGetOperationFlow; // For shortening the namepredicate run(Flow::PathNode source, Flow::PathNode sink, FlowState state) {  Flow::flowPath(source, sink) and  sink.getState() = state}}

I think the key problem here is that in theisAdditionalFlowStep predicate, I cannot select the path node beside thefromNode andtoNode, and use its flow state information to determine the flow state oftoNode. In my case, the taint propagation step should take into account both the flow state of the key (fromNode) and the flow state of the base object to correctly determine the flow state oftoNode.

Could anyone provide suggestions on how to fix this issue? Any insights would be greatly appreciated!

You must be logged in to vote

Replies: 3 comments 1 reply

Comment options

I have the same question!

You must be logged in to vote
0 replies
Comment options

You cannot inspect flow to bothobj andkey at the same time in order to determine the output state atobj.get(key) in that way - this is because the calculated flow is a path that goes from one node to another one step at a time. However, you can do something that's close and might actually get you what you want: You can use two additional steps: One fromkey to the output that updates the state fromSourceKeyFlowState toObjectFlowState-One and one from the qualifierobj to the output that updates the state fromObjectFlowState-One toObjectFlowState-MoreThanOne. Your current step definition mostly achieves the former, except that you've given the resulting state astoState instanceof ObjectFlowState which includes bothOne andMoreThanOne. You probably want to change that tostate = "ObjectFlowState-One". And then supplement with the other step. The two steps might look something like this:

  predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {    exists( MethodCallNode callNode, Call call |      callNode.asExpr() = call and      callNode.getMethodName() = "get" and      callNode.asExpr() = toNode.asExpr()    |      call.getArg(0) = fromNode.asExpr() and      fromState instanceof SourceKeyFlowState and      toState = "ObjectFlowState-One"      or      call.getQualifier() = fromNode.asExpr() and      fromState = "ObjectFlowState-One" and      toState = "ObjectFlowState-MoreThanOne"    )  }

Note that you cannot expect the out-of-the-box analysis to distinguishfor _ in range(1): andfor _ in range(2):. Determining the number of times a loop is executed in general is halting-problem territory.

You must be logged in to vote
0 replies
Comment options

Hi@aschackmull! Thank you for your answer!

However, I’m still trying to differentiate between the following two cases:

def test1(obj, key):        # Source: Key, SourceKeyFlowState  for _ in range(2):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState  return obj                # Sink: obj, ObjectLayerTwoFlowStatedef test3(obj, key):        # Source: Key, SourceKeyFlowState  key2 = "x"                # Not Source Key  for _ in range(1):    obj = obj.get(key)      # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState                             # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState    obj = obj.get(key2)     # Should not propagate the taint flow as key2 is not tainted with SourceKeyFlowState  return obj                # Sink: obj, ObjectLayerOneFlowState

If I add two additional taint flow steps for theget operation—one propagating from key to val, and the other from obj to val—I cannot make sure whether they refer to the sameget operation. Also, please ignore the for loop times, it is just for demonstration.

If this (propagating taint flow based on multiple nodes' taint flow state in theisAdditionalFlowStep) is challenging for CodeQL to achieve, do you think we could support this in the future?

You must be logged in to vote
1 reply
@aschackmull
Comment options

It's perfectly doable, you just need a slightly different setup. Let's consider your step here:

obj = obj.get(key)  # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState

If you're just tracking flow via a single path then you're either stepping fromobj or fromkey, so you can't rely on state propagation from both. However, you could do an initial flow calculation that only deals with the propagation of source keys. Then once that's done you can restrict the additional steps in your second flow calculation to those qualifiers inobj.get(key) for whichkey has flow in the first flow calculation. You can use FlowState in your second flow definition to track the One vs MoreThanOne, but you probably don't need FlowState in the first - there you just need to find thoseget call for which the argument comes from a key source.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
3 participants
@jackfromeast@aschackmull@superboy-zjc

[8]ページ先頭

©2009-2025 Movatter.jp