Customizing library models for C#¶
You can model the methods and callables that control data flow in any framework or library. This is especially useful for custom frameworks or niche libraries, that are not supported by the standard CodeQL libraries.
Beta Notice - Unstable API
Library customization using data extensions is currently in beta and subject to change.
Breaking changes to this format may occur while in beta.
About this article¶
This article contains reference material about how to define custom models for sources, sinks, and flow summaries for C# dependencies in data extension files.
About data extensions¶
You can customize analysis by defining models (summaries, sinks, and sources) of your code’s C#/.NET dependencies in data extension files. Each model defines the behavior of one or more elements of your library or framework, such as methods, properties, and callables. When you run dataflow analysis, these models expand the potential sources and sinks tracked by dataflow analysis and improve the precision of results.
Most of the security queries search for paths from a source of untrusted input to a sink that represents a vulnerability. This is known as taint tracking. Each source is a starting point for dataflow analysis to track tainted data and each sink is an end point.
Taint tracking queries also need to know how data can flow through elements that are not included in the source code. These are modeled as summaries. A summary model enables queries to synthesize the flow behavior through elements in dependency code that is not stored in your repository.
Syntax used to define an element in an extension file¶
Each model of an element is defined using a data extension where each tuple constitutes a model.A data extension file to extend the standard C# queries included with CodeQL is a YAML file with the form:
extensions:-addsTo:pack:codeql/csharp-allextensible:<name of extensible predicate>data:-<tuple1>-<tuple2>-...
Each YAML file may contain one or more top-level extensions.
addsTo
defines the CodeQL pack name and extensible predicate that the extension is injected into.data
defines one or more rows of tuples that are injected as values into the extensible predicate. The number of columns and their types must match the definition of the extensible predicate.
Data extensions use union semantics, which means that the tuples of all extensions for a single extensible predicate are combined, duplicates are removed, and all of the remaining tuples are queryable by referencing the extensible predicate.
Publish data extension files in a CodeQL model pack to share¶
You can group one or more data extension files into a CodeQL model pack and publish it to the GitHub Container Registry. This makes it easy for anyone to download the model pack and use it to extend their analysis. For more information, seeCreating a CodeQL model pack andPublishing and using CodeQL packs in the CodeQL CLI documentation.
Extensible predicates used to create custom models in C#¶
The CodeQL library for C# analysis exposes the following extensible predicates:
sourceModel(namespace,type,subtypes,name,signature,ext,output,kind,provenance)
. This is used to model sources of potentially tainted data. Thekind
of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see “Threat models.”sinkModel(namespace,type,subtypes,name,signature,ext,input,kind,provenance)
. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable.summaryModel(namespace,type,subtypes,name,signature,ext,input,output,kind,provenance)
. This is used to model flow through elements.neutralModel(namespace,type,name,signature,kind,provenance)
. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such asmanual
orai-manual
) can be used to override generated summary models (those with a provenance such asdf-generated
), so that the summary model will be ignored. Other than that, neutral models have no effect.
The extensible predicates are populated using the models defined in data extension files.
Examples of custom model definitions¶
The examples in this section are taken from the standard CodeQL C# query pack published by GitHub. They demonstrate how to add tuples to extend extensible predicates that are used by the standard queries.
Example: Taint sink in theSystem.Data.SqlClient
namespace¶
This example shows how the C# query pack models the argument of theSqlCommand
constructor as a SQL injection sink.This is the constructor of theSqlCommand
class, which is located in theSystem.Data.SqlClient
namespace.
publicstaticvoidTaintSink(SqlConnectionconn,stringquery){SqlCommandcommand=newSqlCommand(query,connection)// The argument to this method is a SQL injection sink....}
We need to add a tuple to thesinkModel
(namespace, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate by updating a data extension file.
extensions:-addsTo:pack:codeql/csharp-allextensible:sinkModeldata:-["System.Data.SqlClient","SqlCommand",False,"SqlCommand","(System.String,System.Data.SqlClient.SqlConnection)","","Argument[0]","sql-injection","manual"]
Since we want to add a new sink, we need to add a tuple to thesinkModel
extensible predicate.The first five values identify the callable (in this case a method) to be modeled as a sink.
The first value
System.Data.SqlClient
is the namespace name.The second value
SqlCommand
is the name of the class (type) that contains the method.The third value
False
is a flag that indicates whether or not the sink also applies to all overrides of the method.The fourth value
SqlCommand
is the method name. Constructors are named after the class.The fifth value
(System.String,System.Data.SqlClient.SqlConnection)
is the method input type signature. The type names must be fully qualified.
The sixth value should be left empty and is out of scope for this documentation.The remaining values are used to define theaccesspath
, thekind
, and theprovenance
(origin) of the sink.
The seventh value
Argument[0]
is theaccesspath
to the first argument passed to the method, which means that this is the location of the sink.The eighth value
sql-injection
is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries.The ninth value
manual
is the provenance of the sink, which is used to identify the origin of the sink.
Example: Taint source from theSystem.Net.Sockets
namespace¶
This example shows how the C# query pack models the return value from theGetStream
method as aremote
source.This is theGetStream
method in theTcpClient
class, which is located in theSystem.Net.Sockets
namespace.
publicstaticvoidTainted(TcpClientclient){NetworkStreamstream=client.GetStream();// The return value of this method is a remote source of taint....}
We need to add a tuple to thesourceModel
(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file.
extensions:-addsTo:pack:codeql/csharp-allextensible:sourceModeldata:-["System.Net.Sockets","TcpClient",False,"GetStream","()","","ReturnValue","remote","manual"]
Since we are adding a new source, we need to add a tuple to thesourceModel
extensible predicate.The first five values identify the callable (in this case a method) to be modeled as a source.
The first value
System.Net.Sockets
is the namespace name.The second value
TcpClient
is the name of the class (type) that contains the source.The third value
False
is a flag that indicates whether or not the source also applies to all overrides of the method.The fourth value
GetStream
is the method name.The fifth value
()
is the method input type signature.
The sixth value should be left empty and is out of scope for this documentation.The remaining values are used to define theaccesspath
, thekind
, and theprovenance
(origin) of the source.
The seventh value
ReturnValue
is the access path to the return of the method, which means that it is the return value that should be considered a source of tainted input.The eighth value
remote
is the kind of the source. The source kind is used to define the threat model where the source is in scope.remote
applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query usesremote
sources. For more information, see “Threat models.”The ninth value
manual
is the provenance of the source, which is used to identify the origin of the source.
Example: Add flow through theConcat
method¶
This example shows how the C# query pack models flow through a method for a simple case.This pattern covers many of the cases where we need to summarize flow through a method that is stored in a library or framework outside the repository.
publicstaticvoidTaintFlow(strings1,strings2){stringt=String.Concat(s1,s2);// There is taint flow from s1 and s2 to t....}
We need to add tuples to thesummaryModel
(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file:
extensions:-addsTo:pack:codeql/csharp-allextensible:summaryModeldata:-["System","String",False,"Concat","(System.Object,System.Object)","","Argument[0]","ReturnValue","taint","manual"]-["System","String",False,"Concat","(System.Object,System.Object)","","Argument[1]","ReturnValue","taint","manual"]
Since we are adding flow through a method, we need to add tuples to thesummaryModel
extensible predicate.Each tuple defines flow from one argument to the return value.The first row defines flow from the first argument (s1
in the example) to the return value (t
in the example) and the second row defines flow from the second argument (s2
in the example) to the return value (t
in the example).
The first five values identify the callable (in this case a method) to be modeled as a summary.These are the same for both of the rows above as we are adding two summaries for the same method.
The first value
System
is the namespace name.The second value
String
is the class (type) name.The third value
False
is a flag that indicates whether or not the summary also applies to all overrides of the method.The fourth value
Concat
is the method name.The fifth value
(System.Object,System.Object)
is the method input type signature.
The sixth value should be left empty and is out of scope for this documentation.The remaining values are used to define theaccesspath
, thekind
, and theprovenance
(origin) of the summary.
The seventh value is the access path to the input (where data flows from).
Argument[0]
is the access path to the first argument (s1
in the example) andArgument[1]
is the access path to the second argument (s2
in the example).The eighth value
ReturnValue
is the access path to the output (where data flows to), in this caseReturnValue
, which means that the input flows to the return value.The ninth value
taint
is the kind of the flow.taint
means that taint is propagated through the call.The tenth value
manual
is the provenance of the summary, which is used to identify the origin of the summary.
It would also be possible to merge the two rows into one by using a comma-separated list in the seventh value. This would be useful if the method has many arguments and the flow is the same for all of them.
extensions:-addsTo:pack:codeql/csharp-allextensible:summaryModeldata:-["System","String",False,"Concat","(System.Object,System.Object)","","Argument[0,1]","ReturnValue","taint","manual"]
This row defines flow from both the first and the second argument to the return value. The seventh valueArgument[0,1]
is shorthand for specifying an access path to bothArgument[0]
andArgument[1]
.
Example: Add flow through theTrim
method¶
This example shows how the C# query pack models flow through a method for a simple case.
publicstaticvoidTaintFlow(strings){stringt=s.Trim();// There is taint flow from s to t....}
We need to add a tuple to thesummaryModel
(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file:
extensions:-addsTo:pack:codeql/csharp-allextensible:summaryModeldata:-["System","String",False,"Trim","()","","Argument[this]","ReturnValue","taint","manual"]
Since we are adding flow through a method, we need to add tuples to thesummaryModel
extensible predicate.Each tuple defines flow from one argument to the return value.The first row defines flow from the qualifier of the method call (s1
in the example) to the return value (t
in the example).
The first five values identify the callable (in this case a method) to be modeled as a summary.These are the same for both of the rows above as we are adding two summaries for the same method.
The first value
System
is the namespace name.The second value
String
is the class (type) name.The third value
False
is a flag that indicates whether or not the summary also applies to all overrides of the method.The fourth value
Trim
is the method name.The fifth value
()
is the method input type signature.
The sixth value should be left empty and is out of scope for this documentation.The remaining values are used to define theaccesspath
, thekind
, and theprovenance
(origin) of the summary.
The seventh value is the access path to the input (where data flows from).
Argument[this]
is the access path to the qualifier (s
in the example).The eighth value
ReturnValue
is the access path to the output (where data flows to), in this caseReturnValue
, which means that the input flows to the return value.The ninth value
taint
is the kind of the flow.taint
means that taint is propagated through the call.The tenth value
manual
is the provenance of the summary, which is used to identify the origin of the summary.
Example: Add flow through theSelect
method¶
This example shows how the C# query pack models a more complex flow through a method.Here we model flow through higher order methods and collection types, as well as how to handle extension methods and generics.
publicstaticvoidTaintFlow(IEnumerable<string>stream){IEnumerable<string>lines=stream.Select(item=>item+"\n");...}
We need to add tuples to thesummaryModel
(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file:
extensions:-addsTo:pack:codeql/csharp-allextensible:summaryModeldata:-["System.Linq","Enumerable",False,"Select<TSource,TResult>","(System.Collections.Generic.IEnumerable<TSource>,System.Func<TSource,TResult>)","","Argument[0].Element","Argument[1].Parameter[0]","value","manual"]-["System.Linq","Enumerable",False,"Select<TSource,TResult>","(System.Collections.Generic.IEnumerable<TSource>,System.Func<TSource,TResult>)","","Argument[1].ReturnValue","ReturnValue.Element","value","manual"]
Since we are adding flow through a method, we need to add tuples to thesummaryModel
extensible predicate.Each tuple defines part of the flow that comprises the total flow through theSelect
method.The first five values identify the callable (in this case a method) to be modeled as a summary.These are the same for both of the rows above as we are adding two summaries for the same method.
The first value
System.Linq
is the namespace name.The second value
Enumerable
is the class (type) name.The third value
False
is a flag that indicates whether or not the summary also applies to all overrides of the method.The fourth value
Select<TSource,TResult>
is the method name, along with the type parameters for the method. The names of the generic type parameters provided in the model must match the names of the generic type parameters in the method signature in the source code.The fifth value
(System.Collections.Generic.IEnumerable<TSource>,System.Func<TSource,TResult>)
is the method input type signature. The generics in the signature must match the generics in the method signature in the source code.
The sixth value should be left empty and is out of scope for this documentation.The remaining values are used to define theaccesspath
, thekind
, and theprovenance
(origin) of the summary definition.
The seventh value is the access path to the
input
(where data flows from).The eighth value is the access path to the
output
(where data flows to).
For the first row:
The seventh value is
Argument[0].Element
, which is the access path to the elements of the qualifier (the elements of the enumerablestream
in the example).The eight value is
Argument[1].Parameter[0]
, which is the access path to the first parameter of theSystem.Func<TSource,TResult>
argument ofSelect
(the lambda parameteritem
in the example).
For the second row:
The seventh value is
Argument[1].ReturnValue
, which is the access path to the return value of theSystem.Func<TSource,TResult>
argument ofSelect
(the return value of the lambda in the example).The eighth value is
ReturnValue.Element
, which is the access path to the elements of the return value ofSelect
(the elements of the enumerablelines
in the example).
For the remaining values for both rows:
The ninth value
value
is the kind of the flow.value
means that the value is preserved.The tenth value
manual
is the provenance of the summary, which is used to identify the origin of the summary.
That is, the first row specifies that values can flow from the elements of the qualifier enumerable into the first argument of the function provided toSelect
. The second row specifies that values can flow from the return value of the function to the elements of the enumerable returned fromSelect
.
Example: Add aneutral
method¶
This example shows how we can model a method as being neutral with respect to flow. We will also cover how to model a property by modeling the getter of theNow
property of theDateTime
class as neutral.A neutral model is used to define that there is no flow through a method.
publicstaticvoidTaintFlow(){System.DateTimet=System.DateTime.Now;// There is no flow from Now to t....}
We need to add a tuple to theneutralModel
(namespace, type, name, signature, kind, provenance) extensible predicate by updating a data extension file.
extensions:-addsTo:pack:codeql/csharp-allextensible:neutralModeldata:-["System","DateTime","get_Now","()","summary","manual"]
Since we are adding a neutral model, we need to add tuples to theneutralModel
extensible predicate.The first four values identify the callable (in this case the getter of theNow
property) to be modeled as a neutral, the fifth value is the kind, and the sixth value is the provenance (origin) of the neutral.
The first value
System
is the namespace name.The second value
DateTime
is the class (type) name.The third value
get_Now
is the method name. Getter and setter methods are namedget_<name>
andset_<name>
respectively.The fourth value
()
is the method input type signature.The fifth value
summary
is the kind of the neutral.The sixth value
manual
is the provenance of the neutral.
Threat models¶
Note
Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C#, Python and JavaScript/TypeScript analysis.
A threat model is a named class of dataflow sources that can be enabled or disabled independently. Threat models allow you to control the set of dataflow sources that you want to consider unsafe. For example, one codebase may only consider remote HTTP requests to be tainted, whereas another may also consider data from local files to be unsafe. You can use threat models to ensure that the relevant taint sources are used in a CodeQL analysis.
Thekind
property of thesourceModel
determines which threat model a source is associated with. There are two main categories:
remote
which represents requests and responses from the network.local
which represents data from local files (file
), command-line arguments (commandargs
), database reads (database
), environment variables(environment
), standard input (stdin
) and Windows registry values (“windows-registry”). Currently, Windows registry values are used by C# only.
Note that subcategories can be turned included or excluded separately, so you can specifylocal
withoutdatabase
, or justcommandargs
andenvironment
without the rest oflocal
.
The less commonly used categories are:
android
which represents reads from external files in Android (android-external-storage-dir
) and parameter of an entry-point method declared in aContentProvider
class (contentprovider
). Currently only used by Java/Kotlin.database-access-result
which represents a database access. Currently only used by JavaScript.file-write
which represents opening a file in write mode. Currently only used in C#.reverse-dns
which represents reverse DNS lookups. Currently only used in Java.view-component-input
which represents inputs to a React, Vue, or Angular component (also known as “props”). Currently only used by JavaScript/TypeScript.
When running a CodeQL analysis, theremote
threat model is included by default. You can optionally include other threat models as appropriate when using the CodeQL CLI and in GitHub code scanning. For more information, seeAnalyzing your code with CodeQL queries andCustomizing your advanced setup for code scanning.