CodeQL library for Java and Kotlin¶
When you’re analyzing a Java/Kotlin program, you can make use of the large collection of classes in the CodeQL library for Java/Kotlin.
About the CodeQL library for Java and Kotlin¶
There is an extensive library for analyzing CodeQL databases extracted from Java/Kotlin projects. The classes in this library present the data from a database in an object-oriented form and provide abstractions and predicates to help you with common analysis tasks.
The library is implemented as a set of QL modules, that is, files with the extension.qll
. The modulejava.qll
imports all the core Java library modules, so you can include the complete library by beginning your query with:
importjava
The rest of this article briefly summarizes the most important classes and predicates provided by this library.
Note
The example queries in this article illustrate the types of results returned by different library classes. The results themselves are not interesting but can be used as the basis for developing a more complex query. The other articles in this section of the help show how you can take a simple query and fine-tune it to find precisely the results you’re interested in.
Summary of the library classes¶
The most important classes in the standard Java/Kotlin library can be grouped into five main categories:
Classes for representing program elements (such as classes and methods)
Classes for representing AST nodes (such as statements and expressions)
Classes for representing metadata (such as annotations and comments)
Classes for computing metrics (such as cyclomatic complexity and coupling)
Classes for navigating the program’s call graph
We will discuss each of these in turn, briefly describing the most important classes for each category.
Program elements¶
These classes represent named program elements: packages (Package
), compilation units (CompilationUnit
), types (Type
), methods (Method
), constructors (Constructor
), and variables (Variable
).
Their common superclass isElement
, which provides general member predicates for determining the name of a program element and checking whether two elements are nested inside each other.
It’s often convenient to refer to an element that might either be a method or a constructor; the classCallable
, which is a common superclass ofMethod
andConstructor
, can be used for this purpose.
Types¶
ClassType
has a number of subclasses for representing different kinds of types:
PrimitiveType
represents aprimitive type, that is, one ofboolean
,byte
,char
,double
,float
,int
,long
,short
; QL also classifiesvoid
and<nulltype>
(the type of thenull
literal) as primitive types.RefType
represents a reference (that is, non-primitive) type; it in turn has several subclasses:Class
represents a Java class.Interface
represents a Java interface.EnumType
represents a Javaenum
type.Array
represents a Java array type.
For example, the following query finds all variables of typeint
in the program:
importjavafromVariablev,PrimitiveTypeptwherept=v.getType()andpt.hasName("int")selectv
You’re likely to get many results when you run this query because most projects contain many variables of typeint
.
Reference types are also categorized according to their declaration scope:
TopLevelType
represents a reference type declared at the top-level of a compilation unit.NestedType
is a type declared inside another type.
For instance, this query finds all top-level types whose name is not the same as that of their compilation unit:
importjavafromTopLevelTypetlwheretl.getName()!=tl.getCompilationUnit().getName()selecttl
You will typically see this pattern in the source code of a repository, with many more instances in the files referenced by the source code.
Several more specialized classes are available as well:
TopLevelClass
represents a class declared at the top-level of a compilation unit.NestedClass
representsa class declared inside another type, such as:A
LocalClass
, which isa class declared inside a method or constructor.An
AnonymousClass
, which is ananonymous class.
Finally, the library also has a number of singleton classes that wrap frequently used Java standard library classes:TypeObject
,TypeCloneable
,TypeRuntime
,TypeSerializable
,TypeString
,TypeSystem
andTypeClass
. Each CodeQL class represents the standard Java class suggested by its name.
As an example, we can write a query that finds all nested classes that directly extendObject
:
importjavafromNestedClassncwherenc.getASupertype()instanceofTypeObjectselectnc
You’re likely to get many results when you run this query because many projects include nested classes that extendObject
directly.
Generics¶
There are also several subclasses ofType
for dealing with generic types.
AGenericType
is either aGenericInterface
or aGenericClass
. It represents a generic type declaration such as interfacejava.util.Map
from the Java standard library:
packagejava.util.;publicinterfaceMap<K,V>{intsize();// ...}
Type parameters, such asK
andV
in this example, are represented by classTypeVariable
.
A parameterized instance of a generic type provides a concrete type to instantiate the type parameter with, as inMap<String,File>
. Such a type is represented by aParameterizedType
, which is distinct from theGenericType
representing the generic type it was instantiated from. To go from aParameterizedType
to its correspondingGenericType
, you can use predicategetSourceDeclaration
.
For instance, we could use the following query to find all parameterized instances ofjava.util.Map
:
importjavafromGenericInterfacemap,ParameterizedTypeptwheremap.hasQualifiedName("java.util","Map")andpt.getSourceDeclaration()=mapselectpt
In general, generic types may restrict which types a type parameter can be bound to. For instance, a type of maps from strings to numbers could be declared as follows:
classStringToNumMap<NextendsNumber>implementsMap<String,N>{// ...}
This means that a parameterized instance ofStringToNumberMap
can only instantiate type parameterN
with typeNumber
or one of its subtypes but not, for example, withFile
. We say that N is a bounded type parameter, withNumber
as its upper bound. In QL, a type variable can be queried for its type bound using predicategetATypeBound
. The type bounds themselves are represented by classTypeBound
, which has a member predicategetType
to retrieve the type the variable is bounded by.
As an example, the following query finds all type variables with type boundNumber
:
importjavafromTypeVariabletv,TypeBoundtbwheretb=tv.getATypeBound()andtb.getType().hasQualifiedName("java.lang","Number")selecttv
For dealing with legacy code that is unaware of generics, every generic type has a “raw” version without any type parameters. In the CodeQL libraries, raw types are represented using classRawType
, which has the expected subclassesRawClass
andRawInterface
. Again, there is a predicategetSourceDeclaration
for obtaining the corresponding generic type. As an example, we can find variables of (raw) typeMap
:
importjavafromVariablev,RawTypertwherert=v.getType()andrt.getSourceDeclaration().hasQualifiedName("java.util","Map")selectv
For example, in the following code snippet this query would findm1
, but notm2
:
Mapm1=newHashMap();Map<String,String>m2=newHashMap<String,String>();
Finally, variables can be declared to be of awildcard type:
Map<?extendsNumber,?superFloat>m;
The wildcards?extendsNumber
and?superFloat
are represented by classWildcardTypeAccess
. Like type parameters, wildcards may have type bounds. Unlike type parameters, wildcards can have upper bounds (as in?extendsNumber
), and also lower bounds (as in?superFloat
). ClassWildcardTypeAccess
provides member predicatesgetUpperBound
andgetLowerBound
to retrieve the upper and lower bounds, respectively.
For dealing with generic methods, there are classesGenericMethod
,ParameterizedMethod
andRawMethod
, which are entirely analogous to the like-named classes for representing generic types.
For more information on working with types, see theTypes in Java and Kotlin.
Variables¶
ClassVariable
represents a variablein the Java sense, which is either a member field of a class (whether static or not), or a local variable, or a parameter. Consequently, there are three subclasses catering to these special cases:
Field
represents a Java field.LocalVariableDecl
represents a local variable.Parameter
represents a parameter of a method or constructor.
Abstract syntax tree¶
Classes in this category represent abstract syntax tree (AST) nodes, that is, statements (classStmt
) and expressions (classExpr
). For a full list of expression and statement types available in the standard QL library, see “Abstract syntax tree classes for working with Java and Kotlin programs.”
BothExpr
andStmt
provide member predicates for exploring the abstract syntax tree of a program:
Expr.getAChildExpr
returns a sub-expression of a given expression.Stmt.getAChild
returns a statement or expression that is nested directly inside a given statement.Expr.getParent
andStmt.getParent
return the parent node of an AST node.
For example, the following query finds all expressions whose parents arereturn
statements:
importjavafromExprewheree.getParent()instanceofReturnStmtselecte
Many projects have examples ofreturn
statements with child expressions.
Therefore, if the program contains a return statementreturnx+y;
, this query will returnx+y
.
As another example, the following query finds statements whose parent is anif
statement:
importjavafromStmtswheres.getParent()instanceofIfStmtselects
Many projects have examples ofif
statements with child statements.
This query will find boththen
branches andelse
branches of allif
statements in the program.
Finally, here is a query that finds method bodies:
importjavafromStmtswheres.getParent()instanceofMethodselects
As these examples show, the parent node of an expression is not always an expression: it may also be a statement, for example, anIfStmt
. Similarly, the parent node of a statement is not always a statement: it may also be a method or a constructor. To capture this, the QL Java library provides two abstract classExprParent
andStmtParent
, the former representing any node that may be the parent node of an expression, and the latter any node that may be the parent node of a statement.
For more information on working with AST classes, see thearticle on overflow-prone comparisons in Java and Kotlin.
Metadata¶
Java/Kotlin programs have several kinds of metadata, in addition to the program code proper. In particular, there areannotations andJavadoc comments. Since this metadata is interesting both for enhancing code analysis and as an analysis subject in its own right, the QL library defines classes for accessing it.
For annotations, classAnnotatable
is a superclass of all program elements that can be annotated. This includes packages, reference types, fields, methods, constructors, and local variable declarations. For every such element, its predicategetAnAnnotation
allows you to retrieve any annotations the element may have. For example, the following query finds all annotations on constructors:
importjavafromConstructorcselectc.getAnAnnotation()
You may see examples where annotations are used to suppress warnings or to mark code as deprecated.
These annotations are represented by classAnnotation
. An annotation is simply an expression whose type is anAnnotationType
. For example, you can amend this query so that it only reports deprecated constructors:
importjavafromConstructorc,Annotationann,AnnotationTypeanntpwhereann=c.getAnAnnotation()andanntp=ann.getType()andanntp.hasQualifiedName("java.lang","Deprecated")selectann
Only constructors with the@Deprecated
annotation are reported this time.
For more information on working with annotations, see thearticle on annotations.
For Javadoc, classElement
has a member predicategetDoc
that returns a delegateDocumentable
object, which can then be queried for its attached Javadoc comments. For example, the following query finds Javadoc comments on private fields:
importjavafromFieldf,Javadocjdocwheref.isPrivate()andjdoc=f.getDoc().getJavadoc()selectjdoc
You can see this pattern in many projects.
ClassJavadoc
represents an entire Javadoc comment as a tree ofJavadocElement
nodes, which can be traversed using member predicatesgetAChild
andgetParent
. For instance, you could edit the query so that it finds all@author
tags in Javadoc comments on private fields:
importjavafromFieldf,Javadocjdoc,AuthorTagatwheref.isPrivate()andjdoc=f.getDoc().getJavadoc()andat.getParent+()=jdocselectat
Note
On line 5 we used
getParent+
to capture tags that are nested at any depth within the Javadoc comment.
For more information on working with Javadoc, see thearticle on Javadoc.
Metrics¶
The standard QL Java library provides extensive support for computing metrics on Java program elements. To avoid overburdening the classes representing those elements with too many member predicates related to metric computations, these predicates are made available on delegate classes instead.
Altogether, there are six such classes:MetricElement
,MetricPackage
,MetricRefType
,MetricField
,MetricCallable
, andMetricStmt
. The corresponding element classes each provide a member predicategetMetrics
that can be used to obtain an instance of the delegate class, on which metric computations can then be performed.
For example, the following query finds methods with acyclomatic complexity greater than 40:
importjavafromMethodm,MetricCallablemcwheremc=m.getMetrics()andmc.getCyclomaticComplexity()>40selectm
Most large projects include some methods with a very high cyclomatic complexity. These methods are likely to be difficult to understand and test.
Call graph¶
CodeQL databases generated from Java and Kotlin code bases include precomputed information about the program’s call graph, that is, which methods or constructors a given call may dispatch to at runtime.
The classCallable
, introduced above, includes both methods and constructors. Call expressions are abstracted using classCall
, which includes method calls,new
expressions, and explicit constructor calls usingthis
orsuper
.
We can use predicateCall.getCallee
to find out which method or constructor a specific call expression refers to. For example, the following query finds all calls to methods calledprintln
:
importjavafromCallc,Methodmwherem=c.getCallee()andm.hasName("println")selectc
Conversely,Callable.getAReference
returns aCall
that refers to it. So we can find methods and constructors that are never called using this query:
importjavafromCallablecwherenotexists(c.getAReference())selectc
Codebases often have many methods that are not called directly, but this is unlikely to be the whole story. To explore this area further, see “Navigating the call graph.”
For more information about callables and calls, see thearticle on the call graph.