Static Code Analysis

Author: Ryan Dewhurst
Contributor(s): KirstenS, Nick Bloor, Sarah Baso, James Bowie, Ram ch, EvgeniyRyzhkov, Iberiam, Ann.campbell, Ejohn20, Jonathan Marcil, Christina Schelin, Jie Wang, Fabian, Achim, Dirk Wetter, kingthorin

Description

Static Code Analysis (also known as Source Code Analysis) is usuallyperformed as part of a Code Review (also known as white-box testing) andis carried out at the Implementation phase of a Security DevelopmentLifecycle (SDL). Static Code Analysis commonly refers to the running ofStatic Code Analysis tools that attempt to highlight possiblevulnerabilities within ‘static’ (non-running) source code by usingtechniques such as Taint Analysis and Data Flow Analysis.

Ideally, such tools would automatically find security flaws with a highdegree of confidence that what is found is indeed a flaw. However, thisis beyond the state of the art for many types of application securityflaws. Thus, such tools frequently serve as aids for an analyst to helpthem zero in on security relevant portions of code so they can findflaws more efficiently, rather than a tool that simply finds flawsautomatically.

Some tools are starting to move into the Integrated DevelopmentEnvironment (IDE). For the types of problems that can be detected duringthe software development phase itself, this is a powerful phase withinthe development lifecycle to employ such tools, as it provides immediatefeedback to the developer on issues they might be introducing into thecode during code development itself. This immediate feedback is veryuseful as compared to finding vulnerabilities much later in thedevelopment cycle.

The UK Defense Standard 00-55 requires that Static Code Analysis be usedon all ‘safety related software in defense equipment’.[0]

Techniques

There are various techniques to analyze static source code for potentialvulnerabilities that maybe combined into one solution. These techniquesare often derived from compiler technologies.

Data Flow Analysis

Data flow analysis is used to collect run-time (dynamic) informationabout data in software while it is in a static state (Wögerer, 2005).

There are three common terms used in data flow analysis, basic block(the code), Control Flow Analysis (the flow of data) and Control FlowPath (the path the data takes):

Basic block: A sequence of consecutive instructions where control entersat the beginning of a block, control leaves at the end of a block andthe block cannot halt or branch out except at its end (Wögerer, 2005).

Example PHP basic block:

$a=0;$b=1;if($a==$b){# start of blockechoaandbarethesame;}# end of blockelse{# start of blockechoaandbaredifferent;}# end of block

Control Flow Graph (CFG)

An abstract graph representation of software by use of nodes thatrepresent basic blocks. A node in a graph represents a block; directededges are used to represent jumps (paths) from one block to another. Ifa node only has an exit edge, this is known as an ‘entry’ block, if anode only has a entry edge, this is know as an ‘exit’ block (Wögerer, 2005).

Example Control Flow Graph; ‘node 1’ represents the entry block and‘node 6’ represents the exit block.

Control Flow Graph

Taint Analysis

Taint Analysis attempts to identify variables that have been ‘tainted’with user controllable input and traces them to possible vulnerablefunctions also known as a ‘sink’. If the tainted variable gets passed toa sink without first being sanitized it is flagged as a vulnerability.

Some programming languages such as Perl and Ruby have Taint Checkingbuilt into them and enabled in certain situations such as accepting datavia CGI.

Lexical Analysis

Lexical Analysis converts source code syntax into ‘tokens’ ofinformation in an attempt to abstract the source code and make it easierto manipulate (Sotirov, 2005).

Pre-tokenised PHP source code:

<?php $name = "Ryan"; ?>

Post tokenised PHP source code:

T_OPEN_TAGT_VARIABLE=T_CONSTANT_ENCAPSED_STRING;T_CLOSE_TAG

Strengths and Weaknesses

Strengths

  • Scales Well (Can be run on lots of software, and can be repeatedly (like in nightly builds))
  • For things that such tools can automatically find with high confidence, such as buffer overflows, SQL Injection Flaws, etc. they are great.

Weaknesses

  • Many types of security vulnerabilities are very difficult to find automatically, such as authentication problems, access control issues, insecure use of cryptography, etc. The current state of the art only allows such tools to automatically find a relatively small percentage of application security flaws. Tools of this type are getting better, however.
  • High numbers of false positives.
  • Frequently can’t find configuration issues, since they are not represented in the code.
  • Difficult to ‘prove’ that an identified security issue is an actual vulnerability.
  • Many of these tools have difficulty analyzing code that can’t be compiled. Analysts frequently can’t compile code because they don’t have the right libraries, all the compilation instructions, all the code, etc.

Limitations

False Positives

A static code analysis tool will often produce false positive resultswhere the tool reports a possible vulnerability that in fact is not.This often occurs because the tool cannot be sure of the integrity andsecurity of data as it flows through the application from input tooutput.

False positive results might be reported when analysing an applicationthat interacts with closed source components or external systems becausewithout the source code it is impossible to trace the flow of data inthe external system and hence ensure the integrity and security of thedata.

False Negatives

The use of static code analysis tools can also result in false negativeresults where vulnerabilities result but the tool does not report them.This might occur if a new vulnerability is discovered in an externalcomponent or if the analysis tool has no knowledge of the runtimeenvironment and whether it is configured securely.

Important Selection Criteria

  • Requirement: Must support your language, but not usually a key factor once it does.
  • Types of Vulnerabilities it can detect (The OWASP Top Ten?) (more?)
  • Does it require a fully buildable set of source?
  • Can it run against binaries instead of source?
  • Can it be integrated into the developer’s IDE?
  • License cost for the tool. (Some are sold per user, per org, per app, per line of code analyzed. Consulting licenses are frequently different than end user licenses.)
  • Does it support Object-oriented programming (OOP)?

Examples

RIPS PHP Static Code Analysis Tool

RIPS

OWASP LAPSE+ Static Code Analysis Tool

Lapse Plus Screenshot

Tool Lists

Further Reading


WatchStar
The OWASP® Foundation works to improve the security of software through its community-led open source software projects, hundreds of chapters worldwide, tens of thousands of members, and by hosting local and global conferences.

Important Community Links

Upcoming OWASP Global Events