Code Analysis Methods

Software tools for code analysis let developers create code which has fewer bugs and is more secure. It finds problems which are hard for human readers to spot and which produce unpredictable run-time errors. Along with dynamic tests such as unit testing, they’re a valuable part of the development cycle.

Most analysis tools operate on source code. Some tools try to analyze binary machine or VM code, but it’s a more difficult task, and it can’t point out the exact coding error.

Source code analysis has strong similarities to compilation. It performs lexical and syntactic analysis, converting the source code to tokens and then determining the structure of statements and blocks. Code flow analysis identifies blocks of code and possible sequences of execution. This allows exhaustive consideration of all code paths, including ones which the programmer may not have thought of.

Data flow analysis provides a more global view of how the code processes units of information. It can trace a variable from its initial declaration and assignment through any subroutines that use or modify it. This can reveal unwarranted assumptions about its state through its life cycle.

Enhancing security

Many security issues result from software bugs. The developer may not have considered unusual data values or the possibility of rare errors. This is a special concern when the code receives uncontrolled inputs from an untrusted source. Testing how the code handles those inputs is called taint analysis. It can verify, for example, that user form inputs are sanitized and malformed dates are caught.

Other code flaws can affect security as well. Failure to handle error conditions properly can inadvertently expose information. Analysis tools will catch many of these cases.

C and C++ code is often used for low-level code, because it compiles to very fast machine code. Its speed, though, comes largely from the lack of built-in safety checks, so it’s prone to buffer overflow vulnerabilities and other failures to consider bad inputs. Code analysis is especially important here.

Memory use

With programming languages that make developers do their own memory management, failure to release allocated memory is a common error. This results in a memory leak which might take hours or days to have a visible effect. When an out-of-memory condition occurs, the code will usually stop dead, with few clues about the problem’s source. Data flow analysis can identify cases where memory might not be released.

Using code analysis in development

The best way to use code analysis is to make it an automated part of the testing and release process. If it doesn’t happen automatically, coders will start skipping over it.

Most code analysis tools issue a lot of false positives, where there might be a problem but actually isn’t. It isn’t possible, with the current state of the art, to catch the majority of real problems without reporting ones that don’t actually exist. That’s why many developers would rather avoid it.

What code analysis actually reports is a violation of one of its rules, which may or may not indicate a defect. The analysis could be buggy, reporting a violation where there isn’t one. Or it might be insisting on a rule which is impossible to follow, in a case where the code prevents anything bad from happening.

With most tools, it’s possible to annotate the code so that the analysis will suppress reporting of a particular violation. This is a dangerous course unless it’s used very conservatively. Coders may start annotating out all the messages they don’t like, leaving uncaught bugs in the code.

Code analysis should be considered not just a way of finding existing bugs, but of making sure that the code isn’t bug-prone. Questionable practices may not actually result in a defect, but they make it easier to introduce a bug when changing another part of the code. Fixing all reported problems results in better compliance with a common standard. When it’s possible and reasonable, developers should fix code that gets warnings.

Annotations that eliminate warning messages are most helpful when they make assumptions explicit. For example, if an annotation declares that a function call won’t change the contents of an array, it reminds the developer not to violate that assumption.

What code analysis won’t do

Developers should never assume that simply because code analysis is giving them no warnings, the code is bug-free. It can’t detect all possible problems. Some areas which static analysis isn’t very useful include:

  • Compiler bugs. If the compiler introduces a defective optimization, analysis of the source code can’t discover that.
  • Problems with third-party libraries. If the source code isn’t available, then tools that only do source analysis can’t tell anything about them. If it is available, or if the tool can analyze binary code, the value of analysis is still limited. The library may be full of alleged problems, but developers seldom want to make changes to third-party code.
  • Configuration issues. The analysis tool can’t detect errors in the runtime configuration.
  • Anything for which there isn’t a rule. There are more ways to go wrong than writers of tools can think of, and some rules are difficult to reduce to algorithms. An analysis tool will catch only what it’s explicitly designed to catch.
  • Mind-reading. The analysis can’t tell what the developer actually wanted to do.

Unit testing and code review by humans complement static code analysis. Each one will find a different set of issues. Unit tests check whether the code works as intended. Code reviews will provide feedback on inadequate comments and cryptic variable names, which can lead to maintenance problems later on.

Fixing bugs before releasing software is simpler, and it means fewer complaints by customers and users. It can prevent expensive security breaches. Building code analysis into the development and release cycle means releases with fewer bugs.