Explore Sembi IQ - The AI engine powering the next generation of intelligent quality

Essential Code Analysis Methods Developers Need to Know

January 4th, 2019
Asfand Khan

Software tools for code analysis let developers create code that is less bug-ridden and more secure. They find problems that are hard for human readers to spot and produce unpredictable run-time errors. Along with dynamic tests such as unit testing, they’re a valuable part of the development cycle.

Most analysis tools operate on source code. Some tools try to analyze binary machine or VM code, but it’s a more difficult task, and they can’t pinpoint the exact coding error.

Source code analysis is similar to compilation. It performs lexical and syntactic analysis, converting the source code to tokens and then determining the structure of statements and blocks. Code flow analysis identifies code blocks and possible execution sequences. This allows exhaustive consideration of all code paths, including ones the programmer may not have considered.

Data flow analysis provides a more global view of how the code processes units of information. It can trace a variable from its initial declaration and assignment through any subroutines that use or modify it. This can reveal unwarranted assumptions about its state through its life cycle.

Enhancing security

Many security issues result from software bugs. The developer may not have considered unusual data values or the possibility of rare errors. This is a special concern when the code receives uncontrolled inputs from an untrusted source. Testing how the code handles those inputs is called taint analysis. It can verify, for example, that user-form inputs are sanitized and that malformed dates are caught.

Other code flaws can also affect security. Failure to handle error conditions properly can inadvertently expose information. Analysis tools will catch many of these cases.

C and C++ are often used for low-level code because they compile to very fast machine code. Its speed comes mainly from the lack of built-in safety checks, so it’s prone to buffer overflow vulnerabilities and other failures to consider bad inputs. Code analysis is especially important here.

Memory use

Failure to release allocated memory is a common error in programming languages that require developers to manage their own memory. This results in a memory leak that might take hours or days to become visible. When an out-of-memory condition occurs, the code usually stops dead, with few clues about the problem’s source. Data flow analysis can identify cases where memory might not be released.

Using code analysis in development

The best way to use code analysis is to make it an automated part of the testing and release process. If it doesn’t happen automatically, coders will start skipping over it.

Most code analysis tools issue many false positives, where there might be a problem, but it actually isn’t. It isn’t possible to catch the majority of real problems without reporting ones that don’t exist. That’s why many developers would rather avoid it.

What code analysis reports is a violation of one of its rules, which may or may not indicate a defect. The analysis could be buggy, reporting a violation when none exists. Or it might insist on a rule that is impossible to follow in a case where the code prevents anything bad from happening.

With most tools, it’s possible to annotate code so that the analysis suppresses reporting of a particular violation. This is a dangerous course unless it’s used very conservatively. Coders may start annotating out all the messages they don’t like, leaving uncaught bugs in the code.

Code analysis should be considered a way to find existing bugs and ensure the code isn’t bug-prone. Questionable practices may not result in a defect, but they make it easier to introduce a bug when changing another part of the code. Fixing all reported problems improves compliance with a common standard. When it’s possible and reasonable, developers should fix code that gets warnings.

Annotations that eliminate warning messages are most helpful when they make assumptions explicit. For example, if an annotation declares that a function call won’t change the contents of an array, it reminds the developer not to violate that assumption.

What code analysis won’t do

Developers should never assume that the code is bug-free, even if code analysis shows no warnings. It can’t detect all possible problems. Some areas where static analysis isn’t very useful to include:

Compiler bugs. If the compiler introduces a defective optimization, source code analysis can’t detect it.
Problems with third-party libraries. If the source code isn’t available, tools that only do source analysis can’t tell anything about them. If it is available or if the tool can analyze binary code, the value of the analysis is still limited. The library may be full of alleged problems, but developers seldom want to change third-party code.
Configuration issues. The analysis tool can’t detect runtime configuration errors.
Anything for which there isn’t a rule. There are more ways to go wrong than writers of tools can think of, and some rules are difficult to reduce to algorithms. An analysis tool will catch only what it’s explicitly designed to catch.
Mind-reading. The analysis can’t tell what the developer wanted to do.

Unit testing and human code review complement static code analysis. Each one will find a different set of issues. Unit tests check whether the code works as intended. Code reviews will provide feedback on inadequate comments and cryptic variable names, which can lead to maintenance problems later on.

Fixing bugs before releasing software is simpler and results in fewer customer complaints. It can prevent expensive security breaches. Building code analysis into the development and release cycle means fewer bugs in releases.