View Source

Tainted Flow Analysis

Root cause of many security breaches is trusting unvalidated input:

Input from the user is considered as tainted (possibly controlled by adversary), i.e user is considered as a untrusted source
Data is used, assuming it is untainted (must not be controlled by adversary), i.e. sensitive data sinks rely on trusted (untainted) data

Source locations are those code places from where data comes in, that can be potentially controlled by the user (or the environment) and must consequently be presumably considered as tainted (it may be used to build injection attacks).

Sink locations are those code places where consumed data must not be tainted.

The goal of Tainted Flow Analysis is to detect tainted data flows:

For all possible sinks, prove that tainted data will never be used where untainted data is expected.

While Data Flow Analysis (DFA) is the computer technique to extract info about values at each program site, Tainting Flow Analysis (TFA) is a special case of DFA between sources/sinks for checking if non-neutralized external inputs may reach vulnerability sinks.

Kiuwan implements Tainted Flow Analysis by inferring flows in the source code of your application:

What sinks are reached by what sources
If any flows are illegal, i.e., whether a tainted source may flow to an untainted sink without going across a “sanitizer”

Tainting Propagation Algorithm: for each sink detected (typically, an argument expression to a call to a sick function), follow backwards the variable value propagation in the CGF (Control Flow Graph) that could affect the sink site, until a source site that “taints” any of the candidate variables is reached.

When inferring flows from an untainted sink to a tainted source, Kiuwan is able to detect if any well-known sanitizer is used, dropping those flows and thus avoiding to raise false vulnerabilities.

Data Neutralization Model

Complex subsystems that accept string data that may hold commands or instructions need neutralization of inputs targeted to them.

If untrusted input entering the subsystem may result in unexpected execution of commands/actions, an injection security flaw exists. Examples of such subsystems that are candidates for injection attacks are:

Operating system command interpreter
Data repository with SQL engine
XML parser
XPath / XQuery evaluator
LDAP directory service API
Script engines
Regexp compilers (e.g. the pcre_replace() PHP function with /e pattern modifier)

Root cause of most web security flaws:

Too much trust in external input (but HTTP request msg could be change ad-libitum by the hacker): headers (incl. cookies), request URL, body (incl. hidden fields).
No adequate input validation / output sanitization / canonicalization – normalization.

The first defense line against application attacks is an adequate input validation.

Should be positive, “accept only which is known to be good” (whitelist), not negative, “reject what is known to be bad” (blacklist).
Sometimes output escaping is a good thing (e.g. against XSS; but less against SQLi and other attacks)

Good practice says: “filter on input, escape on output”.

kiuwan > Custom Neutralizations > NUE_02.png

Canonicalization / Normalization
- Canonicalization is the process of lossless reduction of input to its equivalent simplest known form (for example, replacing .. and . in a pathname to produce canonicalized pathname, Unicode canonical equivalence…).
- Normalization is the process of lossy conversión of input data to the simplest form (e.g. converting a text input into one value from a fixed set, removing accents, removing whitespace, stop words and punctuation chars, lower-/upper-casing…).
Sanitization
- Ensuring that data conforms to the requirements of the subsystem to which it is passed, including security requirements relatated to data leakage or sensitive data exposure across trust boundary. This may include removal of unwanted characters, escaping metacharactes, etc.
Validation
- Ensuring that input falls within expected domain of valid program input: type/numeric range requirements, input invariants…

Kiuwan contains a built-in library of sanitizers for every supported programming language and framework. These sanitizers are commonly used directly by programmers or by frameworks. And Kiuwan detects their usage.

But if you are using your own sanitizers, Kiuwan could not recognize them as such, detecting false “tainted data flows”. In this case, you should let Kiuwan be aware of them.

Goal of this section is to teach you how to incorporate custom sanitizers to the Kiuwan built-in library.

During the next section, we will use the terms “sanitizers” and “neutralization routines” as synonyms.

Neutralization Routines (a.k.a Sanitizers)

A Neutralization Routine (or Sanitizer) is understood as any piece of code that can assure that any tainted data got as input produces untainted as output.

This documentation is not related to how to build custom neutralization routines, but how to add your own custom neutralization routines to Kiuwan.

Basically, the process consists of:

First, let Kiuwan know your routine
- Depending on the programming language you are analyzing, the so-called “routine” can be a function, a method of a class, etc.
Second, let Kiuwan know that it’s a neutralization
- Kiuwan provides some ways to define your routine (we will see it later) but, regardless of it, you need to indicate that routine as “neutralization” .

Next, for instruction purposes, we will follow these steps using Java as the programming language. Differences with other programming languages will be further detailed.

Specifying Java Neutralization Routines

Any custom neutralization routine must be defined in a custom neutralizations file (xml format).

Name of the file is irrelevant but location it’s quite important.

Locations and precedence

Neutralization routines can be configured at different scopes

Single-analysis,
Application-specific and
System-wide,).

Depending on the location of the xml file, precedence and scope will change.

Precedence and scope of configurations is as follows:

Single-Analysis
- Neutralizations can apply only to a unique analysis.
- In this case, the xml file should be located at:
  - ```
  [analysis_base_dir]/libraries/[technology]
```
Application-specific
- Neutralizations can apply to all analyses of a specific application.
- In this case, the xml file should be located at:
  - ```
  [agent_home_dir]/conf/apps/[app_name]/libraries/[technology] 
```
System-wide
- Neutralizations can apply to all analyses of all applications.
- In this case, the xml file should be located at:
  - ```
  [agent_home_dir]/libraries/[technology] 
```
- Exceptions to this rule are:
  - cpp engine reads from …/libraries/c
  - objective engine reads from …/libraries/objetivec and …/libraries/c

Legend:

[agent_home_dir] : local installation directory of Kiuwan Local Analyzer (KLA)
[analysis_base_dir] : root directory of application source code to be analyzed, as specified by “-s” option of KLA CLI (Command Line Interface), or in “Folder to analyze” input box when using KLA GUI (Graphical User Interface)
[app_name] : name of the app to be analyzed, as specified by “-n” option of KLA CLI (Command Line Interface), or in “Application name” input box when using KLA GUI (Graphical User Interface)
[technology] : name of the Kiuwan technology, as specified in [agent_home_dir]/conf/LanguageInfo.properties

As a general recommendation, we suggest to name the xml file as [technology]_custom_neutralizations.xml (this will help to clearly identify your custom files from Kiuwan own files).

Therefore, next sections will use java_custom_neutralizations.xml as the name for our custom file.

Structure of Custom Neutralization File (CNF)

Any CNF must be an XML file with the following structure:

Reference to “master” DTD
Definition of the custom Library of Neutralization routines
List of custom Neutralization routines

Next sections describe this structure.