Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

How to optimize the content and the scope of the analysis

for

with Kiuwan Local Analyzer (KLA)

Table of Contents

Select the right source code to analyze

Before you start an analysis with KLA, you have to provide a source code directory. All the files available in this directory

,

will be analyzed.

This affects proportionallyWhen running an analysis with KLA, a source code directory is provided. This directory will be used by KLA to scan all the available files within that directory and subdirectories. Analysis execution time and memory is directly proportional to the size of code to be analyzed.

The size of the source code to be analyzed affects proportionally the time and memory used for the analysis execution. 

Avoiding analyzing unneeded code is the first approach to reduce time and memory.

 

Please see https://www.kiuwan.com/docs/display/K5/Source+Code+Filters
KLA comes with some default “exclusion” patterns, i.e. typical directory names that contain test code, generated code, 3rd-party libraries, etc.

These exclusion patterns prevents KLA to analyze source code that will increase resources needed for the analysis while being meaningless to your purposes (test directories, 3rd party libraries, etc.).

Pay attention to your source code structure to specify in exclude patterns any directories/files you do not need to analyze.

Your analysis will use less resources and will be more efficient in performance and memory consumption.

See our guide on Setting Source Code Filters with KLA.

Info

As a rule of thumb, big source files are

indicative of

good candidates to be excluded from the analysis

.Very often, “big” files are

, for example:

auto
  • Auto-generated code
(for example, Stub-code generated for web services), 
  • library components, 
  • and very often “database exports”.
    • ;
    • Library components; 
    • Database exports
    All  these kind of files are candidates to be excluded from the analysis
    • .

    To identify these large files,find the discovery.diagnosis.txt file in the temp directory of your analysis

    you will find a 

    . It will show:

    • The
    discovery.diagnosis.txt file where you will find :the
    • number of files to analyze for every technology
    ,as well as those 
    • ;
    • Any files bigger than a preconfigured
    threshold Rules analysis step executes all your model’s
    • threshold (200Kb).

    Execute a ruleset according to your needs

    Info

    By default, the rules analysis steps executes all of the model's active rules for every file. 

    The default model (CQM) contains aprox 900 rules,

    being active aprox 700.This

    of which 700 are active. This means that for every file, 700 rules will be executed on

    its

    their source code.

    Are all active rules needed ?

    In a large analysis (for example, with thousand of files), you most probably will only be interested in “important” defects.

    Choose a model that suits your needs the most, activating only the rules that are important to you. 

    A large set of rules will generate defects for high

    importance

    -priority rules as well as for

    very

    low-

    low

    priority ones.

     

    Read more in our Guide to Model Management

    Mute or deactivate a rule

    Rules can be either muted or deactivated. 

    Muting a rule means that the rule will still be executed in the background, however the results will be hidden (e.g. in the event of many false positives)

    Deactivating a rule means that the rule will not be executed at all (e.g. found defects are uninteresting or do not apply to your application). Deactivating rules speeds up the analysis process and make it more manageable. 

    Read more in our Guide to Model Management.

    Low priority rules will generate thousands of non-important defects that will increase the resources needed for your analyses.

    Info

    Try to focus on your analysis needs. Avoid generate more defects than needed. 

    Use a model that best suit your needs, activating only those rules that are really important for you. 

     To activate only important rules is the most efficient way to execute the analyses as well as to “consume” the produced results.

    Mute vs deactivate a rule

    Reasons for mute defects can be of different nature, being the most common to hide defects that are considered false positives.

     

    But muting a defect is supposed to be something ocasional.

    Bear in mind that muting a rule only “hides” its defects, but the rule is still being executed.

    1. If you are muting too many false positivesyou should immediately contact Kiuwan Technical Support (and deactivate that rule). 
    2. If the reason to mute a rule is because the discovered defects do not apply to your application or because are not of your interest, deactivate the rule.

    You will speed up the analysis process and make your analyses more manageable.

    Please visit https://www.kiuwan.com/docs/display/K5/Models+Manager+User+Guide on how to deactivate rules and managing Kiuwan models

     

    Process JSP in Java analyses

    If you are

    analyzing Java

    analyzing Java, there’s a configuration option that has a considerable impact on analysis performance and memory needs:

    • process JSP as Java servlets?

    If this option is set to true (the default value),

     for

     for every JSP Kiuwan will internally generate its java servlet code and will execute the java rules to it.

    This servlet code generation consumes a considerable amount of time and memory.

    The advantage to

    The advantage to generate it is

    a higher

    a higher precision in detecting Code Security

    vulnerabilities spread and the

    vulnerabilities spread between JSPs and Java files (mainly XSS). 

    If this is not your concern, you can set this property to 

    false 

    false and the execution will be faster and will run with less memory needs.

     

    Pay attention to

    SQL analyses

    ambiguous file extensions

     
     
    Kiuwan associates source files and technologies through file extensions
    .

    And this association is used by KLA to execute the adequate engine on the source files.

    See https://www.kiuwan.com/docs/display/K5/Kiuwan+Supported+Technologies for a full detail on extensions
    and
    technologies.
    But 
    there are some extensions that are commonly associated to more than one technology. Some examples:  
    • .

    sql is a typical example, it matches PL
    • sql matches PL_SQL, Transact and Informix,

    • .c/.

    h are also the case for C
    • h matches C, C++ and Objective-C


     info
    When running in GUI mode,

    GUI mode: KLA detects such ambiguous situations and asks the user to

    resolve it by selecting the adequate technology.  Then, for example, the user might select plsql because he/she knows that it’s analyzing an Oracle application.

    select the correct technology.

    CLI mode: KLA will execute by default every available engine, wasting time and resources producing confusing results. To solve this, search for the supported.technologies parameter

    Instead, when running in CLI mode, by default KLA will execute (in the sql case) the three available sql engines, wasting time and resources and producing confusing results (as will generate defect information from all those engines and corresponding rules).

    An easy way to avoid unnecessary processing is specifying supported.technologies parameter with only the proper technologies

    when invoking KLA in CLI mode

    .

    and delete the unneeded technologies. 

    Info
    titleImport/export SQL scripts

    Export/import SQL scripts are quite common in applications, and those files are usually very large.

    Make sure you exclude those scripts from the analysis by changing the default SQL configuration, if you want to speed up your analysis.

    If you know that you are analyzing PL_SQL, be sure to delete Transact and Informix from the list of supported technologies.

    For further info please visit Command Line Interface - SupportedTechnologies 

    Another example, it’s quite common to analyze applications that include export/import SQL scripts.

    These scripts are usually huge files. If you do not exclude those script files, and do not change default sql configuration, Kiuwan will analyze those huge files with all the sql engines.

    You can imagine the waste of time and resources ...
    Info

    As general rules:

    1. be careful to specify only the adequate sql engine in supported.technologies parameter.
    2. be sure to exclude export/import script files from the analysis
     

    Duplicated code analysis

    Duplicated code analysis (

    aka clone

    aka clone detection) is also quite

    a memory and cpu intensive

    an intensive memory and CPU-draining task.

    Nevertheless

    However, it

    allows to

    can be configured to modify its working mode,

    then

    reducing time and memory requirements.

    Info

    There’s a couple of aspects that affects resource consumption (mainly memory and execution time):

    1. how to manage literals and identifiers

    2. the minimum number of tokens a clone must have

    This article (https://www.kiuwan.com/blog/avoid-duplicated-code-with-clone-detector/) explains how clone detector works and the different ways of configuring it.

    If you are not interested at all in duplication code analysis, you can make Kiuwan not execute it:

    • In KLA CLI mode, specify ignore=clones 

    Kiuwan’s clone detector searches for fragments of tokens that are very similar.

    The term ‘token’ refers to each of the atomic elements identified by the analyzer. There are three types of tokens:

    1. Operators and reserved words (specific for each language)
    2. Identifiers: variable names, function names, etc.
    3. Literals: numbers and string constants used in the code.

    Kiuwan also generates defects of ‘duplicated code’ according to the size of the fragments found:

    Image Added

    You can configure the minimum tokens that Kiuwan uses to detect a clone. This is done at two levels:

    a) In Kiuwan’s Local Analyzer go to Advanced options, then configure the number of tokens to detect a clone. You can configure a different number of clones for each language. 

    Image Added

    b) In your model, configure the minimum tokens to generate a ‘Duplicated code’ defect.

    Image Added

    Depending on this configuration, Kiuwan gets different results in the clone detection. Let see these in detail.

    The most conservative way

    In this case, we configure Kiuwan to look for an exact match between the differents fragments:

    {language}.min.tokens=20
    {language}.ignore.literals=false
    {language}.ignore.identifiers=false

    Taking this source code as example:

    Image Added

    Kiuwan detects duplicate code:

    Image Added


    IMPORTANT: The ‘clone’ begins at the close parenthesis of line 5, but Kiuwan prints the complete line. This may be a little messy sometimes.
    The detected tokens are:

    Image Added 

    Set the clone detector to be smarter

    Now we are going to configure Kiuwan to ignore the numbers and string constants in our code:

    {language}.min.tokens=20

     

    As you can read in the above article, ignoring literals and identifiers its a “smart” way to find clones, but in many circumstances it’s not obvious to understand.

    Most of the times, we want to identify duplicated code as “identical” code.

    You can set this way of working (i.e. only detecting identical code blocks) by specifying the following properties:

    Code Block
    languagexml

    {language}.ignore.literals=

    false

    true
    {language}.ignore.identifiers=false

    Also, the minimum number of tokens of a clon (200 by default) can be changed.

    Image Added


    As you can see in the picture above, now the fragment is bigger because the literals (‘3’ and ‘8’) are not taken into account.

    The third option

    As the third option, we can ignore literals and identifiers:

    {language}.min.tokens=20
    {language}.ignore.literals=true
    {language}.ignore.identifiers=true

    Image Added

    With this last option, the most efficient one, it was clear that class DupCodeTestCopy is really a copy-paste where the class was only renamed, so Kiuwan detects the whole class as a clone.

    But this configuration is also the one most prone to false positives. For example:

    Image Added

    Both files have a similar structure, but functionally they are very different. Ignoring literals and identifiers, Kiuwan considers both a clone:

    Image Added

    If clone detector raises many duplicated blocks, increase the number of tokens.

    Doing so, there will be less clones, reducing this way the amount of memory needed to execute the clone detection process.

     

    Info

    Just in case you are not interested at all in duplication code analysis, you can make Kiuwan not to execute it.

    • To do it, in KLA CLI mode, specify ignore=clones at the command line.