Avoid duplicated code with clone detector

Reusing code is usual in software development, but this practice makes the code less maintainable, besides introducing defects. That’s why we have the Kiuwan Clone detector.
As we write an application and this development extends over time, very similar or identical code fragments begin to appear. These fragments are known as ‘clones’.

The existence of these clones makes the evolution and maintenance of the application more difficult since a single change should be done in different places.

Kiuwan’s clone detector searches for fragments of tokens that are very similar.
The term ‘token’ refers to each of the atomic elements identified by the analyzer. There are three types of tokens:

  1. Operators and reserved words (specific for each language)
  2. Identifiers: variable names, function names, etc.
  3. Literals: numbers and string constants used in the code.

 

Kiuwan also generates defects of ‘duplicated code’ according to the size of the fragments found:

dupcode-rules

 

You can configure the minimum tokens that Kiuwan uses to detect a clone. This is done at two levels:

a) In Kiuwan’s Local Analyzer go to ‘advanced’ options, then configure the number of tokens to detect a clone. This is configured independently for each language , so you can configure a different number for different languages:

tokens-configuration

 

b) In your model, configure the minimum tokens to generate a ‘Duplicated code’ defect.

dup-code-configuration-2

 

Depending on this configuration, Kiuwan gets different results in the clone detection. Let see these in detail.

 

The most conservative way

In this case, we configure Kiuwan to look for an exact match between the differents fragments:

{language}.min.tokens=20
{language}.ignore.literals=false
{language}.ignore.identifiers=false

Taking this source code as example:

[su_table]

DupCodeTest.java  DupCodeTestCopy.java

[/su_table]

 

Kiuwan detects duplicate code:

dupcode1a

 

IMPORTANT: The ‘clone’ begins at the close parenthesis of line 5, but Kiuwan prints the complete line. This may be a little messy sometimes.
The detected tokens are:

[su_table]

1 2 3 4 5 6 7 7 9 10 11 12 13 14 15 16 17 18
) ; } public dump ( int j ) { for ( int i = 0 ; i

[/su_table]

[su_table]

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
< j ; i ++ ) { System.out.println ( “Hello world!…” ) ; } } }

[/su_table]

 

Set the clone detector to be smarter

Now we are going to configure Kiuwan to ignore the numbers and string constants in our code:

{language}.min.tokens=20
{language}.ignore.literals=true
{language}.ignore.identifiers=false

dupcode2a

 

As you can see in the picture above, now the fragment is bigger because the literals (‘3′ and ‘8’) are not taken into account.

 

And a bit more

As the third option, we can ignore literals and identifiers:

{language}.min.tokens=20
{language}.ignore.literals=true
{language}.ignore.identifiers=true

 

dupcode3a

With this last option, the most efficient one, it was clear that class DupCodeTestCopy is really a copy-paste where the class was only renamed, so Kiuwan detects the whole class as a clone.

 

But this configuration is also the one most prone to false positives. For example:

[su_table]

DupCodeTest.java DatabaseConn.java

[/su_table]

Both files have a similar structure, but functionally they are very different. Ignoring literals and identifiers, Kiuwan considers both a clone:

dupcode4a

 

Conclusions

 

Kiuwan Code Analysis allows you to configure the duplicated code detection from a strict way to a more relaxed one. Depending on the maturity level of your application you can choose one or another.

In applications where you initially get a high level of duplicated code, you can configure the analyzer to be stricter and thus focus on those clones easier to detect.

Applications with similar code structures, for example those with graphic interfaces and forms, will generate false positives with an easy-going configuration.