Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jul 19, 2025. It is now read-only.

Code Climate engine for code duplication analysis

License

NotificationsYou must be signed in to change notification settings

codeclimate/codeclimate-duplication

 
 

Repository files navigation

Maintainability

codeclimate-duplication is an engine that wrapsflay and supports Java, Ruby,Python, JavaScript, and PHP. You can run it on the command line using the CodeClimate CLI or on ourhosted analysis platform.

What is duplication?

The duplication engine's algorithm can be surprising, but it's actually verysimple. We have adocs page explaining the algorithm.

Installation

  1. Install theCode Climate CLI, if you haven't already.
  2. You're ready to analyze!cd into your project's folder and runcodeclimate analyze. Duplication analysis is enabled by default, so you don't need to doanything else.

Configuring

Mass Threshold

We set useful threshold defaults for the languages we support but you may wantto adjust these settings based on your project guidelines.

The mass threshold configuration represents the minimum "mass" a code block musthave to be analyzed for duplication. If the engine is too easily reportingduplication, try raising the threshold. If you suspect that the engine isn'tcatching enough duplication, try lowering the threshold. The best setting tendsto differ from language to language.

To adjust this setting, use the top-levelchecks key in your config file:

checks:identical-code:config:threshold:25similar-code:config:threshold:50

Note that you have the update the YAML structure under thelanguages key tothe Hash type to support extra configuration.

Count Threshold

By default, the duplication engine will report code that has been duplicated in just two locations. You can be less strict by only raising a warning if code is duplicated in three or more locations only. To adjust this setting, add acount_threshold key to your config. For instance, to use the defaultmass_threshold for ruby, but to enforce theRule of Three, you could use this configuration:

plugins:duplication:enabled:trueconfig:languages:ruby:count_threshold:3

You can also change the defaultcount_threshold for all languages:

plugins:duplication:enabled:trueconfig:count_threshold:3

Custom file name patterns

All engines check only appropriate files but you can override default set ofpatterns. Patterns are ran against the project root directory so you have to use** to match files in nested directories. Also note that you have to specifyall patterns, not only the one you want to add.

plugins:duplication:enabled:trueconfig:languages:ruby:patterns:            -"**/*.rb            -"**/*.rake"            -"Rakefile"            -"**/*.ruby"

Python 3

By default, the Duplication engine will use a Python 2 parser. To enableanalysis for Python 3 code, specify thepython_version as shown in the examplebelow. This will enable a Python 3 parser and add the.py3 file extension tothe list of included file patterns.

plugins:duplication:enabled:trueconfig:languages:python:python_version:3

Node Filtering

Sometimes structural similarities are reported that you just don'tcare about. For example, the contents of arrays or hashes might havesimilar structures and there's little you can do to refactor them. Youcan specify language specific filters to ignore any issues that matchthe pattern. Here is an example that filters simple hashes and arrays:

plugins:duplication:enabled:trueconfig:languages:ruby:filters:            -"(hash (lit _) (str _) ___)"            -"(array (str _) ___)"

The syntax for patterns are pretty simple. In the first pattern:"(hash (lit _) (str _) ___)" specifies "A hash with a literal key, astring value, followed by anything else (including nothing)". Youcould also specify"(hash ___)" to ignore all hashes altogether.

Visualizing the Parse Tree

Figuring out what to filter is tricky. codeclimate-duplication comeswith a configuration option to help with the discovery. Instead ofscanning your code and printing out issues for codeclimate, it printsout the parse-trees instead! Just adddump_ast: true anddebug: true to your.codeclimate.yml file:

---plugins:  duplication:    enabled: true    config:      dump_ast: true      debug: true      ... rest of config ...

Then runcodeclimate analyze while using the debug flag to output stderr:

% CODECLIMATE_DEBUG=1 codeclimate analyze

Running that command might output something like:

Sexps for issues:# 1) ExpressionStatement#4261258897 mass=128:# 1.1) bogus-examples.js:5s(:ExpressionStatement, :expression, s(:AssignmentExpression,  :"=",  :left,  s(:MemberExpression,   :object,   s(:Identifier, :EventBlock),   :property,   s(:Identifier, :propTypes)),   ... LOTS more...)   ... even more LOTS more...)

This is the internal representation of the actual code. Assumingyou've looked at those issues and have determined them not to be anissue you want to address, you can filter it by writing a patternstring that would match that tree.

Looking at the tree output again, this time flattening it out:

s(:ExpressionStatement, :expression, s(:AssignmentExpression, :"=",:left, ...) ...)

The internal representation (which is ruby) is different from thepattern language (which is lisp-like), so first we need to converts(: to( and remove all commas and colons:

(ExpressionStatement expression (AssignmentExpression "=" left ...) ...)

Next, we don't care boutexpression so let's get rid of that byreplacing it with the matcher for any single element_:

(ExpressionStatement _ (AssignmentExpression "=" left ...) ...)

The same goes for"=" andleft, but we actually don't care aboutthe rest of the AssignmentExpression node, so let's use the matcherthat'll ignore the remainder of the tree___:

(ExpressionStatement _ (AssignmentExpression ___) ...)

And finally, we don't care about what follows in theExpressionStatement so let's ignore the rest too:

(ExpressionStatement _ (AssignmentExpression ___) ___)

This reads: "Any ExpressionStatement node, with any value and anAssignmentExpression node with anything in it, followed by anythingelse". There are other ways to write a pattern to match this tree, butthis is pretty clear.

Then you can add that filter to your config:

---plugins:  duplication:    enabled: true    config:      dump_ast: true      languages:        #"(ExpressionStatement _ (AssignmentExpression ___) ___)"

Then rerun the analyzer and figure out what the next filter should be.When you are happy with the results, remove thedump_ast config (orset it to false) to go back to normal analysis.

For more information on pattern matching,seesexp_processor, especiallysexp.rb


[8]ページ先頭

©2009-2025 Movatter.jp