- Notifications
You must be signed in to change notification settings - Fork33
A tool for mining commits from Git repositories and diffs to automatically extract code change pattern instances and features with ast analysis
License
SpoonLabs/coming
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Coming is a tool for commit analysis in git repositories.
If you use Coming, please cite:
- Coming: a Tool for Mining Change Pattern Instances from Git Commits. M. Martinez, M. Monperrus, Proceedings of ICSE, 2019 (doi:10.1109/ICSE-Companion.2019.00043).bibtex
Contact:
Matias Martinez,Martin Monperrus
Coming is deployed on Maven Central, seepast versions.
To build yourself, the procedure is as follows.
Add a github token in.m2/settings.xml
.
<settings> <servers> <server> <id>brufulascam</id> <username>yourlogin</username><!-- your github token with scope read:packages--> <password>FOOBAR</password> </server> </servers></settings>
Install a JDK 17 and configure Maven or your IDE to use it.
$ export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/$ mvn -versionApache Maven 3.6.3Maven home: /usr/share/mavenJava version: 17.0.9, vendor: Private Build, runtime: /usr/lib/jvm/java-17-openjdk-amd64# now installing$ mvn install -DskipTests
Tests:
git clone https://github.com/SpoonLabs/repogit4testv0mvn test
repogit4testv0
is a GIT repository included inside Coming which is used by the test cases.
The main class is:fr.inria.coming.main.ComingMain
.
mvn exec:java -Dexec.mainClass=fr.inria.coming.main.ComingMain -action <INS | DEL | UPD | MOV | PER | ANY> tye of action to be mined -branch <branch name> In case of -input='git', use this branch name. Default is master. -entitytype <arg> entity type to be mine -entityvalue <arg> the value of the entity mentioned in -entitytype -filter <arg> name of the filter -filtervalue <arg> values of the filter mentioned in -filter -hunkanalysis <arg> include analysis of hunks -input <git(default) | files | filespair | repairability> format of the content present in the given -path. git implies that the path is a git repository. files implies the path contains .patch files -location <path> analyse the content in 'path' -message <arg> comming message -mode <mineinstance | diff | features> the mode of execution of the analysis -output <path> dump the output of the analysis in the given path -outputprocessor <classname> output processors for result -parameters <arg> Parameters, divided by : -parentlevel <arg> numbers of AST node where the parent is located. 1 implies immediate parent -parenttype <arg> parent type of the nodes to be considered -pattern <path> path of the pattern file to be used when the -mode is 'mineinstance' -patternparser <classname> parser to be used for parsing the file specified -pattern. Default is XML -repairtool <ALL | JMutRepair | Nopol | JKali | NPEfix | JGenProg> If -mode=repairability, this option specifies which repair tools should we consider in our analysis. Can be a list separated by : -showactions show all actions -showentities show all entities
Parameters Most of the properties are configured in fileconfig-coming.properties
One can change any of those properties from the command line by using-parameters
The value of those argument are the following format<name_property_1>:<value_property_1>:<name_property_2>:<value_property_2>
In the following command we change the value of two properties:max_nb_hunks
andmax_files_per_commit
-parameters max_nb_hunks:2:max_files_per_commit:1
When running Coming in mode-mode mineinstance
the output is a file nameinstances_found.json
, which shows the different instances of the pattern passed as parameter.
- Automatically Extracting Instances of Code Change Patterns with AST Analysis (Martinez, M.; Duchien, L.; Monperrus, M.) IEEE International Conference on Software Maintenance (ICSM), pp.388-391, 2013, doi: 10.1109/ICSM.2013.54bibtex
Extract all commits ofrepogit4testv0
that insert a binary operator AST node
java -classpath ./coming.jar fr.inria.coming.main.ComingMain -location ./repogit4testv0/ -mode mineinstance -action INS -entitytype BinaryOperator -output ./out
The argument-mode
indicates the analyzer that Coming will use.The value-mode mineinstance
means to detect instances of a change pattern (in the previous example, insert a binary operator AST node).
The argument-location
indicates the location of the project to analyze.By default, Coming analyzes Git projects(as per-input
), so the-location
should be the path to the cloned project. Moreover, the argumentbranch
allows to specify the Git branch to analyze (by default, it analyzes themaster
branch).
The argument-output
is used to indicate the folder where Coming will write the results.
To know the values accepted by the arguments-action
and-entitytype
,please call ComingMain with the following arguments:-showactions
and-showentities
, resp.You can also find those values on thispage.
Instead of passing the action type and entity type per command line (which defines simple pattern),we can pass to Coming complex change pattern specified in a XML file.
-mode mineinstance -pattern ./pattern_INS_IF_MOVE_ASSIG.xml
Here,-pattern
must receive the location to an XML with the pattern specification.
This pattern is specified as follows:
<pattern><entity type="Assignment"><parent parentId="2" distance="10" /></entity><entity type="If" /><action entityId="2" type="INS" /><action entityId="1" type="MOV" /></pattern>
Coming accepts Change Patterns specified in a XML files.As example the patternAdd If-Return
:
<pattern><entity id=``1" type=``Return"><parent parentId=``2" distance=``2" /></entity><entity id=``2" type=``If" /><action entityId=``1" type=``INS" /><action entityId=``2" type=``INS" /></pattern>
Specifies:
a) two entities (id 1 and 2), one representing aReturn
, the second one anIf
;
b) a parent relation between theifandtheReturnentities (with a max distance of 2 nodes); and
c) two actions of type INS (insert), one affecting the entity id 1 (i.e., theReturn
), the other one the entity id 2 (i.e., theif
)
This pattern is able to match a changes such:
+ if ((n1 * n2) < MathUtils.SAFE_MIN) {+ return ZERO;+ }
That change is aninstance of the patternAdd If-Return
.
The pattern specification also allows to specify therole of an entity in its parent entity.Given the code:
if (exception == null) {- l.connectionClosed(event);+ l.connectionErrorOccurred(event);...+ if (realConnection != null)- if (realConnection == null)
The following pattern, that matches any changes inside an entity which parent is an IF, is able to detect two instances:
<pattern><entity type = "If"/><entity type = "*"><parent parentId="1" distance="10" /></entity><action entityId ="2" type = "*" /></pattern>
One of the instances is over the method invocation (which was an updated parameter), and the second one the operator inside the IF.
The role feature allows to specify a pattern that matches an element according to the role of the element in its parent.
For example, the following pattern matches an element (with ID 2) which role in parent iscondition:
<pattern><entity type = "If"/><entity type = "*" role = "condition"><parent parentId="1" distance="10" /></entity><action entityId ="2" type = "*" /></pattern>
Thus, this patches will find one instance: the change inside the IF condition (update of binary operator) and it does not match with the other change (update of parameter).
However, the next pattern will uniquely match the second change: changes on an entity which parent has a role ofThen block.
<pattern><entity type = "If"/><entity type = "Block" role = "Then"><parent parentId="1" distance="10" /></entity><entity type = "*"><parent parentId="3" distance="10" /></entity><action entityId ="2" type = "*" /></pattern>
This pattern matches with the update of the method invocation's parameter (and not with the binary operator update)
The list of available Roles is presented on thispage.
When running Coming in mode-mode diff
the output is a file namechange_frequency.json
, which shows the frequency and probability of each type of change (i.e., frequency of actions applied to each type of entities).
An example of the content of such file is:
{ "frequency": [ { "c": "BinaryOperator", "f": "6" }, { "c": "Invocation", "f": "2" }, { "c": "If", "f": "2" }, .... ], "frequencyParent": [ { "c": "INS_Invocation_Block", "f": "2" }, { "c": "UPD_BinaryOperator_If", "f": "2" }, { "c": "INS_If_Block", "f": "2" }, ... ],
The file shows:
a) the frequency of affected entities within json attributefrequency
(see types available).Example, the previous json file shows
"c": "BinaryOperator", "f": "6"
which means that there are 6 actions (code changes) that affect Binary Operators.
b) the frequency of Actions over affected entities and their entity parents.Example, the previous json file shows
{... "c": "UPD_BinaryOperator_If", "f": "2" },
which means that there are 2 changes that update binary operators inside an if condition (i.e., the parent).
This is a mode to find commits which look like automated program repair commits, see paper"Estimating the Potential of Program Repair Search Spaces with Commit Analysis" (Khashayar Etemadi, Niloofar Tarighat, Siddharth Yadav, Matias Martinez and Martin Monperrus, Journal of Systems and Software, 2022).
Note that the results are sensitive to the underlying diff algorithm. If you run repairibility analysis today, you'll get results that are different from the paper. For exact reproduction, use commit1cad74323bacad65f06ddf80ab53971d38957507 and Java 8.
When running Coming in mode-mode repairibility
, the output is a file namedall_instances_found.json
, which shows the possible tool creating the commits. You can choose tools of interest by including the option:-repairtool All,Jkali,..
An example of the content of such file is:
{ { "instances": [ "revision": "8c0e7110c9ebc3ba5158e8de0f73c80ec69e1001", "repairability": [ { "tool-name": "JMutRepair", "pattern-name": "JMutRepair:binary_1", "instance_detail": [ { "pattern_action": "UPD", "pattern_entity": { "entity_type": "BinaryOperator", "entity_new value": "*", "entity_role": "*", "entity_parent": "null" }, "concrete_change": { "operator": "UPD", "src_type": "BinaryOperator", "dst_type": "BinaryOperator", "src": "sz - 1", "dst": "sz + 1", "src_parent_type": "Assignment", "dst_parent_type": "Assignment", "src_parent": "start \u003d sz - 1", "dst_parent": "start \u003d sz + 1" }, "line": 127, "file": "/Users/macbook/Documents/university/internship/coming/coming/src/CharSequenceUtils.java" } ] } ] }}
In order to perform an analysis of possible repair tools that may have generated commits use the python script athttps://github.com/kth-tcs/defects4j-repair-reloaded/tree/comrepair-coming/.
create the output json file by running the script with option-mode repairibility
and then:
python analyse_repairability_output.py <path to the json>
or
python analyse_repairability_output.py <path to the json> <path to patches>
This script produces an output showing how many commits are corresponding to each repair tool and also (in the second choice) the number of commits it was unable to find.
Last 100 commits of the repository are analyzed by default, you can change this default with -parameters nb_commits:
Coming can be used to compute features associated to the code changed by a commit.This functionality can be used with the argument-mode features
.Coming writes in the folder specified in the-output
a JSON file for each commit.
SeeAutomated Classification of Overfitting Patches with Statically Extracted Code Features (He Ye, Jian Gu, Matias Martinez, Thomas Durieux and Martin Monperrus), In IEEE Transactions on Software Engineering, 2021.
Coming read the input from the folder indicated by the argument-location
. The kind of input depends on the argument-input
.
If-input
is not specified, it isgit
by default. In the previous case or in the case of-input git
, the path represented by-location
should be a git repo.
This input format is used to do analysis on one revision mentioned by the diff between specified the source and tha target file.If-input filespair
, the location argument is supposed to specified in the following format:-location <source_file_path>:<target_file_path>
If-input files
, the location path should follow the following hierarchy. Note here-location <location_arg>
.
<location_arg>├── <diff_folder>│ └── <modif_file>│ ├── <diff_folder>_<modif_file>_s.java│ └── <diff_folder>_<modif_file>_t.java
In the above case, the analysis are performed on the revision form<diff_folder>_<modif_file>_s.java
to<diff_folder>_<modif_file>_t.java
, wheres
stands for source andt
stands for target.
Example Input Specification
java ... -location ./pairsD4j -input files ...$ tree ./pairsD4j/pairsD4j├── Math_70│ └── BisectionSolver│ ├── Math_70_BisectionSolver_s.java│ └── Math_70_BisectionSolver_t.java└── Math_73 └── BrentSolver ├── Math_73_BrentSolver_s.java └── Math_73_BrentSolver_t.java4 directories, 4 files
Coming provides a filter to discard Commits which commit message does not include some keywords
For studying only commits which messages include words related to bug fixing (e.g., bug, fix, issue), add the following command.
-filter bugfix
The bugfix keywords are predefined. If you want to use other keywords, use theCustom keywords
.
For studying only commits which messages include[MATH-
, add the following two commands:
-filter keywords filtervalue [MATH-
Coming applies line-based diff between two files (for more information, seehttp://en.wikipedia.org/wiki/Diff).
To filter a Commit according to the number of hunks:
-filter numberhunks -parameters:max_nb_hunks:2
Here, in attribute-filter
indicates that Commits are filtered according to max number of hunks (valuenumberhunks
).Then, using the argument-parameters
we specifymax_nb_hunks:2
which means max number of hunks per modified file is 2.
The arguments:
-filter maxfiles -parameters max_files_per_commit:1
consider commits with at least one file modified, added or deleted.
We can combine the two precedent filters:
-filter numberhunks:maxfiles -parameters max_nb_hunks2:max_files_per_commit:1
The argument-filter withtest
indicates that only commits with at least one modification on test cases are considered.
Coming filters a commit according to the number of AST changes involved in that commit.If a commit modified a filef
by introducing more changes thanMAX_AST_CHANGES_PER_FILE
or less thanMIN_AST_CHANGES_PER_FILE
, then those changes are not further considered by Coming. This means that his filter has a direct impact on the Analyzers based on AST changes such as pattern mining or change frequency: Coming will not apply those analyzers overf
.
To use this filter, add to the command line:
-parameters MIN_AST_CHANGES_PER_FILE:0:MAX_AST_CHANGES_PER_FILE:50
To extend Coming, please read the documentExtension points of ComingMoreover, you can also readcode_walk-through.