- Notifications
You must be signed in to change notification settings - Fork5
Code Continuity Analysis Framework
License
codinuum/cca
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project has been divided into the following projects:
Hence it will be archived soon.
The framework is currently composed of the following:
- parsers for Python, Java, Verilog, Fortran, and C/C++,
- an AST differencing tool, Diff/AST, based on the parsers,
- helper scripts for factbase manipulation, and
- ontologies for the related entities.
The parsers and Diff/AST export resultingfacts such as abstract syntax trees (ASTs), changes between them, and other syntactic/semantic information inXML orN-Triples.In particular, facts in N-Triples format are loaded into an RDF store such asVirtuoso to build afactbase or a database of facts.Factbases are intended to be queried for software engineering tasks such ascode comprehension,debugging,change pattern mining, andcode homology analysis.
Diff/AST is an experimental implementation of the AST differencing algorithmreported in the following paper:
Masatomo Hashimoto and Akira Mori, "Diff/TS: A Tool for Fine-Grained Structural Change Analysis,"InProc. 15th Working Conference on Reverse Engineering, 2008, pp. 279-288,DOI:10.1109/WCRE.2008.44.
It compares ASTs node by node, while populardiff
tools compare any (text) files line by line.The algorithm is based onan algorithm for computingtree edit distance (TED) between two ordered labeled trees. The TED between two trees is the minimum (weighted) number of edit operations to transform one tree into another.Unfortunately, applying TED algorithms directly to wild ASTs is not feasible in general becausetheir computational complexity is essentially, at best, quadratic according to the number of AST nodes.Therefore Diff/TS makes moderate use of a TED algorithm in a divide-and-conquer manner backed by elaborated heuristics to approximate tree edit distances.Nevertheless, Diff/AST still requires much time for non-trivial massive inputs. Thus it always caches the results.
You can see the results of comparing some pairs of source files taken fromsampleshere.
You can instantly try Diff/AST by utilizingDocker anda ready-made container image.
$ docker pull codinuum/cca
The following command line executes Diff/AST within a container to compare sample Java programs and then saves the results inresults
(host) directory.
$ ./cca.py diffast -c results samples/java/0/Test.java samples/java/1/Test.java
Once you have builtDiffViewer, you can inspect the AST differences in a viewer window. Seediffviewer/README.md
for details.
$ diffviewer/run.py -c results samples/java/0/Test.java samples/java/1/Test.java
You can run both Diff/AST and DiffViewer by the following line.
$ ./cca.py diffast -c results --view samples/java/0/Test.java samples/java/1/Test.java
The following will installparsesrc
anddiffast
.
$ opam install cca
You can also build parsers and Diff/AST in person.
- GNU make
- OCaml (>=4.11.1)
- OPAM (for installing camlzip, cryptokit, csv, git-unix, menhir, ocamlnet, pxp, ulex, uuidm, and volt.)
The following createast/analyzing/bin/{parsesrc.opt,diffast.opt}
.
$ cd src$ make
They should be used via shell scriptsast/analyzing/bin/{parsesrc,diffast}
to set some environment variables.
If you have built Diff/AST, you can use it with Git. Add the following lines to your.gitconfig
. Note thatPATH_TO_THIS_REPO
should be replaced by your local path to this repository.
[diff] tool = diffast[difftool] prompt = false[difftool "diffast"] cmd = PATH_TO_THIS_REPO/git_ext_diff "$LOCAL" "$REMOTE"[alias] diffast = difftool
Then you should be able to usegit diffast
likegit diff
. You will be prompted to launch diffast for each source file comparison. Other file comparisons will be ignored.
The following command line creates a docker image namedcca
. In the image, the framework is installed at/opt/cca
.
$ docker build -t cca .
Apache License, Version 2.0
About
Code Continuity Analysis Framework