- Notifications
You must be signed in to change notification settings - Fork336
State-of-the-Art Source Code Plagiarism & Collusion Detection. Check for plagiarism in a set of programs.
License
jplag/JPlag
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
JPlag finds pairwise similarities among a set of multiple programs. It can reliably detect software plagiarism and collusion in software development, even when obfuscated. All similarities are calculated locally; no source code or plagiarism results are ever uploaded online. JPlag supports a large number of programming and modeling languages.
All supported languages and their supported versions are listed below.
Language | Version | CLI Argument Name | state | parser |
---|---|---|---|---|
Java | 21 | java | mature | JavaC |
C | 11 | c | legacy | JavaCC |
C++ | 14 | cpp | beta | ANTLR 4 |
C# | 6 | csharp | mature | ANTLR 4 |
Python | 3.6 | python3 | beta | ANTLR 4 |
JavaScript | ES6 | javascript | beta | ANTLR 4 |
TypeScript | ~5 | typescript | beta | ANTLR 4 |
Go | 1.17 | golang | beta | ANTLR 4 |
Kotlin | 1.3 | kotlin | beta | ANTLR 4 |
R | 3.5.0 | rlang | beta | ANTLR 4 |
Rust | 1.60.0 | rust | beta | ANTLR 4 |
Swift | 5.4 | swift | beta | ANTLR 4 |
Scala | 2.13.8 | scala | beta | Scalameta |
LLVM IR | 15 | llvmir | beta | ANTLR 4 |
Scheme | ? | scheme | legacy | JavaCC |
EMF Metamodel | 2.25.0 | emf | beta | EMF |
EMF Model | 2.25.0 | emf-model | alpha | EMF |
SCXML | 1.0 | scxml | alpha | XML |
Text (naive, use with caution) | - | text | legacy | CoreNLP |
You need Java SE 21 to run or build JPlag.
- Download areleased version.
- In case you depend on the legacy version of JPlag, we refer to thelegacy release v2.12.1 and thelegacy branch.
JPlag is released onMaven Central, it can be included as follows:
<dependency> <groupId>de.jplag</groupId> <artifactId>jplag</artifactId> <version><!--desired version--></version></dependency>
- Download or clone the code from this repository.
- Run
mvn clean package
from the root of the repository to compile and build all submodules.Runmvn clean package assembly:single
instead if you need the full jar which includes all dependencies.Runmvn -P with-report-viewer clean package assembly:single
to build the full jar with the report viewer. In this case, you'll needNode.js installed. - You will find the generated JARs in the subdirectory
cli/target
.
JPlag can either be used via the CLI or directly via its Java API. For more information, see theusage information in the wiki. If you are using the CLI, the report viewer UI will launch automatically. No data will leave your computer!
Note that thelegacy CLI is varying slightly.The language can either be set with the -l parameter or as a subcommand (jplag [jplag options] -l <language name> [language options]
). A subcommand takes priority over the -l option.Language-specific arguments can be set when using the subcommand. A list of language-specific options can be obtained by requesting the help page of a subcommand (e.g.,jplag java —h
).
Parameter descriptions: [root-dirs[,root-dirs...]...] Root-directory with submissions to check for plagiarism. If mode is set to VIEW, this parameter can be used to specify a report file to open. In that case only a single file may be specified. -bc, --bc, --base-code=<baseCode> Path to the base code directory (common framework used in all submissions). -l, --language=<language> Select the language of the submissions (default: java). See subcommands below. -M, --mode=<{RUN, VIEW, RUN_AND_VIEW, AUTO}> The mode of JPlag. One of: RUN, VIEW, RUN_AND_VIEW, AUTO (default: null). If VIEW is chosen, you can optionally specify a path to an existing report. -n, --shown-comparisons=<shownComparisons> The maximum number of comparisons that will be shown in the generated report, if set to -1 all comparisons will be shown (default: 2500) -new, --new=<newDirectories>[,<newDirectories>...] Root-directories with submissions to check for plagiarism (same as root). --normalize Activate the normalization of tokens. Supported for languages: Java, C++. -old, --old=<oldDirectories>[,<oldDirectories>...] Root-directories with prior submissions to compare against. -r, --result-file=<resultFile> Name of the file in which the comparison results will be stored (default: results). Missing .zip endings will be automatically added. -t, --min-tokens=<minTokenMatch> Tunes the comparison sensitivity by adjusting the minimum token required to be counted as a matching section. A smaller value increases the sensitivity but might lead to more false-positives.Advanced --csv-export Export pairwise similarity values as a CSV file. -d, --debug Store on-parsable files in error folder. --log-level=<{ERROR, WARN, INFO, DEBUG, TRACE}> Set the log level for the cli. -m, --similarity-threshold=<similarityThreshold> Comparison similarity threshold [0.0-1.0]: All comparisons above this threshold will be saved (default: 0.0). --overwrite Existing result files will be overwritten. -p, --suffixes=<suffixes>[,<suffixes>...] comma-separated list of all filename suffixes that are included. -P, --port=<port> The port used for the internal report viewer (default: 1996). -s, --subdirectory=<subdirectory> Look in directories <root-dir>/*/<dir> for programs. -x, --exclusion-file=<exclusionFileName> All files named in this file will be ignored in the comparison (line-separated list).Clustering --cluster-alg, --cluster-algorithm=<{AGGLOMERATIVE, SPECTRAL}> Specifies the clustering algorithm. Available algorithms: agglomerative, spectral (default: spectral). --cluster-metric=<{AVG, MIN, MAX, INTERSECTION}> The similarity metric used for clustering. Available metrics: average similarity, minimum similarity, maximal similarity, matched tokens (default: average similarity). --cluster-skip Skips the cluster calculation.Subsequence Match Merging --gap-size=<maximumGapSize> Maximal gap between neighboring matches to be merged (between 1 and minTokenMatch, default: 6). --match-merging Enables merging of neighboring matches to counteract obfuscation attempts. --neighbor-length=<minimumNeighborLength> Minimal length of neighboring matches to be merged (between 1 and minTokenMatch, default: 2). --required-merges=<minimumRequiredMerges> Minimal required merges for the merging to be applied (between 1 and 50, default: 6).Languages: c cpp csharp emf emf-model go java javascript kotlin llvmir multi python3 rlang rust scheme scxml swift text typescript
The new API makes it easy to integrate JPlag's plagiarism detection into external Java projects:
Languagelanguage =newJavaLanguage();Set<File>submissionDirectories =Set.of(newFile("/path/to/rootDir"));FilebaseCode =newFile("/path/to/baseCode");JPlagOptionsoptions =newJPlagOptions(language,submissionDirectories,Set.of()).withBaseCodeSubmissionDirectory(baseCode);try {JPlagResultresult =JPlag.run(options);// OptionalReportObjectFactoryreportObjectFactory =newReportObjectFactory(newFile("/path/to/output"));reportObjectFactory.createAndSaveReport(result);}catch (ExitExceptione) {// error handling here}catch (FileNotFoundExceptione) {// handle IO exception here}
We're happy to incorporate all improvements to JPlag into this codebase. Feel free to fork the project and send pull requests.Please consider ourguidelines for contributions.
If you encounter bugs or other issues, please report themhere.For other purposes, you can contact us atjplag@ipd.kit.edu.We would love to hear about your research related to JPlag. Feel free to contact us!
More information can be found in ourWiki!
About
State-of-the-Art Source Code Plagiarism & Collusion Detection. Check for plagiarism in a set of programs.