How to add a new CPD language

How to add a new language module with CPD support.

Table of Contents

Adding support for a CPD language

CPD works generically on the tokens produced by aCpdLexer.To add support for a new language, the crucial piece is writing a CpdLexer thatsplits the source file into the tokens specific to your language. Thankfully youcan use a stockAntlr grammar or JavaCCgrammar to generate a lexer for you. If you cannot use a lexer generator, forinstance because you are wrapping a lexer from another library, it is still relativelyeasy to implement the Tokenizer interface.

Use the following guide to set up a new language module that supports CPD.

Create a new Maven module for your language. You can takethe Golang module as an example.
- Make sure to add your new module to the parent pom as<module> entry, so that it is built alongside theother languages.
- Also add your new module to the dependencies list in “pmd-languages-deps/pom.xml”, so that the new languageis automatically available in the binary distribution (pmd-dist).
Implement aCpdLexer.
- For Antlr grammars you can take the grammar fromantlr/grammars-v4 and place it insrc/main/antlr4 followed by the package name of the language. You then need to call the antlr4 plugin and the appropriate ant wrapper with targetcpd-language to generate the lexer from the grammar. To do so, editpom.xml (eg likethe Golang module).Once that is done,mvn generate-sources should generate the lexer sources for you.
  You can now implement a CpdLexer, for instance by extendingAntlrCpdLexer. The following reproduces the Go implementation:
```
// mind the package convention if you are going to make a PRpackagenet.sourceforge.pmd.lang.go.cpd;publicclassGoCpdLexerextendsAntlrCpdLexer{@OverrideprotectedLexergetLexerForSource(CharStreamcharStream){returnnewGolangLexer(charStream);}}
```
- If your language is case-insensitive, then you might want to overwritegetImage(AntlrToken). There you canchange each token e.g. into uppercase, so that CPD sees the same strings and can find duplicates even whenthe casing differs. SeeTSqlCpdLexer for an example. You will also need a“CaseChangingCharStream”, so that antlr itself is case-insensitive.
- For JavaCC grammars, place your grammar inetc/grammar and edit thepom.xml like thePython implementation does.You can then subclassJavaccCpdLexer instead of AntlrCpdLexer.
- If your JavaCC based language is case-insensitive (optionIGNORE_CASE=true), then you need to implementJavaccTokenDocument.TokenDocumentBehavior, which can change each tokene.g. into uppercase. SeePLSQLParser for an example.
- For any other scenario just implement the interface however you can. Look at the Scala or Apex module for existing implementations.
Create aLanguage implementation, and make it implementCpdCapableLanguage.If your language only supports CPD, then you can subclassCpdOnlyLanguageModuleBase to get going:
```
// mind the package convention if you are going to make a PRpackagenet.sourceforge.pmd.lang.go;publicclassGoLanguageModuleextendsCpdOnlyLanguageModuleBase{// A public noarg constructor is required.publicGoLanguageModule(){super(LanguageMetadata.withId("go").name("Go").extensions("go"));}@OverridepublicTokenizercreateCpdLexer(LanguagePropertyBundlebundle){// This method should return an instance of the CpdLexer you created.returnnewGoCpdLexer();}}
```
To make PMD find the language module at runtime, write the fully-qualified name of your language class into the filesrc/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language.
At this point the new language module should be available inCPD and usable by CPD like any other language.
Update the test that asserts the list of supported languages by updating theSUPPORTED_LANGUAGES constant inBinaryDistributionIT.
Add some tests for your CpdLexer by following thesection below.
Add a page in the documentation. Create a new markdown file<langId>.md indocs/pages/pmd/languages/. This file should have the following frontmatter:
```
---title: <Language Name>permalink: pmd_languages_<langId>.htmllast_updated: <Month> <Year> (<PMD Version>)tags: [languages, CpdCapableLanguage]---
```
On this page, language specifics can be documented, e.g. when the language was first supported by PMD.There is also the following Jekyll Include, that creates summary box for the language:
```
   {% include language_info.html name='<Language Name>' id='<langId>' implementation='<langId>::lang.<langId>.<langId>LanguageModule' supports_cpd=true %}
```
Finishing up your new language module by adding a menu entry inPMD Sidebar.```
- title:url: /pmd_languages_.htmloutput: web, pdf```

Declaring CpdLexer options

To make the CpdLexer configurable, first define some property descriptors usingPropertyFactory. Look atCpdLanguagePropertiesfor some predefined ones which you can reuse (prefer reusing property descriptors if you can).You need to overridenewPropertyBundleand calldefinePropertyDescriptor to register the descriptors.After that you can access the values of the properties from the parameterofcreateCpdTokenizer.

To implement simple token filtering, you can useBaseTokenFilteras a base class, or another base class innet.sourceforge.pmd.cpd.impl.Take a look at theKotlin token filter implementation, or theJava one.

Testing your implementation

Add a Maven dependency onpmd-lang-test (scopetest) in yourpom.xml.This contains utilities to test your CpdLexer.

Create a test class extending fromCpdTextComparisonTest.To add tests, you need to write regular JUnit@Test-annotated methods, andcall the methoddoTest with the name of the test file.

For example, for the Dart language:

packagenet.sourceforge.pmd.lang.dart.cpd;publicclassDartTokenizerTestextendsCpdTextComparisonTest{/**********************************      Implementation of the superclass    ***********************************/publicDartTokenizerTest(){super("dart",".dart");// the ID of the language, then the file extension used by test files}@OverrideprotectedStringgetResourcePrefix(){// "testdata" is the default value, you don't need to override.// This specifies that you should place the test files in// src/test/resources/net/sourceforge/pmd/lang/dart/cpd/testdatareturn"testdata";}/**************      Test methods    ***************/@Test// don't forget the JUnit annotationpublicvoidtestLiterals(){// This will look for a file named literals.dart// in the directory identified by getResourcePrefix,// tokenize it, then compare the result against a baseline// literals.txt file in the same directory// If the baseline file does not exist, it is created automaticallydoTest("literals");}}

Tags:devdocs extending

Movatterモバイル変換

How to add a new CPD language

Adding support for a CPD language

Declaring CpdLexer options

Testing your implementation