PMD 7.19.0

Release date: 28-November-2025

» Adding PMD support for a new JavaCC grammar based language Edit on GitHub

Adding PMD support for a new JavaCC grammar based language

How to add a new language to PMD using JavaCC grammar.

Table of Contents

Do you really need a new language?

This document describes how to add a new full-fledged language, with it’s own grammar and parser.If what you are trying to support is “a specific type” of files for a grammar that already exists(ie: a specific type of XML or HTML file) you may want to considercreating adialect instead.

Before you start…

This is really a big contribution and can’t be done with a drive by contribution. It requires dedicated passionand long commitment to implement support for a new language.

This step-by-step guide is just a small intro to get the basics started, and it’s also not necessarily up-to-dateor complete. You have to be able to fill in the blanks.

After the basic support for a language is there, there are lots of missing features left. Typical featuresthat can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.

Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class typeof each used type, following along method calls (including overloaded and overwritten methods), allowingto query subtypes and type hierarchy. This requires additional configuration of an auxiliary classpath.Call and data flow analysis keep track of the data as it is moving through different execution pathsa program has.

These features are out of scope of this guide. Type resolution and data flow are features thatdefinitely don’t come for free. It is much effort and requires perseverance to implement.

Steps

1. Start with a new sub-module

See pmd-java or pmd-vm for examples.
Make sure to add your new module to PMD’s parent pom as<module> entry, so that it is built alongside theother languages.
Also add your new module to the dependencies list in “pmd-languages-deps/pom.xml”, so that the new languageis automatically available in the binary distribution (pmd-dist).

2. Implement an AST parser for your language

Ideally an AST parser should be implemented as a JJT file(see VmParser.jjt or Java.jjt for example).The grammar files are placed in directorysrc/main/javacc.
There is nothing preventing any other parser implementation, as long as you have some way to convert an inputstream into an AST tree. Doing it as a JJT simplifies maintenance down the road.
See this link for reference:https://javacc.java.net/doc/JJTree.html

3. Create AST node classes

For each AST node that your parser can generate, there should be a class
The name of the AST class should be “AST” + “whatever is the name of the node in JJT file”.
- For example, if JJT contains a node called “IfStatement”, there should be a class called “ASTIfStatement”
Each AST class should have one package-private constructor, that takes anint id.
It’s a good idea to create a parent AST class for all AST classes of the language. This simplifies rulecreation later.(see SimpleNode for Velocity and AbstractJavaNode for Java for example)
Note: These AST node classes are generated usually once by javacc/jjtree and can then be modified as needed.
You can add additional methods in your AST node classes, that can be used in rules. Most gettersare also available for XPath rules, see sectionXPath integration below.

4. Generate your parser (using JJT)

An ant script is being used to compile jjt files into classes. This is injavacc-wrapper.xml file in thetop-level pmd sources.
The ant script is executed via themaven-antrun-plugin. Add this plugin to yourpom.xml file and configureit the language name. You can usepmd-java/pom.xml as an example.
The ant script is called in the phasegenerate-sources whenever the whole project is built. But you cancall./mvnw generate-sources directly for your module if you want your parser to be generated.

5. Create a PMD parser “adapter”

Create a new class that extendsJjtreeParserAdapter.
This is a generic class, and you need to declare the root AST node.
There are two important methods to implement
- tokenBehavior method should return a new instance ofTokenDocumentBehavior constructed with the listof tokes in your language. The compile step #4 will generate a class$langTokenKinds which hasall the available tokens in the fieldTOKEN_NAMES.
- parseImpl method should return the root node of the AST tree obtained by parsing the CharStream source
- SeeVmParser class as an example

6. Create a language version handler

ExtendAbstractPmdLanguageVersionHandler(see VmHandler for example)
This class is sort of a gateway between PMD and all parsing logic specific to your language.
For a minimal implementation, it just needs to return a parser(see step #5).
It can be used to provide other features for your language like
- violation suppression logic
- ViolationDecorators, to add additional language specific information to thecreated violations. TheJava language module uses this toprovide the method name or class name, where the violation occurred.
- metrics (see below “Optional features”)
- custom XPath functions
SeeVmHandler class as an example

7. Create a base visitor

A parser visitor adapter is not needed anymore with PMD 7. The visitor interface now provides a defaultimplementation.
The visitor for JavaCC based AST is generated along the parser from the grammar file. Thebase interface for a visitor isAstVisitor.
The generated visitor class for VM is calledVmVisitor.
In order to help use this visitor later on, a base visitor class should be created.SeeVmVisitorBase as an example.

8. Make PMD recognize your language

Create your own subclass ofnet.sourceforge.pmd.lang.impl.SimpleLanguageModuleBase.(see VmLanguageModule orJavaLanguageModule as an example)
Add for each version of your language a call toaddVersion in your language module’s constructor.UseaddDefaultVersion for defining the default version.
You’ll need to refer the language version handler created in step #6.
Create the service registration via the text filesrc/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language.Add your fully qualified class name as a single line into it.

9. Add AST regression tests

For languages, that use an external library for parsing, the AST can easily change when upgrading the library.Also for languages, where we have the grammar under our control, it is useful to have such tests.

The tests parse one or more source files and generate a textual representation of the AST. This text is comparedagainst a previously recorded version. If there are differences, the test fails.

This helps to detect anything in the AST structure that changed, maybe unexpectedly.

Create a test class in the packagenet.sourceforge.pmd.lang.$lang.ast with the name$langTreeDumpTest.
This test class must extendnet.sourceforge.pmd.lang.test.ast.BaseTreeDumpTest. Note: This classis written in kotlin and is available in the module “lang-test”.
Add a default constructor, that calls the super constructor like so:
```
public$langTreeDumpTest(){super(NodePrintersKt.getSimpleNodePrinter(),".$extension");}
```
Replace “$lang” and “$extension” accordingly.
Implement the methodgetParser(). It must return asubclass ofnet.sourceforge.pmd.lang.test.ast.BaseParsingHelper. Seenet.sourceforge.pmd.lang.ecmascript.ast.JsParsingHelper for an example.With this parser helper you can also specify, where the test files are searched, by usingthe methodwithResourceContext(Class<?>, String).
Add one or more test methods. Each test method parses one file and compares the result. The baseclass has a helper methoddoTest(String) that does all the work. This method just needs to be called:
```
@TestpublicvoidmyFirstAstTest(){doTest("filename-without-extension");}
```
On the first test run the test fails. A text file (with the extension.txt) is created, that records thecurrent AST. On the next run, the text file is used as comparison and the test should pass. Don’t forgetto commit the generated text file.

A complete example can be seen in the JavaScript module:net.sourceforge.pmd.lang.ecmascript.ast.JsTreeDumpTest.The test resources are in the subpackage “testdata”:pmd-javascript/src/test/resources/net/sourceforge/pmd/lang/ecmascript/ast/testdata/.

The Scala module also has a test, written in Kotlin instead of Java:net.sourceforge.pmd.lang.scala.ast.ScalaParserTests.

10. Create an abstract rule class for the language

ExtendAbstractRule and implement the parser visitor interface for your language(see AbstractVmRule for example)
All other rules for your language should extend this class. The purpose of this class is to implement visitmethods for all AST types to simply delegate to default behavior. This is useful because most rules care onlyabout specific AST nodes, but PMD needs to know what to do with each node - so this just lets you use defaultbehavior for nodes you don’t care about.

11. Create rules

Rules are created by extending the abstract rule class created in step 9(seeEmptyForeachStmtRule for example)
Creating rules is already pretty well documented in PMD - and it’s no different for a new language,except you may have different AST nodes.

12. Test the rules

Testing rules is described in depth inTesting your rules.
- Each rule has its own test class: Create a test class for your rule extendingPmdRuleTst(see AvoidReassigningParametersTest in pmd-vm for example)
- Create a category rule set for your language(see category/vm/bestpractices.xml for example)
- Place the test XML file with the test cases in the correct location
- When executing the test class
  - this triggers the unit test to read the corresponding XML file with the rule test data(seeAvoidReassigningParameters.xml for example)
  - This test XML file contains sample pieces of code which should trigger a specified number ofviolations of this rule. The unit test will execute the rule on this piece of code, and verifythat the number of violations matches.
To verify the validity of the created ruleset, create a subclass ofAbstractRuleSetFactoryTest(seeRuleSetFactoryTest in pmd-vm for example).This will load all rulesets and verify, that all required attributes are provided.
Note: You’ll need to add your category ruleset tocategories.properties, so that it can be found.

13. Create documentation page

Finishing up your new language module by adding a page in the documentation. Create a new markdown file<langId>.md indocs/pages/pmd/languages/. This file should have the following frontmatter:

---title: <Language Name>permalink: pmd_languages_<langId>.htmllast_updated: <Month> <Year> (<PMD Version>)tags: [languages, PmdCapableLanguage, CpdCapableLanguage]---

On this page, language specifics can be documented, e.g. when the language was first supported by PMD.There is also the following Jekyll Include, that creates summary box for the language:

{% include language_info.html name='<Language Name>' id='<langId>' implementation='<langId>::lang.<langId>.<langId>LanguageModule' supports_cpd=true supports_pmd=true %}

XPath integration

PMD exposes the AST nodes for use by XPath based rules (seeDOM representation of ASTs).Most Java getters in the AST classes are made available by default. These getters constitute the API of the language.If a getter method is renamed, then every XPath rule that uses this getter also needs to be adjusted. In order tohave more control over this, there are two annotations that can be used for AST classes and their methods:

DeprecatedAttribute: Getters might be annotated with that indicating, thatthis getter method should not be used in XPath rules. When a XPath rule uses such a method, a warning isissued. If the method additionally has the standard Java@Deprecated annotation, then the getter is alsodeprecated for java usage. Otherwise, the getter is only deprecated for usage in XPath rules.
When a getter is deprecated and there is a different getter to be used instead, then theattributereplaceWith should be used.
NoAttribute: This annotation can be used on an AST node type or on individualmethods in order to filter out which methods are available for XPath rules.When used on a type, either all methods can be filtered or only inherited methods (see attributescope).When used directly on an individual method, then only this method will be filtered out.That way methods can be added in AST nodes, that should only be used in Java rules, e.g. as auxiliary methods.

Note:

Not all getters are available for XPath rules. It depends on the result type.EspeciallyLists or Collections in general arenot supported.

Only the following Java result types are supported:

String
any Enum-type
int
boolean
double
long
char
float

Debugging with Rule Designer

When implementing your grammar it may be very useful to see how PMD parses your example files.This can be achieved with Rule Designer:

Override thegetXPathNodeName in your AST nodes for Designer to show node names.
Make sure to override bothjjtOpen andjjtClose in your AST node base class so that they set both start and end line and column for proper node bound highlighting.
Not strictly required but trivial and useful: implement syntax highlighting for Rule Designer:
- Fork and clone thepmd/pmd-designer repository.
- Add a syntax highlighter implementation tonet.sourceforge.pmd.util.fxdesigner.util.codearea.syntaxhighlighting (you could use Java as an example).
- Register it in theAvailableSyntaxHighlighters enumeration.
- Now build your implementation and place thetarget/pmd-designer-<version>-SNAPSHOT.jar to thelib directory inside yourpmd-bin-... distribution (you have to delete oldpmd-designer-*.jar from there).

Optional features

Metrics

If you want to add support for computing metrics:

Create a packagelang.<langname>.metrics
Create a utility class<langname>Metrics
Implement new metrics and add them as static constants. Be sure to document them.
ImplementgetLanguageMetricsProvider, to make the metrics available in the designer.

SeeJavaMetrics for an example.

Symbol table

A symbol table keeps track of variables and their usages. It is part of semantic analysis and wouldbe executed in your parser adapter as an additional pass after you got the initial AST.

There is no general language independent API in PMD core. For now, each language will need to implementits own solution. The symbol information that has been resolved in the additional parser passcan be made available on the AST nodes via extra methods, e.g.getSymbolTable(),getSymbol(), orgetUsages().

Currently only Java provides an implementation for symbol table,seeJava-specific features and guidance.

Note:

With PMD 7.0.0 the symbol table and type resolution implementation has beenrewritten from scratch. There is still an old API for symbol table support, that is used by PLSQL,seenet.sourceforge.pmd.lang.symboltable. This has been deprecated and should not be used.

Type resolution

For typed languages like Java type information can be useful for writing rules, that trigger only onspecific types. Resolving types of expressions and variables would be done after in your parseradapter as yet another additional pass, potentially after resolving the symbol table.

Type resolution tries to find the actual class type of each used type, following along method calls(including overloaded and overwritten methods), allowing to query subtypes and type hierarchy.This might require additional configuration for the language, e.g. in Java you needto configure an auxiliary classpath.

There is no general language independent API in PMD core. For now, each language will need to implementits own solution. The type information can be made available on the AST nodes via extra methods,e.g.getType().

Currently only Java provides an implementation for type resolution,seeJava-specific features and guidance.

Call and data flow analysis

Call and data flow analysis keep track of the data as it is moving through different execution pathsa program has. This would be yet another analysis pass.

There is no general language independent API in PMD core. For now, each language will need to implementits own solution.

Currently Java has some limited support for data flow analysis,seeJava-specific features and guidance.

Tags:devdocs extending

Movatterモバイル変換