Movatterモバイル変換


[0]ホーム

URL:


 
» Adding PMD support for a new ANTLR grammar based language Edit on GitHub

Adding PMD support for a new ANTLR grammar based language

How to add a new language to PMD using ANTLR grammar.
Table of Contents
Do you really need a new language?

This document describes how to add a new full-fledged language, with it’s own grammar and parser.If what you are trying to support is “a specific type” of files for a grammar that already exists(ie: a specific type of XML or HTML file) you may want to considercreating adialect instead.
Before you start…

This is really a big contribution and can’t be done with a drive by contribution. It requires dedicated passionand long commitment to implement support for a new language.

This step-by-step guide is just a small intro to get the basics started, and it’s also not necessarily up-to-dateor complete. You have to be able to fill in the blanks.

Currently, the Antlr integration has some basiclimitations compared to JavaCC: The output of theAntlr parser generator is not an abstract syntax tree (AST) but a parse tree (also known as CST, concrete syntax tree).As such, a parse tree is much more fine-grained than what a typical JavaCC grammar will produce. This means that theparse tree is much deeper and contains nodes down to the different token types.

The Antlr nodes are context objects and serve a different abstraction than nodes in an AST. These context objectsthemselves don’t have any attributes because they themselves represent the attributes (as nodes or leaves in theparse tree). As they don’t have attributes, there are no attributes that can be used in XPath based rules.

The current implementation of the languages using ANTLR use these context objects as nodes in PMD’s ASTrepresentation.

In order to overcome these limitations, one would need to implement a post-processing step that transformsa parse tree into an abstract syntax tree and introducing real nodes on a higher abstraction level. Thesereal nodes can then have attributes which are available in XPath based rules. The transformation can happenwith a visitor, but the implementation of the AST is a manual step. This step isnot describedin this guide.

After the basic support for a language is there, there are lots of missing features left. Typical featuresthat can greatly improve rule writing are: symbol table, type resolution, call/data flow analysis.

Symbol table keeps track of variables and their usages. Type resolution tries to find the actual class typeof each used type, following along method calls (including overloaded and overwritten methods), allowingto query subtypes and type hierarchy. This requires additional configuration of an auxiliary classpath.Call and data flow analysis keep track of the data as it is moving through different execution pathsa program has.

These features are out of scope of this guide. Type resolution and data flow are features thatdefinitely don’t come for free. It is much effort and requires perseverance to implement.

Steps

1. Start with a new sub-module

  • See pmd-swift for examples.
  • Make sure to add your new module to PMD’s parent pom as<module> entry, so that it is built alongside theother languages.
  • Also add your new module to the dependencies list in “pmd-languages-deps/pom.xml”, so that the new languageis automatically available in the binary distribution (pmd-dist).

2. Implement an AST parser for your language

  • ANTLR will generate the parser for you based on the grammar file. The grammar file needs to be placed in thefoldersrc/main/antlr4 in the appropriate sub packageast of the language. E.g. for swift, the grammarfile isSwift.g4and is placed in the packagenet.sourceforge.pmd.lang.swift.ast.
  • Configure the options “superClass” and “contextSuperClass”. These are the base classes for the generatedclasses.

3. Create AST node classes

  • The individual AST nodes are generated, but you need to define the common interface for them.
  • You need to define the supertype interface for all nodes of the language. For that, we provideAntlrNode.
  • SeeSwiftNodeas an example.
  • Additionally, you need several base classes:
    • a language specific inner node - these nodes represent the production rules from the grammar.In Antlr, they are called “ParserRuleContext”. We call them “InnerNode”. Use thebase class from pmd-coreBaseAntlrInnerNode. And example isSwiftInnerNode.Note that this language specific inner node is package-private, as it is only the base class for the concretenodes generated by ANLTR.
    • a language specific root node - this provides the root of the AST and our parser will returnsubtypes of this node. The root node itself is a “InnerNode”.SeeSwiftRootNode.Note that this language specific root node is package-private, as it is only the base class for the concretenode generated by ANLTR.
    • a language specific terminal node.SeeSwiftTerminalNode.
    • a language specific error node.SeeSwiftErrorNode.
    • a language name dictionary. This is used to convert ANTLR node names to useful XPath node names.See`SwiftNameDictionary’.
  • Once these base classes exist, you need to change the ANTLR grammar to add additional members via@parser::members
    • Define a package private fieldDICO which creates a new instance of your language name dictionary using thevocabulary from the generated parser (VOCABULARY).
    • Define two additional methods to help converting the ANTLR context objects into PMD AST nodes.The methods are abstract inAntlrGeneratedParserBaseand need to be implemented here for the concrete language:createPmdTerminal() andcreatePmdError().
  • In order for the generated code to match and use our custom classes, we have a common ant script, that fiddles withthe generated code. The ant script isantlr4-wrapper.xmland does not need to be adjusted - it has plenty of parameters that can be configured.The ant script is added in the language module’spom.xml where the parameters are set (e.g. name of root nameclass). Have a look at Swift’s example:pmd-swift/pom.xml.
  • You can add additional methods in your “InnerNode” (e.g.SwiftInnerNode) that are available on all nodes.But on most cases you won’t need to do anything.

4. Generate your parser (using ANTLR)

  • Make sure, you have the property<antlr4.visitor>true</antlr4.visitor> in yourpom.xml file.
  • Include the antlr and antrun plugins to the modulepom.xml. Antlr needs to be first, to ensure it runs first.The antrun plugin should execute thepmd-language target using the${antlr4.ant.wrapper} antfile.
  • This is just a matter of building the language module. ANTLR is called via ant, and this step is addedto the phasegenerate-sources. So you can just call e.g../mvnw generate-sources -pl pmd-swift tohave the parser generated.
  • The generated code will be placed undertarget/generated-sources/antlr4 and will not be committed tosource control.
  • You should reviewpmd-swift/pom.xml.

5. Create a TokenManager

  • This is needed to support CPD (copy paste detection)
  • We provide a default implementation usingAntlrTokenManager.
  • You must create your own “AntlrCpdLexer” such as we do withSwiftCpdLexer.
  • If you wish to filter specific tokens (e.g. comments to support CPD suppression via “CPD-OFF” and “CPD-ON”)you can create your own implementation ofAntlrTokenFilter.You’ll need to override then the protected methodgetTokenFilter(AntlrTokenManager)and return your custom filter. See the CpdLexer for C# as an exmaple:CsCpdLexer.

    If you don’t need a custom token filter, you don’t need to override the method. It returns the defaultAntlrTokenFilter which doesn’t filter anything.

6. Create a PMD parser “adapter”

  • Create your own parser, that adapts the ANLTR interface to PMD’s parser interface.
  • We provide aAntlrBaseParserimplementation that you need to extend to create your own adapter as we do withPmdSwiftParser.

7. Create a language version handler

  • Now you need to create your version handler, as we did withSwiftHandler.
  • This class is sort of a gateway between PMD and all parsing logic specific to your language.
  • For a minimal implementation, it just needs to return a parser(see step #6).
  • It can be used to provide other features for your language like
    • violation suppression logic
    • ViolationDecorators, to add additional language specific information to thecreated violations. TheJava language module uses this toprovide the method name or class name, where the violation occurred.
    • metrics
    • custom XPath functions

8. Create a base visitor

  • A parser visitor adapter is not needed anymore with PMD 7. The visitor interface now provides a defaultimplementation.
  • The visitor for ANTLR based AST is generated along the parser from the ANTLR grammar file. Thebase interface for a visitor isAstVisitor.
  • The generated visitor class for Swift is calledSwiftVisitor.
  • In order to help use this visitor later on, a base visitor class should be created.SeeSwiftVisitorBaseas an example.

9. Make PMD recognize your language

  • Create your own subclass ofnet.sourceforge.pmd.lang.impl.SimpleLanguageModuleBase, see Swift as an example:SwiftLanguageModule.
  • Ensure the language name and language id used, match those set as properties when runningant in step #4.
  • Add for each version of your language a call toaddVersion in your language module’s constructor.UseaddDefaultVersion for defining the default version.
  • You’ll need to refer the language version handler created in step #7.
  • Create the service registration via the text filesrc/main/resources/META-INF/services/net.sourceforge.pmd.lang.Language.Add your fully qualified class name as a single line into it.

10. Create an abstract rule class for the language

  • You need to create your own abstract rule class in order to interface your language with PMD’s generic ruleexecution.
  • SeeAbstractSwiftRule as an example.
  • The rule basically just extendsAbstractVisitorRuleand only redefines the abstractbuildVisitor() method to return our own type of visitor.In this case ourSwiftVisitor is used.While there is no real functionality added, every language should have its own base class for rules.This helps to organize the code.
  • All other rules for your language should extend this class. The purpose of this class is to provide a visitorvia the methodbuildVisitor() for analyzing the AST. The provided visitor only implements the visit methodsfor specific AST nodes. The other node types use the default behavior, and you don’t need to care about them.
  • Note: This is different from how it was in PMD 6: Each rule in PMD 6 was itself a visitor (implementing the visitorinterface of the specific language). Now the rule just provides a visitor, which can be hidden and potentiallyshared between rules.

11. Create rules

  • Creating rules is already pretty well documented in PMD - and it’s no different for a new language, except youmay have different AST nodes.
  • PMD supports 2 types of rules, through visitors or XPath.
  • To add a visitor rule:
    • You need to extend the abstract rule you created on the previous step, you can use the swiftruleUnavailableFunctionRuleas an example. Note, that all rule classes should be suffixed withRule and should be placedin a package the corresponds to their category.
  • To add an XPath rule you can follow our guideWriting XPath Rules.
  • When creating the category ruleset XML file, the XML can reference build properties that are replacedduring the build. This is used for theexternalInfoUrl attribute of a rule. E.g. we use${pmd.website.baseurl}to point to the correct webpage (depending on the PMD version). In order for this to work, you need to add aresource filtering configuration in the language module’spom.xml. Under<build> add the following lines:
    <resources><resource><directory>${project.basedir}/src/main/resources</directory><filtering>true</filtering></resource></resources>

12. Test the rules

  • Testing rules is described in depth inTesting your rules.
    • Each rule has its own test class: Create a test class for your rule extendingPmdRuleTst(seeUnavailableFunctionTestfor example)
    • Create a category rule set for your language(seepmd-swift/src/main/resources/bestpractices.xmlfor example)
    • Place the test XML file with the test cases in the correct location
    • When executing the test class
      • this triggers the unit test to read the corresponding XML file with the rule test data(seeUnavailableFunction.xmlfor example)
      • This test XML file contains sample pieces of code which should trigger a specified number ofviolations of this rule. The unit test will execute the rule on this piece of code, and verifythat the number of violations matches.
  • To verify the validity of all the created rulesets, create a subclass ofAbstractRuleSetFactoryTest(seeRuleSetFactoryTest in pmd-swift for example).This will load all rulesets and verify, that all required attributes are provided.

    Note: You’ll need to add your ruleset tocategories.properties, so that it can be found.

13. Create documentation page

Finishing up your new language module by adding a page in the documentation. Create a new markdown file<langId>.md indocs/pages/pmd/languages/. This file should have the following frontmatter:

---title: <Language Name>permalink: pmd_languages_<langId>.htmllast_updated: <Month> <Year> (<PMD Version>)tags: [languages, PmdCapableLanguage, CpdCapableLanguage]---

On this page, language specifics can be documented, e.g. when the language was first supported by PMD.There is also the following Jekyll Include, that creates summary box for the language:

{% include language_info.html name='<Language Name>' id='<langId>' implementation='<langId>::lang.<langId>.<langId>LanguageModule' supports_cpd=true supports_pmd=true %}

Optional features

SeeOptional features in JavaCC based languages.

In order to implement these, most likely an AST needs to be developed first. The parse tree (CST, concretesyntax tree) is not suitable to add methods such asgetSymbol() to the node classes.


This documentation is written in markdown.
If there is something missing or can be improved, edit this page on github and create a PR: Edit on GitHub

©2025 PMD Open Source Project. All rights reserved.
Page last updated: December 2023 (7.0.0)
Site last generated: Jun 27, 2025

PMD                logo


[8]ページ先頭

©2009-2025 Movatter.jp