commonmark/commonmark-javaPublic

NotificationsYou must be signed in to change notification settings
Fork314
Star2.5k

Java library for parsing and rendering CommonMark (Markdown)

License

BSD-2-Clause license

2.5k stars 314 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,085 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
commonmark-android-test		commonmark-android-test
commonmark-ext-autolink		commonmark-ext-autolink
commonmark-ext-footnotes		commonmark-ext-footnotes
commonmark-ext-gfm-strikethrough		commonmark-ext-gfm-strikethrough
commonmark-ext-gfm-tables		commonmark-ext-gfm-tables
commonmark-ext-heading-anchor		commonmark-ext-heading-anchor
commonmark-ext-image-attributes		commonmark-ext-image-attributes
commonmark-ext-ins		commonmark-ext-ins
commonmark-ext-task-list-items		commonmark-ext-task-list-items
commonmark-ext-yaml-front-matter		commonmark-ext-yaml-front-matter
commonmark-integration-test		commonmark-integration-test
commonmark-test-util		commonmark-test-util
commonmark		commonmark
etc		etc
.codecov.yml		.codecov.yml
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
mvnw		mvnw
pom.xml		pom.xml
renovate.json		renovate.json

Repository files navigation

commonmark-java

Java library for parsing and renderingMarkdown text according to theCommonMark specification (and some extensions).

Introduction

Provides classes for parsing input to an abstract syntax tree (AST),visiting and manipulating nodes, and rendering to HTML or back to Markdown.It started out as a port ofcommonmark.js, but has since evolved into anextensible library with the following features:

Small (core has no dependencies, extensions in separate artifacts)
Fast (10-20 times faster thanpegdown which used to be a popular Markdownlibrary, see benchmarks in repo)
Flexible (manipulate the AST after parsing, customize HTML rendering)
Extensible (tables, strikethrough, autolinking and more, see below)

The library is supported on Java 11 and later. It works on Android too,but that is on a best-effort basis, please report problems. For Android theminimum API level is 19, see thecommonmark-android-testdirectory.

Coordinates for core library (see all onMaven Central):

<dependency>    <groupId>org.commonmark</groupId>    <artifactId>commonmark</artifactId>    <version>0.25.0</version></dependency>

The module names to use in Java 9 areorg.commonmark,org.commonmark.ext.autolink, etc, corresponding to package names.

Note that for 0.x releases of this library, the API is not considered stableyet and may break between minor releases. After 1.0,Semantic Versioning willbe followed. A package containingbeta means it's not subject to stable APIguarantees yet; but for normal usage it should not be necessary to use.

See thespec.txtfile if you're wondering which version of the spec is currentlyimplemented. Also check out theCommonMark dingus for getting familiarwith the syntax or trying out edge cases. If you clone the repository,you can also use theDingusApp class to try out things interactively.

Usage

Parse and render to HTML

importorg.commonmark.node.*;importorg.commonmark.parser.Parser;importorg.commonmark.renderer.html.HtmlRenderer;Parserparser =Parser.builder().build();Nodedocument =parser.parse("This is *Markdown*");HtmlRendererrenderer =HtmlRenderer.builder().build();renderer.render(document);// "<p>This is <em>Markdown</em></p>\n"

This uses the parser and renderer with default options. Both builders havemethods for configuring their behavior:

escapeHtml(true) onHtmlRenderer will escape raw HTML tags and blocks.
sanitizeUrls(true) onHtmlRenderer will strip potentially unsafe URLsfrom<a> and<img> tags
For all available options, see methods on the builders.

Note that this library doesn't try to sanitize the resulting HTML with regardsto which tags are allowed, etc. That is the responsibility of the caller, andif you expose the resulting HTML, you probably want to run a sanitizer on itafter this.

Render to Markdown

importorg.commonmark.node.*;importorg.commonmark.renderer.markdown.MarkdownRenderer;MarkdownRendererrenderer =MarkdownRenderer.builder().build();Nodedocument =newDocument();Headingheading =newHeading();heading.setLevel(2);heading.appendChild(newText("My title"));document.appendChild(heading);renderer.render(document);// "## My title\n"

For rendering to plain text with minimal markup, there's alsoTextContentRenderer.

Use a visitor to process parsed nodes

After the source text has been parsed, the result is a tree of nodes.That tree can be modified before rendering, or just inspected withoutrendering:

Nodenode =parser.parse("Example\n=======\n\nSome more text");WordCountVisitorvisitor =newWordCountVisitor();node.accept(visitor);visitor.wordCount;// 4classWordCountVisitorextendsAbstractVisitor {intwordCount =0;@Overridepublicvoidvisit(Texttext) {// This is called for all Text nodes. Override other visit methods for other node types.// Count words (this is just an example, don't actually do it this way for various reasons).wordCount +=text.getLiteral().split("\\W+").length;// Descend into children (could be omitted in this case because Text nodes don't have children).visitChildren(text);    }}

Source positions

If you want to know where a parsedNode appeared in the input source text,you can request the parser to return source positions like this:

varparser =Parser.builder().includeSourceSpans(IncludeSourceSpans.BLOCKS_AND_INLINES).build();

Then parse nodes and inspect source positions:

varsource ="foo\n\nbar *baz*";vardoc =parser.parse(source);varemphasis =doc.getLastChild().getLastChild();vars =emphasis.getSourceSpans().get(0);s.getLineIndex();// 2 (third line)s.getColumnIndex();// 4 (fifth column)s.getInputIndex();// 9 (string index 9)s.getLength();// 5source.substring(s.getInputIndex(),s.getInputIndex() +s.getLength());// "*baz*"

If you're only interested in blocks and not inlines, useIncludeSourceSpans.BLOCKS.

Add or change attributes of HTML elements

Sometimes you might want to customize how HTML is rendered. If all youwant to do is add or change attributes on some elements, there's asimple way to do that.

In this example, we register a factory for anAttributeProvider on therenderer to set aclass="border" attribute onimg elements.

Parserparser =Parser.builder().build();HtmlRendererrenderer =HtmlRenderer.builder()        .attributeProviderFactory(newAttributeProviderFactory() {publicAttributeProvidercreate(AttributeProviderContextcontext) {returnnewImageAttributeProvider();            }        })        .build();Nodedocument =parser.parse("![text](/url.png)");renderer.render(document);// "<p><img src=\"/url.png\" alt=\"text\" class=\"border\" /></p>\n"classImageAttributeProviderimplementsAttributeProvider {@OverridepublicvoidsetAttributes(Nodenode,StringtagName,Map<String,String>attributes) {if (nodeinstanceofImage) {attributes.put("class","border");        }    }}

Customize HTML rendering

If you want to do more than just change attributes, there's also a wayto take complete control over how HTML is rendered.

In this example, we're changing the rendering of indented code blocks toonly wrap them inpre instead ofpre andcode:

Parserparser =Parser.builder().build();HtmlRendererrenderer =HtmlRenderer.builder()        .nodeRendererFactory(newHtmlNodeRendererFactory() {publicNodeRenderercreate(HtmlNodeRendererContextcontext) {returnnewIndentedCodeBlockNodeRenderer(context);            }        })        .build();Nodedocument =parser.parse("Example:\n\n    code");renderer.render(document);// "<p>Example:</p>\n<pre>code\n</pre>\n"classIndentedCodeBlockNodeRendererimplementsNodeRenderer {privatefinalHtmlWriterhtml;IndentedCodeBlockNodeRenderer(HtmlNodeRendererContextcontext) {this.html =context.getWriter();    }@OverridepublicSet<Class<?extendsNode>>getNodeTypes() {// Return the node types we want to use this renderer for.returnSet.of(IndentedCodeBlock.class);    }@Overridepublicvoidrender(Nodenode) {// We only handle one type as per getNodeTypes, so we can just cast it here.IndentedCodeBlockcodeBlock = (IndentedCodeBlock)node;html.line();html.tag("pre");html.text(codeBlock.getLiteral());html.tag("/pre");html.line();    }}

Add your own node types

In case you want to store additional data in the document or have customelements in the resulting HTML, you can create your own subclass ofCustomNode and add instances as child nodes to existing nodes.

To define the HTML rendering for them, you can use aNodeRenderer asexplained above.

Customize parsing

There are a few ways to extend parsing or even override built-in parsing,all of them via methods onParser.Builder(seeBlocks and inlines in the spec for an overview of blocks/inlines):

Parsing of specific block types (e.g. headings, code blocks, etc) can beenabled/disabled withenabledBlockTypes
Parsing of blocks can be extended/overridden withcustomBlockParserFactory
Parsing of inline content can be extended/overridden withcustomInlineContentParserFactory
Parsing ofdelimiters in inline content can beextended withcustomDelimiterProcessor
Processing of links can be customized withlinkProcessor andlinkMarker

Thread-safety

Both theParser andHtmlRenderer are designed so that you canconfigure them once using the builders and then use them multipletimes/from multiple threads. This is done by separating the state forparsing/rendering from the configuration.

Having said that, there might be bugs of course. If you find one, pleasereport an issue.

API documentation

Javadocs are available online onjavadoc.io.

Extensions

Extensions need to extend the parser, or the HTML renderer, or both. Touse an extension, the builder objects can be configured with a list ofextensions. Because extensions are optional, they live in separateartifacts, so additional dependencies need to be added as well.

Let's look at how to enable tables from GitHub Flavored Markdown.First, add an additional dependency (seeMaven Central for others):

<dependency>    <groupId>org.commonmark</groupId>    <artifactId>commonmark-ext-gfm-tables</artifactId>    <version>0.25.0</version></dependency>

Then, configure the extension on the builders:

importorg.commonmark.ext.gfm.tables.TablesExtension;List<Extension>extensions =List.of(TablesExtension.create());Parserparser =Parser.builder()        .extensions(extensions)        .build();HtmlRendererrenderer =HtmlRenderer.builder()        .extensions(extensions)        .build();

To configure another extension in the above example, just add it to the list.

The following extensions are developed with this library, each in theirown artifact.

Autolink

Turns plain links such as URLs and email addresses into links (based onautolink-java).

Use classAutolinkExtension from artifactcommonmark-ext-autolink.

Strikethrough

Enables strikethrough of text by enclosing it in~~. For example, inhey ~~you~~,you will be rendered as strikethrough text.

Use classStrikethroughExtension in artifactcommonmark-ext-gfm-strikethrough.

Tables

Enables tables using pipes as inGitHub Flavored Markdown.

Use classTablesExtension in artifactcommonmark-ext-gfm-tables.

Footnotes

Enables footnotes like inGitHuborPandoc:

Main text[^1][^1]: Additional text in a footnote

Inline footnotes like^[inline footnote] are also supported when enabled viaFootnotesExtension.Builder#inlineFootnotes.

Use classFootnotesExtension in artifactcommonmark-ext-footnotes.

Heading anchor

Enables adding auto generated "id" attributes to heading tags. The "id"is based on the text of the heading.

# Heading will be rendered as:

<h1>Heading</h1>

Use classHeadingAnchorExtension in artifactcommonmark-ext-heading-anchor.

In case you want custom rendering of the heading instead, you can usetheIdGenerator class directly together with aHtmlNodeRendererFactory (see example above).

Ins

Enables underlining of text by enclosing it in++. For example, inhey ++you++,you will be rendered as underline text. Uses the <ins> tag.

Use classInsExtension in artifactcommonmark-ext-ins.

YAML front matter

Adds support for metadata through a YAML front matter block. This extension only supports a subset of YAML syntax. Here's an example of what's supported:

---key: valuelist:  - value 1  - value 2literal: |  this is literal value.  literal values 2---document start here

Use classYamlFrontMatterExtension in artifactcommonmark-ext-yaml-front-matter. To fetch metadata, useYamlFrontMatterVisitor.

Image Attributes

Adds support for specifying attributes (specifically height and width) for images.

The attribute elements are given askey=value pairs inside curly braces{ } after the image node to which they apply,for example:

![text](/url.png){width=640 height=480}

will be rendered as:

<img src="/url.png" alt="text" width="640" height="480" />

Use classImageAttributesExtension in artifactcommonmark-ext-image-attributes.

Note: since this extension uses curly braces{} as its delimiters (inStylesDelimiterProcessor), this means thatother delimiter processorscannot use curly braces for delimiting.

Task List Items

Adds support for tasks as list items.

A task can be represented as a list item where the first non-whitespace character is a left bracket[, then a singlewhitespace character or the letterx in lowercase or uppercase, then a right bracket] followed by at least onewhitespace before any other content.

For example:

- [ ] task #1- [x] task #2

will be rendered as:

<ul><li><input type="checkbox" disabled=""> task #1</li><li><input type="checkbox" disabled="" checked=""> task #2</li></ul>

Use classTaskListItemsExtension in artifactcommonmark-ext-task-list-items.

Third-party extensions

You can also find other extensions in the wild:

commonmark-ext-notifications: this extension allows to easily create notifications/admonitions paragraphs likeINFO,SUCCESS,WARNING orERROR

Used by

Some users of this library (feel free to raise a PR if you want to be added):

Atlassian (where the library was initially developed)
Java (OpenJDK) (link)
Gerrit code review/Gitiles (link)
Clerk moldable live programming for Clojure
Znai
Open Note a markdown editor and note-taking app for Android
Quarkus Roq The Roq Static Site Generator allows to easily create a static website or blog using Quarkus super-powers.
Lucee