- Notifications
You must be signed in to change notification settings - Fork301
Java library for parsing and rendering CommonMark (Markdown)
License
commonmark/commonmark-java
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Java library for parsing and renderingMarkdown text according to theCommonMark specification (and some extensions).
Provides classes for parsing input to an abstract syntax tree (AST),visiting and manipulating nodes, and rendering to HTML or back to Markdown.It started out as a port ofcommonmark.js, but has since evolved into anextensible library with the following features:
- Small (core has no dependencies, extensions in separate artifacts)
- Fast (10-20 times faster thanpegdown which used to be a popular Markdownlibrary, see benchmarks in repo)
- Flexible (manipulate the AST after parsing, customize HTML rendering)
- Extensible (tables, strikethrough, autolinking and more, see below)
The library is supported on Java 11 and later. It works on Android too,but that is on a best-effort basis, please report problems. For Android theminimum API level is 19, see thecommonmark-android-testdirectory.
Coordinates for core library (see all onMaven Central):
<dependency> <groupId>org.commonmark</groupId> <artifactId>commonmark</artifactId> <version>0.24.0</version></dependency>
The module names to use in Java 9 areorg.commonmark
,org.commonmark.ext.autolink
, etc, corresponding to package names.
Note that for 0.x releases of this library, the API is not considered stableyet and may break between minor releases. After 1.0,Semantic Versioning willbe followed. A package containingbeta
means it's not subject to stable APIguarantees yet; but for normal usage it should not be necessary to use.
See thespec.txtfile if you're wondering which version of the spec is currentlyimplemented. Also check out theCommonMark dingus for getting familiarwith the syntax or trying out edge cases. If you clone the repository,you can also use theDingusApp
class to try out things interactively.
importorg.commonmark.node.*;importorg.commonmark.parser.Parser;importorg.commonmark.renderer.html.HtmlRenderer;Parserparser =Parser.builder().build();Nodedocument =parser.parse("This is *Markdown*");HtmlRendererrenderer =HtmlRenderer.builder().build();renderer.render(document);// "<p>This is <em>Markdown</em></p>\n"
This uses the parser and renderer with default options. Both builders havemethods for configuring their behavior:
escapeHtml(true)
onHtmlRenderer
will escape raw HTML tags and blocks.sanitizeUrls(true)
onHtmlRenderer
will strip potentially unsafe URLsfrom<a>
and<img>
tags- For all available options, see methods on the builders.
Note that this library doesn't try to sanitize the resulting HTML with regardsto which tags are allowed, etc. That is the responsibility of the caller, andif you expose the resulting HTML, you probably want to run a sanitizer on itafter this.
importorg.commonmark.node.*;importorg.commonmark.renderer.markdown.MarkdownRenderer;MarkdownRendererrenderer =MarkdownRenderer.builder().build();Nodedocument =newDocument();Headingheading =newHeading();heading.setLevel(2);heading.appendChild(newText("My title"));document.appendChild(heading);renderer.render(document);// "## My title\n"
For rendering to plain text with minimal markup, there's alsoTextContentRenderer
.
After the source text has been parsed, the result is a tree of nodes.That tree can be modified before rendering, or just inspected withoutrendering:
Nodenode =parser.parse("Example\n=======\n\nSome more text");WordCountVisitorvisitor =newWordCountVisitor();node.accept(visitor);visitor.wordCount;// 4classWordCountVisitorextendsAbstractVisitor {intwordCount =0;@Overridepublicvoidvisit(Texttext) {// This is called for all Text nodes. Override other visit methods for other node types.// Count words (this is just an example, don't actually do it this way for various reasons).wordCount +=text.getLiteral().split("\\W+").length;// Descend into children (could be omitted in this case because Text nodes don't have children).visitChildren(text); }}
If you want to know where a parsedNode
appeared in the input source text,you can request the parser to return source positions like this:
varparser =Parser.builder().includeSourceSpans(IncludeSourceSpans.BLOCKS_AND_INLINES).build();
Then parse nodes and inspect source positions:
varsource ="foo\n\nbar *baz*";vardoc =parser.parse(source);varemphasis =doc.getLastChild().getLastChild();vars =emphasis.getSourceSpans().get(0);s.getLineIndex();// 2 (third line)s.getColumnIndex();// 4 (fifth column)s.getInputIndex();// 9 (string index 9)s.getLength();// 5source.substring(s.getInputIndex(),s.getInputIndex() +s.getLength());// "*baz*"
If you're only interested in blocks and not inlines, useIncludeSourceSpans.BLOCKS
.
Sometimes you might want to customize how HTML is rendered. If all youwant to do is add or change attributes on some elements, there's asimple way to do that.
In this example, we register a factory for anAttributeProvider
on therenderer to set aclass="border"
attribute onimg
elements.
Parserparser =Parser.builder().build();HtmlRendererrenderer =HtmlRenderer.builder() .attributeProviderFactory(newAttributeProviderFactory() {publicAttributeProvidercreate(AttributeProviderContextcontext) {returnnewImageAttributeProvider(); } }) .build();Nodedocument =parser.parse("");renderer.render(document);// "<p><img src=\"/url.png\" alt=\"text\" class=\"border\" /></p>\n"classImageAttributeProviderimplementsAttributeProvider {@OverridepublicvoidsetAttributes(Nodenode,StringtagName,Map<String,String>attributes) {if (nodeinstanceofImage) {attributes.put("class","border"); } }}
If you want to do more than just change attributes, there's also a wayto take complete control over how HTML is rendered.
In this example, we're changing the rendering of indented code blocks toonly wrap them inpre
instead ofpre
andcode
:
Parserparser =Parser.builder().build();HtmlRendererrenderer =HtmlRenderer.builder() .nodeRendererFactory(newHtmlNodeRendererFactory() {publicNodeRenderercreate(HtmlNodeRendererContextcontext) {returnnewIndentedCodeBlockNodeRenderer(context); } }) .build();Nodedocument =parser.parse("Example:\n\n code");renderer.render(document);// "<p>Example:</p>\n<pre>code\n</pre>\n"classIndentedCodeBlockNodeRendererimplementsNodeRenderer {privatefinalHtmlWriterhtml;IndentedCodeBlockNodeRenderer(HtmlNodeRendererContextcontext) {this.html =context.getWriter(); }@OverridepublicSet<Class<?extendsNode>>getNodeTypes() {// Return the node types we want to use this renderer for.returnSet.of(IndentedCodeBlock.class); }@Overridepublicvoidrender(Nodenode) {// We only handle one type as per getNodeTypes, so we can just cast it here.IndentedCodeBlockcodeBlock = (IndentedCodeBlock)node;html.line();html.tag("pre");html.text(codeBlock.getLiteral());html.tag("/pre");html.line(); }}
In case you want to store additional data in the document or have customelements in the resulting HTML, you can create your own subclass ofCustomNode
and add instances as child nodes to existing nodes.
To define the HTML rendering for them, you can use aNodeRenderer
asexplained above.
There are a few ways to extend parsing or even override built-in parsing,all of them via methods onParser.Builder
(seeBlocks and inlines in the spec for an overview of blocks/inlines):
- Parsing of specific block types (e.g. headings, code blocks, etc) can beenabled/disabled with
enabledBlockTypes
- Parsing of blocks can be extended/overridden with
customBlockParserFactory
- Parsing of inline content can be extended/overridden with
customInlineContentParserFactory
- Parsing ofdelimiters in inline content can beextended with
customDelimiterProcessor
- Processing of links can be customized with
linkProcessor
andlinkMarker
Both theParser
andHtmlRenderer
are designed so that you canconfigure them once using the builders and then use them multipletimes/from multiple threads. This is done by separating the state forparsing/rendering from the configuration.
Having said that, there might be bugs of course. If you find one, pleasereport an issue.
Javadocs are available online onjavadoc.io.
Extensions need to extend the parser, or the HTML renderer, or both. Touse an extension, the builder objects can be configured with a list ofextensions. Because extensions are optional, they live in separateartifacts, so additional dependencies need to be added as well.
Let's look at how to enable tables from GitHub Flavored Markdown.First, add an additional dependency (seeMaven Central for others):
<dependency> <groupId>org.commonmark</groupId> <artifactId>commonmark-ext-gfm-tables</artifactId> <version>0.24.0</version></dependency>
Then, configure the extension on the builders:
importorg.commonmark.ext.gfm.tables.TablesExtension;List<Extension>extensions =List.of(TablesExtension.create());Parserparser =Parser.builder() .extensions(extensions) .build();HtmlRendererrenderer =HtmlRenderer.builder() .extensions(extensions) .build();
To configure another extension in the above example, just add it to the list.
The following extensions are developed with this library, each in theirown artifact.
Turns plain links such as URLs and email addresses into links (based onautolink-java).
Use classAutolinkExtension
from artifactcommonmark-ext-autolink
.
Enables strikethrough of text by enclosing it in~~
. For example, inhey ~~you~~
,you
will be rendered as strikethrough text.
Use classStrikethroughExtension
in artifactcommonmark-ext-gfm-strikethrough
.
Enables tables using pipes as inGitHub Flavored Markdown.
Use classTablesExtension
in artifactcommonmark-ext-gfm-tables
.
Enables footnotes like inGitHuborPandoc:
Main text[^1][^1]: Additional text in a footnote
Inline footnotes like^[inline footnote]
are also supported when enabled viaFootnotesExtension.Builder#inlineFootnotes
.
Use classFootnotesExtension
in artifactcommonmark-ext-footnotes
.
Enables adding auto generated "id" attributes to heading tags. The "id"is based on the text of the heading.
# Heading
will be rendered as:
<h1>Heading</h1>
Use classHeadingAnchorExtension
in artifactcommonmark-ext-heading-anchor
.
In case you want custom rendering of the heading instead, you can usetheIdGenerator
class directly together with aHtmlNodeRendererFactory
(see example above).
Enables underlining of text by enclosing it in++
. For example, inhey ++you++
,you
will be rendered as underline text. Uses the <ins> tag.
Use classInsExtension
in artifactcommonmark-ext-ins
.
Adds support for metadata through a YAML front matter block. This extension only supports a subset of YAML syntax. Here's an example of what's supported:
---key: valuelist: - value 1 - value 2literal: | this is literal value. literal values 2---document start here
Use classYamlFrontMatterExtension
in artifactcommonmark-ext-yaml-front-matter
. To fetch metadata, useYamlFrontMatterVisitor
.
Adds support for specifying attributes (specifically height and width) for images.
The attribute elements are given askey=value
pairs inside curly braces{ }
after the image node to which they apply,for example:
{width=640 height=480}
will be rendered as:
<img src="/url.png" alt="text" width="640" height="480" />
Use classImageAttributesExtension
in artifactcommonmark-ext-image-attributes
.
Note: since this extension uses curly braces{
}
as its delimiters (inStylesDelimiterProcessor
), this means thatother delimiter processorscannot use curly braces for delimiting.
Adds support for tasks as list items.
A task can be represented as a list item where the first non-whitespace character is a left bracket[
, then a singlewhitespace character or the letterx
in lowercase or uppercase, then a right bracket]
followed by at least onewhitespace before any other content.
For example:
- [ ] task #1- [x] task #2
will be rendered as:
<ul><li><input type="checkbox" disabled=""> task #1</li><li><input type="checkbox" disabled="" checked=""> task #2</li></ul>
Use classTaskListItemsExtension
in artifactcommonmark-ext-task-list-items
.
You can also find other extensions in the wild:
- commonmark-ext-notifications: this extension allows to easily create notifications/admonitions paragraphs like
INFO
,SUCCESS
,WARNING
orERROR
Some users of this library (feel free to raise a PR if you want to be added):
- Atlassian (where the library was initially developed)
- Java (OpenJDK) (link)
- Gerrit code review/Gitiles (link)
- Clerk moldable live programming for Clojure
- Znai
- Markwon: Android library for rendering markdown as system-native Spannables
- flexmark-java: Fork that added support for a lot more syntax and flexibility
SeeCONTRIBUTING.md file.
Copyright (c) 2015, Robin Stocker
BSD (2-clause) licensed, see LICENSE.txt file.
About
Java library for parsing and rendering CommonMark (Markdown)