pulldown-cmark/pulldown-cmarkPublic

NotificationsYou must be signed in to change notification settings
Fork254
Star2.3k

An efficient, reliable parser for CommonMark, a standard dialect of Markdown

License

MIT license

2.3k stars 254 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,719 Commits
.github/workflows		.github/workflows
bench		bench
dos-fuzzer		dos-fuzzer
fuzz		fuzz
guide		guide
pulldown-cmark-escape		pulldown-cmark-escape
pulldown-cmark		pulldown-cmark
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Repository files navigation

pulldown-cmark

Documentation

This library is a pull parser forCommonMark, writteninRust. It comes with a simple command-line tool,useful for rendering to HTML, and is also designed to be easy to use from asa library.

It is designed to be:

Fast; a bare minimum of allocation and copying
Safe; written in pure Rust with no unsafe blocks (except in the opt-in SIMD feature)
Versatile; in particular source-maps are supported
Correct; the goal is 100% compliance with theCommonMark spec

Further, it optionally supports parsing footnotes,Github flavored tables,Github flavored task lists andstrikethrough.

Rustc 1.71.1 or newer is required to build the crate.

Example

Example usage:

// Create parser with example Markdown text.let markdown_input ="hello world";let parser = pulldown_cmark::Parser::new(markdown_input);// Write to a new String buffer.letmut html_output =String::new();pulldown_cmark::html::push_html(&mut html_output, parser);assert_eq!(&html_output,"<p>hello world</p>\n");

Why a pull parser?

There are many parsers for Markdown and its variants, but to my knowledge noneuse pull parsing. Pull parsing has become popular for XML, especially formemory-conscious applications, because it uses dramatically less memory thanconstructing a document tree, but is much easier to use than push parsers. Pushparsers are notoriously difficult to use, and also often error-prone because ofthe need for user to delicately juggle state in a series of callbacks.

In a clean design, the parsing and rendering stages are neatly separated, butthis is often sacrificed in the name of performance and expedience. Many Markdownimplementations mix parsing and rendering together, and even designs that tryto separate them (such as the popularhoedown),make the assumption that the rendering process can be fully represented as aserialized string.

Pull parsing is in some sense the most versatile architecture. It's possible todrive a push interface, also with minimal memory, and quite straightforward toconstruct an AST. Another advantage is that source-map information (the mappingbetween parsed blocks and offsets within the source text) is readily available;you can callinto_offset_iter() to create an iterator that yields(Event, Range)pairs, where the second element is the event's corresponding range in the sourcedocument.

While manipulating ASTs is the most flexible way to transform documents,operating on iterators is surprisingly easy, and quite efficient. Here, forexample, is the code to transform soft line breaks into hard breaks:

let parser = parser.map(|event|match event{Event::SoftBreak =>Event::HardBreak,_ => event});

Or expanding an abbreviation in text:

let parser = parser.map(|event|match event{Event::Text(text) =>Event::Text(text.replace("abbr","abbreviation").into()),_ => event});

Another simple example is code to determine the max nesting level:

letmut max_nesting =0;letmut level =0;for eventin parser{match event{Event::Start(_) =>{level +=1;max_nesting = std::cmp::max(max_nesting, level);}Event::End(_) => level -=1,_ =>()}}

Note that consecutive text events can happen due to the manner in which theparser evaluates the source. A utilityTextMergeStream exists to improvethe comfort of iterating the events:

use pulldown_cmark::{Event,Parser,Options};let markdown_input ="Hello world, this is a ~~complicated~~ *very simple* example.";let iterator =TextMergeStream::new(Parser::new(markdown_input));for eventin iterator{match event{Event::Text(text) =>println!("{}", text),        _ =>{}}}

There are some basic but fully functional examples of the usage of the crate in theexamples directory of this repository.

Using Rust idiomatically

A lot of the internal scanning code is written at a pretty low level (itpretty much scans byte patterns for the bits of syntax), but the externalinterface is designed to be idiomatic Rust.

Pull parsers are at heart an iterator of events (start and end tags, text,and other bits and pieces). The parser data structure implements theRust Iterator trait directly, and Event is an enum. Thus, you can use thefull power and expressivity of Rust's iterator infrastructure, includingfor loops andmap (as in the examples above), collecting the events intoa vector (for recording, playback, and manipulation), and more.

Further, theText event (representing text) is a small copy-on-write string.The vast majority of text fragments are justslices of the source document. For these, copy-on-write gives a convenientrepresentation that requires no allocation or copying, but allocatedstrings are available when they're needed. Thus, when rendering text toHTML, most text is copied just once, from the source document to theHTML buffer.

When using the pulldown-cmark's own HTML renderer, make sure to write to a bufferedtarget like aVec<u8> orString. Since it performs many (very) small writes, writingdirectly to stdout, files, or sockets is detrimental to performance. Such writers canbe wrapped in aBufWriter.

Build options

By default, the binary is built as well. If you don't want/need it, then build like this:

> cargo build --no-default-features

Or add this package as dependency of your project usingcargo add:

> cargo add pulldown-cmark --no-default-features

SIMD accelerated scanners are available for the x64 platform from version 0.5 onwards. Toenable them, build with simd feature:

> cargo build --release --features simd

Or add this package as dependency of your project with the feature usingcargo add:

> cargo add pulldown-cmark --no-default-features --features=simd

For a higher release performance you may want this configuration in your profile release:

lto = truecodegen-units = 1panic = "abort"

`no_std` support

no_std support can be enabled by compiling with--no-default-features todisablestd support and--features hashbrown forHash collections that are onlydefined instd for internal usages in crate. For example:

[dependencies]pulldown-cmark = {version ="*",default-features =false,features = ["hashbrown","other features"] }

To support bothstd andno_std builds in project, you can use the followingin yourCargo.toml:

[features]default = ["std","other features"]std = ["pulldown-cmark/std"]hashbrown = ["pulldown-cmark/hashbrown"]other_features = [][dependencies]pulldown-cmark = {version ="*",default-features =false }

Authors

The main author is Raph Levien. The implementation of the new design (v0.3+) wascompleted by Marcus Klaas de Vries. Since 2023, the development has been drivenby Martín Pozo, Michael Howell, Roope Salmi and Martin Geisler.