Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/djotPublic

A light markup language

License

NotificationsYou must be signed in to change notification settings

jgm/djot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Djot is a light markup syntax. It derives most of its featuresfromcommonmark, but it fixesa few things that make commonmark's syntax complex and difficultto parse efficiently. It is also much fuller-featured thancommonmark, with support for definition lists, footnotes,tables, several new kinds of inline formatting (insert, delete,highlight, superscript, subscript), math, smart punctuation,attributes that can be applied to any element, and genericcontainers for block-level, inline-level, and raw content.

The project began as an attempt to implement some of theideas I suggested in my essayBeyond Markdown. (SeeRationale, below.)

This repository contains aSyntax Description,aCheatsheet, and aQuick Start for Markdown Usersthat outlines the main differences between djot and Markdown.

You can try djot on thedjot playgroundwithout installing anything locally.

Rationale

Here are some design goals:

  1. It should be possible to parse djot markup in linear time,with no backtracking.

  2. Parsing of inline elements should be "local" and not dependon what references are defined later. This is not the casein commonmark:[foo][bar] might be "[foo]" followed bya link with text "bar", or "[foo][bar]", or a link withtext "foo", or a link with text "foo" followed by"[bar]", depending on whether the references[foo] and[bar] are defined elsewhere (perhaps later) in thedocument. This non-locality makes accurate syntax highlightingnearly impossible.

  3. Rules for emphasis should be simpler. The fact that doubledcharacters are used for strong emphasis in commonmark leads tomany potential ambiguities, which are resolved by a dauntinglist of 17 rules. It is hard to form a good mental modelof these rules. Most of the time they interpret things theway a human would most naturally interpret them---but not always.

  4. Expressive blind spots should be avoided. In commonmark,you're out of luck if you want to produce the HTMLa<em>?</em>b, because the flanking rules classifythe first asterisk ina*?*b as right-flanking. There is away around this, but it's ugly (using a numerical entity insteadofa). In djot there should not be expressive blind spots ofthis kind.

  5. Rules for what content belongs to a list item should be simple.In commonmark, content under a list item must be indented as faras the first non-space content after the list marker (or fivespaces after the marker, in case the list item begins with indentedcode). Many people get confused when their indented content isnot indented far enough and does not get included in the list item.

  6. Parsers should not be forced to recognize unicode character classes,HTML tags, or entities, or perform unicode case folding.That adds a lot of complexity.

  7. The syntax should be friendly to hard-wrapping: hard-wrappinga paragraph should not lead to different interpretations, e.g.when a number followed by a period ends up at the beginning ofa line. (I anticipate that many will ask, why hard-wrap atall? Answer: so that your document is readable just as itis, without conversion to HTML and without special editormodes that soft-wrap long lines. Remember that source readabilitywas one of the prime goals of Markdown and Commonmark.)

  8. The syntax should compose uniformly, in the following sense:if a sequence of lines has a certain meaning outside a listitem or block quote, it should have the same meaning inside it.This principle isarticulated in the commonmarkspec,but the spec doesn't completely abide by it (seecommonmark/commonmark-spec#634).

  9. It should be possible to attach arbitrary attributes to anyelement.

  10. There should be generic containers for text, inline content,and block-level content, to which arbitrary attributes can be applied.This allows for extensibility using AST transformations.

  11. The syntax should be kept as simple as possible, consistent withthese goals. Thus, for example, we don't need two differentstyles of headings or code blocks.

These goals motivated the following decisions:

  • Block-level elements can't interrupt paragraphs (or headings),because of goal 7. So in djot the following is a single paragraph, not(as commonmark sees it) a paragraph followed by an ordered listfollowed by a block quote followed by a section heading:

    My favorite number is probably the number1. It's the smallest natural number that is> 0. With pencils, though, I prefer a# 2.

    Commonmark does make some concessions to goal 7, by forbiddinglists beginning with markers other than1. to interrupt paragraphs.But this is a compromise and a sacrifice of regularity andpredictability in the syntax. Better just to have a general rule.

  • An implication of the last decision is that, although "tight"lists are still possible (without blank lines between items),asublist must always be preceded by a blank line. Thus,instead of

    - Fruits  - apple  - orange

    you must write

    - Fruits  - apple  - orange

    (This blank line doesn't count against "tightness.")reStructuredText makes the same design decision.

  • Also to promote goal 7, we allow headings to "lazily"span multiple lines:

    ## My excessively long section heading is toolong to fit on one line.

    While we're at it, we'll simplify by removing setext-style(underlined) headings. We don't really need two headingsyntaxes (goal 11).

  • To meet goal 5, we have a very simple rule: anything that isindented beyond the start of the list marker belongs inthe list item.

    1. list item  > block quote inside item 12. second item

    In commonmark, this would be parsed as two separate lists witha block quote between them, because the block quote is notindented far enough. What kept us from using this simple rulein commonmark was indented code blocks. If list items aregoing to contain an indented code block, we need to know atwhat column to start counting the indentation, so we fixed onthe column that makes the list look best (the first column ofnon-space content after the marker):

    1.  A commonmark list item with an indented code block in it.        code!

    In djot, we just get rid of indented code blocks. Most peopleprefer fenced code blocks anyway, and we don't need twodifferent ways of writing code blocks (goal 11).

  • To meet goal 6 and to avoid the complex rules commonmarkadopted for handling raw HTML, we simply do not allow raw HTML,except in explicitly marked contexts, e.g.`<a>`{=html} or

    ``` =html<table><tr><td>foo</td></tr></table>```

    Unlike Markdown, djot is not HTML-centric. Djot documentsmight be rendered to a variety of different formats, so althoughwe want to provide the flexibility to include raw content inany output format, there is no reason to privilege HTML. Forsimilar reasons we do not interpret HTML entities, ascommonmark does.

  • To meet goal 2, we make reference link parsing local.Anything that looks like[foo][bar] or[foo][] getstreated as a reference link, regardless of whether[foo]is defined later in the document. A corollary is that wemust get rid of shortcut link syntax, with just a singlebracket pair,[like this]. It must always be clear what is alink without needing to know the surrounding context.

  • In support of goal 6, reference links are no longercase-insensitive. Supporting this beyond an ASCII contextwould require building in unicode case folding to everyimplementation, and it doesn't seem necessary.

  • A space or newline is required after> in block quotes,to avoid the violations of the principle of uniformitynoted in goal 8:

    >This is not a>block quote in djot.
  • To meet goal 3, we avoid using doubled characters forstrong emphasis. Instead, we use_ for emphasis and* forstrong emphasis. Emphasis can begin with one of thesecharacters, as long as it is not followed by a space,and will end when a similar character is encountered,as long as it is not preceded by a space and somedifferent characters have occurred in between. In the caseof overlap, the first one to be closed takes precedence.(This simple rule also avoids the need we had in commonmark todetermine unicode character classes---goal 6.)

  • Taken just by itself, this last change would introduce anumber of expressive blind spots. For example, given thesimple rule,

    _(_foo_)_

    parses as

    <em>(</em>foo<em>)</em>

    rather than

    <em>(<em>foo</em>)</em>

    If you want the latterinterpretation, djot allows you to use the syntax

    _({_foo_})_

    The{_ is a_ that can only open emphasis, and the_} isa_ that can only close emphasis. The same can be done with* or any other inline formatting marker that is ambiguousbetween an opener and closer. These curly braces arerequired for certain inline markup, e.g.{=highlighting=},{+insert+}, and{-delete-}, since the characters=,+,and- are found often in ordinary text.

  • In support of goal 1, code span parsing does not backtrack.So if you open a code span and don't close it, it extends tothe end of the paragraph. That is similar to the way fencedcode blocks work in commonmark.

    This is `inline code.
  • In support of goal 9, a generic attribute syntax isintroduced. Attributes can be attached to any block-levelelement by putting them on the line before it, and to anyinline-level element by putting them directly after it.

    {#introduction}This is the introductory paragraph, withan identifier `introduction`.           {.important color="blue" #heading}## headingThe word *atelier*{weight="600"} is French.
  • Since we are going to have generic attributes, we no longersupport quoted titles in links. One can add a titleattribute if needed, but this isn't very common, so we don'tneed a special syntax for it:

    [Link text](url){title="Click me!"}
  • Fenced divs and bracketed spans are introduced in order toallow attributes to be attached to arbitrary sequences ofblock-level or inline-level elements. For example,

    {#warning .sidebar}::: WarningThis is a warning.Here is a word in [français]{lang=fr}.:::

Syntax

For a full syntax reference, see thesyntax description.

A vim syntax highlighting definition for djot is provided ineditors/vim/.

Implementations

There are currently six djot implementations:

Here are some benchmarks of these implementations.

djot.lua was the original reference implementation, butcurrent development is focused on djot.js, and it is possiblethat djot.lua will not be kept up to date with the latest syntaxchanges.

Tooling

File extension

The extension.dj may be used to indicate that the contentsof a file are djot-formatted text.

License

The code and documentation are released under the MIT license.


[8]ページ先頭

©2009-2025 Movatter.jp