Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

utility to transform mdast to hast

License

NotificationsYou must be signed in to change notification settings

syntax-tree/mdast-util-to-hast

Repository files navigation

BuildCoverageDownloadsSizeSponsorsBackersChat

mdast utility to transform tohast.

Contents

What is this?

This package is a utility that takes anmdast (markdown) syntax tree asinput and turns it into ahast (HTML) syntax tree.

When should I use this?

This project is useful when you want to deal with ASTs and turn markdown toHTML.

The hast utilityhast-util-to-mdast does the inverse ofthis utility.It turns HTML into markdown.

The remark pluginremark-rehype wraps this utility to alsoturn markdown to HTML at a higher-level (easier) abstraction.

Install

This package isESM only.In Node.js (version 16+), install withnpm:

npm install mdast-util-to-hast

In Deno withesm.sh:

import{toHast}from'https://esm.sh/mdast-util-to-hast@13'

In browsers withesm.sh:

<scripttype="module">import{toHast}from'https://esm.sh/mdast-util-to-hast@13?bundle'</script>

Use

Say we have the followingexample.md:

##Hello**World**!

…and next to it a moduleexample.js:

import{fs}from'node:fs/promises'import{toHtml}from'hast-util-to-html'import{fromMarkdown}from'mdast-util-from-markdown'import{toHast}from'mdast-util-to-hast'constmarkdown=String(awaitfs.readFile('example.md'))constmdast=fromMarkdown(markdown)consthast=toHast(mdast)consthtml=toHtml(hast)console.log(html)

…now runningnode example.js yields:

<h2>Hello<strong>World</strong>!</h2>

API

This package exports the identifiersdefaultFootnoteBackContent,defaultFootnoteBackLabel,defaultHandlers, andtoHast.There is no default export.

defaultFootnoteBackContent(referenceIndex, rereferenceIndex)

Generate the default content that GitHub uses on backreferences.

Parameters
  • referenceIndex (number)— index of the definition in the order that they are first referenced,0-indexed
  • rereferenceIndex (number)— index of calls to the same definition, 0-indexed
Returns

Content (Array<ElementContent>).

defaultFootnoteBackLabel(referenceIndex, rereferenceIndex)

Generate the default label that GitHub uses on backreferences.

Parameters
  • referenceIndex (number)— index of the definition in the order that they are first referenced,0-indexed
  • rereferenceIndex (number)— index of calls to the same definition, 0-indexed
Returns

Label (string).

defaultHandlers

Default handlers for nodes (Handlers).

toHast(tree[, options])

Transform mdast to hast.

Parameters
Returns

hast tree (HastNode).

Notes
HTML

Raw HTML is available in mdast ashtml nodes and can be embeddedin hast as semistandardraw nodes.Most utilities ignoreraw nodes but two notable ones don’t:

  • hast-util-to-html also has an optionallowDangerousHtml which will output the raw HTML.This is typically discouraged as noted by the option name but is useful ifyou completely trust authors
  • hast-util-raw can handle the raw embedded HTML strings byparsing them into standard hast nodes (element,text, etc).This is a heavy task as it needs a full HTML parser, but it is the only wayto support untrusted content
Footnotes

Many options supported here relate to footnotes.Footnotes are not specified by CommonMark, which we follow by default.They are supported by GitHub, so footnotes can be enabled in markdown withmdast-util-gfm.

The optionsfootnoteBackLabel andfootnoteLabel define natural languagethat explains footnotes, which is hidden for sighted users but shown toassistive technology.When your page is not in English, you must define translated values.

Back references use ARIA attributes, but the section label itself uses aheading that is hidden with ansr-only class.To show it to sighted users, define different attributes infootnoteLabelProperties.

Clobbering

Footnotes introduces a problem, as it links footnote calls to footnotedefinitions on the page throughid attributes generated from user content,which results in DOM clobbering.

DOM clobbering is this:

<pid=x></p><script>alert(x)// `x` now refers to the DOM `p#x` element</script>

Elements by their ID are made available by browsers on thewindow object,which is a security risk.Using a prefix solves this problem.

More information on how to handle clobbering and the prefix is explained inExample: headings (DOM clobbering) inrehype-sanitize.

Unknown nodes

Unknown nodes are nodes with a type that isn’t inhandlers orpassThrough.The default behavior for unknown nodes is:

  • when the node has avalue (and doesn’t havedata.hName,data.hProperties, ordata.hChildren, see later), create a hasttextnode
  • otherwise, create a<div> element (which could be changed withdata.hName), with its children mapped from mdast to hast as well

This behavior can be changed by passing anunknownHandler.

FootnoteBackContentTemplate

Generate content for the backreference dynamically.

For the following markdown:

Alpha[^micromark], bravo[^micromark], and charlie[^remark].[^remark]: things about remark[^micromark]: things about micromark

This function will be called with:

  • 0 and0 for the backreference fromthings about micromark toalpha, as it is the first used definition, and the first call to it
  • 0 and1 for the backreference fromthings about micromark tobravo, as it is the first used definition, and the second call to it
  • 1 and0 for the backreference fromthings about remark tocharlie, as it is the second used definition
Parameters
  • referenceIndex (number)— index of the definition in the order that they are first referenced,0-indexed
  • rereferenceIndex (number)— index of calls to the same definition, 0-indexed
Returns

Content for the backreference when linking back from definitions to theirreference (Array<ElementContent>,ElementContent, orstring).

FootnoteBackLabelTemplate

Generate a back label dynamically.

For the following markdown:

Alpha[^micromark], bravo[^micromark], and charlie[^remark].[^remark]: things about remark[^micromark]: things about micromark

This function will be called with:

  • 0 and0 for the backreference fromthings about micromark toalpha, as it is the first used definition, and the first call to it
  • 0 and1 for the backreference fromthings about micromark tobravo, as it is the first used definition, and the second call to it
  • 1 and0 for the backreference fromthings about remark tocharlie, as it is the second used definition
Parameters
  • referenceIndex (number)— index of the definition in the order that they are first referenced,0-indexed
  • rereferenceIndex (number)— index of calls to the same definition, 0-indexed
Returns

Back label to use when linking back from definitions to their reference(string).

Handler

Handle a node (TypeScript type).

Parameters
Returns

Result (Array<HastNode> | HastNode | undefined).

Handlers

Handle nodes (TypeScript type).

Type
typeHandlers=Partial<Record<Nodes['type'],Handler>>

Options

Configuration (TypeScript type).

Fields
  • allowDangerousHtml (boolean, default:false)— whether to persist raw HTML in markdown in the hast tree
  • clobberPrefix (string, default:'user-content-')— prefix to use before theid property on footnotes to prevent them fromclobbering
  • file (VFile, optional)— corresponding virtual file representing the input document
  • footnoteBackContent(FootnoteBackContentTemplateorstring, default:defaultFootnoteBackContent)— content of the backreference back to references
  • footnoteBackLabel(FootnoteBackLabelTemplateorstring, default:defaultFootnoteBackLabel)— label to describe the backreference back to references
  • footnoteLabel (string, default:'Footnotes')— label to use for the footnotes section (affects screen readers)
  • footnoteLabelProperties(Properties, default:{className: ['sr-only']})— properties to use on the footnote label(note thatid: 'footnote-label' is always added as footnote calls use itwitharia-describedby to provide an accessible label)
  • footnoteLabelTagName (string, default:h2)— tag name to use for the footnote label
  • handlers (Handlers, optional)— extra handlers for nodes
  • passThrough (Array<Nodes['type']>, optional)— list of custom mdast node types to pass through (keep) in hast (note thatthe node itself is passed, but eventual children are transformed)
  • unknownHandler (Handler, optional)— handle all unknown nodes

Raw

Raw string of HTML embedded into HTML AST (TypeScript type).

Type
importtype{Data,Literal}from'hast'interfaceRawextendsLiteral{type:'raw'data?:RawData|undefined}interfaceRawDataextendsData{}

State

Info passed around about the current state (TypeScript type).

Fields
  • all ((node: MdastNode) => Array<HastNode>)— transform the children of an mdast parent to hast
  • applyData (<Type extends HastNode>(from: MdastNode, to: Type) => Type | HastElement)— honor thedata offrom and maybe generate an element instead ofto
  • definitionById (Map<string, Definition>)— definitions by their uppercased identifier
  • footnoteById (Map<string, FootnoteDefinition>)— footnote definitions by their uppercased identifier
  • footnoteCounts (Map<string, number>)— counts for how often the same footnote was called
  • footnoteOrder (Array<string>)— identifiers of order when footnote calls first appear in tree order
  • handlers (Handlers)— applied node handlers
  • one ((node: MdastNode, parent: MdastNode | undefined) => HastNode | Array<HastNode> | undefined)— transform an mdast node to hast
  • options (Options)— configuration
  • patch ((from: MdastNode, to: HastNode) => undefined)
  • wrap (<Type extends HastNode>(nodes: Array<Type>, loose?: boolean) => Array<Type | HastText>)— wrapnodes with line endings between each node, adds initial/final lineendings whenloose

Examples

Example: supporting HTML in markdown naïvely

If you completely trust authors (or plugins) and want to allow them to HTMLinmarkdown, and the last utility has anallowDangerousHtml option as well (suchashast-util-to-html) you can passallowDangerousHtml to this utility(mdast-util-to-hast):

import{fromMarkdown}from'mdast-util-from-markdown'import{toHast}from'mdast-util-to-hast'import{toHtml}from'hast-util-to-html'constmarkdown='It <i>works</i>! <img onerror="alert(1)">'constmdast=fromMarkdown(markdown)consthast=toHast(mdast,{allowDangerousHtml:true})consthtml=toHtml(hast,{allowDangerousHtml:true})console.log(html)

…now runningnode example.js yields:

<p>It<i>works</i>!<imgonerror="alert(1)"></p>

⚠️Danger: observe that the XSS attack through theonerror attributeis still present.

Example: supporting HTML in markdown properly

If you do not trust the authors of the input markdown, or if you want to makesure that further utilities can see HTML embedded in markdown, usehast-util-raw.The following example passesallowDangerousHtml to this utility(mdast-util-to-hast), then turns the raw embedded HTML into proper HTML nodes(hast-util-raw), and finally sanitizes the HTML by only allowing safe things(hast-util-sanitize):

import{raw}from'hast-util-raw'import{sanitize}from'hast-util-sanitize'import{toHtml}from'hast-util-to-html'import{fromMarkdown}from'mdast-util-from-markdown'import{toHast}from'mdast-util-to-hast'constmarkdown='It <i>works</i>! <img onerror="alert(1)">'constmdast=fromMarkdown(markdown)consthast=raw(toHast(mdast,{allowDangerousHtml:true}))constsafeHast=sanitize(hast)consthtml=toHtml(safeHast)console.log(html)

…now runningnode example.js yields:

<p>It<i>works</i>!<img></p>

👉Note: observe that the XSS attack through theonerror attributeis no longer present.

Example: footnotes in languages other than English

If you know that the markdown is authored in a language other than English,and you’re usingmicromark-extension-gfm andmdast-util-gfm to match howGitHub renders markdown, and you know that footnotes are (or can?) be used, youshould translate the labels associated with them.

Let’s first set the stage:

import{toHtml}from'hast-util-to-html'import{gfm}from'micromark-extension-gfm'import{fromMarkdown}from'mdast-util-from-markdown'import{gfmFromMarkdown}from'mdast-util-gfm'import{toHast}from'mdast-util-to-hast'constmarkdown='Bonjour[^1]\n\n[^1]: Monde!'constmdast=fromMarkdown(markdown,{extensions:[gfm()],mdastExtensions:[gfmFromMarkdown()]})consthast=toHast(mdast)consthtml=toHtml(hast)console.log(html)

…now runningnode example.js yields:

<p>Bonjour<sup><ahref="#user-content-fn-1"id="user-content-fnref-1"data-footnote-refaria-describedby="footnote-label">1</a></sup></p><sectiondata-footnotesclass="footnotes"><h2class="sr-only"id="footnote-label">Footnotes</h2><ol><liid="user-content-fn-1"><p>Monde!<ahref="#user-content-fnref-1"data-footnote-backref=""aria-label="Back to reference 1"class="data-footnote-backref"></a></p></li></ol></section>

This is a mix of English and French that screen readers can’t handle nicely.Let’s say our program does know that the markdown is in French.In that case, it’s important to translate and define the labels relating tofootnotes so that screen reader users can properly pronounce the page:

@@ -9,7 +9,16 @@ const mdast = fromMarkdown(markdown, {   extensions: [gfm()],   mdastExtensions: [gfmFromMarkdown()] })-const hast = toHast(mdast)+const hast = toHast(mdast, {+  footnoteLabel: 'Notes de bas de page',+  footnoteBackLabel(referenceIndex, rereferenceIndex) {+    return (+      'Retour à la référence ' ++      (referenceIndex + 1) ++      (rereferenceIndex > 1 ? '-' + rereferenceIndex : '')+    )+  }+}) const html = toHtml(hast) console.log(html)

…now runningnode example.js with the above patch applied yields:

@@ -1,8 +1,8 @@ <p>Bonjour<sup><a href="#user-content-fn-1" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>-<section data-footnotes><h2>Footnotes</h2>+<section data-footnotes><h2>Notes de bas de page</h2> <ol> <li>-<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Back to reference 1">↩</a></p>+<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Retour à la référence 1">↩</a></p> </li> </ol> </section>

Example: supporting custom nodes

This project supports CommonMark and the GFM constructs (footnotes,strikethrough, tables) and the frontmatter constructs YAML and TOML.Support can be extended to other constructs in two ways: a) with handlers, b)through fields on nodes.

For example, when we represent a mark element in markdown and want to turn itinto a<mark> element in HTML, we can use a handler:

import{toHtml}from'hast-util-to-html'import{toHast}from'mdast-util-to-hast'constmdast={type:'paragraph',children:[{type:'mark',children:[{type:'text',value:'x'}]}]}consthast=toHast(mdast,{handlers:{mark(state,node){return{type:'element',tagName:'mark',properties:{},children:state.all(node)}}}})console.log(toHtml(hast))

We can do the same through certain fields on nodes:

import{toHtml}from'hast-util-to-html'import{toHast}from'mdast-util-to-hast'constmdast={type:'paragraph',children:[{type:'mark',children:[{type:'text',value:'x'}],data:{hName:'mark'}}]}console.log(toHtml(toHast(mdast)))

Algorithm

This project by default handles CommonMark, GFM (footnotes, strikethrough,tables) and common frontmatter (YAML, TOML).

Existing handlers can be overwritten and handlers for more nodes can be added.It’s also possible to define how mdast is turned into hast through fields onnodes.

Default handling

The following table gives insight into what input turns into what output:

mdast nodemarkdown examplehast nodehtml example

blockquote

>A greater than…

element (blockquote)

<blockquote><p>A greater than…</p></blockquote>

break

A backslash\before a line break…

element (br)

<p>A backslash<br>before a line break…</p>

code

```jsbacktick.fences('for blocks')```

element (pre andcode)

<pre><codeclassName="language-js">backtick.fences('for blocks')</code></pre>

delete (GFM)

Two~~tildes~~ for delete.

element (del)

<p>Two<del>tildes</del> for delete.</p>

emphasis

Some*asterisks* for emphasis.

element (em)

<p>Some<em>asterisks</em> for emphasis.</p>

footnoteReference,footnoteDefinition(GFM)

With a[^caret].[^caret]: Stuff

element (section,sup,a)

<p>With a<sup><ahref="#fn-caret">1</a></sup>.</p>

heading

#One number sign…######Six number signs…

element (h1h6)

<h1>One number sign…</h1><h6>Six number signs…</h6>

html

<kbd>CMD+S</kbd>

Nothing (default),raw (whenallowDangerousHtml: true)

n/a

image

![Alt text](/logo.png"title")

element (img)

<p><imgsrc="/logo.png"alt="Alt text"title="title"></p>

imageReference,definition

![Alt text][logo][logo]:/logo.png"title"

element (img)

<p><imgsrc="/logo.png"alt="Alt text"title="title"></p>

inlineCode

Some`backticks` for inline code.

element (code)

<p>Some<code>backticks</code> for inline code.</p>

link

[Example](https://example.com"title")

element (a)

<p><ahref="https://example.com"title="title">Example</a></p>

linkReference,definition

[Example][][example]:https://example.com"title"

element (a)

<p><ahref="https://example.com"title="title">Example</a></p>

list,listItem

* asterisks for unordered items1. decimals and a dot for ordered items

element (li andol orul)

<ul><li>asterisks for unordered items</li></ul><ol><li>decimals and a dot for ordered items</li></ol>

paragraph

Just some text…

element (p)

<p>Just some text…</p>

root

Anything!

root

<p>Anything!</p>

strong

Two**asterisks** for strong.

element (strong)

<p>Two<strong>asterisks</strong> for strong.</p>

text

Anything!

text

<p>Anything!</p>

table,tableRow,tableCell

| Pipes|| -----|

element (table,thead,tbody,tr,td,th)

<table><thead><tr><th>Pipes</th></tr></thead></table>

thematicBreak

Three asterisks for a thematic break:***

element (hr)

<p>Three asterisks for a thematic break:</p><hr>

toml (frontmatter)

+++fenced =true+++

Nothing

n/a

yaml (frontmatter)

---fenced:yes---

Nothing

n/a

👉Note: GFM prescribes that the obsoletealign attribute ontd andth elements is used.To usestyle attributes instead of obsolete features, combine this utilitywith@mapbox/hast-util-table-cell-style.

🧑‍🏫Info: this project is concerned with turning one syntax tree intoanother.It does not deal with markdown syntax or HTML syntax.The preceding examples are illustrative rather than authoritative orexhaustive.

Fields on nodes

A frequent problem arises when having to turn one syntax tree into another.As the original tree (in this case, mdast for markdown) is in some caseslimited compared to the destination (in this case, hast for HTML) tree,is it possible to provide more info in the original to define what theresult will be in the destination?This is possible by defining data on mdast nodes, which this utility will readas instructions on what hast nodes to create.

An example is math, which is a nonstandard markdown extension, that this utilitydoesn’t understand.To solve this,mdast-util-math defines instructions on mdast nodes that thisplugin does understand because they define a certain hast structure.

The following fields can be used:

  • node.data.hName — define the element’s tag name
  • node.data.hProperties — define extra properties to use
  • node.data.hChildren — define hast children to use
hName

node.data.hName sets the tag name of an element.The followingmdast:

{type:'strong',data:{hName:'b'},children:[{type:'text',value:'Alpha'}]}

…yields (hast):

{type:'element',tagName:'b',properties:{},children:[{type:'text',value:'Alpha'}]}
hProperties

node.data.hProperties sets the properties of an element.The followingmdast:

{type:'image',src:'circle.svg',alt:'Big red circle on a black background',data:{hProperties:{className:['responsive']}}}

…yields (hast):

{type:'element',tagName:'img',properties:{src:'circle.svg',alt:'Big red circle on a black background',className:['responsive']},children:[]}
hChildren

node.data.hChildren sets the children of an element.The followingmdast:

{type:'code',lang:'js',data:{hChildren:[{type:'element',tagName:'span',properties:{className:['hljs-meta']},children:[{type:'text',value:'"use strict"'}]},{type:'text',value:';'}]},value:'"use strict";'}

…yields (hast):

{type:'element',tagName:'pre',properties:{},children:[{type:'element',tagName:'code',properties:{className:['language-js']},children:[{type:'element',tagName:'span',properties:{className:['hljs-meta']},children:[{type:'text',value:'"use strict"'}]},{type:'text',value:';'}]}]}

👉Note: thepre andlanguage-js class are normalmdast-util-to-hastfunctionality.

CSS

Assuming you know how to use (semantic) HTML and CSS, then it should generallybe straightforward to style the HTML produced by this plugin.With CSS, you can get creative and style the results as you please.

Some semistandard features, notably GFMs tasklists and footnotes, generate HTMLthat be unintuitive, as it matches exactly what GitHub produces for theirwebsite.There is a project,sindresorhus/github-markdown-css,that exposes the stylesheet that GitHub uses for rendered markdown, which mighteither be inspirational for more complex features, or can be used as-is toexactly match how GitHub styles rendered markdown.

The following CSS is needed to make footnotes look a bit like GitHub:

/* Style the footnotes section. */.footnotes {font-size: smaller;color:#8b949e;border-top:1px solid#30363d;}/* Hide the section label for visual users. */.sr-only {position: absolute;width:1px;height:1px;padding:0;overflow: hidden;clip:rect(0,0,0,0);word-wrap: normal;border:0;}/* Place `[` and `]` around footnote calls. */[data-footnote-ref]::before {content:'[';}[data-footnote-ref]::after {content:']';}

Syntax tree

The following interfaces are added tohast by this utility.

Nodes

Raw

interface Raw<: Literal {  type:'raw'}

Raw (Literal) represents a string if raw HTML insidehast.Raw nodes are typically ignored but are handled byhast-util-to-html andhast-util-raw.

Types

This package is fully typed withTypeScript.It exports theFootnoteBackContentTemplate,FootnoteBackLabelTemplate,Handler,Handlers,Options,Raw, andState types.

It also registers theRaw node type with@types/hast.If you’re working with the syntax tree (and you passallowDangerousHtml: true), make sure to import this utility somewhere in yourtypes, as that registers the new node type in the tree.

/** *@typedef {import('mdast-util-to-hast')} */import{visit}from'unist-util-visit'/**@type {import('hast').Root} */consttree={/* … */}visit(tree,function(node){// `node` can now be `raw`.})

Finally, it also registers thehChildren,hName, andhProperties fieldsonData of@types/mdast.If you’re working with the syntax tree, make sure to import this utilitysomewhere in your types, as that registers the data fields in the tree.

/** *@typedef {import('mdast-util-to-hast')} */import{visit}from'unist-util-visit'/**@type {import('hast').Root} */consttree={/* … */}console.log(tree.data?.hName)// Types as `string | undefined`.

Compatibility

Projects maintained by the unified collective are compatible with maintainedversions of Node.js.

When we cut a new major release, we drop support for unmaintained versions ofNode.This means we try to keep the current release line,mdast-util-to-hast@^13,compatible with Node.js 16.

Security

Use ofmdast-util-to-hast can open you up to across-site scripting (XSS) attack.Embedded hast properties (hName,hProperties,hChildren), custom handlers,and theallowDangerousHtml option all provide openings.

The following example shows how a script is injected where a benign code blockis expected with embedded hast properties:

constcode={type:'code',value:'alert(1)'}code.data={hName:'script'}

Yields:

<script>alert(1)</script>

The following example shows how an image is changed to fail loading andtherefore run code in a browser.

constimage={type:'image',url:'existing.png'}image.data={hProperties:{src:'missing',onError:'alert(2)'}}

Yields:

<imgsrc="missing"onerror="alert(2)">

The following example shows the default handling of embedded HTML:

#Hello<script>alert(3)</script>

Yields:

<h1>Hello</h1>

PassingallowDangerousHtml: true tomdast-util-to-hast is typically stillnot enough to run unsafe code:

<h1>Hello</h1>&#x3C;script>alert(3)&#x3C;/script>

IfallowDangerousHtml: true is also given tohast-util-to-html (orrehype-stringify), the unsafe code runs:

<h1>Hello</h1><script>alert(3)</script>

Usehast-util-sanitize to make the hast tree safe.

Related

Contribute

Seecontributing.md insyntax-tree/.github for ways to getstarted.Seesupport.md for ways to get help.

This project has acode of conduct.By interacting with this repository, organization, or community you agree toabide by its terms.

License

MIT ©Titus Wormer


[8]ページ先頭

©2009-2025 Movatter.jp