pragdave/earmarkPublic

NotificationsYou must be signed in to change notification settings
Fork148
Star896

Markdown parser for Elixir

License

View license

896 stars 148 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,121 Commits
.github/workflows		.github/workflows
bench		bench
config		config
dev		dev
lib		lib
lib1/earmark		lib1/earmark
src		src
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
DEVNOTES.md		DEVNOTES.md
LICENSE		LICENSE
README.md		README.md
README.md.eex		README.md.eex
RELEASE.md		RELEASE.md
coveralls.json		coveralls.json
mix.exs		mix.exs
mix.lock		mix.lock

Repository files navigation

Earmark—A Pure Elixir Markdown Processor

N.B.

This README contains the docstrings and doctests from the code by means ofextractlyand the following code examples are therefore verified withExUnit doctests.

Table Of Content

Options

Earmark.Cli.Implementation

Functional (with the exception of reading input files withEarmark.File) interface to the CLIreturning the device and the string to be output.

Earmark.Options

This is a superset of the options that need to be passed intoEarmark.Parser.as_ast/2

The following options are proper toEarmark only and therefore explained in detail

compact_output: boolean indicating to avoid indentation and minimize whitespace
eex: Allows usage of anEEx template to be expanded to markdown before conversion
file: Name of file passed in from the CLI
line: 1 but might be set to an offset for better error messages in some integration cases
smartypants: boolean useSmarty Pants in the output
ignore_strings,postprocessor andregistered_processors: processors that modify the AST returned from
Earmark.Parser.as_ast/2 before rendering (post because preprocessing is done on the markdown, e.g.eex)Refer to the moduledoc of Earmark.Transform for details

All other options are passed onto Earmark.Parser.as_ast/2

Earmark.Options.make_options/1

Make a legal and normalized Option struct from, maps or keyword lists

Without a param or an empty input we just get a new Option struct

iex(1)> { make_options(), make_options(%{}) }{ {:ok, %Earmark.Options{}}, {:ok, %Earmark.Options{}} }

The same holds for the bang version of course

iex(2)> { make_options!(), make_options!(%{}) }{ %Earmark.Options{}, %Earmark.Options{} }

We check for unallowed keys

iex(3)> make_options(no_such_option: true){:error, [{:warning, 0, "Unrecognized option no_such_option: true"}]}

Of course we do not let our users discover one error after another

iex(4)> make_options(no_such_option: true, gfm: false, still_not_an_option: 42){:error, [{:warning, 0, "Unrecognized option no_such_option: true"}, {:warning, 0, "Unrecognized option still_not_an_option: 42"}]}

And the bang version will raise anEarmark.Error as excepted (sic)

iex(5)> make_options!(no_such_option: true, gfm: false, still_not_an_option: 42)** (Earmark.Error) [{:warning, 0, "Unrecognized option no_such_option: true"}, {:warning, 0, "Unrecognized option still_not_an_option: 42"}]

Some values need to be numeric

iex(6)> make_options(line: "42"){:error, [{:error, 0, "line option must be numeric"}]}

iex(7)> make_options(%Earmark.Options{footnote_offset: "42"}){:error, [{:error, 0, "footnote_offset option must be numeric"}]}

iex(8)> make_options(%{line: "42", footnote_offset: nil}){:error, [{:error, 0, "footnote_offset option must be numeric"}, {:error, 0, "line option must be numeric"}]}

Earmark.Options.relative_filename/2

Allows to compute the path of a relative file name (starting with"./") from the file in optionsand return an updated options struct

iex(9)> options = %Earmark.Options{file: "some/path/xxx.md"}...(9)> options_ = relative_filename(options, "./local.md")...(9)> options_.file"some/path/local.md"

For your convenience you can just use a keyword list

iex(10)> options = relative_filename([file: "some/path/_.md", breaks: true], "./local.md")...(10)> {options.file, options.breaks}{"some/path/local.md", true}

If the filename is not absolute it just replaces the file in options

iex(11)> options = %Earmark.Options{file: "some/path/xxx.md"}...(11)> options_ = relative_filename(options, "local.md")...(11)> options_.file"local.md"

And there is a special case when processing stdin, meaning thatfile: nil we replace fileverbatim in that case

iex(12)> options = %Earmark.Options{}...(12)> options_ = relative_filename(options, "./local.md")...(12)> options_.file"./local.md"

Earmark.Options.with_postprocessor/2

A convenience constructor

Earmark.Internal

All public functions that are internal to Earmark, so thatonly external APIfunctions are public inEarmark

Earmark.Internal.as_ast!/2

A wrapper to extract the AST from a call toEarmark.Parser.as_ast if a tuple{:ok, result, []} is returned,raise errors otherwise

iex(1)>as_ast!(["Hello %% annotated"],annotations:"%%")[{"p",[],["Hello "],%{annotation:"%% annotated"}}]

iex(2)>as_ast!("===")**(Earmark.Error)[{:warning,1,"Unexpected line ==="}]

Earmark.Internal.from_file!/2

This is a convenience method to read a file or pass it toEEx.eval_file if its nameends in.eex

The returned string is then passed toas_html this is used in the escript now and allowsfor a simple inclusion mechanism, as a matter of fact aninclude function is passed

Earmark.Internal.include/2

A utility function that will be passed as a partial capture toEEx.eval_file byproviding a value for theoptions parameter

EEx.eval(...,include:&include(&1,options))

thusly allowing

<%=include.(somefile) %>

wheresome file can be a relative path starting with"./"

Here is an example usingthese fixtures

iex(3)>include("./include/basic.md.eex",file:"test/fixtures/does_not_matter")"# Headline Level 1\n"

And here is how it is used inside a template

iex(4)>options=[file:"test/fixtures/does_not_matter"]...(4)>EEx.eval_string(~s{<%= include.("./include/basic.md.eex") %>},include:&include(&1,options))"# Headline Level 1\n"

Earmark.Transform

Structure Conserving Transformers

For the convenience of processing the output ofEarmark.Parser.as_ast we expose two structure conservingmappers.

`map_ast`

Traverses an AST using a mapper function.

The mapper function will be called for each node including text elements unlessmap_ast is called withthe third positional parameterignore_strings, which is optional and defaults tofalse, set totrue.

Depending on the return value of the mapper function the traversal will either

{new_tag, new_atts, ignored, new_meta}
just replace thetag,attribute andmeta values of the current node with the values of the returnedquadruple (ignoringignored for facilitating nodes w/o transformation)and then descend into theoriginal content of the node.
{:replace, node}
replaces the current node withnode and does not descend anymore, but continues traversal on sibblings.
{new_function, {new_tag, new_atts, ignored, new_meta}}
just replace thetag,attribute andmeta values of the current node with the values of the returnedquadruple (ignoringignored for facilitating nodes w/o transformation)and then descend into theoriginal content of the node but with the mapper functionnew_functionused for transformation of the AST.
N.B. The original mapper function will be used for transforming the sibbling nodes though.

takes a function that will be called for each node of the AST, where a leaf node is either a quadruplelike{"code", [{"class", "inline"}], ["some code"], %{}} or a text leaf like"some code"

The result of the function call must be

for nodes → as described above
for strings → strings or nodes

As an example let us transform an ast to have symbol keys

iex(1)>input=[...(1)>{"h1",[],["Hello"],%{title:true}},...(1)>{"ul",[],[{"li",[],["alpha"],%{}},{"li",[],["beta"],%{}}],%{}}]...(1)>map_ast(input,fn{t,a,_,m}->{String.to_atom(t),a,nil,m}end,true)[{:h1,[],["Hello"],%{title:true}},{:ul,[],[{:li,[],["alpha"],%{}},{:li,[],["beta"],%{}}],%{}}]

N.B. If this returning convention is not respectedmap_ast might not complain, but the resultingtransformation might not be suitable forEarmark.Transform.transform anymore. From this follows thatany function passed in as value of thepostprocessor: option must obey to these conventions.

`map_ast_with`

this is likemap_ast but like a reducer an accumulator can also be passed through.

For that reason the function is called with two arguments, the first element being the same valueas inmap_ast and the second the accumulator. The return values need to be equally augmentedtuples.

A simple example, annotating traversal order in the meta map's:count key, as we are notinterested in text nodes we use the fourth parameterignore_strings which defaults tofalse

iex(2)>input=[...(2)>{"ul",[],[{"li",[],["one"],%{}},{"li",[],["two"],%{}}],%{}},...(2)>{"p",[],["hello"],%{}}]...(2)>counter=fn{t,a,_,m},c->{{t,a,nil,Map.put(m,:count,c)},c+1}end...(2)>map_ast_with(input,0,counter,true){[{"ul",[],[{"li",[],["one"],%{count:1}},{"li",[],["two"],%{count:2}}],%{count:0}},{"p",[],["hello"],%{count:3}}],4}

Let us describe an implementation of a real world use case taken fromElixir Forum

Simplifying the exact parsing of the text node in this example we only want to replace a text node of the form#elixir witha link to the Elixir home pagebut only when inside a{"p",....} node

We can achieve this as follows

iex(3)>elixir_home={"a",[{"href","https://elixir-lang.org"}],["Elixir"],%{}}...(3)>transformer=fn{"p",atts,_,meta},_->{{"p",atts,nil,meta},true}...(3)>"#elixir",true->{elixir_home,false}...(3)>text,_whenis_binary(text)->{text,false}...(3)>node,_->{node,false}end...(3)>ast=[...(3)>{"p",[],["#elixir"],%{}},{"bold",[],["#elixir"],%{}},...(3)>{"ol",[],[{"li",[],["#elixir"],%{}},{"p",[],["elixir"],%{}},{"p",[],["#elixir"],%{}}],%{}}      ...(3)>]...(3)>map_ast_with(ast,false,transformer){[{"p",[],[{"a",[{"href","https://elixir-lang.org"}],["Elixir"],%{}}],%{}},{"bold",[],["#elixir"],%{}},{"ol",[],[{"li",[],["#elixir"],%{}},{"p",[],["elixir"],%{}},{"p",[],[{"a",[{"href","https://elixir-lang.org"}],["Elixir"],%{}}],%{}}],%{}}],false}

An alternate, maybe more elegant solution would be to change the mapper function during AST traversalas demonstratedhere

Postprocessors and Convenience Functions

These can be declared in the fieldspostprocessor andregistered_processors in theOptions struct,postprocessor is prepened toregistered_processors and they are all applied to non string nodes (thatis the quadtuples of the AST which are of the form{tag, atts, content, meta}

All postprocessors can just be functions on nodes or aTagSpecificProcessors struct which will groupfunction applications depending on tags, as a convienience tuples of the form{tag, function} will betransformed into aTagSpecificProcessors struct.

iex(4)>add_class1=&Earmark.AstTools.merge_atts_in_node(&1,class:"class1")...(4)>m1=Earmark.Options.make_options!(postprocessor:add_class1)|>make_postprocessor()...(4)>m1.({"a",[],nil,nil}){"a",[{"class","class1"}],nil,nil}

We can also use theregistered_processors field:

iex(5)>add_class1=&Earmark.AstTools.merge_atts_in_node(&1,class:"class1")...(5)>m2=Earmark.Options.make_options!(registered_processors:add_class1)|>make_postprocessor()...(5)>m2.({"a",[],nil,nil}){"a",[{"class","class1"}],nil,nil}

Knowing that values on the same attributes are added onto the front the following doctest demonstratesthe order in which the processors are executed

iex(6)>add_class1=&Earmark.AstTools.merge_atts_in_node(&1,class:"class1")...(6)>add_class2=&Earmark.AstTools.merge_atts_in_node(&1,class:"class2")...(6)>add_class3=&Earmark.AstTools.merge_atts_in_node(&1,class:"class3")...(6)>m=Earmark.Options.make_options!(postprocessor:add_class1,registered_processors:[add_class2,{"a",add_class3}])...(6)>|>make_postprocessor()...(6)>[{"a",[{"class","link"}],nil,nil},{"b",[],nil,nil}]...(6)>|>Enum.map(m)[{"a",[{"class","class3 class2 class1 link"}],nil,nil},{"b",[{"class","class2 class1"}],nil,nil}]

We can see that the tuple form has been transformed into a tag specific transformationonly as a matter of fact, the explicit definition would be:

iex(7)>m=make_postprocessor(...(7)>%Earmark.Options{...(7)>registered_processors:...(7)>[Earmark.TagSpecificProcessors.new({"a",&Earmark.AstTools.merge_atts_in_node(&1,target:"_blank")})]})...(7)>[{"a",[{"href","url"}],nil,nil},{"b",[],nil,nil}]...(7)>|>Enum.map(m)[{"a",[{"href","url"},{"target","_blank"}],nil,nil},{"b",[],nil,nil}]

We can also define a tag specific transformer in one step, which might (or might not) solve potential performance issueswhen running too many processors

iex(8)>add_class4=&Earmark.AstTools.merge_atts_in_node(&1,class:"class4")...(8)>add_class5=&Earmark.AstTools.merge_atts_in_node(&1,class:"class5")...(8)>add_class6=&Earmark.AstTools.merge_atts_in_node(&1,class:"class6")...(8)>tsp=Earmark.TagSpecificProcessors.new([{"a",add_class5},{"b",add_class5}])...(8)>m=Earmark.Options.make_options!(...(8)>postprocessor:add_class4,...(8)>registered_processors:[tsp,add_class6])...(8)>|>make_postprocessor()...(8)>[{"a",[],nil,nil},{"c",[],nil,nil},{"b",[],nil,nil}]...(8)>|>Enum.map(m)[{"a",[{"class","class6 class5 class4"}],nil,nil},{"c",[{"class","class6 class4"}],nil,nil},{"b",[{"class","class6 class5 class4"}],nil,nil}]

Of course the mechanics shown above is hidden if all we want is to trigger the postprocessor chain inEarmark.as_html, here goes a typicalexample

iex(9)>add_target=fnnode-># This will only be applied to nodes as it will become a TagSpecificProcessors...(9)>ifRegex.match?(~r{\.x\.com\z},Earmark.AstTools.find_att_in_node(node,"href","")),do:...(9)>Earmark.AstTools.merge_atts_in_node(node,target:"_blank"),else:nodeend...(9)>options=[...(9)>registered_processors:[{"a",add_target},{"p",&Earmark.AstTools.merge_atts_in_node(&1,class:"example")}]]...(9)>markdown=[...(9)>"http://hello.x.com",...(9)>"",...(9)>"[some](url)",...(9)>]...(9)>Earmark.as_html!(markdown,options)"<p class=\"example\">\n<a href=\"http://hello.x.com\" target=\"_blank\">http://hello.x.com</a></p>\n<p class=\"example\">\n<a href=\"url\">some</a></p>\n"

Use case: Modification of Link Attributes depending on the URL

This would be done as follows

Earmark.as_html!(markdown,registered_processors:{"a",my_function_that_is_invoked_only_with_a_nodes})

Use case: Modification of the AST according to Annotations

N.B. Annotation are anexperimental feature in 1.4.16-pre and are documentedhere

By annotating our markdown source we can then influence the rendering. In this example we will justadd some decoration

iex(10)>markdown=["A joke %% smile","","Charming %% in_love"]...(10)>add_smiley =fn{_,_,_,meta}=quad,_acc ->...(10)>caseMap.get(meta,:annotation)do...(10)>"%% smile"->{quad,"\u1F601"}...(10)>"%% in_love"->{quad,"\u1F60d"}...(10)>_->{quad,nil}...(10)>end...(10)>text,nil->{text,nil}...(10)>text,ann->{"#{text}#{ann}",nil}...(10)>end...(10)>Earmark.as_ast!(markdown,annotations:"%%")|>Earmark.Transform.map_ast_with(nil,add_smiley)|>Earmark.transform"<p>\nA joke  ὠ1</p>\n<p>\nCharming  ὠd</p>\n"

Structure Modifying Transformers

For structure modifications a tree traversal is needed and no clear pattern of how to assist this task withtools has emerged yet.

Earmark.Restructure.walk_and_modify_ast/4

Walks an AST and allows you to process it (storing details in acc) and/ormodify it as it is walked.

items is the AST you got from Earmark.Parser.as_ast()

acc is the initial value of an accumulator that is passed to bothprocess_item_fn and process_list_fn and accumulated. If your functionsdo not need to use or store any state, you can pass nil.

The process_item_fn function is required. It takes two parameters, thesingle item to process (which will either be a string or a 4-tuple) andthe accumulator, and returns a tuple {processed_item, updated_acc}.Returning the empty list for processed_item will remove the item processedthe AST.

The process_list_fn function is optional and defaults to no modification ofitems or accumulator. It takes two parameters, the list of items thatare the sub-items of a given element in the AST (or the top-level list ofitems), and the accumulator, and returns a tuple{processed_items_list, updated_acc}.

This function ends up returning {ast, acc}.

Here is an example using a custom format to make<em> nodes and allowingcommented text to be left out

iex(1)>is_comment?=fnitem->is_binary(item)&&Regex.match?(~r/\A\s*--/,item)end...(1)>comment_remover=...(1)>fnitems,acc->{Enum.reject(items,is_comment?),acc}end...(1)>italics_maker =fn...(1)>item,accwhenis_binary(item) ->...(1)>new_item =Restructure.split_by_regex(...(1)>item,...(1)>~r/\/([[:graph:]].*?[[:graph:]]|[[:graph:]])\//,...(1)>fn[_,content]->...(1)>{"em",[],[content],%{}}...(1)>end...(1)>)...(1)>{new_item,acc}...(1)>item,"a"->{item,nil}...(1)>{name,_,_,_}=item,_->{item,name}...(1)>end...(1)>markdown = """    ...(1)> [no italics in links](http://example.io/some/path)    ...(1)> but /here/    ...(1)>    ...(1)> -- ignore me    ...(1)>    ...(1)> text    ...(1)> """    ...(1)> {:ok, ast, []} = Earmark.Parser.as_ast(markdown)    ...(1)> Restructure.walk_and_modify_ast(ast, nil, italics_maker, comment_remover)    {[      {"p", [],        [          {"a", [{"href", "http://example.io/some/path"}], ["no italics in links"],          %{}},          "\nbut ",          {"em", [], ["here"], %{}},          ""        ], %{}},        {"p", [], [], %{}},        {"p", [], ["text"], %{}}      ], "p"}

Earmark.Restructure.split_by_regex/3

Utility for creating a restructuring that parses text by splitting it intoparts "of interest" vs. "other parts" using a regular expression.Returns a list of parts where the parts matching regex have been processedby invoking map_captures_fn on each part, and a list of remaining parts,preserving the order of parts from what it was in the plain text item.

iex(2)>input="This is ::all caps::, right?"...(2)>split_by_regex(input,~r/::(.*?)::/,fn[_,inner|_]->String.upcase(inner)end)["This is ","ALL CAPS",", right?"]

Contributing

Pull Requests are happily accepted.

Please be aware of onecaveat when correcting/improvingREADME.md.

TheREADME.md is generated byExtractly as mentioned above and therefore contributors shall not modify it directly, butREADME.md.eex and the imported docs instead.

You need to runmix xtra after getting the dependencies to generate theREADME.md file.Thank you all who have already helped with Earmark, your names are duly noted inRELEASE.md.

Author

LICENSE

Same as Elixir, which is Apache License v2.0. Please refer toLICENSE for details.

About

Markdown parser for Elixir

Movatterモバイル変換

License

pragdave/earmark

Folders and files

Latest commit

History

Repository files navigation

Earmark—A Pure Elixir Markdown Processor

Table Of Content

Options

Earmark.Cli.Implementation

Earmark.Options

Earmark.Options.make_options/1

Earmark.Options.relative_filename/2

Earmark.Options.with_postprocessor/2

Earmark.Internal

Earmark.Internal.as_ast!/2

Earmark.Internal.from_file!/2

Earmark.Internal.include/2

Earmark.Transform

Structure Conserving Transformers

map_ast

map_ast_with

Postprocessors and Convenience Functions

Use case: Modification of Link Attributes depending on the URL

Use case: Modification of the AST according to Annotations

Structure Modifying Transformers

Earmark.Restructure.walk_and_modify_ast/4

Earmark.Restructure.split_by_regex/3

Contributing

Author

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors64

Uh oh!

Languages

`map_ast`

`map_ast_with`

Packages