Movatterモバイル変換


[0]ホーム

URL:


Getting Started

parsermd

The goal of parsermd is to extract the content of an R Markdown file to allow for programmatic interactions with the document’s contents (i.e. code chunks and markdown text). The goal is to capture the fundamental structure of the document and as such we do not attempt to parse every detail of the Rmd. Specifically, the yaml front matter, markdown text, and R code are read as text lines allowing them to be processed using other tools.

The package supports both traditional chunk options (specified in the chunk header) and YAML-style chunk options (specified as special comments within chunks). When both formats are used for the same option, YAML options take precedence.

Installation

parsermd can be installed from CRAN with:

install.packages("parsermd")

You can install the latest development version ofparsermd fromGitHub with:

remotes::install_github("rundel/parsermd")
library(parsermd)

Parsing Rmds

This is a basic example which shows you the basic abstract syntax tree (AST) that results from parsing a simple Rmd file,

rmd= parsermd::parse_rmd(system.file("examples/minimal.Rmd",package ="parsermd"))

The R Markdown document is parsed and stored in a flat, ordered list object containing tagged elements. By default the package will present a hierarchical view of the document where chunks and markdown text are nested within headings, which is shown by the default print method forrmd_ast objects.

print(rmd)#> ├── YAML [4 fields]#> ├── Heading [h1] - Setup#> │   └── Chunk [r, 1 line] - setup#> └── Heading [h1] - Content#>     ├── Heading [h2] - R Markdown#>     │   ├── Markdown [5 lines]#>     │   ├── Chunk [r, 1 line] - cars#>     │   └── Chunk [r, 1 line] - unnamed-chunk-1#>     └── Heading [h2] - Including Plots#>         ├── Markdown [1 line]#>         ├── Chunk [r, 1 line] - pressure#>         └── Markdown [2 lines]

If you would prefer to see the underlying flat structure, this can be printed by settingflat = TRUE withprint.

print(rmd,flat =TRUE)#> ├── YAML [4 fields]#> ├── Heading [h1] - Setup#> ├── Chunk [r, 1 line] - setup#> ├── Heading [h1] - Content#> ├── Heading [h2] - R Markdown#> ├── Markdown [5 lines]#> ├── Chunk [r, 1 line] - cars#> ├── Chunk [r, 1 line] - unnamed-chunk-1#> ├── Heading [h2] - Including Plots#> ├── Markdown [1 line]#> ├── Chunk [r, 1 line] - pressure#> └── Markdown [2 lines]

Additionally, to ease the manipulation of the AST the package supports the transformation of the object into a tidy tibble withas_tibble oras.data.frame (both return a tibble).

as_tibble(rmd)#> # A tibble: 12 × 5#>    sec_h1  sec_h2          type         label           ast#>    <chr>   <chr>           <chr>        <chr>           <list>#>  1 <NA>    <NA>            rmd_yaml     <NA>            <yaml>#>  2 Setup   <NA>            rmd_heading  <NA>            <heading [h1]>#>  3 Setup   <NA>            rmd_chunk    setup           <chunk [r]>#>  4 Content <NA>            rmd_heading  <NA>            <heading [h1]>#>  5 Content R Markdown      rmd_heading  <NA>            <heading [h2]>#>  6 Content R Markdown      rmd_markdown <NA>            <markdown>#>  7 Content R Markdown      rmd_chunk    cars            <chunk [r]>#>  8 Content R Markdown      rmd_chunk    unnamed-chunk-1 <chunk [r]>#>  9 Content Including Plots rmd_heading  <NA>            <heading [h2]>#> 10 Content Including Plots rmd_markdown <NA>            <markdown>#> 11 Content Including Plots rmd_chunk    pressure        <chunk [r]>#> 12 Content Including Plots rmd_markdown <NA>            <markdown>

and it is possible to convert from these data frames back into anrmd_ast.

as_ast(as_tibble(rmd) )#> ├── YAML [4 fields]#> ├── Heading [h1] - Setup#> │   └── Chunk [r, 1 line] - setup#> └── Heading [h1] - Content#>     ├── Heading [h2] - R Markdown#>     │   ├── Markdown [5 lines]#>     │   ├── Chunk [r, 1 line] - cars#>     │   └── Chunk [r, 1 line] - unnamed-chunk-1#>     └── Heading [h2] - Including Plots#>         ├── Markdown [1 line]#>         ├── Chunk [r, 1 line] - pressure#>         └── Markdown [2 lines]

Finally, we can also convert thermd_ast back into an R Markdown document viaas_document

cat(as_document(rmd),sep ="\n")
#> ---#> title: Minimal#> author: Colin Rundel#> date: 7/21/2020#> output: html_document#> ---#> #> # Setup#> #> ```{r setup}#> #| include: false#> knitr::opts_chunk$set(echo = TRUE)#> ```#> #> # Content#> #> ## R Markdown#> #> This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, #> PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.#> #> When you click the **Knit** button a document will be generated that includes both content as well #> as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:#> #> #> ```{r cars}#> summary(cars)#> ```#> #> ```{r unnamed-chunk-1}#> knitr::knit_patterns$get()#> ```#> #> ## Including Plots#> #> You can also embed plots, for example:#> #> #> ```{r pressure}#> #| echo: false#> plot(pressure)#> ```#> #> Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code #> that generated the plot.

Working with the AST

Once we have parsed an R Markdown document, there are a variety of things that we can do with our new abstract syntax tree (ast). Below we will demonstrate some of the basic functionality withinparsermd to manipulate and edit these objects as well as check their properties.

rmd=parse_rmd(system.file("examples/hw01-student.Rmd",package="parsermd"))rmd#> ├── YAML [2 fields]#> ├── Heading [h3] - Load packages#> │   └── Chunk [r, 2 lines] - load-packages#> ├── Heading [h3] - Exercise 1#> │   ├── Markdown [1 line]#> │   └── Heading [h4] - Solution#> │       └── Markdown [4 lines]#> ├── Heading [h3] - Exercise 2#> │   ├── Markdown [1 line]#> │   └── Heading [h4] - Solution#> │       ├── Markdown [1 line]#> │       ├── Chunk [r, 5 lines] - plot-dino#> │       ├── Markdown [1 line]#> │       └── Chunk [r, 2 lines] - cor-dino#> └── Heading [h3] - Exercise 3#>     ├── Markdown [1 line]#>     └── Heading [h4] - Solution#>         ├── Chunk [r, 5 lines] - plot-star#>         └── Chunk [r, 2 lines] - cor-star

Say we were interested in examining the solution a student entered for Exercise 1 - we can get access to this using thermd_select function and its selection helper functions, specifically theby_section helper.

rmd_select(rmd,by_section(c("Exercise 1","Solution") ))#> ├── YAML [2 fields]#> └── Heading [h3] - Exercise 1#>     └── Heading [h4] - Solution#>         └── Markdown [4 lines]

To view the content instead of the AST we can use theas_document() function,

rmd_select(rmd,by_section(c("Exercise 1","Solution") ))|>as_document()#>  [1] "---"#>  [2] "title: Lab 01 - Hello R"#>  [3] "output: html_document"#>  [4] "---"#>  [5] ""#>  [6] "### Exercise 1"#>  [7] ""#>  [8] "#### Solution"#>  [9] ""#> [10] "2 columns, 13 rows, 3 variables: "#> [11] "dataset: indicates which dataset the data are from "#> [12] "x: x-values "#> [13] "y: y-values "#> [14] ""#> [15] ""

Note that this gives us theExercise 1 andSolution headings and the contained markdown text, if we only wanted the markdown text then we can refine our selector to only include nodes with the typermd_markdown via thehas_type helper.

rmd_select(rmd,by_section(c("Exercise 1","Solution"))&has_type("rmd_markdown"))|>as_document()#>  [1] "---"#>  [2] "title: Lab 01 - Hello R"#>  [3] "output: html_document"#>  [4] "---"#>  [5] ""#>  [6] "2 columns, 13 rows, 3 variables: "#>  [7] "dataset: indicates which dataset the data are from "#>  [8] "x: x-values "#>  [9] "y: y-values "#> [10] ""#> [11] ""

This approach uses the tidyselect& operator within the selection to find the intersection of the selectorsby_section(c("Exercise 1", "Solution")) andhas_type("rmd_markdown"). Alternative the same result can be achieved by chaining multiplermd_selects together,

rmd_select(rmd,by_section(c("Exercise 1","Solution")))|>rmd_select(has_type("rmd_markdown"))|>as_document()#>  [1] "---"#>  [2] "title: Lab 01 - Hello R"#>  [3] "output: html_document"#>  [4] "---"#>  [5] ""#>  [6] "2 columns, 13 rows, 3 variables: "#>  [7] "dataset: indicates which dataset the data are from "#>  [8] "x: x-values "#>  [9] "y: y-values "#> [10] ""#> [11] ""

Wildcards

One useful feature of theby_section() andhas_label() selection helpers is that they supportglob style pattern matching. As such we can do the following to extract all of the solutions from our document:

rmd_select(rmd,by_section(c("Exercise *","Solution")))#> ├── YAML [2 fields]#> ├── Heading [h3] - Exercise 1#> │   └── Heading [h4] - Solution#> │       └── Markdown [4 lines]#> ├── Heading [h3] - Exercise 2#> │   └── Heading [h4] - Solution#> │       ├── Markdown [1 line]#> │       ├── Chunk [r, 5 lines] - plot-dino#> │       ├── Markdown [1 line]#> │       └── Chunk [r, 2 lines] - cor-dino#> └── Heading [h3] - Exercise 3#>     └── Heading [h4] - Solution#>         ├── Chunk [r, 5 lines] - plot-star#>         └── Chunk [r, 2 lines] - cor-star

Similarly, if we wanted to just extract the chunks that involve plotting we can match for chunk labels with a “plot” prefix,

rmd_select(rmd,has_label("plot*"))#> ├── YAML [2 fields]#> ├── Chunk [r, 5 lines] - plot-dino#> └── Chunk [r, 5 lines] - plot-star

[8]ページ先頭

©2009-2025 Movatter.jp