Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rrtools: Tools for Writing Reproducible Research in R

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

benmarwick/rrtools

Repository files navigation

R-CMD-checkLaunch Rstudio Binder

Motivation

The goal ofrrtools is to provide instructions, templates, and functions for making a basic compendium suitable for writing a reproducible journal article or report withR. This package documents the key steps and provides convenient functions for quickly creating a new research compendium. The approach is based on Marwick et al.'s(2017,Marwick et al. 2018) work using the R package structure as the basis for a research compendium.

rrtools provides a template for doing scholarly writing in a literate programming environment usingQuarto, an open-source scientific and technical publishing system. It also allows for isolation of your computational environment usingDocker, package versioning usingrenv, and continuous integration usingGitHub Actions. It makes a convenient starting point for writing a journal article or report.

The functions in rrtools allow you to use R to easily follow the best practices outlined in several major scholarly publications on reproducible research. In addition to those cited above,Wilson et al. (2017),Piccolo & Frampton (2016),Stodden & Miguez (2014) andrOpenSci (2017) are important sources that have influenced our approach to this package.

Installation

To explore and test rrtools without installing anything, click theBinder badge above to start RStudio in a browser tab that includes the contents of this GitHub repository. In that environment you can browse the files, install rrtools, and make a test compendium without altering anything on your computer.

You can install rrtools from GitHub with these lines of R code (Windows users are recommended to install a separate program,Rtools, before proceeding with this step):

if (!require("devtools")) install.packages("devtools")devtools::install_github("benmarwick/rrtools")

How to use

To create a reproducible research compendium step-by-step using the rrtools approach, follow these detailed instructions. We useRStudio, and recommend it, but is not required for these steps to work. We recommend copy-pasting these directly into your console, and editing the options before running. We don’t recommend saving these lines in a script in your project: they are meant to be once-off setup functions.

0. Create a Git-managed directory linked to an online repository

  • It is possible to use rrtools without Git, but usually we want our research compendium to be managed by the version control softwareGit. The free online bookHappy Git With R has details on how to do this. In brief, there are two methods to get started:
    • New project on GitHub first, then download to RStudio: Start on Github, Gitlab, or a similar web service, and create an empty repository calledpkgname (you should use a different name, please follow the rules below) on that service. Thenclone that repository to have a local empty directory on your computer, calledpkgname, that is linked to this remote repository. Please see ourwiki for a step-by-step walk-though of this method, illustrated with screenshots.
    • New project in RStudio first, then connect to GitHub/GitLab: An alternative approach is to create a local, empty, directory calledpkgname on your computer (e.g. in your Desktop or Downloads folder), and initialize it with Git (git init), then create a GitHub/GitLab repository and connect your local project to the remote repository.
  • Whichever of those two methods that you choose, you continue bystaging, commiting and pushing every future change in the repository with Git.
  • Yourpkgname must follow some rules for everything to work, it must:
    • … contain only ASCII letters, numbers, and ‘.’
    • … have at least two characters
    • … start with a letter (not a number)
    • … not end with ‘.’

1.rrtools::use_compendium("pkgname")

  • if you started with a new project on GitHub first, runrrtools::use_compendium(), if you started with a new project in RStudio first, runrrtools::use_compendium("pkgname")
  • this usesusethis::create_package() to create a basic R package in thepkgname directory, and then, if you’re using RStudio, opens the project. If you’re not using RStudio, it sets the working directory to thepkgname directory.
  • we need to:
    • edit theDESCRIPTION file (located in yourpkgname directory) to include accurate metadata, e.g. yourORCID and email address
    • periodically update theImports: section of theDESCRIPTION file with the names of packages used in the code we write in the qmd document(s) by runningrrtools::add_dependencies_to_description()

2.usethis::use_mit_license(copyright_holder = "My Name")

  • this adds a reference to the MIT license in theDESCRIPTION file and generates aLICENSE file listing the name provided as the copyright holder
  • to use a different license, replace this line with any of the licenses mentioned here:?usethis::use_mit_license()

3.rrtools::use_readme_qmd()

  • this generatesREADME.qmd and renders it toREADME.md, ready to display on GitHub. It contains:
    • a template citation to show others how to cite your project. Edit this to include the correct title andDOI.
    • license information for the text, figures, code and data in your compendium
  • this also adds two other markdown files: a code of conduct for usersCONDUCT.md, and basic instructions for people who want to contribute to your projectCONTRIBUTING.md, including for first-timers to git and GitHub.
  • this adds a.binder/Dockerfile that makesBinder work, if your compendium is hosted online. Currently configured for GitHub, but easily adapted for elsewhere (e.g. Zenodo, Figshare, Dataverse, etc.)
  • render this document after each change to refreshREADME.md, which is the file that GitHub displays on the repository home page

4.rrtools::use_analysis()

  • this function has threelocation = options:top_level to create a top-levelanalysis/ directory,inst to create aninst/ directory (so that all the sub-directories are available after the package is installed), andvignettes to create avignettes/ directory (and automatically update theDESCRIPTION). The default is a top-levelanalysis/.
  • for each option, the contents of the sub-directories are the same, with the following (using the defaultanalysis/ for example):
analysis/|├── paper/│   ├── paper.qmd       # this is the main document to edit│   └── references.bib  # this contains the reference list information├── figures/            # location of the figures produced by the qmd|├── data/│   ├── raw_data/       # data obtained from elsewhere│   └── derived_data/   # data generated during the analysis|└── templates    ├── journal-of-archaeological-science.csl    |                   # this sets the style of citations & reference list    ├── template.docx   # used to style the output of the paper.qmd    └── template.Rmd
  • thepaper.qmd is ready to write in and render with Quarto. It includes:
    • a YAML header that identifies thereferences.bib file and the suppliedcsl file (to style the reference list)
    • a colophon that adds some git commit details to the end of the document. This means that the output file (HTML/PDF/Word) is always traceable to a specific state of the code.
  • thereferences.bib file has just one item to demonstrate the format. It is ready to insert more reference details.
  • you can replace the suppliedcsl file with a different citation style fromhttps://github.com/citation-style-language/
  • we recommend using theRStudio 2022.07 or higher to efficiently insert citations from yourZotero library while writing in an qmd file (seehere for detailed setup and use information to connect your RStudio to your Zotero)
  • remember that theImports: field in theDESCRIPTION file must include the names of all packages used in analysis documents (e.g. paper.qmd). We have a helper functionrrtools::add_dependencies_to_description() that will scan the qmd file, identify libraries used in there, and add them to theDESCRIPTION file.
  • this function has andata_in_git = argument, which isTRUE by default. If set toFALSE you will exclude files in thedata/ directory from being tracked by git and prevent them from appearing on GitHub. You should setdata_in_git = FALSE if your data files are large (>100 mb is the limit for GitHub) or you do not want to make the data files publicly accessible on GitHub.
    • To load your custom code in thepaper.qmd, you have a few options. You can write all your R code in chunks in the qmd, that’s the simplest method. Or you can write R code in script files in/R, and includedevtools::load_all(".") at the top of yourpaper.qmd. Or you can write functions in/R and uselibrary(pkgname) at the top of yourpaper.qmd, or omitlibrary and preface each function call withpkgname::. Up to you to choose whatever seems most natural to you.

5.rrtools::use_dockerfile()

  • this creates a basic Dockerfile usingrocker/verse as the base image
  • this also creates creates a minimal.yml configuration file to activate continuous integration using GitHub Actions. This will attempt to render your qmd document, in a Docker container specified by your Dockerfile, each time you push to GitHub. You can view the results of each attempt at the 'actions' page for your compendium on github.com, e.g.https://github.com/benmarwick/rrtools/actions
  • the version of R in your rocker container will match the version used when you run this function (e.g.,rocker/verse:3.5.0)
  • rocker/verse includes R, thetidyverse, RStudio, pandoc and LaTeX, so compendium build times are very fast
  • we need to:
    • edit the Dockerfile to add linux dependencies (for R packages that require additional libraries outside of R). You can find out what these are by browsing theDESCRIPTION files of the other packages you’re using, and looking in the SystemRequirements field for each package. If you are getting build errors on GitHub Actions, check the logs. Often, the error messages will include the names of missing libraries.
    • modify which qmd files are rendered when the container is made
    • have a public GitHub repo to use the Dockerfile that this function generates. It is possible to keep the repository private and run a local Docker container with minor modifications to the Dockerfile that this function generates.

6.renv::init()

  • this initates tracking of the packages you use in your project usingrenv. renv will discover the R packages used in your project, and install those packages into a private project library
  • We can userenv::snapshot() to save the state of our project library from time to time, or at the end when we are ready to share. The project state will be saved into a file called renv.lock.
  • Our collaborators can runrenv::restore() to install exactly those packages into their own library.
  • Don't skip this step because our Binder and Dockerfile use the renv.lock file to install the packages they need to run your code. So renv is an important component of making a compendium reproducible.

You should be able to follow these steps to get a new research compendium repository ready to write in just a few minutes.

References and related reading

Kitzes, J., Turek, D., & Deniz, F. (Eds.). (2017).The Practice ofReproducible Research: Case Studies and Lessons from the Data-IntensiveSciences. Oakland, CA: University of California Press.

Marwick, B. (2017). Computational reproducibility in archaeologicalresearch: Basic principles and a case study of their implementation.Journal of Archaeological Method and Theory, 24(2), 424-450.https://doi.org/10.1007/s10816-015-9272-9

Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging dataanalytical work reproducibly using R (and friends).The American Statistician 72(1), 80-88.https://doi.org/10.1080/00031305.2017.1375986

Piccolo, S. R. and M. B. Frampton (2016). “Tools and techniques forcomputational reproducibility.” GigaScience 5(1): 30.https://gigascience.biomedcentral.com/articles/10.1186/s13742-016-0135-4

rOpenSci community (2017b). rrrpkg: Use of an R package to facilitatereproducible research. Online athttps://github.com/ropensci/rrrpkg

Schmidt, S.C. and Marwick, B., 2020. Tool-Driven Revolutions in Archaeological Science.Journal of Computer Applications in Archaeology, 3(1), pp.18–32. DOI:http://doi.org/10.5334/jcaa.29

Stodden, V. & Miguez, S., (2014). Best Practices for ComputationalScience: Software Infrastructure and Environments for Reproducible andExtensible Research. Journal of Open Research Software. 2(1), p.e21.DOI:http://doi.org/10.5334/jors.ay

Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, et al. (2017).Good enough practices in scientific computing.PLOS ComputationalBiology 13(6): e1005510.https://doi.org/10.1371/journal.pcbi.1005510

Contributing

If you would like to contribute to this project, please start by reading uurGuide to Contributing. Please note that this project is released with aContributor Code of Conduct. By participating in this project you agree to abide by its terms.

Acknowledgements

This project was developed during the 2017 Summer School on Reproducible Research in Landscape Archaeology at the Freie Universität Berlin (17-21 July), funded and jointly organized byExc264 Topoi,CRC1266, andISAAKiel. Special thanks toSophie C. Schmidt for help. The convenience functions in this package are inspired by similar functions in theusethis package.

Releases

No releases published

Packages

No packages published

Contributors16


[8]ページ先頭

©2009-2026 Movatter.jp