Managing research projects and data analyses can be challenging whendealing with:
Theorg package solves these problems by providing astandardized framework for organizing R projects with clear separationof concerns and consistent structure across all your analyses.
Here’s how to get started with your firstorgproject:
library(org)# 1. Initialize your project structureorg::initialize_project(env = .GlobalEnv,home ="my_analysis",results ="my_results")# 2. Access project pathsorg::project$home# Your code locationorg::project$results_today# Today's results folder# 3. Use org functions in your analysisorg::path("data","file.csv")# Cross-platform pathsorg::ls_files("R")# List R filesThe concept behindorg is straightforward - mostanalyses have three main sections:
Each section has unique requirements:
org::initialize_projectThis is the main function that sets up your project structure. Ittakes 2+ arguments and saves folder locations inorg::project for use throughout your analysis:
home: Location ofRun.R and theR/ folder (accessible viaorg::project$home)results: Results folder that creates date-basedsubfolders (accessible viaorg::project$results_today)...: Additional folders as needed (e.g.,data_raw,data_clean)Run.RThis is your main analysis script that orchestrates the entireworkflow:
All code sections should be encapsulated in functions in theR/ folder. You should not have multiple main files, as thiscreates confusion when returning to your code later. However, you canhave versioned files (e.g.,Run_v01.R,Run_v02.R) where later versions supersede earlier ones.
R/ directoryAll analysis functions should be defined inorg::project$home/R. Theinitialize_projectfunction automatically sources all R scripts in this directory.
Here’s a complete example of how to structure your project:
# Initialize the projectorg::initialize_project(env = .GlobalEnv,home ="/git/analyses/2019/analysis3/",results ="/dropbox/analyses_results/2019/analysis3/",data_raw ="/data/analyses/2019/analysis3/")# Document changes in archived resultstxt<- glue::glue(" 2019-01-01: Included: - Table 1 - Table 2 2019-02-02: Changed Table 1 from mean -> median",.trim=FALSE)org::write_text(txt = txt,file = fs::path(org::project$results,"info.txt"))# Load required packageslibrary(data.table)library(ggplot2)# Run analysisd<-clean_data()# Accesses data from org::project$data_rawtable_1(d)# Saves to org::project$results_todayfigure_1(d)# Saves to org::project$results_todayfigure_2(d)# Saves to org::project$results_todayWhen writing research articles, you often need multiple versions(initial submission, resubmissions).org helps manage thisby using date-based versioning:
Run.R toRun_YYYY_MM_DD_submission_1.RR/ toR_YYYY_MM_DD_submission_1/This preserves the code that produced results for each submission,ensuring all changes are deliberate and intentional.
When working with team members who have different folder structures,you can specify multiple possible paths. Theorg packagewill automatically select the first path that exists:
# Team member setup - org will use the first existing pathorg::initialize_project(env = .GlobalEnv,home =c("/Users/teammate1/projects/analysis3/",# Mac user"/home/teammate2/analysis3/",# Linux user"C:/Users/teammate3/analysis3/"# Windows user ),results =c("/Users/teammate1/Dropbox/results/","/home/teammate2/dropbox/results/","C:/Users/teammate3/Dropbox/results/" ),data_raw =c("/Users/teammate1/data/analysis3/","/home/teammate2/data/analysis3/","C:/shared_drive/data/analysis3/" ))This approach allows the same initialization code to work acrossdifferent team members’ machines without modification.
Store your project components in appropriate locations:
# Code (GitHub)git/└── analyses/ ├── 2018/ │ ├── analysis_1/ # org::project$home │ │ ├── Run.R │ │ └── R/ │ │ ├── clean_data.R │ │ ├── descriptives.R │ │ ├── analysis.R │ │ └── figure_1.R │ └── analysis_2/ └── 2019/ └── analysis_3/# Results (Dropbox)dropbox/└── analyses_results/ ├── 2018/ │ ├── analysis_1/ # org::project$results │ │ ├── 2018-03-12/ # org::project$results_today │ │ │ ├── table_1.xlsx │ │ │ └── figure_1.png │ │ ├── 2018-03-15/ │ │ └── 2018-03-18/ │ └── analysis_2/ └── 2019/ └── analysis_3/# Data (Local)data/└── analyses/ ├── 2018/ │ ├── analysis_1/ # org::project$data_raw │ │ └── data.xlsx │ └── analysis_2/ └── 2019/ └── analysis_3/For projects on a shared network drive without GitHub/Dropbox:
project_name/ # org::project$home├── Run.R├── R/│ ├── CleanData.R│ ├── Descriptives.R│ ├── Analysis1.R│ └── Graphs1.R├── paper/│ └── paper.Rmd├── results/ # org::project$results│ └── 2018-03-12/ # org::project$results_today│ ├── table1.xlsx│ └── figure1.png└── data_raw/ # org::project$data_raw └── data.xlsxFor projects with limited access:
project_name/ # org::project$home├── Run.R├── R/│ ├── clean_data.R│ ├── descriptives.R│ ├── analysis.R│ └── figure_1.R├── results/ # org::project$results│ └── 2018-03-12/ # org::project$results_today│ ├── table_1.xlsx│ └── figure_1.png└── data_raw/ # org::project$data_raw └── data.xlsxUnderstanding path components is important:
| Component | Name |
|---|---|
| /home/richard/test.src | Absolute (file)path |
| richard/test.src | Relative (file)path |
| /home/richard/ | Absolute (directory) path |
| ./richard/ | Relative (directory) path |
| richard | Directory |
| test.src | Filename |
A path specifies a location in a directory structure, while afilename only includes the file name itself. Directories only includedirectory name information.
Theorg package provides several key functions forproject management:
org::initialize_project(): Set upproject structure and source R filesorg::set_results(): Modify resultsfolder after project initializationorg::project: Environment containingall project folder locationsorg::path(): Construct cross-platformfile pathsorg::ls_files(): List files withoptional pattern matchingorg::move_directory(): Movedirectories safelyorg::write_text(): Write text fileswith consistent formattingorg::package_installed(): Check ifpackages are installedorg::create_project_quarto_internal_results():Create Quarto projects with internal resultsorg::create_project_quarto_external_results():Create Quarto projects with external resultsRecommendation: Always use.GlobalEnv -it makes life so much easier! All your functions will be directlyaccessible without having to worry about environment scoping issues.
Theorg::path() function ensures your code works acrossdifferent operating systems:
# Cross-platform path constructiondata_file<- org::path(org::project$data_raw,"survey_data.csv")output_file<- org::path(org::project$results_today,"analysis_results.xlsx")# Handles multiple path componentsnested_path<- org::path("folder1","subfolder","file.txt")# Removes double slashes automaticallyclean_path<- org::path("folder//","//file.txt")# Returns "folder/file.txt"org::path() for cross-platformcompatibilityhelp(package = "org")?org::initialize_project