Welcome toleakR, an R package designed to helpresearchers, data scientists, and machine learning practitionersrigorously detect and diagnose data leakage in their workflows.
Data leakage is a pervasive yet often overlooked issue thatundermines the integrity and reproducibility of predictive models byallowing unintended information to “leak” between training and testingphases. leakR provides a modular, extensible toolkit for detecting themost common and impactful forms of leakage, starting with tabular datacontamination, target leakage, and temporal misalignments, while layingthe foundation for a universal leakage detection framework acrossdiverse data domains.
install.packages("leakr")For the latest features and bug fixes:
# Install devtools if you don't have itinstall.packages("devtools")# Install leakR from GitHubdevtools::install_github("cherylisabella/leakR")library(leakr)# Basic audit of your datasetreport<-leakr_audit(iris,target ="Species")# View summary of issues foundleakr_summarise(report)# Generate diagnostic visualizationsleakr_plot(report)# Access detailed resultsprint(report)| Function | Purpose |
|---|---|
leakr_audit() | Main auditing function - detects leakage across your dataset |
leakr_summarise() | Generate human-readable summaries of detected issues |
leakr_plot() | Create diagnostic visualizations highlighting problems |
leakr_from_caret() | Import and audit caret workflow objects |
leakr_from_tidymodels() | Import and audit tidymodels workflow objects |
leakr_from_mlr3() | Import and audit mlr3 workflow objects |
Get started with the comprehensive vignettes:
# Getting started guidevignette("getting-started",package ="leakr")# Advanced detection techniquesvignette("advanced-detection",package ="leakr")# Framework integration examplesvignette("framework-integration",package ="leakr")If you use leakR in your research, please cite:
@Manual{leakr2025, title = {leakR: Data Leakage Detection Tools for Machine Learning}, author = {Cheryl Isabella Lim}, year = {2025}, note = {R package version 0.1.0}, url = {https://github.com/cherylisabella/leakR},}This project is licensed under the MIT License - see theLICENSE file for details.
leakR is currently under development. Feedback and contributions arewelcome from the community!