- Notifications
You must be signed in to change notification settings - Fork42
data-cleaning/validate
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Thevalidate R-package makes it super-easy to check whether data lives up to expectations you have based on domain knowledge. It works by allowing you to define data validation rules independent of the code or data set. Next you can confront a dataset, or various versions thereof with the rules. Results can be summarized, plotted, and so on. Below is a simple example.
> library(validate)> check_that(iris,Sepal.Width<0.5*Sepal.Length)|> summary()ruleitemspassesfailsnNAerrorwarningexpression1V115079710FALSEFALSESepal.Width<0.5*Sepal.Length
Withvalidate, data validation rules are treated as first-class citizens.This means you can import, export, annotate, investigate and manipulate datavalidation rules in a meaninful way.
To get started: see ourdata validation cookbook.
Please cite theJSS article
@article{van2021data, title={Data validation infrastructure for R}, author={van der Loo, Mark PJ and de Jonge, Edwin}, journal={Journal of Statistical Software}, year={2021}, volume ={97}, issue = {10}, pages = {1-33}, doi={10.18637/jss.v097.i10}, url = {https://www.jstatsoft.org/article/view/v097i10}}To cite the theory, please cite ourWiley StatsRef chapter.
@article{loo2020data, title = {Data Validation}, year = {2020}, journal = {Wiley StatsRef: Statistics Reference Online}, author = {M.P.J. van der Loo and E. de Jonge}, pages = {1--7}, doi = {https://doi.org/10.1002/9781118445112.stat08255}, url = {https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat08255}}- Tutorial material from the tutorial atuRos2024 (Greece)
- Tutorial material from our tutorial at _useR!_2021
- The Data Validation Cookbook
- Slides of theuseR2016 talk (Stanford University, June 28 2016).
- Video of thesatRdays talk (Hungarian Academy of Sciences, Sept 3 2016).
- Slides and exercises from theuseR2018 tutorial.
- Materials for theuRos2018 workshop (The Hague, 2018)
- Materials for theENBES|EESW workshop (Bilbao, 2019)
- Materials for the planned workshop at theInstitute for Statistical Mathematics (Tokyo, 2020 - cancelled because of the COVID-19 situation)
The latest release can be installed from the R command-line
install.packages("validate")The development version can be installed as follows.
git clone https://github.com/data-cleaning/validatecd validatemake installNote that the development version likely contain bugs (please report them!) and interfaces that may not be stable.
About
Professional data validation for the R environment
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.