- Notifications
You must be signed in to change notification settings - Fork0
This repository contains the materials of the CDCS workshop on data wrangling and manipulation in R. We work step by step on how to take our raw survey results, and prepare it for analysis.
License
DCS-training/A-Pipeline-for-Data-Wrangling-and-Manipulation-with-R
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This workshop will provide a guide for creating a data tidying workflow. It is aimed at researchers who are looking for ways of creating systematic and reproducible data tidying workflows. It will demonstrate how to create a script that systematically works through the multiple steps needed to prepare data for analysis.
The first half of the workshop will be dedicated to uploading data, strategies for working with repeated measures, and the merging of datasets. The second half of the session will cover the basics of working with missing data, learning how to create new variables, implement strategies for cleaning data (and dealing with bots…), and finishing off with saving and exporting our processed data.
We will also demonstrate how R allows us to quickly update and re-run our data tidying/processing, saving us time and effort. Additionally, the R scripts generated will provide transparency to your work and make it simpler to retrace your analytical steps.
The classes will also dedicate a section of time to Q+A for any data tidying questions participants may have, as well as providing tips and tricks for data tidying problem solving.
This is an intermediate workshop, some previous knowledge of the R and the RStudio interface would be required to follow the content. If you want to review your familiarity with the R interface, you can look at this video. If you want to refresh the basis of working with R and RStudio you can sign up for our Introduction to Programming with R and RStudio course.
Below are the steps to do so and get set.
- Go tohttps://noteable.edina.ac.uk/login
- Login with your EASE credentials
- Select RStudio as a personal notebook server and press start
- Go to File >New Project>Version Control>Git
- Copy and Paste this repository URLhttps://github.com/DCS-training/A-Pipeline-for-Data-Wrangling-and-Manipulation-with-R as the Repository URL
- The Project directory name will filled in automatically but you can change it if you want your folder in Notable to have a different name
- Decide where to locate the folder. By default, it will locate it in your home directory
- Press Create Project
Congratulations you have now pulled the content of the repository on your Notable server space the last thing you need to do is to install the packages not already installed in Noteable.
- Open the 'Install.R' file and run the code within it
- Now you can open the 'PCA.R' file and you can follow along
- R and RStudio are separate downloads and installations. R is theunderlying statistical computing environment, but using R alone is nofun. RStudio is a graphical integrated development environment (IDE) that makesusing R much easier and more interactive. You need to install R before youinstall RStudio. After installing both programs, you will need to installsome specific R packages within RStudio. Follow the instructions below foryour operating system, and then follow the instructions to install
tidyverseandRSQLite.
- Open RStudio, and click on "Help" > "Check for updates". If a new version isavailable, quit RStudio, and download the latest version for RStudio.
- To check which version of R you are using, start RStudio and the first thingthat appears in the console indicates the version of R you arerunning. Alternatively, you can type
sessionInfo(), which will also displaywhich version of R you are running. Go ontheCRAN website and checkwhether a more recent version is available. If so, please download and installit. You cancheck here formore information on how to remove old versions from your system if you wish to do so.{: .solution}
- Download R fromtheCRAN website.
- Run the
.exefile that was just downloaded- Go to theRStudio download page
- UnderInstallers selectRStudio x.yy.zzz - Windows Vista/7/8/10 (where x, y, and z represent version numbers)
- Double click the file to install it
- Once it's installed, open RStudio to make sure it works and you don't get anyerror messages.{: .solution}
- Open RStudio, and click on "Help" > "Check for updates". If a new version isavailable, quit RStudio, and download the latest version for RStudio.
- To check the version of R you are using, start RStudio and the first thingthat appears on the terminal indicates the version of R you are running. Alternatively, you can type
sessionInfo(), which willalso display which version of R you are running. Go ontheCRAN website and checkwhether a more recent version is available. If so, please download and installit.{: .solution}
- Download R fromtheCRAN website.
- Select the
.pkgfile for the latest R version- Double click on the downloaded file to install R
- It is also a good idea to installXQuartz (neededby some packages)
- Go to theRStudio download page
- UnderInstallers selectRStudio x.yy.zzz - Mac OS X 10.6+ (64-bit)(where x, y, and z represent version numbers)
- Double click the file to install RStudio
- Once it's installed, open RStudio to make sure it works and you don't get anyerror messages.{: .solution}
- Follow the instructions for your distributionfromCRAN, they provide informationto get the most recent version of R for common distributions. For mostdistributions, you could use your package manager (e.g., for Debian/Ubuntu run
sudo apt-get install r-base, and for Fedorasudo yum install R), but wedon't recommend this approach as the versions provided by this areusually out of date. In any case, make sure you have at least R 3.5.1. - Go to theRStudio downloadpage
- UnderInstallers select the version that matches your distribution, andinstall it with your preferred method (e.g., with Debian/Ubuntu
sudo dpkg -i rstudio-x.yy.zzz-amd64.debat the terminal). - Once it's installed, open RStudio to make sure it works and you don't get anyerror messages.
Using a consistent folder structure across your projects will help keep thingsorganized, and will help you to find/file things in the future. Thiscan be especially helpful when you have multiple projects. In general, you maycreate directories (folders) forscripts,data, anddocuments.If you want to learn more about how to get set have a look (https://datacarpentry.org/R-ecology-lesson/00-before-we-start.html)[https://datacarpentry.org/R-ecology-lesson/00-before-we-start.html]
All material here collected is free to use but it is covered by a
About
This repository contains the materials of the CDCS workshop on data wrangling and manipulation in R. We work step by step on how to take our raw survey results, and prepare it for analysis.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Contributors2
Uh oh!
There was an error while loading.Please reload this page.