- Notifications
You must be signed in to change notification settings - Fork217
🐍 Learn Python and Pandas from the ground up
License
dgerlanc/programming-with-data
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains the slides, exercises, and answers forProgrammingwith Data: Python and Pandas. The goal of this tutorial is to teach you,someone with experience programming in Python, most of the features available inPandas. The material from this course has been presented at conferencesincluding ODSC and Battlefin Discovery Data and online through the O'Reillyplatform.
Whether in R, MATLAB, Stata, or python, modern data analysis, for manyresearchers, requires some kind of programming. The preponderance of tools andspecialized languages for data analysis suggests that general purposeprogramming languages like C and Java do not readily address the needs of datascientists; something more is needed.
In this workshop, you will learn how to accelerate your data analyses using thePython language and Pandas, a library specifically designed for interactive dataanalysis. Pandas is a massive library, so we will focus on its corefunctionality, specifically, loading, filtering, grouping, and transformingdata. Having completed this workshop, you will understand the fundamentals ofPandas, be aware of common pitfalls, and be ready to perform your own analyses.
Workshop assumes that participants have intermediate-level programming abilityin Python. Participants should know the difference between adict,list, andtuple. Familiarity with control-flow (if/else/for/while) and error handling(try/catch) are required.
No statistics background is required.
If you have a stable Internet connection and the free Binder service isn't undertoo much load, the easiest way to interactively run the slides and try theexercises is to click the Binder badge (make sure you open in a new window).Keep in mind that Binder aggresively shuts down idle instances so you'll need torefresh the link if you're idle for too long.
You may view the HTML versions of slides and the answers directly in your browser on Githubthough you will not be able to run them interactively:
- Lesson 1 - Series
- Lesson 2 - DataFrames
- Lesson 3 - Split, Apply, Combine
- Lesson 4 - Time Series
- Lesson 5 - Merge and Concat
- Lesson 6 - Advanced Merge and Reshape
If you're taking the course, want to follow along with the slides and do theexercises, and may not have Internet access, download andinstall the Anaconda Python 3 distribution andconda package managerahead of time:
https://www.anaconda.com/download/Download the latest version of the course materialshere.
Alternatively, you may clone the course repository usinggit:
$ git clone https://github.com/dgerlanc/programming-with-data.gitThe remainder of the installation requires that you use the command line.
To complete the course exercises, you must useconda to install thedependencies specified in theenvironment.yml file in the repository:
$ conda env create -f environment.ymlThis will create anconda environment calledprogwd which may be"activated" with the following commands:
- Windows:
activate progwd - Linux and Mac:
conda activate progwd
Once you've activated the environment your prompt will probablylook something like this:
(progwd) $The entire course is designed to usejupyter notebooks. Start thenotebook server to get started:
(progwd) $ jupyter labYour feedback on the course helps to improve it for future students.Please leave feedbackhere.
About
🐍 Learn Python and Pandas from the ground up
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.