Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

An introduction to data science using Python and Pandas with Jupyter notebooks.

License

NotificationsYou must be signed in to change notification settings

LukaMrPython/python-for-data-analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Course in data science. Learn to analyze data of all types using the Python programming language. No programming experience is necessary.

Quick links:📁 lessons⏬ Lesson Schedule

Software covered:

  • IPython environment and Jupyter notebooks
  • Conda for package management and virtual environments
  • Python 3

Course topics include:

  • Introduction to/review of the command line
  • Fundamentals of Python and its data types
  • Data analysis packages Numpy and Pandas
  • Plotting packages Matplotlib and Seaborn
  • Statistics
  • Regular expressions
  • Interactive visualization
  • Modules and classes
  • Git and GitHub

Instructor

Online Content

Textbooks

Note: O'Reilly Media titles are free to UCSD affiliates withSafari Books Online.

Additional Materials

Command Line Resources

Python Resources

IPython Resources from Cyrille Rossant

Data Analysis Resources

Course Philosophy

  1. Just like anything else, you learn Python by doing. With a few exceptions, you're not going to break your computer by trying new commands. So just try it and see what happens. Print output of commands. Print values of variables. Kick the thing until it works.
  2. When you don't know how to do something, google it. You'll be amazed by the solutions you'll find to dothing x if you google "python thing x".
  3. Learn keyboard shortcuts, as many as you can. Tab-complete in the shell and IPython/Jupyter!
  4. Remember Zed's sage wisdom:
    • Practice every day.
    • Don't over-do it. Slow and steady wins the race.
    • It's alright to be totally lost at first.
    • When you get stuck, get more information.
    • Try to solve it yourself first.

Assignments

Weekly assignments

Weekly take-home assignments will follow the course schedule, reinforcing skills with exercises to analyze and visualize scientific data. Assignments will given out on Thursdays and will be due the following Thursday, using TritonEd.

Final Project

You will choose a data set of your own or provided in one of the texts and write a Python program (or set of Python programs or mixture of .ipynb and .py/.sh scripts) to carry out a revealing data analysis. Have a look at Shaw Ex43-52 and McKinney Ch10-12 for more ideas.

Requirements:

  • Submit your project as either: a Jupyter notebook (or collection of notebooks), a Python script (or collection of scripts), or a combination of the two.
  • Usepandas and at least three (3) additional libraries/packages, such as:
    • Plotting:matplotlib,seaborn
    • Statistics and modeling:statsmodels,scikit-learn
    • Bioinformatics:scikit-bio,biopython
    • Climate science:cdms,iris
    • Other domain-specific libraries/packages
  • Use at least three (3) user-defined functions.
  • Optional: Create user-defined modules and classes for use in your code.
  • Optional: Share your code on GitHub.

Note: There are no midterm or final exams.

Schedule Overview

Schedule is subject to change.

The course consists of 20 lessons. It was originally taught as 2 lessons per week for 10 weeks, but the material can be covered at any pace.

Lessons 1-3 will be an introduction to the command line. By the end of this tutorial, everyone will be familiar with basic Unix commands.

Lessons 4-9 will be an introduction to programming using Python. The main text will be Shaw'sLearn Python 3 the Hard Way. For those with experience in a programming language other than Python, Lutz'sLearning Python will provide a more thorough introduction to programming Python. We will learn to use IPython and IPython Notebooks (also called Jupyter), a much richer Python experience than the Unix command line or Python interpreter.

Lessons 10-18 will focus on Python packages for data analysis. We will work through McKinney'sPython for Data Analysis, which is all about analyzing data, doing statistics, and making pretty plots. You may find that Python can emulate or exceed much of the functionality of R and MATLAB.

Lessons 19-20 conclude the course with two skills useful in developing code: writing your own classes and modules, and sharing your code on GitHub.

Lesson Schedule

  • Course material is available as .md or .ipynb files by clicking on the lesson number below.
  • In addition to doing the readings, please follow along writing code (this is integral to the Shaw readings), and do any Study Drills (Shaw) and Chapter Quizzes (Lutz).
LessonTitleReadingsTopicsAssignment
1Overview--Introductions and overview of coursePre-course survey; Acquire texts
2Command Line Part IShaw:Introduction,
Ex0,Appendix A
Command line crash course; Text editorsAssignment 1: Basic Shell Commands
3Command Line Part IIYale:The 10 Most Important Linux CommandsAdvanced commands in the bash shell--
4Conda, IPython, and Jupyter NotebooksGeohackweek:Introduction to CondaConda tutorial including Conda environments, Python packages, and PIP, Python and IPython in the command line, Jupyter notebook tutorial and Python crash courseAssignment 2: Bash, Conda, IPython, and Jupyter
5Python Basics, Strings, PrintingShaw:Ex1-10; Lutz: Ch1-7Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython--
6Taking Input, Reading and Writing Files, FunctionsShaw:Ex11-26; Lutz: Ch9,14-17Taking input, reading files, writing files, functionsAssignment 3: Python Fundamentals I
7Logic, Loops, Lists, Dictionaries, and TuplesShaw:Ex27-39; Lutz: Ch8-13Logic and loops, lists and list comprehension, tuples, dictionaries, other types--
8Python and IPython ReviewMcKinney:Ch1,Ch2,Ch3Review of Python commands, IPython review -- enhanced interactive Python shells with support for data visualization, distributed and parallel computation and a browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich mediaAssignment 4: Python Fundamentals II
9Regular ExpressionsKuchling:Regular Expression HOWTORegular expression syntax, Command-line tools:grep,sed,awk,perl -e, Python examples: built-in andre module--
10Numpy, Pandas and Matplotlib CrashcoursePratik:Introduction to Numpy and PandasNumpy, Pandas, and Matplotlib overviewAssignment 5: Regular Expressions
11Pandas Part IMcKinney:Ch4,Ch5Introduction to NumPy and Pandas:ndarray,Series,DataFrame,index,columns,dtypes,info,describe,read_csv,head,tail,loc,iloc,ix,to_datetime--
12Pandas Part IIMcKinney:Ch6,Ch7,Ch8Data Analysis with Pandas:concat,append,merge,join,set_option,stack,unstack,transpose, dot-notation,values,apply,lambda,sort_index,sort_values,to_csv,read_csv,isnullAssignment 6: Pandas Fundamentals
13Plotting with MatplotlibMcKinney:Ch9; Johansson:Matplotlib 2D and 3D plotting in PythonMatplotlib tutorial from J.R. Johansson--
14Plotting with SeabornSeaborn TutorialSeaborn tutorial from Michael WaskomAssignment 7: Plotting
15Pandas Time SeriesMcKinney:Ch11Time series data in Pandas--
16Pandas Group OperationsMcKinney:Ch10groupby,melt,pivot,inplace=True,reindexAssignment 8: Time Series and Group Operations
17Statistics PackagesHandbook of Biological StatisticsStatitics capabilities of Pandas, Numpy, Scipy, and Scikit-bio--
18Interactive Visualization with BokehBokeh User GuideQuickstart guide to making interactive HTML and notebook plots with BokehAssignment 9: Statistics and Interactive Visualization
19Modules and ClassesShaw:Ex40-52Packaging your code so you and others can use it again--
20Git and GitHubGitHub GuidesSharing your code in a public GitHub repositoryFinal Project

About

An introduction to data science using Python and Pandas with Jupyter notebooks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook100.0%

[8]ページ先頭

©2009-2025 Movatter.jp