Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

This repository provides everything you need to get started with Python for (social science) research.

NotificationsYou must be signed in to change notification settings

TiesdeKok/LearnPythonforResearch

Repository files navigation

Get started with Python for Research

Want to learn how to usePython for (Social Science) Research?
This repository has everything that you need to get started!

Author: Ties de Kok (Personal Page)

Table of contents

Introduction

The goal of this GitHub page is to provide you with everything you need to get started with Python for actual research projects.

Who is this repository for?

The topics and techniques demonstrated in this repository are primarily oriented towards empirical research projects in fields such as Accounting, Finance, Marketing, Political Science, and other Social Sciences.

However, many of the basics are also perfectly applicable if you are looking to use Python for any other type of Data Science!

How to use this repository?

This repository is written to facilitate learning by doing

If you are starting from scratch I recommend the following:

  1. Familiarize yourself with theGetting your Python setup ready andUsing Python sections below
  2. Check theCode along! section to make sure that you can interactively use the Jupyter Notebooks
  3. Work through the0_python_basics.ipynb notebook and try to get a basics grasp on the Python syntax
  4. Do the "Basic Python tasks" part of theexercises.ipynb notebook
  5. Work through the1_opening_files.ipynb,2_handling_data.ipynb, and3_visualizing_data.ipynb notebooks.
    Note: the2_handling_data.ipynb notebook is very comprehensive, feel free to skip the more advanced parts at first.
  6. Do the "Data handling tasks (+ some plotting)" part of theexercises.ipynb notebook

If you are interested in web-scraping:

  1. Work through the4_web_scraping.ipynb notebook
  2. Do the "Web scraping" part of theexercises.ipynb notebook

If you are interested in Natural Language Processing with Python:

  1. Take a look at myPython NLP tutorial repository + notebook

If you are already familiar with the Python basics:

Use the notebooks provided in this repository selectively depending on the types of problems that you try to solve with Python.

Everything in the notebooks is purposely sectioned by the task description. So if you, for example, are looking to merge two Pandas dataframes together, you can use theCombining dataframes section of the2_handling_data.ipynb notebook as a starting point.

Getting your Python setup ready

There are multiple ways to get your Python environment set up. To keep things simple I will only provide you with what I believe to be the best and easiest way to get started: the Anaconda distribution + a conda environment.

Anaconda Distribution

The Anaconda Distribution bundles Python with a large collection of Python packages from the (data) science Python eco-system.

By installing the Anaconda Distribution you essentially obtain everything you need to get started with Python for Research!

Step 1: Install Anaconda

  1. Go toanaconda.com/download/
  2. Download thePython 3.x version installer
  3. Install Anaconda.
    • It is worth to take note of the installation directory in case you ever need to find it again.
  4. Check if the installation works by launching a command prompt (terminal) and typepython, it should say Anaconda at the top.
    • On Windows I recommend using theAnaconda Prompt

Note: Anaconda also comes with theAnaconda Explorer, I haven't personally used it yet but it might be convenient.

Step 2: Set up thelearnpythonforresearch environment

  1. Make sure you've cloned/downloaded this repository:Clone repository
  2. cd (i.e. Change) to the folder where you extracted the ZIP file
    for example:cd "C:\Files\Work\Project_1"
    Note: if you are changing do folder on another drive you might have to also switch drives by typing, for example,E:
  3. Run the following commandconda env create -f environment.yml
  4. Activate the environment with:conda activate LearnPythonforResearch

A full list of all the packages used is provided in theenvironment.yml file.

Python 3 vs Python 2?

Python 3.x is the newer and superior version over Python 2.7 so I strongly recommend to use Python 3.x whenever possible. There is no reason to use Python 2.7, unless you are forced to work with old Python 2.7 code.

Using Python

Basic methods:

The native way to run Python code is by saving the code to a file with the ".py" extension and executing it from the console / terminal:

python code.py

Alternatively, you can run some quick code by starting a python or ipython interactive console by typing eitherpython oripython in your console / terminal.

Jupyter Notebook/Lab

The above is, however, not very convenient for research purposes as we desire easy interactivity and good documentation options.
Fortunately, the awesomeJupyter Notebooks provide a great alternative way of using Python for research purposes.

Jupyter comes pre-installed with the Anaconda distribution so you should have everything already installed and ready to go.

Note on Jupyter Lab

JupyterLab 1.0: Jupyter’s Next-Generation Notebook Interface
JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning. JupyterLab is extensible and modular: write plugins that add new components and integrate with existing ones.

Jupyter Lab is an additional interface layer that extends the functionality of Jupyter Notebooks which are the primary way you interact with Python code.

What is the Jupyter Notebook?

From theJupyter website:

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

In other words, the Jupyter Notebook allows you to program Python code straight from your browser!

How does the Jupyter Notebook/Lab work in the background?

The diagram below sums up the basics components of Jupyter:

At the heart there is theJupyter Server that handles everything, theJupyter Notebook which is accessed and used through your browser, and thekernel that executes the code. We will be focusing on the natively includedPython Kernel but Jupyter is language agnostic so you can also use it with other languages/software such as 'R'.

It is worth noting that in most cases you will be running theJupyter Server on your own computer and will connect to it locally in your browser (i.e. you don't need to be connected to the internet). However, it is also possible to run the Jupyter Server on a different computer, for example a high performance computation server in the cloud, and connect to it over the internet.

How to start a Jupyter Notebook/Lab?

The primary method that I would recommend to start a Jupyter Notebook/Lab is to use the command line (terminal) directly:

  1. Open your command prompt / terminal (on Windows I recommend the Anaconda Prompt)
  2. Activate the right environment withconda activate LearnPythonForResearch
  3. cd (i.e. Change) to the desired starting directory
    for example:cd "C:\Files\Work\Project_1"
    Note: if you are changing do folder on another drive you might have to also switch drives by typing, for example,E:
  4. Start the Jupyter Notebook/Lab server by typing:jupyter notebook orjupyter lab

This should automatically open up the corresponding Jupyter Notebook/Lab in your default browser.You can also manually go to the Jupyter Notebook/Lab by going tolocalhost:8888 with your browser. (You might be asked for a password, which can find in the terminal window where there Jupyter server is running.)

How to close a Jupyter Server erver?

If you want to close down the Jupyter Server: open up the command prompt window that runs the server and pressCTRL + C twice.
Make sure that you have saved any open Jupyter Notebooks!

How to use the Jupyter Notebook?

Some shortcuts are worth mentioning for reference purposes:

command mode --> enable by pressingesc
edit mode --> enable by pressingenter

command modeedit modeboth modes
Y : cell to codeTab : code completion or indentShift-Enter : run cell, select below
M : cell to markdownShift-Tab : tooltipCtrl-Enter : run cell
A : insert cell aboveCtrl-A : select all
B : insert cell belowCtrl-Z : undo
X: cut selected cell

Installing Packages

The Python eco-system consists of many packages and modules that people have programmed and made available for everyone to use.
These packages/modules are one of the things that makes Python so useful.

Some packages are natively included with Python and Anaconda, but anything not included you need to install first before you can import them.
I will discuss the three primary methods of installing packages:

Method 1: usepip

Many packages are available on the "Python Package Index" (i.e. "PyPI"):https://pypi.python.org/pypi

You can install packages that are on "PyPI" by using thepip command:

Example, install therequests package: runpip install requests in your command line / terminal (not in the Jupyter Notebook!).

To uninstall you can usepip uninstall and to upgrade an existing package you can add the-U flag (pip install -U requests)

Method 2: useconda

Sometimes when you try something withpip you get a compile error (especially on Windows). You can try to fix this by configuring the right compiler but most of the times it is easier to try to install it directly via Anaconda as these are pre-compiled. For example:

conda install scipy

Full documentation is here:Conda documentation

Method 3: install directly using thesetup.py file

Sometimes a package is not on pypi and conda (you often find these packages on GitHub). Follow these steps to install those:

  1. Download the folder with all the files (if archived, make sure to unpack the folder)
  2. Open your command prompt (terminal) andcd to the folder you just downloaded
  3. Type:python setup.py install

Tutorial Notebooks

This repository covers the following topics:

Additionally, if you are interested in Natural Language Processing I have a notebook for that as well:

Exercises

I have provided several tasks / exercises that you can try to solve in theexercises.ipynb notebook.

Note: To avoid the "oh, that looks easy!" trap I have not uploaded the exercises notebook with examples answers.
Feel free to email me for the answer keys once you are done!

Code along!

You can code along in two ways:

Option 1: use Binder

If you want to experiment with the code in a live environment you can also usebinder.

Binder allows to create a live environment where you can execute code just as-if you were on your own computer based on a GitHub repository, it is very awesome!

Click on the button below to launch binder:

Note: you could use binder to complete the exercises but it will not save!!

Option 2: Set up local Python setup

You can essentially "download" the contents of this repository by cloning the repository.

You can do this by clicking "Clone or download" button and then "Download ZIP":

After you download and extracted the zip file into a folder you can follow the steps to set up your environment:

  1. Installing Anaconda
  2. Setting up Conda Environment

Questions?

If you have questions or experience problems please use theissues tab of this repository.

License

MIT - Ties de Kok - 2020

Special Thanks

https://github.com/teles/array-mixer for having an awesome readme that I used as a template.

About

This repository provides everything you need to get started with Python for (social science) research.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp