Ready to take Python coding to a new level? Explore ourPython Code Generator. The perfect tool to get your code up and running in no time. Start now!
GitHub is aGit repository hosting service that adds many of its own features, such as a web-based graphical interface to manage repositories, access control, and several other features, such as wikis, organizations, gists, and more.
As you may already know, there is a ton of data to be grabbed. In addition to using GitHub API v3 in Python, you might also be interested in learning how touse the Google Drive API in Python to automate tasks related to Google Drive. Or perhaps you need touse the Gmail API in Python to automate tasks related to your Gmail account.
In this tutorial, you will learn how you can use GitHub API v3 in Python using bothrequests orPyGithub libraries.
Table of content:
To get started, let's install the dependencies:
$ pip3 install PyGithub requests
Related:How to Extract YouTube Data using YouTube API in Python.
Since it's pretty straightforward to useGithub API v3, you can make a simpleGET
request to a specific URL and retrieve the results:
import requestsfrom pprint import pprint# github usernameusername = "x4nth055"# url to requesturl = f"https://api.github.com/users/{username}"# make the request and return the jsonuser_data = requests.get(url).json()# pretty print JSON datapprint(user_data)
Here I used my account; here is a part of the returnedJSON (you can see it in the browser as well):
{'avatar_url': 'https://avatars3.githubusercontent.com/u/37851086?v=4', 'bio': None, 'blog': 'https://www.thepythoncode.com', 'company': None, 'created_at': '2018-03-27T21:49:04Z', 'email': None, 'events_url': 'https://api.github.com/users/x4nth055/events{/privacy}', 'followers': 93, 'followers_url': 'https://api.github.com/users/x4nth055/followers', 'following': 41, 'following_url': 'https://api.github.com/users/x4nth055/following{/other_user}', 'gists_url': 'https://api.github.com/users/x4nth055/gists{/gist_id}', 'gravatar_id': '', 'hireable': True, 'html_url': 'https://github.com/x4nth055', 'id': 37851086, 'login': 'x4nth055', 'name': 'Rockikz',<..SNIPPED..>
A lot of data, that's why using therequests library alone won't be handy to extract this ton of data manually. As a result,PyGithub comes to the rescue.
Related:Webhooks in Python with Flask.
Let's get all the public repositories of that user using the PyGithub library we just installed:
import base64from github import Githubfrom pprint import pprint# Github usernameusername = "x4nth055"# pygithub objectg = Github()# get that user by usernameuser = g.get_user(username)for repo in user.get_repos(): print(repo)
Here is my output:
Repository(full_name="x4nth055/aind2-rnn")Repository(full_name="x4nth055/awesome-algeria")Repository(full_name="x4nth055/emotion-recognition-using-speech")Repository(full_name="x4nth055/emotion-recognition-using-text")Repository(full_name="x4nth055/food-reviews-sentiment-analysis")Repository(full_name="x4nth055/hrk")Repository(full_name="x4nth055/lp_simplex")Repository(full_name="x4nth055/price-prediction")Repository(full_name="x4nth055/product_recommendation")Repository(full_name="x4nth055/pythoncode-tutorials")Repository(full_name="x4nth055/sentiment_analysis_naive_bayes")
Alright, so I made a simple function to extract some useful information from thisRepository object:
def print_repo(repo): # repository full name print("Full name:", repo.full_name) # repository description print("Description:", repo.description) # the date of when the repo was created print("Date created:", repo.created_at) # the date of the last git push print("Date of last push:", repo.pushed_at) # home website (if available) print("Home Page:", repo.homepage) # programming language print("Language:", repo.language) # number of forks print("Number of forks:", repo.forks) # number of stars print("Number of stars:", repo.stargazers_count) print("-"*50) # repository content (files & directories) print("Contents:") for content in repo.get_contents(""): print(content) try: # repo license print("License:", base64.b64decode(repo.get_license().content.encode()).decode()) except: pass
Repository object has a lot of other fields. I suggest you usedir(repo)
to get the fields you want to print.Let's iterate over repositories again and use the function we just wrote:
# iterate over all public repositoriesfor repo in user.get_repos(): print_repo(repo) print("="*100)
This will print some information about each public repository of this user:
====================================================================================================Full name: x4nth055/pythoncode-tutorialsDescription: The Python Code TutorialsDate created: 2019-07-29 12:35:40Date of last push: 2020-04-02 15:12:38Home Page: https://www.thepythoncode.comLanguage: PythonNumber of forks: 154Number of stars: 150--------------------------------------------------Contents:ContentFile(path="LICENSE")ContentFile(path="README.md")ContentFile(path="ethical-hacking")ContentFile(path="general")ContentFile(path="images")ContentFile(path="machine-learning")ContentFile(path="python-standard-library")ContentFile(path="scapy")ContentFile(path="web-scraping")License: MIT License<..SNIPPED..>
I've truncated the whole output, as it will return all repositories and their information; you can see we usedrepo.get_contents("") method to retrieve all the files and folders of that repository,PyGithub parses it into aContentFile object, usedir(content)
to see other useful fields.
Also, if you have private repositories, you can access them by authenticating your account (using the correct credentials) usingPyGithub as follows:
username = "username"password = "password"# authenticate to githubg = Github(username, password)# get the authenticated useruser = g.get_user()for repo in user.get_repos(): print_repo(repo)
It is also suggested by GitHub to use the authenticated requests, as it will raise aRateLimitExceededException if you use the public one (without authentication) and exceed a small number of requests.
You can also download any file from any repository you want. To do that, I'm editing theprint_repo()
function to search for Python files in a given repository. If found, we make the appropriate file name and write the content of it usingcontent.decoded_content
attribute. Here's the edited version of theprint_repo()
function:
# make a directory to save the Python filesif not os.path.exists("python-files"): os.mkdir("python-files")def print_repo(repo): # repository full name print("Full name:", repo.full_name) # repository description print("Description:", repo.description) # the date of when the repo was created print("Date created:", repo.created_at) # the date of the last git push print("Date of last push:", repo.pushed_at) # home website (if available) print("Home Page:", repo.homepage) # programming language print("Language:", repo.language) # number of forks print("Number of forks:", repo.forks) # number of stars print("Number of stars:", repo.stargazers_count) print("-"*50) # repository content (files & directories) print("Contents:") try: for content in repo.get_contents(""): # check if it's a Python file if content.path.endswith(".py"): # save the file filename = os.path.join("python-files", f"{repo.full_name.replace('/', '-')}-{content.path}") with open(filename, "wb") as f: f.write(content.decoded_content) print(content) # repo license print("License:", base64.b64decode(repo.get_license().content.encode()).decode()) except Exception as e: print("Error:", e)
After you run the code again (you can get the complete code of the entire tutorial here), you'll notice a folder namedpython-files
created that contain Python files from different repositories of that user:
Learn also:How to Make a URL Shortener in Python.
The GitHub API is quite rich; you can search for repositories by a specific query just like you do on the website:
# search repositories by namefor repo in g.search_repositories("pythoncode tutorials"): # print repository details print_repo(repo)
This will return9 repositories and their information.
You can also search by programming language or topic:
# search by programming languagefor i, repo in enumerate(g.search_repositories("language:python")): print_repo(repo) print("="*100) if i == 9: break
To search for a particular topic, you simply put something like"topic:machine-learning"
insearch_repositories()
method.
Read also:How to Extract Wikipedia Data in Python.
If you're using the authenticated version, you can also create, update and delete files very easily using the API:
# searching for my repositoryrepo = g.search_repositories("pythoncode tutorials")[0]# create a file and commit n pushrepo.create_file("test.txt", "commit message", "content of the file")# delete that created filecontents = repo.get_contents("test.txt")repo.delete_file(contents.path, "remove test.txt", contents.sha)
The above code is a simple use case; I searched for a particular repository, I've added a new file and called ittest.txt
, I put some content in it and made a commit. After that, I grabbed the content of that new file and deleted it (and it'll count as agit commit
as well).
And sure enough, after the execution of the above lines of code, the commits were created and pushed:
We have just scratched the surface of the GitHub API, there are a lot of other functions and methods you can use, and obviously, we can't cover all of them. Here are some useful ones you can test on your own:
There are a lot more; please usedir(g)
to get other methods. CheckPyGithub documentation or theGitHub API for detailed information.
Learn also: How to Use Google Custom Search Engine API in Python.
Happy Coding ♥
Let ourCode Converter simplify your multi-language projects. It's like having a coding translator at your fingertips. Don't miss out!
View Full Code Improve My CodeGot a coding query or need some guidance before you comment? Check out thisPython Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!