- Notifications
You must be signed in to change notification settings - Fork1
A simple script to visualize the number of GitHub repositories created in a timeline for a given keyword and programming language. The primary idea is to get some statistics about the usage of libraries across different languages.
pavitrakumar78/Language-and-Library-usage-analysis-of-GitHub-repositories
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A simple script to visualize the number of GitHub repositories created in a timeline for a given keyword and programming language. The primary idea is to get some statistics about the usage of libraries across different languages.
I construct a GitHub search link using the input params given by user and extract necessary contents from the html of the page using BeautifulSoup. Multiple searches are made depending on the timeline params (monthly or yearly). My intension was to find the usage of same libraries in different langauges (demonstrated in the example 1). But since the search performs exactly like the search bar in GitHub, we can also use it to search for function names, repo names etc.,
Note: I do not think general search (without mentioning repository name or username) is possible using GitHub's API, so that is why I have used raw crawling. If you try searching without mentioning anything (like we do in the search bar) you will getthis. Also, another downside is that using this script, you cannot request more than 10 requests per minute (sleep
has been added in this script to allow more requests, but long timelines will take more time).
search_language="lua"created_from="2016-02-01"#YYYY-MM-DDcreated_till="2017-04-01"search_string="torch"intreval="month"#"month" or "year" - display/generate statistics yearly or monthly#plot only top x languagestop_x=5stats=get_statistics(search_string,search_language,created_from,created_till)plot_stats(stats,top_x)
search_language="python"created_from="2013-02-01"#YYYY-MM-DDcreated_till="2017-04-01"search_string="deep learning"intreval="year"#"month" or "year" - display/generate statistics yearly or monthly#plot only top x languages/technologiestop_x=7stats=get_statistics(search_string,search_language,created_from,created_till)plot_stats(stats,top_x)
Python 3.5
urllib
BeautifulSoup
pandas
matplotlib
tqdm
About
A simple script to visualize the number of GitHub repositories created in a timeline for a given keyword and programming language. The primary idea is to get some statistics about the usage of libraries across different languages.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.