Rendering data as graphs
Learn how to visualize the programming languages from your repository using the D3.js library and Ruby Octokit.
In this article
In this guide, we're going to use the API to fetch information about repositoriesthat we own, and the programming languages that make them up. Then, we'llvisualize that information in a couple of different ways using theD3.js library. Tointeract with the GitHub API, we'll be using the excellent Ruby library,Octokit.
If you haven't already, you should read theBasics of Authenticationguide before starting this example. You can find the complete source code for this project in theplatform-samples repository.
Let's jump right in!
Setting up an OAuth app
First,register a new application on GitHub. Set the main and callbackURLs tohttp://localhost:4567/. Asbefore, we're going to handle authentication for the API byimplementing a Rack middleware usingsinatra-auth-github:
require'sinatra/auth/github'moduleExampleclassMyGraphApp <Sinatra::Base# !!! DO NOT EVER USE HARD-CODED VALUES IN A REAL APP !!!# Instead, set and test environment variables, like below# if ENV['GITHUB_CLIENT_ID'] && ENV['GITHUB_CLIENT_SECRET']# CLIENT_ID = ENV['GITHUB_CLIENT_ID']# CLIENT_SECRET = ENV['GITHUB_CLIENT_SECRET']# endCLIENT_ID =ENV['GH_GRAPH_CLIENT_ID']CLIENT_SECRET =ENV['GH_GRAPH_SECRET_ID'] enable:sessions set:github_options, {:scopes =>"repo",:secret =>CLIENT_SECRET,:client_id =>CLIENT_ID,:callback_url =>"/" } registerSinatra::Auth::Github get'/'doif !authenticated? authenticate!else access_token = github_user["token"]endendendendSet up a similarconfig.ru file as in the previous example:
ENV['RACK_ENV'] ||='development'require"rubygems"require"bundler/setup"requireFile.expand_path(File.join(File.dirname(__FILE__),'server'))runExample::MyGraphAppFetching repository information
This time, in order to talk to the GitHub API, we're going to use theOctokitRuby library. This is much easier than directly making a bunch ofREST calls. Plus, Octokit was developed by a GitHubber, and is actively maintained,so you know it'll work.
Authentication with the API via Octokit is easy. Just pass your loginand token to theOctokit::Client constructor:
if !authenticated? authenticate!else octokit_client =Octokit::Client.new(:login => github_user.login,:oauth_token => github_user.token)endLet's do something interesting with the data about our repositories. We're goingto see the different programming languages they use, and count which ones are usedmost often. To do that, we'll first need a list of our repositories from the API.With Octokit, that looks like this:
repos = client.repositoriesNext, we'll iterate over each repository, and count the language that GitHubassociates with it:
language_obj = {}repos.eachdo |repo|# sometimes language can be nilif repo.languageif !language_obj[repo.language] language_obj[repo.language] =1else language_obj[repo.language] +=1endendendlanguages.to_sWhen you restart your server, your web page should display somethingthat looks like this:
{"JavaScript"=>13,"PHP"=>1,"Perl"=>1,"CoffeeScript"=>2,"Python"=>1,"Java"=>3,"Ruby"=>3,"Go"=>1,"C++"=>1}So far, so good, but not very human-friendly. A visualizationwould be great in helping us understand how these language counts are distributed. Let's feedour counts into D3 to get a neat bar graph representing the popularity of the languages we use.
Visualizing language counts
D3.js, or just D3, is a comprehensive library for creating many kinds of charts, graphs, and interactive visualizations.Using D3 in detail is beyond the scope of this guide, but for a good introductory article,check outD3 for Mortals.
D3 is a JavaScript library, and likes working with data as arrays. So, let's convert our Ruby hash intoa JSON array for use by JavaScript in the browser.
languages = []language_obj.eachdo |lang, count| languages.push:language => lang,:count => countenderb:lang_freq,:locals => {:languages => languages.to_json}We're simply iterating over each key-value pair in our object and pushing them intoa new array. The reason we didn't do this earlier is because we didn't want to iterateover ourlanguage_obj object while we were creating it.
Now,lang_freq.erb is going to need some JavaScript to support rendering a bar graph.For now, you can just use the code provided here, and refer to the resources linked aboveif you want to learn more about how D3 works:
<!DOCTYPEhtml><metacharset="utf-8"><html><head><scriptsrc="//cdnjs.cloudflare.com/ajax/libs/d3/3.0.1/d3.v3.min.js"></script><style>svg {padding:20px; }rect {fill:#2d578b }text {fill: white; }text.yAxis {font-size:12px;font-family: Helvetica, sans-serif;fill: black; }</style></head><body><p>Check this sweet data out:</p><divid="lang_freq"></div></body><script>var data = <%= languages %>;var barWidth =40;var width = (barWidth +10) * data.length;var height =300;var x = d3.scale.linear().domain([0, data.length]).range([0, width]);var y = d3.scale.linear().domain([0, d3.max(data,function(datum) {return datum.count; })]).rangeRound([0, height]);// add the canvas to the DOMvar languageBars = d3.select("#lang_freq").append("svg:svg").attr("width", width).attr("height", height); languageBars.selectAll("rect").data(data).enter().append("svg:rect").attr("x",function(datum, index) {returnx(index); }).attr("y",function(datum) {return height -y(datum.count); }).attr("height",function(datum) {returny(datum.count); }).attr("width", barWidth); languageBars.selectAll("text").data(data).enter().append("svg:text").attr("x",function(datum, index) {returnx(index) + barWidth; }).attr("y",function(datum) {return height -y(datum.count); }).attr("dx", -barWidth/2).attr("dy","1.2em").attr("text-anchor","middle").text(function(datum) {return datum.count;}); languageBars.selectAll("text.yAxis").data(data).enter().append("svg:text").attr("x",function(datum, index) {returnx(index) + barWidth; }).attr("y", height).attr("dx", -barWidth/2).attr("text-anchor","middle").text(function(datum) {return datum.language;}).attr("transform","translate(0, 18)").attr("class","yAxis");</script></html>Phew! Again, don't worry about what most of this code is doing. The relevant parthere is a line way at the top--var data = <%= languages %>;--which indicatesthat we're passing our previously createdlanguages array into ERB for manipulation.
As the "D3 for Mortals" guide suggests, this isn't necessarily the best use ofD3. But it does serve to illustrate how you can use the library, along with Octokit,to make some really amazing things.
Combining different API calls
Now it's time for a confession: thelanguage attribute within repositoriesonly identifies the "primary" language defined. That means that if you havea repository that combines several languages, the one with the most bytes of codeis considered to be the primary language.
Let's combine a few API calls to get atrue representation of which languagehas the greatest number of bytes written across all our code. Atreemapshould be a great way to visualize the sizes of our coding languages used, ratherthan simply the count. We'll need to construct an array of objects that lookssomething like this:
[{"name":"language1","size":100},{"name":"language2","size":23} ...]Since we already have a list of repositories above, let's inspect each one, andcall theGET /repos/{owner}/{repo}/languages endpoint:
repos.eachdo |repo| repo_name = repo.name repo_langs = octokit_client.languages("#{github_user.login}/#{repo_name}")endFrom there, we'll cumulatively add each language found to a list of languages:
repo_langs.eachdo |lang, count|if !language_obj[lang] language_obj[lang] = countelse language_obj[lang] += countendendAfter that, we'll format the contents into a structure that D3 understands:
language_obj.eachdo |lang, count| language_byte_count.push:name =>"#{lang} (#{count})",:count => countend# some mandatory formatting for D3language_bytes = [:name =>"language_bytes",:elements => language_byte_count](For more information on D3 tree map magic, check outthis simple tutorial.)
To wrap up, we pass this JSON information over to the same ERB template:
erb:lang_freq,:locals => {:languages => languages.to_json,:language_byte_count => language_bytes.to_json}Like before, here's a bunch of JavaScript that you can dropdirectly into your template:
<divid="byte_freq"></div><script>var language_bytes = <%= language_byte_count %>var childrenFunction =function(d){return d.elements};var sizeFunction =function(d){return d.count;};var colorFunction =function(d){returnMath.floor(Math.random()*20)};var nameFunction =function(d){return d.name;};var color = d3.scale.linear() .domain([0,10,15,20]) .range(["grey","green","yellow","red"]);drawTreemap(5000,2000,'#byte_freq', language_bytes, childrenFunction, nameFunction, sizeFunction, colorFunction, color);functiondrawTreemap(height,width,elementSelector,language_bytes,childrenFunction,nameFunction,sizeFunction,colorFunction,colorScale){var treemap = d3.layout.treemap() .children(childrenFunction) .size([width,height]) .value(sizeFunction);var div = d3.select(elementSelector) .append("div") .style("position","relative") .style("width",width +"px") .style("height",height +"px"); div.data(language_bytes).selectAll("div") .data(function(d){return treemap.nodes(d);}) .enter() .append("div") .attr("class","cell") .style("background",function(d){returncolorScale(colorFunction(d));}) .call(cell) .text(nameFunction); }functioncell(){this .style("left",function(d){return d.x +"px";}) .style("top",function(d){return d.y +"px";}) .style("width",function(d){return d.dx -1 +"px";}) .style("height",function(d){return d.dy -1 +"px";}); }</script>Et voila! Beautiful rectangles containing your repo languages, with relativeproportions that are easy to see at a glance. You might need totweak the height and width of your treemap, passed as the first twoarguments todrawTreemap above, to get all the information to show up properly.