- Notifications
You must be signed in to change notification settings - Fork6
Extract a wealth of information from Ruby arrays
License
hopsoft/goldmine
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Extract a wealth of information from lists.
Goldmine is especially helpful when working with source data that is difficult to query.e.g. CSV files, API results, etc...
- Data mining
- Data transformation
- Data blending
- Data visualization prep
- CSV report generation
gem install goldmine
require"goldmine"
list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot("< 5"){ |i|i <5}.to_h
{[["< 5",true]]=>[1,2,3,4],[["< 5",false]]=>[5,6,7,8,9]}
users=[{:name=>"Sally",:favorite_colors=>[:blue]},{:name=>"John",:favorite_colors=>[:blue,:green]},{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]},{:name=>"Emily",:favorite_colors=>[:orange,:green]},{:name=>"Joe",:favorite_colors=>[:red]}]Goldmine(users).pivot(:favorite_color){ |record|record[:favorite_colors]}.to_h
{[:favorite_color,:blue]=>[{:name=>"Sally",:favorite_colors=>[:blue]},{:name=>"John",:favorite_colors=>[:blue,:green]}],[:favorite_color,:green]=>[{:name=>"John",:favorite_colors=>[:blue,:green]},{:name=>"Emily",:favorite_colors=>[:orange,:green]}],[:favorite_color,:red]=>[{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]},{:name=>"Joe",:favorite_colors=>[:red]}],[:favorite_color,:pink]=>[{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]}],[:favorite_color,:purple]=>[{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]}],[:favorite_color,:orange]=>[{:name=>"Emily",:favorite_colors=>[:orange,:green]}]}
users=[{:name=>"Sally",:age=>21},{:name=>"John",:age=>28},{:name=>"Stephen",:age=>37},{:name=>"Emily",:age=>32},{:name=>"Joe",:age=>18}]Goldmine(users).pivot("'e' in name"){ |user| !!user[:name].match(/e/i)}.pivot("21 or over"){ |user|user[:age] >=21}.to_h
{[["'e' in name",false],["21 or over",true]]=>[{:name=>"Sally",:age=>21},{:name=>"John",:age=>28}],[["'e' in name",true],["21 or over",true]]=>[{:name=>"Stephen",:age=>37},{:name=>"Emily",:age=>32}],[["'e' in name",true],["21 or over",false]]=>[{:name=>"Joe",:age=>18}]}
Rollups provide an intuitive way to aggregate pivoted data into a report friendly format.Think computed columns.
Rollups areblocks
that get executed once for each pivot entry.They can be also be chained.
list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot("< 5"){ |i|i <5}.pivot("even"){ |i|i %2 ==0}.rollup("count", &:count).to_h
{[["< 5",true],["even",false]]=>[["count",2]],[["< 5",true],["even",true]]=>[["count",2]],[["< 5",false],["even",false]]=>[["count",3]],[["< 5",false],["even",true]]=>[["count",2]]}
Rollups can be computationally expensive.Optional caching can be used to reduce this computational overhead.
list=[1,2,3,4,5,6,7,8,9]Goldmine(list,cache:true).pivot(:less_than_5){ |i|i <5}.rollup(:count, &:count).rollup(:evens){ |list|list.select{ |i|i %2 ==0}.count}.rollup(:even_percentage){ |list|cache[:evens] /cache[:count].to_f}.to_h
{[[:less_than_5,true]]=>[[:count,4],[:evens,2],[:even_percentage,0.5]],[[:less_than_5,false]]=>[[:count,5],[:evens,2],[:even_percentage,0.4]]}
It's often helpful to flatten rollups into rows.
list=[1,2,3,4,5,6,7,8,9]result=Goldmine(list,cache:true).pivot(:less_than_5){ |i|i <5}.rollup(:count, &:count).rollup(:evens){ |list|list.select{ |i|i %2 ==0}.count}.rollup(:even_percentage){ |list|cache[:evens] /cache[:count].to_f}.result
result.to_rows
[[[:less_than_5,true],[:count,4],[:evens,2],[:even_percentage,0.5]],[[:less_than_5,false],[:count,5],[:evens,2],[:even_percentage,0.4]]]
result.to_hash_rows
[{:less_than_5=>true,:count=>4,:evens=>2,:even_percentage=>0.5},{:less_than_5=>false,:count=>5,:evens=>2,:even_percentage=>0.4}]
Rollups can also be converted into tabular format.
list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot(:less_than_5){ |i|i <5}.pivot(:even){ |i|i %2 ==0}.rollup(:count, &:count).to_tabular
[[:less_than_5,:even,:count],[true,false,2],[true,true,2],[false,false,3],[false,true,2]]
Rollups can also be converted into CSV format.
list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot(:less_than_5){ |i|i <5}.pivot(:even){ |i|i %2 ==0}.rollup(:count, &:count).to_csv
"less_than_5,even,count\ntrue,false,2\ntrue,true,2\nfalse,false,3\nfalse,true,2\n"
All examples are small Sinatra apps.They are designed to help communicate Goldmine use-cases.
git clone git@github.com:hopsoft/goldmine.gitcd /path/to/goldminebundle
Uses data fromhttps://github.com/hopsoft/goldmine/blob/master/examples/new_york_wifi_hotspots/DOITT_WIFI_HOTSPOT_01_13SEPT2010.csv
In this example, we mine out the following information.
- Total hotspots by city, zip, & area code
- Free hotspots by city, zip, & area code
- Paid hotspots by city, zip, & area code
- Library hotspots by city, zip, & area code
- Starbucks hotspots by city, zip, & area code
- McDonalds hotspots by city, zip, & area code
ruby examples/new_york_wifi_hotspots/app.rb
curl http://localhost:3000/rawcurl http://localhost:3000/pivotedcurl http://localhost:3000/rolled_upcurl http://localhost:3000/rowscurl http://localhost:3000/tabularcurl http://localhost:3000/csv
Uses data fromhttp://dev.socrata.com/foundry/#/data.medicare.gov/aeay-dfax
In this example, we mine out the following information.
- Total doctors by state & specialty
- Preferred doctors by state & specialty
- Female doctors by state & specialty
- Male doctors by state & specialty
- Preferred female doctors by state & specialty
- Preferred male doctors by state & specialty
ruby examples/medicare_physician_compare/app.rb
curl http://localhost:3000/rawcurl http://localhost:3000/pivotedcurl http://localhost:3000/rolled_upcurl http://localhost:3000/rowscurl http://localhost:3000/tabularcurl http://localhost:3000/csv
The Medicare dataset is large & works well for performance testing.
My Macbook Pro yields the following benchmarks.
- 3.1 GHz Intel Core i7
- 16 GB 1867 MHz DDR3
user system total realpivoted 0.630000 0.030000 0.660000 ( 0.670409)rolled_up 0.570000 0.030000 0.600000 ( 0.626413)rows 0.010000 0.000000 0.010000 ( 0.003258)tabular 0.010000 0.000000 0.010000 ( 0.010110)csv 0.050000 0.000000 0.050000 ( 0.057677)
user system total realpivoted 7.270000 0.300000 7.570000 ( 8.053166)rolled_up 6.800000 0.830000 7.630000 ( 8.051707)rows 0.000000 0.000000 0.000000 ( 0.003934)tabular 0.010000 0.000000 0.010000 ( 0.011825)csv 0.210000 0.010000 0.220000 ( 0.222752)
Goldmine makes data highly malleable.It allows you to combine the power of pivots, rollups, tabular data,& csv to construct deep insights with minimal effort.
Real world use cases include:
- Build a better understanding of database data before canonizing reports in SQL
- Create source data for building user interfaces & data visualizations
- Transform CSV data from one format to another
About
Extract a wealth of information from Ruby arrays
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.