Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Extract a wealth of information from Ruby arrays

License

NotificationsYou must be signed in to change notification settings

hopsoft/goldmine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lines of CodeMaintainabilityBuild StatusCoverage StatusDownloads

Goldmine

Extract a wealth of information from lists.

Goldmine is especially helpful when working with source data that is difficult to query.e.g. CSV files, API results, etc...

Uses

  • Data mining
  • Data transformation
  • Data blending
  • Data visualization prep
  • CSV report generation

Quick Start

gem install goldmine
require"goldmine"
list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot("< 5"){ |i|i <5}.to_h
{[["< 5",true]]=>[1,2,3,4],[["< 5",false]]=>[5,6,7,8,9]}

Array Value Pivots

users=[{:name=>"Sally",:favorite_colors=>[:blue]},{:name=>"John",:favorite_colors=>[:blue,:green]},{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]},{:name=>"Emily",:favorite_colors=>[:orange,:green]},{:name=>"Joe",:favorite_colors=>[:red]}]Goldmine(users).pivot(:favorite_color){ |record|record[:favorite_colors]}.to_h
{[:favorite_color,:blue]=>[{:name=>"Sally",:favorite_colors=>[:blue]},{:name=>"John",:favorite_colors=>[:blue,:green]}],[:favorite_color,:green]=>[{:name=>"John",:favorite_colors=>[:blue,:green]},{:name=>"Emily",:favorite_colors=>[:orange,:green]}],[:favorite_color,:red]=>[{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]},{:name=>"Joe",:favorite_colors=>[:red]}],[:favorite_color,:pink]=>[{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]}],[:favorite_color,:purple]=>[{:name=>"Stephen",:favorite_colors=>[:red,:pink,:purple]}],[:favorite_color,:orange]=>[{:name=>"Emily",:favorite_colors=>[:orange,:green]}]}

Chained pivots

users=[{:name=>"Sally",:age=>21},{:name=>"John",:age=>28},{:name=>"Stephen",:age=>37},{:name=>"Emily",:age=>32},{:name=>"Joe",:age=>18}]Goldmine(users).pivot("'e' in name"){ |user| !!user[:name].match(/e/i)}.pivot("21 or over"){ |user|user[:age] >=21}.to_h
{[["'e' in name",false],["21 or over",true]]=>[{:name=>"Sally",:age=>21},{:name=>"John",:age=>28}],[["'e' in name",true],["21 or over",true]]=>[{:name=>"Stephen",:age=>37},{:name=>"Emily",:age=>32}],[["'e' in name",true],["21 or over",false]]=>[{:name=>"Joe",:age=>18}]}

Rollups

Rollups provide an intuitive way to aggregate pivoted data into a report friendly format.Think computed columns.

Rollups areblocks that get executed once for each pivot entry.They can be also be chained.

list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot("< 5"){ |i|i <5}.pivot("even"){ |i|i %2 ==0}.rollup("count", &:count).to_h
{[["< 5",true],["even",false]]=>[["count",2]],[["< 5",true],["even",true]]=>[["count",2]],[["< 5",false],["even",false]]=>[["count",3]],[["< 5",false],["even",true]]=>[["count",2]]}

Rollup Caching

Rollups can be computationally expensive.Optional caching can be used to reduce this computational overhead.

list=[1,2,3,4,5,6,7,8,9]Goldmine(list,cache:true).pivot(:less_than_5){ |i|i <5}.rollup(:count, &:count).rollup(:evens){ |list|list.select{ |i|i %2 ==0}.count}.rollup(:even_percentage){ |list|cache[:evens] /cache[:count].to_f}.to_h
{[[:less_than_5,true]]=>[[:count,4],[:evens,2],[:even_percentage,0.5]],[[:less_than_5,false]]=>[[:count,5],[:evens,2],[:even_percentage,0.4]]}

Rows

It's often helpful to flatten rollups into rows.

list=[1,2,3,4,5,6,7,8,9]result=Goldmine(list,cache:true).pivot(:less_than_5){ |i|i <5}.rollup(:count, &:count).rollup(:evens){ |list|list.select{ |i|i %2 ==0}.count}.rollup(:even_percentage){ |list|cache[:evens] /cache[:count].to_f}.result
result.to_rows
[[[:less_than_5,true],[:count,4],[:evens,2],[:even_percentage,0.5]],[[:less_than_5,false],[:count,5],[:evens,2],[:even_percentage,0.4]]]
result.to_hash_rows
[{:less_than_5=>true,:count=>4,:evens=>2,:even_percentage=>0.5},{:less_than_5=>false,:count=>5,:evens=>2,:even_percentage=>0.4}]

Tabular

Rollups can also be converted into tabular format.

list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot(:less_than_5){ |i|i <5}.pivot(:even){ |i|i %2 ==0}.rollup(:count, &:count).to_tabular
[[:less_than_5,:even,:count],[true,false,2],[true,true,2],[false,false,3],[false,true,2]]

CSV

Rollups can also be converted into CSV format.

list=[1,2,3,4,5,6,7,8,9]Goldmine(list).pivot(:less_than_5){ |i|i <5}.pivot(:even){ |i|i %2 ==0}.rollup(:count, &:count).to_csv
"less_than_5,even,count\ntrue,false,2\ntrue,true,2\nfalse,false,3\nfalse,true,2\n"

Example Apps

All examples are small Sinatra apps.They are designed to help communicate Goldmine use-cases.

Setup

git clone git@github.com:hopsoft/goldmine.gitcd /path/to/goldminebundle

Uses data fromhttps://github.com/hopsoft/goldmine/blob/master/examples/new_york_wifi_hotspots/DOITT_WIFI_HOTSPOT_01_13SEPT2010.csv

In this example, we mine out the following information.

  • Total hotspots by city, zip, & area code
  • Free hotspots by city, zip, & area code
  • Paid hotspots by city, zip, & area code
  • Library hotspots by city, zip, & area code
  • Starbucks hotspots by city, zip, & area code
  • McDonalds hotspots by city, zip, & area code
ruby examples/new_york_wifi_hotspots/app.rb
curl http://localhost:3000/rawcurl http://localhost:3000/pivotedcurl http://localhost:3000/rolled_upcurl http://localhost:3000/rowscurl http://localhost:3000/tabularcurl http://localhost:3000/csv

Uses data fromhttp://dev.socrata.com/foundry/#/data.medicare.gov/aeay-dfax

In this example, we mine out the following information.

  • Total doctors by state & specialty
  • Preferred doctors by state & specialty
  • Female doctors by state & specialty
  • Male doctors by state & specialty
  • Preferred female doctors by state & specialty
  • Preferred male doctors by state & specialty
ruby examples/medicare_physician_compare/app.rb
curl http://localhost:3000/rawcurl http://localhost:3000/pivotedcurl http://localhost:3000/rolled_upcurl http://localhost:3000/rowscurl http://localhost:3000/tabularcurl http://localhost:3000/csv

Performance

The Medicare dataset is large & works well for performance testing.

My Macbook Pro yields the following benchmarks.

  • 3.1 GHz Intel Core i7
  • 16 GB 1867 MHz DDR3
100,000 Records
                      user     system      total        realpivoted           0.630000   0.030000   0.660000 (  0.670409)rolled_up         0.570000   0.030000   0.600000 (  0.626413)rows              0.010000   0.000000   0.010000 (  0.003258)tabular           0.010000   0.000000   0.010000 (  0.010110)csv               0.050000   0.000000   0.050000 (  0.057677)
1,000,000 Records
                      user     system      total        realpivoted           7.270000   0.300000   7.570000 (  8.053166)rolled_up         6.800000   0.830000   7.630000 (  8.051707)rows              0.000000   0.000000   0.000000 (  0.003934)tabular           0.010000   0.000000   0.010000 (  0.011825)csv               0.210000   0.010000   0.220000 (  0.222752)

Summary

Goldmine makes data highly malleable.It allows you to combine the power of pivots, rollups, tabular data,& csv to construct deep insights with minimal effort.

Real world use cases include:

  • Build a better understanding of database data before canonizing reports in SQL
  • Create source data for building user interfaces & data visualizations
  • Transform CSV data from one format to another

About

Extract a wealth of information from Ruby arrays

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp