Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork99
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
License
rocketlaunchr/dataframe-go
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
⭐ the project to show your appreciation.
Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet.This package is designed to be light-weight and intuitive.
1.0.0
will be tagged.
It is recommended your package manager locks to a commit id instead of the master branch directly.
- Importing from CSV, JSONL, Parquet, MySQL & PostgreSQL
- Exporting to CSV, JSONL, Excel, Parquet, MySQL & PostgreSQL
- Developer Friendly
- Flexible - Create custom Series (custom data types)
- Performant
- Interoperability withgonum package.
- pandas sub-package
- Fake data generation
- Interpolation (ForwardFill, BackwardFill, Linear, Spline, Lagrange)
- Time-series Forecasting (SES, Holt-Winters)
- Math functions
- Plotting (cross-platform)
SeeTutorial here.
go get -u github.com/rocketlaunchr/dataframe-go
import dataframe"github.com/rocketlaunchr/dataframe-go"
s1:=dataframe.NewSeriesInt64("day",nil,1,2,3,4,5,6,7,8)s2:=dataframe.NewSeriesFloat64("sales",nil,50.3,23.4,56.2,nil,nil,84.2,72,89)df:=dataframe.NewDataFrame(s1,s2)fmt.Print(df.Table())OUTPUT:+-----+-------+---------+||DAY|SALES|+-----+-------+---------+|0:|1|50.3||1:|2|23.4||2:|3|56.2||3:|4|NaN||4:|5|NaN||5:|6|84.2||6:|7|72||7:|8|89|+-----+-------+---------+|8X2|INT64|FLOAT64|+-----+-------+---------+
df.Append(nil,9,123.6)df.Append(nil,map[string]interface{}{"day":10,"sales":nil,})df.Remove(0)OUTPUT:+-----+-------+---------+||DAY|SALES|+-----+-------+---------+|0:|2|23.4||1:|3|56.2||2:|4|NaN||3:|5|NaN||4:|6|84.2||5:|7|72||6:|8|89||7:|9|123.6||8:|10|NaN|+-----+-------+---------+|9X2|INT64|FLOAT64|+-----+-------+---------+
df.UpdateRow(0,nil,map[string]interface{}{"day":3,"sales":45,})
sks:= []dataframe.SortKey{{Key:"sales",Desc:true},{Key:"day",Desc:true},}df.Sort(ctx,sks)OUTPUT:+-----+-------+---------+||DAY|SALES|+-----+-------+---------+|0:|9|123.6||1:|8|89||2:|6|84.2||3:|7|72||4:|3|56.2||5:|2|23.4||6:|10|NaN||7:|5|NaN||8:|4|NaN|+-----+-------+---------+|9X2|INT64|FLOAT64|+-----+-------+---------+
You can change the step and starting row. It may be wise to lock the DataFrame before iterating.
The returned value is a map containing the name of the series (string
) and the index of the series (int
) as keys.
iterator:=df.ValuesIterator(dataframe.ValuesOptions{0,1,true})// Don't apply read lock because we are write locking from outside.df.Lock()for {row,vals,_:=iterator()ifrow==nil {break}fmt.Println(*row,vals)}df.Unlock()OUTPUT:0map[day:10:1sales:50.31:50.3]1map[sales:23.41:23.4day:20:2]2map[day:30:3sales:56.21:56.2]3map[1:<nil>day:40:4sales:<nil>]4map[day:50:5sales:<nil>1:<nil>]5map[sales:84.21:84.2day:60:6]6map[day:70:7sales:721:72]7map[day:80:8sales:891:89]
You can easily calculate statistics for a Series using thegonum ormontanaflynn/stats package.
SeriesFloat64
andSeriesTime
provide access to the exportedValues
field to seamlessly interoperate with external math-based packages.
Some series provide easy conversion using theToSeriesFloat64
method.
import"gonum.org/v1/gonum/stat"s:=dataframe.NewSeriesInt64("random",nil,1,2,3,4,5,6,7,8)sf,_:=s.ToSeriesFloat64(ctx)
mean:=stat.Mean(sf.Values,nil)
import"github.com/montanaflynn/stats"median,_:=stats.Median(sf.Values)
std:=stat.StdDev(sf.Values,nil)
import (chart"github.com/wcharczuk/go-chart""github.com/rocketlaunchr/dataframe-go/plot"wc"github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart")sales:=dataframe.NewSeriesFloat64("sales",nil,50.3,nil,23.4,56.2,89,32,84.2,72,89)cs,_:=wc.S(ctx,sales,nil,nil)graph:= chart.Chart{Series: []chart.Series{cs}}plt,_:=plot.Open("Monthly sales",450,300)graph.Render(chart.SVG,plt)plt.Display(plot.None)<-plt.Closed
Output:
import"github.com/rocketlaunchr/dataframe-go/math/funcs"res:=24sx:=dataframe.NewSeriesFloat64("x",nil,utils.Float64Seq(1,float64(res),1))sy:=dataframe.NewSeriesFloat64("y",&dataframe.SeriesInit{Size:res})df:=dataframe.NewDataFrame(sx,sy)fn:=funcs.RegFunc("sin(2*𝜋*x/24)")funcs.Evaluate(ctx,df,fn,1)
Output:
Theimports
sub-package has support for importing csv, jsonl, parquet, and directly from a SQL database. TheDictateDataType
option can be set to specify the true underlying data type. Alternatively,InferDataTypes
option can be set.
csvStr:=`Country,Date,Age,Amount,Id"United States",2012-02-01,50,112.1,01234"United States",2012-02-01,32,321.31,54320"United Kingdom",2012-02-01,17,18.2,12345"United States",2012-02-01,32,321.31,54320"United Kingdom",2012-05-07,NA,18.2,12345"United States",2012-02-01,32,321.31,54320"United States",2012-02-01,32,321.31,54320Spain,2012-02-01,66,555.42,00241`df,err:=imports.LoadFromCSV(ctx,strings.NewReader(csvStr))OUTPUT:+-----+----------------+------------+-------+---------+-------+||COUNTRY|DATE|AGE|AMOUNT|ID|+-----+----------------+------------+-------+---------+-------+|0:|UnitedStates|2012-02-01|50|112.1|1234||1:|UnitedStates|2012-02-01|32|321.31|54320||2:|UnitedKingdom|2012-02-01|17|18.2|12345||3:|UnitedStates|2012-02-01|32|321.31|54320||4:|UnitedKingdom|2015-05-07|NaN|18.2|12345||5:|UnitedStates|2012-02-01|32|321.31|54320||6:|UnitedStates|2012-02-01|32|321.31|54320||7:|Spain|2012-02-01|66|555.42|241|+-----+----------------+------------+-------+---------+-------+|8X5|STRING|TIME|INT64|FLOAT64|INT64|+-----+----------------+------------+-------+---------+-------+
Theexports
sub-package has support for exporting to csv, jsonl, parquet, Excel and directly to a SQL database.
- If you know the number of rows in advance, you can set the capacity of the underlying slice of a series using
SeriesInit{}
. This will preallocate memory and provide speed improvements.
Out of the box, there is support forstring
,time.Time
,float64
andint64
. Automatic support exists forfloat32
and all types of integers. There is a convenience function provided for dealing withbool
. There is also support forcomplex128
inside thexseries
subpackage.
There may be times that you want to use your own custom data types. You can either implement your ownSeries
type (more performant) or use theGeneric Series (more convenient).
import"time"import"cloud.google.com/go/civil"sg:=dataframe.NewSeriesGeneric("date", civil.Date{},nil, civil.Date{2018,time.May,01}, civil.Date{2018,time.May,02}, civil.Date{2018,time.May,03})s2:=dataframe.NewSeriesFloat64("sales",nil,50.3,23.4,56.2)df:=dataframe.NewDataFrame(sg,s2)OUTPUT:+-----+------------+---------+||DATE|SALES|+-----+------------+---------+|0:|2018-05-01|50.3||1:|2018-05-02|23.4||2:|2018-05-03|56.2|+-----+------------+---------+|3X2|CIVILDATE|FLOAT64|+-----+------------+---------+
Let's create a list of 8 "fake" employees with a name, title and base hourly wage rate.
import"golang.org/x/exp/rand"import"rocketlaunchr/dataframe-go/utils/faker"src:=rand.NewSource(uint64(time.Now().UTC().UnixNano()))df:=faker.NewDataFrame(8,src,faker.S("name",0,"Name"),faker.S("title",0.5,"JobTitle"),faker.S("base rate",0,"Number",15,50))
+-----+----------------+----------------+-----------+||NAME|TITLE|BASERATE|+-----+----------------+----------------+-----------+|0:|CordiaJacobi|Consultant|42||1:|NickolasEmard|NaN|22||2:|HollisDickens|Representative|22||3:|StacyDietrich|NaN|43||4:|AleenLegros|Officer|21||5:|AdeliaMetz|Architect|18||6:|SunnyGerlach|NaN|28||7:|AustinHackett|NaN|39|+-----+----------------+----------------+-----------+|8X3|STRING|STRING|INT64|+-----+----------------+----------------+-----------+
Let's give a promotion to everyone by doubling their salary.
s:=df.Series[2]applyFn:=dataframe.ApplySeriesFn(func(valinterface{},row,nRowsint)interface{} {return2*val.(int64)})dataframe.Apply(ctx,s,applyFn, dataframe.FilterOptions{InPlace:true})
+-----+----------------+----------------+-----------+||NAME|TITLE|BASERATE|+-----+----------------+----------------+-----------+|0:|CordiaJacobi|Consultant|84||1:|NickolasEmard|NaN|44||2:|HollisDickens|Representative|44||3:|StacyDietrich|NaN|86||4:|AleenLegros|Officer|42||5:|AdeliaMetz|Architect|36||6:|SunnyGerlach|NaN|56||7:|AustinHackett|NaN|78|+-----+----------------+----------------+-----------+|8X3|STRING|STRING|INT64|+-----+----------------+----------------+-----------+
Let's inform all employees separately on sequential days.
import"rocketlaunchr/dataframe-go/utils/utime"mts,_:=utime.NewSeriesTime(ctx,"meeting time","1D",time.Now().UTC(),false, utime.NewSeriesTimeOptions{Size:&[]int{8}[0]})df.AddSeries(mts,nil)
+-----+----------------+----------------+-----------+--------------------------------+||NAME|TITLE|BASERATE|MEETINGTIME|+-----+----------------+----------------+-----------+--------------------------------+|0:|CordiaJacobi|Consultant|84|2020-02-0223:13:53.015324||||||+0000UTC||1:|NickolasEmard|NaN|44|2020-02-0323:13:53.015324||||||+0000UTC||2:|HollisDickens|Representative|44|2020-02-0423:13:53.015324||||||+0000UTC||3:|StacyDietrich|NaN|86|2020-02-0523:13:53.015324||||||+0000UTC||4:|AleenLegros|Officer|42|2020-02-0623:13:53.015324||||||+0000UTC||5:|AdeliaMetz|Architect|36|2020-02-0723:13:53.015324||||||+0000UTC||6:|SunnyGerlach|NaN|56|2020-02-0823:13:53.015324||||||+0000UTC||7:|AustinHackett|NaN|78|2020-02-0923:13:53.015324||||||+0000UTC|+-----+----------------+----------------+-----------+--------------------------------+|8X4|STRING|STRING|INT64|TIME|+-----+----------------+----------------+-----------+--------------------------------+
Let's filter out our senior employees (they have titles) for no reason.
filterFn:=dataframe.FilterDataFrameFn(func(valsmap[interface{}]interface{},row,nRowsint) (dataframe.FilterAction,error) {ifvals["title"]==nil {returndataframe.DROP,nil}returndataframe.KEEP,nil})seniors,_:=dataframe.Filter(ctx,df,filterFn)
+-----+----------------+----------------+-----------+--------------------------------+||NAME|TITLE|BASERATE|MEETINGTIME|+-----+----------------+----------------+-----------+--------------------------------+|0:|CordiaJacobi|Consultant|84|2020-02-0223:13:53.015324||||||+0000UTC||1:|HollisDickens|Representative|44|2020-02-0423:13:53.015324||||||+0000UTC||2:|AleenLegros|Officer|42|2020-02-0623:13:53.015324||||||+0000UTC||3:|AdeliaMetz|Architect|36|2020-02-0723:13:53.015324||||||+0000UTC|+-----+----------------+----------------+-----------+--------------------------------+|4X4|STRING|STRING|INT64|TIME|+-----+----------------+----------------+-----------+--------------------------------+
- awesome-svelte - Resources for killing react
- dbq - Zero boilerplate database operations for Go
- electron-alert - SweetAlert2 for Electron Applications
- google-search - Scrape google search results
- igo - A Go transpiler with cool new syntax such as fordefer (defer for for-loops)
- mysql-go - Properly cancel slow MySQL queries
- react - Build front end applications using Go
- remember-go - Cache slow database queries
- testing-go - Testing framework for unit testing
The license is a modified MIT license. Refer toLICENSE
file for more details.
© 2018-21 PJ Engineering and Business Solutions Pty. Ltd.
About
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.