Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

License

NotificationsYou must be signed in to change notification settings

rocketlaunchr/dataframe-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⭐   the project to show your appreciation.↗️

dataframe-go

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet.This package is designed to be light-weight and intuitive.

⚠️ The package is production ready but the API is not stable yet. Once Go 1.18 (Generics) is introduced, theENTIRE package will be rewritten. For example, there will only be 1 generic Series type. After that, version1.0.0 will be tagged.

It is recommended your package manager locks to a commit id instead of the master branch directly.⚠️

Features

  1. Importing from CSV, JSONL, Parquet, MySQL & PostgreSQL
  2. Exporting to CSV, JSONL, Excel, Parquet, MySQL & PostgreSQL
  3. Developer Friendly
  4. Flexible - Create custom Series (custom data types)
  5. Performant
  6. Interoperability withgonum package.
  7. pandas sub-packageHelp Required
  8. Fake data generation
  9. Interpolation (ForwardFill, BackwardFill, Linear, Spline, Lagrange)
  10. Time-series Forecasting (SES, Holt-Winters)
  11. Math functions
  12. Plotting (cross-platform)

SeeTutorial here.

Installation

go get -u github.com/rocketlaunchr/dataframe-go
import dataframe"github.com/rocketlaunchr/dataframe-go"

DataFrames

Creating a DataFrame

s1:=dataframe.NewSeriesInt64("day",nil,1,2,3,4,5,6,7,8)s2:=dataframe.NewSeriesFloat64("sales",nil,50.3,23.4,56.2,nil,nil,84.2,72,89)df:=dataframe.NewDataFrame(s1,s2)fmt.Print(df.Table())OUTPUT:+-----+-------+---------+||DAY|SALES|+-----+-------+---------+|0:|1|50.3||1:|2|23.4||2:|3|56.2||3:|4|NaN||4:|5|NaN||5:|6|84.2||6:|7|72||7:|8|89|+-----+-------+---------+|8X2|INT64|FLOAT64|+-----+-------+---------+

Go Playground

Insert and Remove Row

df.Append(nil,9,123.6)df.Append(nil,map[string]interface{}{"day":10,"sales":nil,})df.Remove(0)OUTPUT:+-----+-------+---------+||DAY|SALES|+-----+-------+---------+|0:|2|23.4||1:|3|56.2||2:|4|NaN||3:|5|NaN||4:|6|84.2||5:|7|72||6:|8|89||7:|9|123.6||8:|10|NaN|+-----+-------+---------+|9X2|INT64|FLOAT64|+-----+-------+---------+

Go Playground

Update Row

df.UpdateRow(0,nil,map[string]interface{}{"day":3,"sales":45,})

Sorting

sks:= []dataframe.SortKey{{Key:"sales",Desc:true},{Key:"day",Desc:true},}df.Sort(ctx,sks)OUTPUT:+-----+-------+---------+||DAY|SALES|+-----+-------+---------+|0:|9|123.6||1:|8|89||2:|6|84.2||3:|7|72||4:|3|56.2||5:|2|23.4||6:|10|NaN||7:|5|NaN||8:|4|NaN|+-----+-------+---------+|9X2|INT64|FLOAT64|+-----+-------+---------+

Go Playground

Iterating

You can change the step and starting row. It may be wise to lock the DataFrame before iterating.

The returned value is a map containing the name of the series (string) and the index of the series (int) as keys.

iterator:=df.ValuesIterator(dataframe.ValuesOptions{0,1,true})// Don't apply read lock because we are write locking from outside.df.Lock()for {row,vals,_:=iterator()ifrow==nil {break}fmt.Println(*row,vals)}df.Unlock()OUTPUT:0map[day:10:1sales:50.31:50.3]1map[sales:23.41:23.4day:20:2]2map[day:30:3sales:56.21:56.2]3map[1:<nil>day:40:4sales:<nil>]4map[day:50:5sales:<nil>1:<nil>]5map[sales:84.21:84.2day:60:6]6map[day:70:7sales:721:72]7map[day:80:8sales:891:89]

Go Playground

Statistics

You can easily calculate statistics for a Series using thegonum ormontanaflynn/stats package.

SeriesFloat64 andSeriesTime provide access to the exportedValues field to seamlessly interoperate with external math-based packages.

Example

Some series provide easy conversion using theToSeriesFloat64 method.

import"gonum.org/v1/gonum/stat"s:=dataframe.NewSeriesInt64("random",nil,1,2,3,4,5,6,7,8)sf,_:=s.ToSeriesFloat64(ctx)

Mean

mean:=stat.Mean(sf.Values,nil)

Median

import"github.com/montanaflynn/stats"median,_:=stats.Median(sf.Values)

Standard Deviation

std:=stat.StdDev(sf.Values,nil)

Plotting (cross-platform)

import (chart"github.com/wcharczuk/go-chart""github.com/rocketlaunchr/dataframe-go/plot"wc"github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart")sales:=dataframe.NewSeriesFloat64("sales",nil,50.3,nil,23.4,56.2,89,32,84.2,72,89)cs,_:=wc.S(ctx,sales,nil,nil)graph:= chart.Chart{Series: []chart.Series{cs}}plt,_:=plot.Open("Monthly sales",450,300)graph.Render(chart.SVG,plt)plt.Display(plot.None)<-plt.Closed

Output:

plot

Math Functions

import"github.com/rocketlaunchr/dataframe-go/math/funcs"res:=24sx:=dataframe.NewSeriesFloat64("x",nil,utils.Float64Seq(1,float64(res),1))sy:=dataframe.NewSeriesFloat64("y",&dataframe.SeriesInit{Size:res})df:=dataframe.NewDataFrame(sx,sy)fn:=funcs.RegFunc("sin(2*𝜋*x/24)")funcs.Evaluate(ctx,df,fn,1)

Go Playground

Output:

sine wave

Importing Data

Theimports sub-package has support for importing csv, jsonl, parquet, and directly from a SQL database. TheDictateDataType option can be set to specify the true underlying data type. Alternatively,InferDataTypes option can be set.

CSV

csvStr:=`Country,Date,Age,Amount,Id"United States",2012-02-01,50,112.1,01234"United States",2012-02-01,32,321.31,54320"United Kingdom",2012-02-01,17,18.2,12345"United States",2012-02-01,32,321.31,54320"United Kingdom",2012-05-07,NA,18.2,12345"United States",2012-02-01,32,321.31,54320"United States",2012-02-01,32,321.31,54320Spain,2012-02-01,66,555.42,00241`df,err:=imports.LoadFromCSV(ctx,strings.NewReader(csvStr))OUTPUT:+-----+----------------+------------+-------+---------+-------+||COUNTRY|DATE|AGE|AMOUNT|ID|+-----+----------------+------------+-------+---------+-------+|0:|UnitedStates|2012-02-01|50|112.1|1234||1:|UnitedStates|2012-02-01|32|321.31|54320||2:|UnitedKingdom|2012-02-01|17|18.2|12345||3:|UnitedStates|2012-02-01|32|321.31|54320||4:|UnitedKingdom|2015-05-07|NaN|18.2|12345||5:|UnitedStates|2012-02-01|32|321.31|54320||6:|UnitedStates|2012-02-01|32|321.31|54320||7:|Spain|2012-02-01|66|555.42|241|+-----+----------------+------------+-------+---------+-------+|8X5|STRING|TIME|INT64|FLOAT64|INT64|+-----+----------------+------------+-------+---------+-------+

Go Playground

Exporting Data

Theexports sub-package has support for exporting to csv, jsonl, parquet, Excel and directly to a SQL database.

Optimizations

  • If you know the number of rows in advance, you can set the capacity of the underlying slice of a series usingSeriesInit{}. This will preallocate memory and provide speed improvements.

Generic Series

Out of the box, there is support forstring,time.Time,float64 andint64. Automatic support exists forfloat32 and all types of integers. There is a convenience function provided for dealing withbool. There is also support forcomplex128 inside thexseries subpackage.

There may be times that you want to use your own custom data types. You can either implement your ownSeries type (more performant) or use theGeneric Series (more convenient).

civil.Date

import"time"import"cloud.google.com/go/civil"sg:=dataframe.NewSeriesGeneric("date", civil.Date{},nil, civil.Date{2018,time.May,01}, civil.Date{2018,time.May,02}, civil.Date{2018,time.May,03})s2:=dataframe.NewSeriesFloat64("sales",nil,50.3,23.4,56.2)df:=dataframe.NewDataFrame(sg,s2)OUTPUT:+-----+------------+---------+||DATE|SALES|+-----+------------+---------+|0:|2018-05-01|50.3||1:|2018-05-02|23.4||2:|2018-05-03|56.2|+-----+------------+---------+|3X2|CIVILDATE|FLOAT64|+-----+------------+---------+

Tutorial

Create some fake data

Let's create a list of 8 "fake" employees with a name, title and base hourly wage rate.

import"golang.org/x/exp/rand"import"rocketlaunchr/dataframe-go/utils/faker"src:=rand.NewSource(uint64(time.Now().UTC().UnixNano()))df:=faker.NewDataFrame(8,src,faker.S("name",0,"Name"),faker.S("title",0.5,"JobTitle"),faker.S("base rate",0,"Number",15,50))
+-----+----------------+----------------+-----------+||NAME|TITLE|BASERATE|+-----+----------------+----------------+-----------+|0:|CordiaJacobi|Consultant|42||1:|NickolasEmard|NaN|22||2:|HollisDickens|Representative|22||3:|StacyDietrich|NaN|43||4:|AleenLegros|Officer|21||5:|AdeliaMetz|Architect|18||6:|SunnyGerlach|NaN|28||7:|AustinHackett|NaN|39|+-----+----------------+----------------+-----------+|8X3|STRING|STRING|INT64|+-----+----------------+----------------+-----------+

Apply Function

Let's give a promotion to everyone by doubling their salary.

s:=df.Series[2]applyFn:=dataframe.ApplySeriesFn(func(valinterface{},row,nRowsint)interface{} {return2*val.(int64)})dataframe.Apply(ctx,s,applyFn, dataframe.FilterOptions{InPlace:true})
+-----+----------------+----------------+-----------+||NAME|TITLE|BASERATE|+-----+----------------+----------------+-----------+|0:|CordiaJacobi|Consultant|84||1:|NickolasEmard|NaN|44||2:|HollisDickens|Representative|44||3:|StacyDietrich|NaN|86||4:|AleenLegros|Officer|42||5:|AdeliaMetz|Architect|36||6:|SunnyGerlach|NaN|56||7:|AustinHackett|NaN|78|+-----+----------------+----------------+-----------+|8X3|STRING|STRING|INT64|+-----+----------------+----------------+-----------+

Create a Time series

Let's inform all employees separately on sequential days.

import"rocketlaunchr/dataframe-go/utils/utime"mts,_:=utime.NewSeriesTime(ctx,"meeting time","1D",time.Now().UTC(),false, utime.NewSeriesTimeOptions{Size:&[]int{8}[0]})df.AddSeries(mts,nil)
+-----+----------------+----------------+-----------+--------------------------------+||NAME|TITLE|BASERATE|MEETINGTIME|+-----+----------------+----------------+-----------+--------------------------------+|0:|CordiaJacobi|Consultant|84|2020-02-0223:13:53.015324||||||+0000UTC||1:|NickolasEmard|NaN|44|2020-02-0323:13:53.015324||||||+0000UTC||2:|HollisDickens|Representative|44|2020-02-0423:13:53.015324||||||+0000UTC||3:|StacyDietrich|NaN|86|2020-02-0523:13:53.015324||||||+0000UTC||4:|AleenLegros|Officer|42|2020-02-0623:13:53.015324||||||+0000UTC||5:|AdeliaMetz|Architect|36|2020-02-0723:13:53.015324||||||+0000UTC||6:|SunnyGerlach|NaN|56|2020-02-0823:13:53.015324||||||+0000UTC||7:|AustinHackett|NaN|78|2020-02-0923:13:53.015324||||||+0000UTC|+-----+----------------+----------------+-----------+--------------------------------+|8X4|STRING|STRING|INT64|TIME|+-----+----------------+----------------+-----------+--------------------------------+

Filtering

Let's filter out our senior employees (they have titles) for no reason.

filterFn:=dataframe.FilterDataFrameFn(func(valsmap[interface{}]interface{},row,nRowsint) (dataframe.FilterAction,error) {ifvals["title"]==nil {returndataframe.DROP,nil}returndataframe.KEEP,nil})seniors,_:=dataframe.Filter(ctx,df,filterFn)
+-----+----------------+----------------+-----------+--------------------------------+||NAME|TITLE|BASERATE|MEETINGTIME|+-----+----------------+----------------+-----------+--------------------------------+|0:|CordiaJacobi|Consultant|84|2020-02-0223:13:53.015324||||||+0000UTC||1:|HollisDickens|Representative|44|2020-02-0423:13:53.015324||||||+0000UTC||2:|AleenLegros|Officer|42|2020-02-0623:13:53.015324||||||+0000UTC||3:|AdeliaMetz|Architect|36|2020-02-0723:13:53.015324||||||+0000UTC|+-----+----------------+----------------+-----------+--------------------------------+|4X4|STRING|STRING|INT64|TIME|+-----+----------------+----------------+-----------+--------------------------------+

Other useful packages

  • awesome-svelte - Resources for killing react
  • dbq - Zero boilerplate database operations for Go
  • electron-alert - SweetAlert2 for Electron Applications
  • google-search - Scrape google search results
  • igo - A Go transpiler with cool new syntax such as fordefer (defer for for-loops)
  • mysql-go - Properly cancel slow MySQL queries
  • react - Build front end applications using Go
  • remember-go - Cache slow database queries
  • testing-go - Testing framework for unit testing

Legal Information

The license is a modified MIT license. Refer toLICENSE file for more details.

© 2018-21 PJ Engineering and Business Solutions Pty. Ltd.

About

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp