frictionlessdata/datapackage-goPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star23

A Go library for working with Data Package.

License

MIT license

23 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github		.github
clone		clone
datapackage		datapackage
examples		examples
validator		validator
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LEAD.md		LEAD.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
doc.go		doc.go
go.mod		go.mod
go.sum		go.sum

Repository files navigation

datapackage-go

A Go library for working withData Packages.

datapackage-go
- Install
- Main Features

Install

$ go get -u github.com/frictionlessdata/datapackage-go/...

Main Features

Loading and validating tabular data package descriptors

Adata package is a collection ofresources. Thedatapackage.Package provides various capabilities like loading local or remote data package, saving a data package descriptor and many more.

Consider we have some local csv file and a JSON descriptor in adata directory:

data/population.csv

city,year,populationlondon,2017,8780000paris,2017,2240000rome,2017,2860000

data/datapackage.json

{"name":"world","resources": [      {"name":"population","path":"population.csv","profile":"tabular-data-resource","schema": {"fields": [            {"name":"city","type":"string"},            {"name":"year","type":"integer"},            {"name":"population","type":"integer"}          ]        }      }    ]  }

Let's create a data package based on this data using thedatapackage.Package class:

pkg,err:=datapackage.Load("data/datapackage.json")// Check error.

Accessing data package resources

Once the data package is loaded, we could use thedatapackage.Resource class to read data resource's contents:

resource:=pkg.GetResource("population")contents,_:=resource.ReadAll()fmt.Println(contents)// [[london 2017 8780000] [paris 2017 2240000] [rome 20172860000]]

Or you could cast to Go types, making it easier to perform further processing:

typePopulationstruct {Citystring`tableheader:"city"`Yearstring`tableheader:"year"`Populationint`tableheader:"population"`}varcities []Populationresource.Cast(&cities,csv.LoadHeaders())fmt.Printf("+v",cities)// [{City:london Year:2017 Population:8780000} {City:paris Year:2017 Population:2240000} {City:rome Year:2017 Population:2860000}]

If the data is to big to be loaded at once or if you would like to perform line-by-line processing, you could iterate through the resource contents:

iter,_:=resource.Iter(csv.LoadHeaders())sch,_:=resource.GetSchema()foriter.Next() {varpPopulationsch.CastRow(iter.Row(),&cp)fmt.Printf("%+v\n",p)}// {City:london Year:2017 Population:8780000// {City:paris Year:2017 Population:2240000}// {City:rome Year:2017 Population:2860000}]

Or you might want to process specific columns, for instance to perform an statical analysis:

varpopulation []float64resource.CastColumn("population",&population,csv.LoadHeaders())fmt.Println(ages)// Output: [8780000 2240000 2860000]

Loading zip bundles

It is very common to store the data in zip bundles containing the descriptor and data files. Those are natively supported by our thedatapackage.Load method. For example, lets say we have the followingpackage.zip bundle:

|- package.zip    |- datapackage.json    |- data.csv

We could load this package by simply:

pkg,err:=datapackage.Load("package.zip")// Check error.

And the library will unzip the package contents to a temporary directory and wire everything up for us.

A complete example can be foundhere.

Creating a zip bundle with the data package.

You could also easily create a zip file containing the descriptor and all the data resources. Let's say you have adatapackage.Package instance, to create a zip file containing all resources simply:

err:=pkg.Zip("package.zip")// Check error.

This call also download remote resources. A complete example can be foundhere

CSV dialect support

Basic support for configuringCSV dialect has been added. In particulardelimiter,skipInitialSpace andheader fields are supported. For instance, lets assume the population file has a different field delimiter:

data/population.csv

city,year,populationlondon;2017;8780000paris;2017;2240000rome;2017;2860000

One could easily parse by adding followingdialect property to theworld resource:

"dialect":{"delimiter":";"    }

A complete example can be foundhere.

Loading multipart resources

Sometimes you have data scattered across many local or remote files. Datapackage-go offers an easy way you to deal all those file as one bigfile. We call it multipart resources. To use this feature, simply list your files in thepath property of the resource. For example, letssay our population data is now split between north and south hemispheres. To deal with this, we only need change to change the package descriptor:

data/datapackage.json

{"name":"world","resources": [      {"name":"population","path": ["north.csv","south.csv"],"profile":"tabular-data-resource","schema": {"fields": [            {"name":"city","type":"string"},            {"name":"year","type":"integer"},            {"name":"population","type":"integer"}          ]        }      }    ]  }

And all the rest of the code would still be working.

A complete example can be foundhere.

Loading non-tabular resources

AData package is a container format used to describe and package a collection of data. Even though there is additional support for dealing with tabular resources, it can be used to package any kind of data.

For instance, lets say an user needs to load JSON-LD information along with some tabular data (for more on this use case, please take a look atthis issue). That can be packed together in a data package descriptor:

{"name":"carp-lake","title":"Carp Lake Title","description":"Tephra and Lithology from Carp Lake","resources": [      {"name":"data","path":"data/carpLakeCoreStratigraphy.csv","format":"csv","schema": {"fields": [            {"name":"depth","type":"number"},            {"name":"notes","type":"text"},            {"name":"core_segments","type":"text"}          ]        }      },      {"name":"schemaorg","path":"data/schemaorg-ld.json","format":"application/ld+json"      }    ]}

The package loading proceeds as usual.

pkg,err:=datapackage.Load("data/datapackage.json")// Check error.

Once the data package is loaded, we could use theResource.RawRead method to accessschemaorg resource contents as a byte slice.

so:=pkg.GetResource("schemaorg")rc,_:=so.RawRead()deferrc.Close()contents,_:=ioutil.ReadAll(rc)// Use contents. For instance, one could validate the JSON-LD schema and unmarshal it into a data structure.data:=pkg.GetResource("data")dataContents,err:=data.ReadAll()// As data is a tabular resource, its content can be loaded as [][]string.

Manipulating data packages programatically

The datapackage-go library also makes it easy to save packages. Let's say you're creating a program that produces data packages and would like to add or remove resource:

descriptor:=map[string]interface{}{"resources": []interface{}{map[string]interface{}{"name":"books","path":"books.csv","format":"csv","profile":"tabular-data-resource","schema":map[string]interface{}{"fields": []interface{}{map[string]interface{}{"name":"author","type":"string"},map[string]interface{}{"name":"title","type":"string"},map[string]interface{}{"name":"year","type":"integer"},                },            },        },    },}pkg,err:=datapackage.New(descriptor,".",validator.InMemoryLoader())iferr!=nil {panic(err)}// Removing resource.pkg.RemoveResource("books")// Adding new resource.pkg.AddResource(map[string]interface{}{"name":"cities","path":"cities.csv","format":"csv","profile":"tabular-data-resource","schema":map[string]interface{}{"fields": []interface{}{map[string]interface{}{"name":"city","type":"string"},map[string]interface{}{"name":"year","type":"integer"},map[string]interface{}{"name":"population","type":"integer"}        },    },})// Printing resource contents.cities,_:=pkg.GetResource("cities").ReadAll()fmt.Println(cities)// [[london 2017 8780000] [paris 2017 2240000] [rome 20172860000]]

About

A Go library for working with Data Package.

Code of conduct

Contributing

Activity

Custom properties

Stars

23 stars

Watchers

5 watching

Forks

5 forks

Report repository

Releases10

v1.0.4 Latest

Jul 19, 2022

+ 9 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

datapackage-go

Install

Main Features

Loading and validating tabular data package descriptors

Accessing data package resources

Loading zip bundles

Creating a zip bundle with the data package.

CSV dialect support

Loading multipart resources

Loading non-tabular resources

Manipulating data packages programatically

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases10

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

frictionlessdata/datapackage-go

Folders and files

Latest commit

History

Repository files navigation

datapackage-go

Install

Main Features

Loading and validating tabular data package descriptors

Accessing data package resources

Loading zip bundles

Creating a zip bundle with the data package.

CSV dialect support

Loading multipart resources

Loading non-tabular resources

Manipulating data packages programatically

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases10

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages