- Notifications
You must be signed in to change notification settings - Fork29
A JavaScript library for working with Table Schema.
License
frictionlessdata/tableschema-js
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A library for working withTable Schema.
Tableclass for working with data and schemaSchemaclass for working with schemasFieldclass for working with schema fieldsvalidatefunction for validating schema descriptorsinferfunction that creates a schema based on a data sample
To use the library with
webpackplease replicate thewebpack.config.js->nodeconfiguration -https://github.com/frictionlessdata/tableschema-js/blob/master/webpack.config.js
The package use semantic versioning. It means that major versions could include breaking changes. It's highly recommended to specifytableschema version range in yourpackage.json file e.g.tabulator: ^1.0 which will be added by default bynpm install --save.
$ npm install tableschema
<scriptsrc="//unpkg.com/tableschema/dist/tableschema.min.js"></script>
Let's start with a simple example for Node.js:
const{Table}=require('tableschema')consttable=awaitTable.load('data.csv')awaittable.infer()// infer a schemaawaittable.read({keyed:true})// read the dataawaittable.schema.save()// save the schemaawaittable.save()// save the data
And for browser:
After the script registration the library will be available as a global variabletableschema:
<!DOCTYPE html><htmllang="en"><head><metacharset="utf-8"><title>tableschema-js</title></head><body><scriptsrc="//unpkg.com/tableschema/dist/tableschema.min.js"></script><script>constmain=async()=>{consttable=awaittableschema.Table.load('https://raw.githubusercontent.com/frictionlessdata/datapackage-js/master/data/data.csv')constrows=awaittable.read()document.body.innerHTML+=`<div>${table.headers}</div>`for(constrowofrows){document.body.innerHTML+=`<div>${row}</div>`}}main()</script></body></html>
A table is a core concept in a tabular data world. It represents data with metadata (Table Schema). Let's see how we could use it in practice.
Consider we have some local csv file. It could be inline data or remote link - all supported byTable class (except local files for in-browser usage of course). But say it'sdata.csv for now:
city,locationlondon,"51.50,-0.11"paris,"48.85,2.30"rome,N/ALet's create and read a table. We use staticTable.load method andtable.read method with akeyed option to get array of keyed rows:
consttable=awaitTable.load('data.csv')table.headers// ['city', 'location']awaittable.read({keyed:true})// [// {city: 'london', location: '51.50,-0.11'},// {city: 'paris', location: '48.85,2.30'},// {city: 'rome', location: 'N/A'},// ]
As we could see our locations are just strings. But it should be geopoints. Also Rome's location is not available but it's also just aN/A string instead of JavaScriptnull. First we have to infer Table Schema:
awaittable.infer()table.schema.descriptor// { fields:// [ { name: 'city', type: 'string', format: 'default' },// { name: 'location', type: 'geopoint', format: 'default' } ],// missingValues: [ '' ]}awaittable.read({keyed:true})// Fails with a data validation error
Let's fix not available location. There is amissingValues property in Table Schema specification. As a first try we setmissingValues toN/A intable.schema.descriptor. Schema descriptor could be changed in-place but all changes should be committed bytable.schema.commit():
table.schema.descriptor['missingValues']='N/A'table.schema.commit()table.schema.valid// falsetable.schema.errors// Error: Descriptor validation error:// Invalid type: string (expected array)// at "/missingValues" in descriptor and// at "/properties/missingValues/type" in profile
As a good citizens we've decided to check out schema descriptor validity. And it's not valid! We should use an array formissingValues property. Also don't forget to have an empty string as a missing value:
table.schema.descriptor['missingValues']=['','N/A']table.schema.commit()table.schema.valid// true
All good. It looks like we're ready to read our data again:
awaittable.read({keyed:true})// [// {city: 'london', location: [51.50,-0.11]},// {city: 'paris', location: [48.85,2.30]},// {city: 'rome', location: null},// ]
Now we see that:
- locations are arrays with numeric latitude and longitude
- Rome's location is a native JavaScript
null
And because there are no errors on data reading we could be sure that our data is valid against our schema. Let's save it:
awaittable.schema.save('schema.json')awaittable.save('data.csv')
Ourdata.csv looks the same because it has been stringified back tocsv format. But now we haveschema.json:
{"fields": [ {"name":"city","type":"string","format":"default" }, {"name":"location","type":"geopoint","format":"default" } ],"missingValues": ["","N/A" ]}If we decide to improve it even more we could update the schema file and then open it again. But now providing a schema path and iterating thru the data using Node Streams:
consttable=awaitTable.load('data.csv',{schema:'schema.json'})conststream=awaittable.iter({stream:true})stream.on('data',(row)=>{// handle row ['london', [51.50,-0.11]] etc// keyed/extended/cast supported in a stream mode too})
It was only basic introduction to theTable class. To learn more let's take a look onTable class API reference.
A model of a schema with helpful methods for working with the schema and supported data. Schema instances can be initialized with a schema source as a url to a JSON file or a JSON object. The schema is initially validated (seevalidate below). By default validation errors will be stored inschema.errors but in a strict mode it will be instantly raised.
Let's create a blank schema. It's not valid becausedescriptor.fields property is required by theTable Schema specification:
constschema=awaitSchema.load({})schema.valid// falseschema.errors// Error: Descriptor validation error:// Missing required property: fields// at "" in descriptor and// at "/required/0" in profile
To not create a schema descriptor by hands we will use aschema.infer method to infer the descriptor from given data:
schema.infer([['id','age','name'],['1','39','Paul'],['2','23','Jimmy'],['3','36','Jane'],['4','28','Judy'],])schema.valid// trueschema.descriptor//{ fields:// [ { name: 'id', type: 'integer', format: 'default' },// { name: 'age', type: 'integer', format: 'default' },// { name: 'name', type: 'string', format: 'default' } ],// missingValues: [ '' ]}
Now we have an inferred schema and it's valid. We could cast data row against our schema. We provide a string input by an output will be cast correspondingly:
schema.castRow(['5','66','Sam'])// [ 5, 66, 'Sam' ]
But if we try provide some missing value toage field cast will fail because for now only one possible missing value is an empty string. Let's update our schema:
schema.castRow(['6','N/A','Walt'])// Cast errorschema.descriptor.missingValues=['','N/A']schema.commit()schema.castRow(['6','N/A','Walt'])// [ 6, null, 'Walt' ]
We could save the schema to a local file. And we could continue the work in any time just loading it from the local file:
awaitschema.save('schema.json')constschema=awaitSchema.load('schema.json')
It was only basic introduction to theSchema class. To learn more let's take a look onSchema class API reference.
Class represents a field in the schema.
Data values can be cast to native JavaScript types. Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema.
{'name':'birthday','type':'date','format':'default','constraints':{'required':True,'minimum':'2015-05-30'}}
Following code will not raise the exception, despite the fact our date is less than minimum constraints in the field, because we do not check constraints of the field descriptor
vardateType=field.castValue('2014-05-29')
And following example will raise exception, because we set flag 'skip constraints' tofalse, and our date is less than allowed byminimum constraints of the field. Exception will be raised as well in situation of trying to cast non-date format values, or empty values
try{vardateType=field.castValue('2014-05-29',false)}catch(e){// uh oh, something went wrong}
Values that can't be cast will raise anError exception.Casting a value that doesn't meet the constraints will raise anError exception.
Available types, formats and resultant value of the cast:
| Type | Formats | Casting result |
|---|---|---|
| any | default | Any |
| array | default | Array |
| boolean | default | Boolean |
| date | default, any, <PATTERN> | Date |
| datetime | default, any, <PATTERN> | Date |
| duration | default | moment.Duration |
| geojson | default, topojson | Object |
| geopoint | default, array, object | [Number, Number] |
| integer | default | Number |
| number | default | Number |
| object | default | Object |
| string | default, uri, email, binary | String |
| time | default, any, <PATTERN> | Date |
| year | default | Number |
| yearmonth | default | [Number, Number] |
validate()validates whether aschema is a validate Table Schema accordingly to thespecifications. It doesnot validate data against a schema.
Given a schema descriptorvalidate returnsPromise with a validation object:
const{validate}=require('tableschema')const{valid, errors}=awaitvalidate('schema.json')for(consterroroferrors){// inspect Error objects}
Given data source and headersinfer will return a Table Schema as a JSON object based on the data values.
Given the data file, example.csv:
id,age,name1,39,Paul2,23,Jimmy3,36,Jane4,28,JudyCallinfer with headers and values from the datafile:
constdescriptor=awaitinfer('data.csv')
Thedescriptor variable is now a JSON object:
{fields:[{name:'id',title:'',description:'',type:'integer',format:'default'},{name:'age',title:'',description:'',type:'integer',format:'default'},{name:'name',title:'',description:'',type:'string',format:'default'}]}
Table representation
- Table
- instance
- .headers ⇒
Array.<string> - .schema ⇒
Schema - .iter(keyed, extended, cast, forceCast, relations, stream) ⇒
AsyncIterator|Stream - .read(limit) ⇒
Array.<Array>|Array.<Object> - .infer(limit) ⇒
Object - .save(target) ⇒
Boolean
- .headers ⇒
- static
- instance
Headers
Returns:Array.<string> - data source headers
Schema
Returns:Schema - table schema instance
Iterate through the table data
And emits rows cast based on table schema (async for loop).With astream flag instead of async iterator a Node stream will be returned.Data casting can be disabled.
Returns:AsyncIterator |Stream - async iterator/stream of rows:
[value1, value2]- base{header1: value1, header2: value2}- keyed[rowNumber, [header1, header2], [value1, value2]]- extendedThrows:TableSchemaErrorraises any error occurred in this process
| Param | Type | Description |
|---|---|---|
| keyed | boolean | iter keyed rows |
| extended | boolean | iter extended rows |
| cast | boolean | disable data casting if false |
| forceCast | boolean | instead of raising on the first row with cast error return an error object to replace failed row. It will allow to iterate over the whole data file even if it's not compliant to the schema. Example of output stream:[['val1', 'val2'], TableSchemaError, ['val3', 'val4'], ...] |
| relations | Object | object of foreign key references in a form of{resource1: [{field1: value1, field2: value2}, ...], ...}. If provided foreign key fields will checked and resolved to its references |
| stream | boolean | return Node Readable Stream of table rows |
Read the table data into memory
The API is the same as
table.iterhas except for:
Returns:Array.<Array> |Array.<Object> - list of rows:
[value1, value2]- base{header1: value1, header2: value2}- keyed[rowNumber, [header1, header2], [value1, value2]]- extended
| Param | Type | Description |
|---|---|---|
| limit | integer | limit of rows to read |
Infer a schema for the table.
It will infer and set Table Schema totable.schema based on table data.
Returns:Object - Table Schema descriptor
| Param | Type | Description |
|---|---|---|
| limit | number | limit rows sample size |
Save data source to file locally in CSV format with, (comma) delimiter
Returns:Boolean - true on successThrows:
TableSchemaErroran error if there is saving problem
| Param | Type | Description |
|---|---|---|
| target | string | path where to save a table data |
Table.load(source, schema, strict, headers, parserOptions) ⇒Table
Factory method to instantiateTable class.
This method is async and it should be used with await keyword or as aPromise.Ifreferences argument is provided foreign keys will be checkedon any reading operation.
Returns:Table - data table class instanceThrows:
TableSchemaErrorraises any error occurred in table creation process
| Param | Type | Description |
|---|---|---|
| source | string |Array.<Array> |Stream |function | data source (one of): - local CSV file (path) - remote CSV file (url) - array of arrays representing the rows - readable stream with CSV file contents - function returning readable stream with CSV file contents |
| schema | string |Object | data schema in all forms supported bySchema class |
| strict | boolean | strictness option to pass toSchema constructor |
| headers | number |Array.<string> | data source headers (one of): - row number containing headers (source should contain headers rows) - array of headers (source should NOT contain headers rows) |
| parserOptions | Object | options to be used by CSV parser. All options listed athttps://csv.js.org/parse/options/. By defaultltrim is true according to the CSV Dialect spec. |
Schema representation
- Schema
- instance
- .valid ⇒
Boolean - .errors ⇒
Array.<Error> - .descriptor ⇒
Object - .primaryKey ⇒
Array.<string> - .foreignKeys ⇒
Array.<Object> - .fields ⇒
Array.<Field> - .fieldNames ⇒
Array.<string> - .getField(fieldName) ⇒
Field|null - .addField(descriptor) ⇒
Field - .removeField(name) ⇒
Field|null - .castRow(row, failFalst) ⇒
Array.<Array> - .infer(rows, headers) ⇒
Object - .commit(strict) ⇒
Boolean - .save(target) ⇒
boolean
- .valid ⇒
- static
- instance
Validation status
It alwaystrue in strict mode.
Returns:Boolean - returns validation status
Validation errors
It always empty in strict mode.
Returns:Array.<Error> - returns validation errors
Descriptor
Returns:Object - schema descriptor
Primary Key
Returns:Array.<string> - schema primary key
Foreign Keys
Returns:Array.<Object> - schema foreign keys
Fields
Returns:Array.<Field> - schema fields
Field names
Returns:Array.<string> - schema field names
Return a field
Returns:Field |null - field instance if exists
| Param | Type |
|---|---|
| fieldName | string |
Add a field
Returns:Field - added field instance
| Param | Type |
|---|---|
| descriptor | Object |
Remove a field
Returns:Field |null - removed field instance if exists
| Param | Type |
|---|---|
| name | string |
Cast row based on field types and formats.
Returns:Array.<Array> - cast data row
| Param | Type | Description |
|---|---|---|
| row | Array.<Array> | data row as an array of values |
| failFalst | boolean |
Infer and setschema.descriptor based on data sample.
Returns:Object - Table Schema descriptor
| Param | Type | Description |
|---|---|---|
| rows | Array.<Array> | array of arrays representing rows |
| headers | integer |Array.<string> | data sample headers (one of): - row number containing headers (rows should contain headers rows) - array of headers (rows should NOT contain headers rows) - defaults to 1 |
Update schema instance if there are in-place changes in the descriptor.
Returns:Boolean - returns true on success and false if not modifiedThrows:
TableSchemaErrorraises any error occurred in the process
| Param | Type | Description |
|---|---|---|
| strict | boolean | alterstrict mode for further work |
Example
constdescriptor={fields:[{name:'field',type:'string'}]}constschema=awaitSchema.load(descriptor)schema.getField('name').type// stringschema.descriptor.fields[0].type='number'schema.getField('name').type// stringschema.commit()schema.getField('name').type// number
Save schema descriptor to target destination.
Returns:boolean - returns true on successThrows:
TableSchemaErrorraises any error occurred in the process
| Param | Type | Description |
|---|---|---|
| target | string | path where to save a descriptor |
Schema.load(descriptor, strict) ⇒Schema
Factory method to instantiateSchema class.
This method is async and it should be used with await keyword or as aPromise.
Returns:Schema - returns schema class instanceThrows:
TableSchemaErrorraises any error occurred in the process
| Param | Type | Description |
|---|---|---|
| descriptor | string |Object | schema descriptor: - local path - remote url - object |
| strict | boolean | flag to alter validation behaviour: - if false error will not be raised and all error will be collected inschema.errors - if strict is true any validation error will be raised immediately |
Field representation
- Field
- new Field(descriptor, missingValues)
- .name ⇒
string - .type ⇒
string - .format ⇒
string - .required ⇒
boolean - .constraints ⇒
Object - .descriptor ⇒
Object - .castValue(value, constraints) ⇒
any - .testValue(value, constraints) ⇒
boolean
Constructor to instantiateField class.
Returns:Field - returns field class instanceThrows:
TableSchemaErrorraises any error occured in the process
| Param | Type | Description |
|---|---|---|
| descriptor | Object | schema field descriptor |
| missingValues | Array.<string> | an array with string representing missing values |
Field name
Field type
Field format
Return true if field is required
Field constraints
Field descriptor
Cast value
Returns:any - cast value
| Param | Type | Description |
|---|---|---|
| value | any | value to cast |
| constraints | Object |false |
Check if value can be cast
| Param | Type | Description |
|---|---|---|
| value | any | value to test |
| constraints | Object |false |
This function is async so it has to be used withawait keyword or as aPromise.
Returns:Object - returns{valid, errors} object
| Param | Type | Description |
|---|---|---|
| descriptor | string |Object | schema descriptor (one of): - local path - remote url - object |
This function is async so it has to be used withawait keyword or as aPromise.
Returns:Object - returns schema descriptorThrows:
TableSchemaErrorraises any error occured in the process
| Param | Type | Description |
|---|---|---|
| source | string |Array.<Array> |Stream |function | data source (one of): - local CSV file (path) - remote CSV file (url) - array of arrays representing the rows - readable stream with CSV file contents - function returning readable stream with CSV file contents |
| headers | Array.<string> | array of headers |
| options | Object | anyTable.load options |
Base class for the all DataPackage/TableSchema errors.
If there are more than one error you could get an additional informationfrom the error object:
try{// some lib action}catch(error){console.log(error)// you have N cast errors (see error.errors)if(error.multiple){for(consterroroferror.errors){console.log(error)// cast error M is ...}}}
- DataPackageError
- new DataPackageError(message, errors)
- .multiple ⇒
boolean - .errors ⇒
Array.<Error>
Create an error
| Param | Type | Description |
|---|---|---|
| message | string | |
| errors | Array.<Error> | nested errors |
Whether it's nested
List of errors
Base class for the all TableSchema errors.
The project follows theOpen Knowledge International coding standards. There are common commands to work with the project:
$ npm install$ npm runtest$ npm run buildHere described only breaking and the most important changes. The full changelog and documentation for all released versions could be found in nicely formattedcommit history.
- Added support for infinite numbers: NaN, INF, -INF
- Improved data/time validation using a conversion table and moment.js (#170)
- Rebased on csv-parse@4
Fix bug:
- URI format must have the scheme protocol to be valid (#135)
Improved behaviour:
- Automatically detect the CSV delimiter if one isn't explicit set
New API added:
- added
forceCastflag to the thetable.iter/readmethods
Improved behaviour:
- improved validation of
stringandgeojsontypes - added heuristics to the
inferfunction
New API added:
- added
formatoption to theTableconstructor - added
encodingoption to theTableconstructor
Improved behaviour:
- Now the
inferfunctions support formats inferring
New API added:
error.rowNumberif availableerror.columnNumberif available
New API added:
Table.loadandinfernow accept Node Stream as asourceargument
New API added:
Table.loadandinfernow acceptsparserOptions
This version includes various big changes, including a move to asynchronous inference.
First stable version of the library.
About
A JavaScript library for working with Table Schema.
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.