- Notifications
You must be signed in to change notification settings - Fork67
Query processing and transformation of array-backed data tables.
License
uwdata/arquero
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Arquero is a JavaScript library for query processing and transformation of array-backed data tables. Following therelational algebra and inspired by the design ofdplyr, Arquero provides a fluent API for manipulating column-oriented data frames. Arquero supports a range of data transformation tasks, including filter, sample, aggregation, window, join, and reshaping operations.
- Fast: process data tables with million+ rows.
- Flexible: query over arrays, typed arrays, array-like objects, orApache Arrow columns.
- Full-Featured: perform a variety of wrangling and analysis tasks.
- Extensible: add new column types or functions, including aggregate & window operations.
- Lightweight: small size, minimal dependencies.
To get up and running, start with theIntroducing Arquero tutorial, part of theArquero notebook collection.
Have a question or need help? Visit theArquero GitHub repo or post to theArquero GitHub Discussions board.
Arquero is Spanish for "archer": if datasets arearrows, Arquero helps their aim stay true. 🏹 Arquero also refers to a goalkeeper: safeguard your data from analytic "own goals"! 🥅 ✋ ⚽
- Top-Level API - All methods in the top-level Arquero namespace.
- Table - Table access and output methods.
- Verbs - Table transformation verbs.
- Op Functions - All functions, including aggregate and window functions.
- Expressions - Parsing and generation of table expressions.
- Extensibility - Extend Arquero with new expression functions or table verbs.
The core abstractions in Arquero aredata tables, which model each column as an array of values, andverbs that transform data and return new tables. Verbs are table methods, allowing method chaining for multi-step transformations. Though each table is unique, many verbs reuse the underlying columns to limit duplication.
import{all,desc,op,table}from'arquero';// Average hours of sunshine per month, from https://usclimatedata.com/.constdt=table({'Seattle':[69,108,178,207,253,268,312,281,221,142,72,52],'Chicago':[135,136,187,215,281,311,318,283,226,193,113,106],'San Francisco':[165,182,251,281,314,330,300,272,267,243,189,156]});// Sorted differences between Seattle and Chicago.// Table expressions use arrow function syntax.dt.derive({month:d=>op.row_number(),diff:d=>d.Seattle-d.Chicago}).select('month','diff').orderby(desc('diff')).print();// Is Seattle more correlated with San Francisco or Chicago?// Operations accept column name strings outside a function context.dt.rollup({corr_sf:op.corr('Seattle','San Francisco'),corr_chi:op.corr('Seattle','Chicago')}).print();// Aggregate statistics per city, as output objects.// Reshape (fold) the data to a two column layout: city, sun.dt.fold(all(),{as:['city','sun']}).groupby('city').rollup({min:d=>op.min(d.sun),// functional form of op.min('sun')max:d=>op.max(d.sun),avg:d=>op.average(d.sun),med:d=>op.median(d.sun),// functional forms permit flexible table expressionsskew:({sun:s})=>(op.mean(s)-op.median(s))/op.stdev(s)||0}).objects()
To use in the browser, you can load Arquero from a content delivery network:
<scriptsrc="https://cdn.jsdelivr.net/npm/arquero@latest"></script>
Arquero will be imported into theaq
global object. The default browser bundle does not include theApache Arrow library. To perform Arrow encoding usingtoArrow() or binary file loading usingloadArrow(), import Apache Arrow first:
<scriptsrc="https://cdn.jsdelivr.net/npm/apache-arrow@latest"></script><scriptsrc="https://cdn.jsdelivr.net/npm/arquero@latest"></script>
Alternatively, you can build and importarquero.min.js
from thedist
directory, or build your own application bundle. When building custom application bundles for the browser, the module bundler should draw from thebrowser
property of Arquero'spackage.json
file. For example, if usingrollup, pass thebrowser: true
option to thenode-resolve plugin.
Arquero uses modern JavaScript features, and so will not work with some outdated browsers. To use Arquero with older browsers including Internet Explorer, set up your project with a transpiler such asBabel.
First installarquero
as a dependency, for example vianpm install arquero --save
.Arquero assumes Node version 18 or higher.As of Arquero version 6, the library uses typemodule
and should be loaded using ES module syntax.
Import using ES module syntax, import all exports into a single object:
import*asaqfrom'arquero';
Import using ES module syntax, with targeted imports:
import{op,table}from'arquero';
Dynamic import (e.g., within a Node.js REPL):
aq=awaitimport('arquero');
To build and develop Arquero locally:
- Clonehttps://github.com/uwdata/arquero.
- Run
npm i
to install dependencies. - Run
npm test
to run test cases,npm run perf
to run performance benchmarks, andnpm run build
to build output files.
About
Query processing and transformation of array-backed data tables.