pyarrow.TableGroupBy#

classpyarrow.TableGroupBy(table,keys,use_threads=True)#

Bases:object

A grouping of columns in a table on which to perform aggregations.

Parameters:
tablepyarrow.Table

Input table to execute the aggregation on.

keysstr orlist[str]

Name of the grouped columns.

use_threadsbool, defaultTrue

Whether to use multithreading or not. When set to True (the default),no stable ordering of the output is guaranteed.

Examples

>>>importpyarrowaspa>>>t=pa.table([...pa.array(["a","a","b","b","c"]),...pa.array([1,2,3,4,5]),...],names=["keys","values"])

Grouping of columns:

>>>pa.TableGroupBy(t,"keys")<pyarrow.lib.TableGroupBy object at ...>

Perform aggregations:

>>>pa.TableGroupBy(t,"keys").aggregate([("values","sum")])pyarrow.Tablekeys: stringvalues_sum: int64----keys: [["a","b","c"]]values_sum: [[3,7,5]]
__init__(self,table,keys,use_threads=True)#

Methods

__init__(self, table, keys[, use_threads])

aggregate(self, aggregations)

Perform an aggregation over the grouped columns of the table.

aggregate(self,aggregations)#

Perform an aggregation over the grouped columns of the table.

Parameters:
aggregationslist[tuple(str,str)] orlist[tuple(str,str,FunctionOptions)]

List of tuples, where each tuple is one aggregation specificationand consists of: aggregation column name followedby function name and optionally aggregation function option.Pass empty list to get a single row for each group.The column name can be a string, an empty list or a list ofcolumn names, for unary, nullary and n-ary aggregation functionsrespectively.

For the list of function names and respective aggregationfunction options seeGrouped Aggregations.

Returns:
Table

Results of the aggregation functions.

Examples

>>>importpyarrowaspa>>>t=pa.table([...pa.array(["a","a","b","b","c"]),...pa.array([1,2,3,4,5]),...],names=["keys","values"])

Sum the column “values” over the grouped column “keys”:

>>>t.group_by("keys").aggregate([("values","sum")])pyarrow.Tablekeys: stringvalues_sum: int64----keys: [["a","b","c"]]values_sum: [[3,7,5]]

Count the rows over the grouped column “keys”:

>>>t.group_by("keys").aggregate([([],"count_all")])pyarrow.Tablekeys: stringcount_all: int64----keys: [["a","b","c"]]count_all: [[2,2,1]]

Do multiple aggregations:

>>>t.group_by("keys").aggregate([...("values","sum"),...("keys","count")...])pyarrow.Tablekeys: stringvalues_sum: int64keys_count: int64----keys: [["a","b","c"]]values_sum: [[3,7,5]]keys_count: [[2,2,1]]

Count the number of non-null values for column “values”over the grouped column “keys”:

>>>importpyarrow.computeaspc>>>t.group_by(["keys"]).aggregate([...("values","count",pc.CountOptions(mode="only_valid"))...])pyarrow.Tablekeys: stringvalues_count: int64----keys: [["a","b","c"]]values_count: [[2,2,1]]

Get a single row for each group in column “keys”:

>>>t.group_by("keys").aggregate([])pyarrow.Tablekeys: string----keys: [["a","b","c"]]