pyarrow.TableGroupBy#
- classpyarrow.TableGroupBy(table,keys,use_threads=True)#
Bases:
objectA grouping of columns in a table on which to perform aggregations.
- Parameters:
Examples
>>>importpyarrowaspa>>>t=pa.table([...pa.array(["a","a","b","b","c"]),...pa.array([1,2,3,4,5]),...],names=["keys","values"])
Grouping of columns:
>>>pa.TableGroupBy(t,"keys")<pyarrow.lib.TableGroupBy object at ...>
Perform aggregations:
>>>pa.TableGroupBy(t,"keys").aggregate([("values","sum")])pyarrow.Tablekeys: stringvalues_sum: int64----keys: [["a","b","c"]]values_sum: [[3,7,5]]
- __init__(self,table,keys,use_threads=True)#
Methods
__init__(self, table, keys[, use_threads])aggregate(self, aggregations)Perform an aggregation over the grouped columns of the table.
- aggregate(self,aggregations)#
Perform an aggregation over the grouped columns of the table.
- Parameters:
- aggregations
list[tuple(str,str)] orlist[tuple(str,str,FunctionOptions)] List of tuples, where each tuple is one aggregation specificationand consists of: aggregation column name followedby function name and optionally aggregation function option.Pass empty list to get a single row for each group.The column name can be a string, an empty list or a list ofcolumn names, for unary, nullary and n-ary aggregation functionsrespectively.
For the list of function names and respective aggregationfunction options seeGrouped Aggregations.
- aggregations
- Returns:
TableResults of the aggregation functions.
Examples
>>>importpyarrowaspa>>>t=pa.table([...pa.array(["a","a","b","b","c"]),...pa.array([1,2,3,4,5]),...],names=["keys","values"])
Sum the column “values” over the grouped column “keys”:
>>>t.group_by("keys").aggregate([("values","sum")])pyarrow.Tablekeys: stringvalues_sum: int64----keys: [["a","b","c"]]values_sum: [[3,7,5]]
Count the rows over the grouped column “keys”:
>>>t.group_by("keys").aggregate([([],"count_all")])pyarrow.Tablekeys: stringcount_all: int64----keys: [["a","b","c"]]count_all: [[2,2,1]]
Do multiple aggregations:
>>>t.group_by("keys").aggregate([...("values","sum"),...("keys","count")...])pyarrow.Tablekeys: stringvalues_sum: int64keys_count: int64----keys: [["a","b","c"]]values_sum: [[3,7,5]]keys_count: [[2,2,1]]
Count the number of non-null values for column “values”over the grouped column “keys”:
>>>importpyarrow.computeaspc>>>t.group_by(["keys"]).aggregate([...("values","count",pc.CountOptions(mode="only_valid"))...])pyarrow.Tablekeys: stringvalues_count: int64----keys: [["a","b","c"]]values_count: [[2,2,1]]
Get a single row for each group in column “keys”:
>>>t.group_by("keys").aggregate([])pyarrow.Tablekeys: string----keys: [["a","b","c"]]

