An aggregate groupby in-memory table is created by specifying the
result_table
option while using the
/aggregate/groupby endpoint. Aggregate groupby
in-memory tables are not persisted by default but can
be persisted (like a table) using the result_table_persist
option. Aggregate groupby in-memory tables can be created from any table or
view (even other aggregate groupby in-memory tables) and will create a
groupby_result
type schema specific to the columns used in the call to the
endpoint.
The aggregate groupby in-memory table will be sharded
if the entire shard key is included in the column_names
parameter. The
aggregate groupby in-memory table can be replicated if the
result_table_force_replicated
option is set to true
, regardless if the
source table or view is sharded or not.
Several limitations are discussed in further detail in the Limitations section.
To create an aggregate groupby in-memory table using the /aggregate/groupby endpoint requires six parameters:
Given source table trip_data
, an aggregate groupby in-memory table can be
created in Python like so:
gpudb.aggregate_group_by(
table_name = "trip_data",
column_names = ["vendor_id", "count(vendor_id) as num_trips", "avg(trip_distance) as avg_trip_dist"],
offset = 0,
limit = 10,
encoding = "json",
options = {
"result_table":"vendor_aggregates"
}
)
Creating a persisted aggregate groupby in-memory table in Python:
gpudb.aggregate_group_by(
table_name = "pay_data",
column_names = ["payment_type", "count(payment_type) as num_pay_type", "avg(total_amount) as avg_amt_paid"],
offset = 0,
limit = 10,
encoding = "json",
options = {
"result_table":"pay_aggregates",
"result_table_persist":"true"
}
)
There are a few limitations and cautions when creating and using aggregate groupby in-memory tables:
sum(store_only_integer)
)count(column_name) as count_col_records