Version:

Aggregate Groupby In-Memory Tables

An aggregate groupby in-memory table is created by specifying the result_table option while using the /aggregate/groupby endpoint. Aggregate groupby in-memory tables are not persisted by default but can be persisted (like a table) using the result_table_persist option. Aggregate groupby in-memory tables can be created from any table or view (even other aggregate groupby in-memory tables) and will create a groupby_result type schema specific to the columns used in the call to the endpoint.

The aggregate groupby in-memory table will be sharded if the entire shard key is included in the column_names parameter. The aggregate groupby in-memory table can be replicated if the result_table_force_replicated option is set to true, regardless if the source table or view is sharded or not.

Several limitations are discussed in further detail in the Limitations section.

Creating an Aggregate Groupby In-Memory Table

To create an aggregate groupby in-memory table using the /aggregate/groupby endpoint requires six parameters:

  1. the name of the data set to create the aggregate groupby in-memory table from
  2. the columns from the source data set to use in the new in-memory table
  3. the number of initial results to skip in the source data set
  4. the number of records to limit the in-memory table to
  5. the type of encoding to use
  6. the name of the in-memory table to store the results in

Examples

Given source table trip_data, an aggregate groupby in-memory table can be created in Python like so:

gpudb.aggregate_group_by(
  table_name = "trip_data",
  column_names = ["vendor_id", "count(vendor_id) as num_trips", "avg(trip_distance) as avg_trip_dist"],
  offset = 0,
  limit = 10,
  encoding = "json",
  options = {
    "result_table":"vendor_aggregates"
  }
)

Creating a persisted aggregate groupby in-memory table in Python:

gpudb.aggregate_group_by(
  table_name = "pay_data",
  column_names = ["payment_type", "count(payment_type) as num_pay_type", "avg(total_amount) as avg_amt_paid"],
  offset = 0,
  limit = 10,
  encoding = "json",
  options = {
    "result_table":"pay_aggregates",
    "result_table_persist":"true"
  }
)

Limitations and Cautions

There are a few limitations and cautions when creating and using aggregate groupby in-memory tables:

  • Creating an aggregate groupby in-memory table results in an entirely new data set, so be mindful of the memory usage implications
  • An aggregate groupby in-memory table cannot be created using a non-charN string column type
  • An aggregate groupby in-memory table cannot be created using a store-only column type or an aggregation of a store-only column type (e.g., sum(store_only_integer))
  • If an aggregate groupby in-memory table is created and its source data set's rows are updated (or removed), the in-memory table will not be updated to reflect the changes in the source data set
  • An aggregate groupby in-memory table cannot be created from a heterogenous collection
  • If using column or aggregate functions in the column_names parameter, an alias must be used for the column name, e.g., count(column_name) as count_col_records
  • An aggregate groupby in-memory table has no permanence by default because it is not protected and will expire after the default TTL setting