Version:

Aggregate Unique In-Memory Tables

An aggregate unique in-memory table is created by specifying the result_table option while using the /aggregate/unique endpoint. Aggregate unique in-memory tables are not persisted by default but can be persisted (like a table) using the result_table_persist option. Aggregate unique in-memory tables can be created from any table or view and will create a unique_result type schema specific to the column used in the call to the endpoint.

An aggregate unique in-memory table is replicated by default, but it can be sharded if the shard key is included in the column_names parameter. If the result_table_force_replicated option is set to true, the aggregate unique in-memory table will be replicated regardless if the source table or view is sharded or not.

Several limitations are discussed in further detail in the Limitations section.

Creating an Aggregate Unique In-Memory Table

To create an aggregate unique in-memory table using the /aggregate/unique endpoint requires six parameters:

  1. the name of the data set to create the aggregate unique in-memory table from
  2. the columns from the source data set to use in the new in-memory table
  3. the number of initial results to skip in the source data set
  4. the number of records to limit the in-memory table to
  5. the type of encoding to use
  6. the name of the in-memory table to store the results in

Examples

Given source table trip_data, an aggregate unique in-memory table can be created in Python like so:

gpudb.aggregate_unique(
  table_name = "trip_data",
  column_name = "vendor_id",
  offset = 0,
  limit = -9999,
  encoding = "json",
  options = {
    "result_table":"vendor_unique"
  }
)

Creating a persisted aggregate unique table in Python:

gpudb.aggregate_unique(
  table_name = "trip_data",
  column_name = "pickup_datetime",
  offset = 0,
  limit = -9999,
  encoding = "json",
  options = {
    "result_table":"unique_pickup_datetime",
    "result_table_persist":"true"
  }
)

Limitations and Cautions

There are a few limitations and cautions when creating and using aggregate unique in-memory tables:

  • Creating an aggregate unique in-memory table results in an entirely new data set, so be mindful of the memory usage implications
  • An aggregate unique in-memory table cannot be created using a non-charN string column type
  • An aggregate unique in-memory table cannot be created using a store-only column type
  • If an aggregate unique in-memory table is created and its source data set's rows are updated (or removed), the in-memory table will not be updated to reflect the changes in the source data set
  • An aggregate unique in-memory table cannot be created from a collection
  • If using column or aggregate functions in the column_name parameter, an alias must be used for the column name, e.g., (column_name / 2) as halved_column
  • An aggregate unique in-memory table has no permanence by default because it is not protected and will expire after the default TTL setting