Kinetica is able to aggregate data via the native APIs and SQL using various aggregate expressions. The native API endpoints related to aggregation:
- /aggregate/convexhull
- /aggregate/groupby
- /aggregate/histogram
- /aggregate/kmeans
- /aggregate/minmax
- /aggregate/minmax/geometry
- /aggregate/statistics
- /aggregate/statistics/byrange
- /aggregate/unique
- /aggregate/unpivot
To read more about aggregation in SQL, see Aggregation.
Execution
In SQL, to find the minimum, maximum, & average trip distances, as well as the average passenger count for each vendor per year from the taxi data set (weeding out data with errant trip distances):
|
|
In Python, create a histogram based on passenger counts from taxi data:
|
|
In Python, unpivot a course_sales table and then create a memory-only table of the results:
|
|
In Java, find aggregate statistics for the trip_distance column from taxi data:
|
|
In the REST API, find the unique Sector values from the stocks table using /aggregate/unique:
|
|
Pivot
A pivot operation does not have its own endpoint and is instead achieved through the /aggregate/groupby endpoint options. Details can be found on Pivot.
Memory-Only Tables and Persistence
The following aggregate endpoints have the ability to create and persist memory-only tables from query results:
To create a memory-only table from any of the above endpoint operations, specify the name of the table in the options map:
|
|
To persist the memory-only table, specify the result_table_persist option as well:
|
|
A memory-only table name must adhere to the standard naming criteria. Each memory-only table exists within a schema and follows the standard name resolution rules for tables.
Sharding and Replication
Aggregate memory-only tables can vary in how they are sharded or replicated depending on the endpoint they are created from.
- /aggregate/groupby: The aggregate groupby memory-only table will be sharded if the entire shard key is included in the column_names parameter and all result records are selected (offset is 0 and limit is -9999). The aggregate groupby memory-only table can be replicated if the result_table_force_replicated option is set to true, regardless if the source table or view is sharded or not.
- /aggregate/unique: An aggregate unique memory-only table is replicated by default, but it can be sharded if the shard key is included in the column_names parameter. If the result_table_force_replicated option is set to true, the aggregate unique memory-only table will be replicated regardless if the source table or view is sharded or not.
- /aggregate/unpivot: The aggregate unpivot memory-only table will be sharded if the shard key is not part of the pivoted_columns parameter. The aggregate unpivot memory-only table will be replicated if the source table or view is replicated.
Limitations
General
- Store-only columns cannot be grouped on, nor can aggregation functions be applied to them.
- Non-charN string column types can be grouped on, but can only have COUNT & COUNT_DISTINCT aggregation functions applied to them.
- In the native APIs, an alias must be used for the column name if using column or aggregate functions in the relevant selected columns parameter, e.g., count(column_name) as count_col_records.
Aggregate Memory-Only Tables
- Creating a memory-only table results in an entirely new data set, so be mindful of the memory usage implications.
- An aggregate memory-only table cannot be created with a non-charN string column type.
- If an aggregate memory-only table is created and its source data set's rows are updated (or removed), the memory-only table will not be updated to reflect the changes in the source data set.
- An aggregate memory-only table is transient, by default, and will expire after the default TTL setting.
- An aggregate memory-only table is not persisted, by default, and will not survive a database restart; specifying a result_table_persist option of true will make the table permanent and not expire.