Note
This documentation is for a prior release of Kinetica. For the latest documentation, click here.
Kinetica is able to aggregate data via the native APIs and SQL using various aggregate expressions. The native API endpoints related to aggregation:
- /aggregate/convexhull
- /aggregate/groupby
- /aggregate/histogram
- /aggregate/kmeans
- /aggregate/minmax
- /aggregate/minmax/geometry
- /aggregate/statistics
- /aggregate/statistics/byrange
- /aggregate/unique
- /aggregate/unpivot
To read more about aggregation in SQL, see Aggregation.
Execution
In SQL, to find the minimum, maximum, & average trip distances, as well as the average passenger count for each vendor per year from the taxi data set (weeding out data with errant trip distances):
|
|
In Python, create a histogram based on passenger counts from taxi data:
|
|
In Python, unpivot a course_sales table and then create a memory-only table of the results:
|
|
In Java, find aggregate statistics for the trip_distance column from taxi data:
|
|
In the REST API, find the unique Sector values from the stocks table using /aggregate/unique:
|
|
Pivot
A pivot operation does not have its own endpoint and is instead achieved through the /aggregate/groupby endpoint options. Details can be found on Pivot.
Memory-Only Tables and Persistence
The following aggregate endpoints have the ability to create and persist memory-only tables from query results:
To create a memory-only table from any of the above endpoint operations, specify the name of the table in the options map:
|
|
To persist the memory-only table, specify the result_table_persist option as well:
|
|
A memory-only table name must adhere to the standard naming criteria. Each memory-only table exists within a schema and follows the standard name resolution rules for tables.
Sharding and Replication
Aggregate memory-only tables can vary in how they are sharded or replicated depending on the endpoint they are created from.
- /aggregate/groupby: The aggregate groupby memory-only table will be sharded if the entire shard key is included in the column_names parameter and all result records are selected (offset is 0 and limit is -9999). The aggregate groupby memory-only table can be replicated if the result_table_force_replicated option is set to true, regardless if the source table or view is sharded or not.
- /aggregate/unique: An aggregate unique memory-only table is replicated by default, but it can be sharded if the shard key is included in the column_names parameter. If the result_table_force_replicated option is set to true, the aggregate unique memory-only table will be replicated regardless if the source table or view is sharded or not.
- /aggregate/unpivot: The aggregate unpivot memory-only table will be sharded if the shard key is not part of the pivoted_columns parameter. The aggregate unpivot memory-only table will be replicated if the source table or view is replicated.
Limitations
General
- Store-only columns cannot be grouped on, nor can aggregation functions be applied to them.
- Non-charN string column types can be grouped on, but can only have COUNT & COUNT_DISTINCT aggregation functions applied to them.
- In the native APIs, an alias must be used for the column name if using column or aggregate functions in the relevant selected columns parameter, e.g., count(column_name) as count_col_records.
Aggregate Memory-Only Tables
- Creating a memory-only table results in an entirely new data set, so be mindful of the memory usage implications.
- An aggregate memory-only table cannot be created with a non-charN string column type.
- If an aggregate memory-only table is created and its source data set's rows are updated (or removed), the memory-only table will not be updated to reflect the changes in the source data set.
- An aggregate memory-only table is transient, by default, and will expire after the default TTL setting.
- An aggregate memory-only table is not persisted, by default, and will not survive a database restart; specifying a result_table_persist option of true will make the table permanent and not expire.