Kinetica is able to aggregate data via the native APIs and SQL using various aggregate expressions. The native API endpoints related to aggregation:
To read more about aggregation in SQL, see Aggregation.
In SQL, to find the minimum, maximum, & average trip distances, as well as the average passenger count for each vendor per year from the taxi data set (weeding out data with errant trip distances):
SELECT
vendor_id,
YEAR(pickup_datetime) AS Year,
MAX(trip_distance) max_trip,
MIN(trip_distance) min_trip,
AVG(trip_distance) avg_trip,
AVG(passenger_count) avg_passenger_count
FROM nyctaxi
WHERE
trip_distance > 0 AND
trip_distance < 100
GROUP BY vendor_id, 2
ORDER BY vendor_id, Year
In Python, create a histogram based on passenger counts from taxi data:
resp = h_db.aggregate_histogram(table_name="nyctaxi", column_name="passenger_count", start=1, end=6, interval=1)
print "Passenger count groups by size:"
for histo_group in zip([1, 2, 3, 4, 5, 6], resp['counts']):
print "\t{}s: {:4.0f}".format(*histo_group)
In Python, unpivot a course_sales
table and then create an in-memory table
of the results:
gpudb.aggregate_unpivot(
table_name="course_sales_by_year",
column_names=["*"]
variable_column_name="total_amount_sold",
pivoted_columns=["Advanced Economics", "Programming 101", "American History: World War II"],
options={"result_table":"course_sales_by_course"}
)
In Java, find aggregate statistics for the trip_distance
column from taxi
data:
Map<String,Double> stats = gpudb.aggregateStatistics("nyctaxi", "trip_distance", "count,min,max,mean,percentile(50,200)", null).getStats();
System.out.println("Statistics of values in the trip_distance column:");
System.out.printf
(
"\tCount: %.0f%n\tMin: %4.2f%n\tMean: %4.2f%n\tMax: %4.2f%n%n",
stats.get("count"), stats.get("min"), stats.get("mean"),
stats.get("max")
);
In the REST API, find the unique Sector values from the stocks
table using
/aggregate/unique:
{
"table_name":"stocks",
"column_name":"Sector",
"offset":0,
"limit":-9999,
"encoding":"json",
"options":{}
}
A pivot operation does not have its own endpoint and is instead achieved through the /aggregate/groupby endpoint options. Details can be found on Pivot.
The following aggregate endpoints have the ability to create and persist in-memory tables from query results:
To create an in-memory table from any of the above endpoint operations, specify
the name of the table in the options
map:
"result_table":"my_table_name"
To persist the in-memory table, specify the result_table_persist
option as
well:
"result_table":"my_table_name",
"result_table_persist":"true"
Aggregate in-memory tables can vary in how they are sharded or replicated depending on the endpoint they are created from.
column_names
parameter and all result records are selected
(offset
is 0 and limit
is -9999). The aggregate groupby memory-only table
can be replicated if the result_table_force_replicated
option is set to true
, regardless if the source table or view is
sharded or not.column_names
parameter. If the result_table_force_replicated
option is
set to true
, the aggregate unique in-memory table will be replicated
regardless if the source table or view is sharded or not.pivoted_columns
parameter. The aggregate unpivot in-memory table
will be replicated if the source table or view is
replicated.COUNT
& COUNT_DISTINCT
aggregation functions applied to themcount(column_name) as count_col_records