Kinetica is able to aggregate data via the native APIs and SQL using various aggregate expressions. The native API endpoints related to aggregation:
To read more about aggregation in SQL, see Aggregation.
In SQL, to find the minimum, maximum, & average trip distances, as well as the average passenger count for each vendor per year from the taxi data set (weeding out data with errant trip distances):
SELECT
vendor_id AS Vendor_ID,
YEAR(pickup_datetime) AS Year,
MAX(trip_distance) AS Max_Trip,
MIN(trip_distance) AS Min_Trip,
ROUND(AVG(trip_distance),2) AS Avg_Trip,
INT(AVG(passenger_count)) AS Avg_Passenger_Count
FROM nyctaxi
WHERE
trip_distance > 0 AND
trip_distance < 100
GROUP BY vendor_id, 2
ORDER BY Vendor_ID, Year
In Python, create a histogram based on passenger counts from taxi data:
resp = h_db.aggregate_histogram(table_name="nyctaxi", column_name="passenger_count", start=1, end=6, interval=1)
print "Passenger count groups by size:"
for histo_group in zip([1, 2, 3, 4, 5, 6], resp['counts']):
print "\t{}s: {:4.0f}".format(*histo_group)
In Python, unpivot a course_sales
table and then create a memory-only
table of the results:
gpudb.aggregate_unpivot(
table_name="course_sales_by_year",
column_names=["*"]
variable_column_name="total_amount_sold",
pivoted_columns=["Advanced Economics", "Programming 101", "American History: World War II"],
options={"result_table":"course_sales_by_course"}
)
In Java, find aggregate statistics for the trip_distance
column from taxi
data:
Map<String,Double> stats = gpudb.aggregateStatistics("nyctaxi", "trip_distance", "count,min,max,mean,percentile(50,200)", null).getStats();
System.out.println("Statistics of values in the trip_distance column:");
System.out.printf
(
"\tCount: %.0f%n\tMin: %4.2f%n\tMean: %4.2f%n\tMax: %4.2f%n%n",
stats.get("count"), stats.get("min"), stats.get("mean"),
stats.get("max")
);
In the REST API, find the unique Sector values from the stocks
table using
/aggregate/unique:
{
"table_name":"stocks",
"column_name":"Sector",
"offset":0,
"limit":-9999,
"encoding":"json",
"options":{}
}
A pivot operation does not have its own endpoint and is instead achieved through the /aggregate/groupby endpoint options. Details can be found on Pivot.
The following aggregate endpoints have the ability to create and persist memory-only tables from query results:
To create a memory-only table from any of the above endpoint operations, specify
the name of the table in the options
map:
"result_table":"my_table_name"
To persist the memory-only table, specify the result_table_persist
option as
well:
"result_table":"my_table_name",
"result_table_persist":"true"
Aggregate memory-only tables can vary in how they are sharded or replicated depending on the endpoint they are created from.
column_names
parameter and all result records are selected
(offset
is 0 and limit
is -9999). The aggregate groupby memory-only table
can be replicated if the result_table_force_replicated
option is set to true
, regardless if the source table or view is
sharded or not.column_names
parameter. If the result_table_force_replicated
option is
set to true
, the aggregate unique memory-only table will be replicated
regardless if the source table or view is sharded or not.pivoted_columns
parameter. The aggregate unpivot memory-only table
will be replicated if the source table or view is
replicated.COUNT
& COUNT_DISTINCT
aggregation functions applied to them.count(column_name) as count_col_records
.result_table_persist
option of
true
will make the table permanent and not expire.