Version:

/aggregate/groupbyΒΆ

URL: http://GPUDB_IP_ADDRESS:GPUDB_PORT/aggregate/groupby

Calculates unique combinations (groups) of values for the given columns in a given table/view/collection and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT...GROUP BY. Any column(s) can be grouped on, but only non-string (i.e. numeric) columns may be used for computing aggregates. The results can be paged via the input parameter offset and input parameter limit parameters. For example, to get 10 groups with the largest counts the inputs would be: limit=10, options={"sort_order":"descending", "sort_by":"value"}. Input parameter options can be used to customize behavior of this call e.g. filtering or sorting the results. To group by 'x' and 'y' and compute the number of objects within each group, use column_names=['x','y','count(*)']. To also compute the sum of 'z' over each group, use column_names=['x','y','count(*)','sum(z)']. Available aggregation functions are: 'count(*)', 'sum', 'min', 'max', 'avg', 'mean', 'stddev', 'stddev_pop', 'stddev_samp', 'var', 'var_pop', 'var_samp', 'arg_min', 'arg_max' and 'count_distinct'. The response is returned as a dynamic schema. For details see: dynamic schemas documentation. If the 'result_table' option is provided then the results are stored in a table with the name given in the option and the results are not returned in the response.

Input Parameter Description

Name Type Description
table_name string Name of the table on which the operation will be performed. Must be a valid table/view/collection in GPUdb.
column_names array of strings List of one or more column names, expressions, and aggregate expressions. Must include at least one 'grouping' column or expression. If no aggregate is included, count(*) will be computed as a default.
offset long A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit long A positive integer indicating the maximum number of results to be returned Or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is 1000.
encoding string

Specifies the encoding for returned records. Default value is 'binary'.

Supported Values Description
binary Indicates that the returned records should be binary encoded.
json Indicates that the returned records should be json encoded.
options map of strings

Optional parameters. Default value is an empty map ( {} ).

Supported Parameters (keys) Parameter Description
expression Filter expression to apply to the table prior to computing the aggregate group by.
having Filter expression to apply to the aggregated results.
sort_order

String indicating how the returned values should be sorted - ascending or descending. Default value is 'ascending'.

Supported Values Description
ascending Indicates that the returned values should be sorted in ascending order.
descending Indicates that the returned values should be sorted in descending order.
sort_by

String determining how the results are sorted. Default value is 'key'.

Supported Values Description
key Indicates that the returned values should be sorted by key
value Indicates that the returned values should be sorted by value
result_table The name of the table used to store the results. If present no results are returned in the response.

Output Parameter Description

The GPUdb server embeds the endpoint response inside a standard response structure which contains status information and the actual response to the query. Here is a description of the various fields of the wrapper:

Name Type Description
status String 'OK' or 'ERROR'
message String Empty if success or an error message
data_type String 'aggregate_group_by_request' or 'none' in case of an error
data String Empty string
data_str JSON or String

This embedded JSON represents the result of the /aggregate/groupby endpoint:

Name Type Description
response_schema_str string Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.
binary_encoded_response bytes Avro binary encoded response.
json_encoded_response string Avro JSON encoded response.
total_number_of_records long Total/Filtered number of records.
has_more_records boolean Too many records. Returned a partial set.

Empty string in case of an error.