URL: http://<db.host>:<db.port>/aggregate/statistics
Calculates the requested statistics of the given column(s) in a given table.
- The available statistics are:
- count (number of total objects), mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, weighted_average, cardinality (unique count), estimated_cardinality, percentile, and percentile_rank.
Estimated cardinality is calculated by using the hyperloglog approximation technique.
Percentiles and percentile ranks are approximate and are calculated using the t-digest algorithm. They must include the desired percentile/percentile_rank. To compute multiple percentiles each value must be specified separately (i.e. 'percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)').
A second, comma-separated value can be added to the percentile statistic to calculate percentile resolution, e.g., a 50th percentile with 200 resolution would be 'percentile(50,200)'.
The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of input parameter column_name times the weight_column_name values divided by the sum of the weight_column_name values.
Additional columns can be used in the calculation of statistics via additional_column_names. Values in these columns will be included in the overall aggregate calculation--individual aggregates will not be calculated per additional column. For instance, requesting the count & mean of input parameter column_name x and additional_column_names y & z, where x holds the numbers 1-10, y holds 11-20, and z holds 21-30, would return the total number of x, y, & z values (30), and the single average value across all x, y, & z values (15.5).
The response includes a list of key/value pairs of each statistic requested and its corresponding value.
Input Parameter Description
Name | Type | Description | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
table_name | string | Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. | ||||||||||||||||||||||||||||||
column_name | string | Name of the primary column for which the statistics are to be calculated. | ||||||||||||||||||||||||||||||
stats | string | Comma separated list of the statistics to calculate, e.g. "sum,mean".
| ||||||||||||||||||||||||||||||
options | map of string to strings | Optional parameters. The default value is an empty map ( {} ).
|
Output Parameter Description
The GPUdb server embeds the endpoint response inside a standard response structure which contains status information and the actual response to the query. Here is a description of the various fields of the wrapper:
Name | Type | Description | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
status | String | 'OK' or 'ERROR' | |||||||||
message | String | Empty if success or an error message | |||||||||
data_type | String | 'aggregate_statistics_response' or 'none' in case of an error | |||||||||
data | String | Empty string | |||||||||
data_str | JSON or String | This embedded JSON represents the result of the /aggregate/statistics endpoint:
Empty string in case of an error. |