Version:

Aggregate StatisticsΒΆ

Calculates the requested statistics of a given column in a given table.

The available statistics are count (number of total objects), mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, weighted_average, cardinality (unique count), estimated cardinality, percentile and percentile_rank.

Estimated cardinality is calculated by using the hyperloglog approximation technique.

Percentiles and percentile_ranks are approximate and are calculated using the t-digest algorithm. They must include the desired percentile/percentile_rank. To compute multiple percentiles each value must be specified separately (i.e. 'percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)').

The weighted average statistic requires a weight_attribute to be specified in input parameter options. The weighted average is then defined as the sum of the products of input parameter column_name times the weight attribute divided by the sum of the weight attribute.

The response includes a list of the statistics requested along with the count of the number of items in the given set.

Input Parameter Description

Name Type Description
table_name string Name of the table on which the statistics operation will be performed.
column_name string Name of the column for which the statistics are to be calculated.
stats string

Comma separated list of the statistics to calculate, e.g. "sum,mean".

Supported Values Description
count Number of objects (independent of the given column).
mean Arithmetic mean (average), equivalent to sum/count.
stdv Sample standard deviation (denominator is count-1).
variance Unbiased sample variance (denominator is count-1).
skew Skewness (third standardized moment).
kurtosis Kurtosis (fourth standardized moment).
sum Sum of all values in the column.
min Minimum value of the column.
max Maximum value of the column.
weighted_average Weighted arithmetic mean (using the option 'weight_column_name' as the weighting column).
cardinality Number of unique values in the column.
estimated_cardinality Estimate (via hyperloglog technique) of the number of unique values in the column.
percentile Estimate (via t-digest) of the given percentile of the column (percentile(50.0) will be an approximation of the median).
percentile_rank Estimate (via t-digest) of the percentile rank of the given value in the column (if the given value is the median of the column, percentile_rank(<median>) will return approximately 50.0).
options map of strings

Optional parameters. Default value is an empty map ( {} ).

Supported Parameters (keys) Parameter Description
additional_column_names A list of comma separated column names over which statistics can be accumulated along with the primary column.
weight_column_name Name of column used as weighting attribute for the weighted average statistic.

Output Parameter Description

Name Type Description
stats map of doubles (statistic name, double value) pairs of the requested statistics, including the total count by default.