Note

This documentation is for a prior release of Kinetica. For the latest documentation, click here.

Aggregate Statistics by Range

Divides the given set into bins and calculates statistics of the values of a value-column in each bin. The bins are based on the values of a given binning-column. The statistics that may be requested are mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, first, last and weighted average. In addition to the requested statistics the count of total samples in each bin is returned. This counts vector is just the histogram of the column used to divide the set members into bins. The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of the value column times the weight column divided by the sum of the weight column.

There are two methods for binning the set members. In the first, which can be used for numeric valued binning-columns, a min, max and interval are specified. The number of bins, nbins, is the integer upper bound of (max-min)/interval. Values that fall in the range [min+n*interval,min+(n+1)*interval) are placed in the nth bin where n ranges from 0..nbin-2. The final bin is [min+(nbin-1)*interval,max]. In the second method, bin_values specifies a list of binning column values. Binning-columns whose value matches the nth member of the bin_values list are placed in the nth bin. When a list is provided, the binning-column must be of type string or int.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Input Parameter Description

NameTypeDescription
table_namestringName of the table on which the ranged-statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.
select_expressionstringFor a non-empty expression statistics are calculated for those records for which the expression is true. The default value is ''.
column_namestringName of the binning-column used to divide the set samples into bins.
value_column_namestringName of the value-column for which statistics are to be computed.
statsstringA string of comma separated list of the statistics to calculate, e.g. 'sum,mean'. Available statistics: mean, stdv (standard deviation), variance, skew, kurtosis, sum.
startdoubleThe lower bound of the binning-column.
enddoubleThe upper bound of the binning-column.
intervaldoubleThe interval of a bin. Set members fall into bin i if the binning-column falls in the range [start+interval*i, start+interval*(i+1)).
optionsmap of string to strings

Map of optional parameters:. The default value is an empty map ( {} ).

Supported Parameters (keys)Parameter Description
additional_column_namesA list of comma separated value-column names over which statistics can be accumulated along with the primary value_column.
bin_valuesA list of comma separated binning-column values. Values that match the nth bin_values value are placed in the nth bin.
weight_column_nameName of the column used as weighting column for the weighted_average statistic.
order_column_nameName of the column used for candlestick charting techniques.

Output Parameter Description

NameTypeDescription
statsmap of string to arrays of doublesA map with a key for each statistic in the stats input parameter having a value that is a vector of the corresponding value-column bin statistics. In a addition the key count has a value that is a histogram of the binning-column.
infomap of string to stringsAdditional information.