A set of parameters for GPUdb::aggregateStatistics. More…
#include <gpudb/protocol/aggregate_statistics.h>
Public Member Functions | |
| AggregateStatisticsRequest () | |
| Constructs an AggregateStatisticsRequest object with default parameters. | |
| AggregateStatisticsRequest (const std::string &tableName_, const std::string &columnName_, const std::string &stats_, const std::map< std::string, std::string > &options_) | |
| Constructs an AggregateStatisticsRequest object with the specified parameters. | |
Public Attributes | |
| std::string | tableName |
| Name of the table on which the statistics operation will be performed, in [ schema_name. ]table_name format, using standard name resolution rules. | |
| std::string | columnName |
| Name of the primary column for which the statistics are to be calculated. | |
| std::string | stats |
| Comma separated list of the statistics to calculate, e.g. | |
| std::map< std::string, std::string > | options |
| Optional parameters. | |
Detailed Description
A set of parameters for GPUdb::aggregateStatistics.
Calculates the requested statistics of the given column(s) in a given table.
The available statistics are: count (number of total objects), mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, weighted_average, cardinality (unique count), estimated_cardinality, percentile, and percentile_rank.
Estimated cardinality is calculated by using the hyperloglog approximation technique.
Percentiles and percentile ranks are approximate and are calculated using the t-digest algorithm. They must include the desired percentile/percentile_rank. To compute multiple percentiles each value must be specified separately (i.e. ‘percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)’).
A second, comma-separated value can be added to the percentile statistic to calculate percentile resolution, e.g., a 50th percentile with 200 resolution would be ‘percentile(50,200)’.
The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of columnName times the weight_column_name values divided by the sum of the weight_column_name values.
Additional columns can be used in the calculation of statistics via additional_column_names. Values in these columns will be included in the overall aggregate calculation–individual aggregates will not be calculated per additional column. For instance, requesting the count and mean of columnName x and additional_column_names y and z, where x holds the numbers 1-10, y holds 11-20, and z holds 21-30, would return the total number of x, y, and z values (30), and the single average value across all x, y, and z values (15.5).
The response includes a list of key/value pairs of each statistic requested and its corresponding value.
Definition at line 76 of file aggregate_statistics.h.
Constructor & Destructor Documentation
◆ AggregateStatisticsRequest() [1/2]
| inline |
Constructs an AggregateStatisticsRequest object with default parameters.
Definition at line 82 of file aggregate_statistics.h.
◆ AggregateStatisticsRequest() [2/2]
| inline |
Constructs an AggregateStatisticsRequest object with the specified parameters.
| [in] | tableName_ | Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. |
| [in] | columnName_ | Name of the primary column for which the statistics are to be calculated. |
| [in] | stats_ | Comma separated list of the statistics to calculate, e.g. “sum,mean”. Supported values:
|
| [in] | options_ | Optional parameters.
|
Definition at line 195 of file aggregate_statistics.h.
Member Data Documentation
◆ columnName
| std::string gpudb::AggregateStatisticsRequest::columnName |
Name of the primary column for which the statistics are to be calculated.
Definition at line 215 of file aggregate_statistics.h.
◆ options
| std::map<std::string, std::string> gpudb::AggregateStatisticsRequest::options |
Optional parameters.
- aggregate_statistics_additional_column_names: A list of comma separated column names over which statistics can be accumulated along with the primary column. All columns listed and columnName must be of the same type. Must not include the column specified in columnName and no column can be listed twice.
- aggregate_statistics_weight_column_name: Name of column used as weighting attribute for the weighted average statistic.
The default value is an empty map.
Definition at line 292 of file aggregate_statistics.h.
◆ stats
| std::string gpudb::AggregateStatisticsRequest::stats |
Comma separated list of the statistics to calculate, e.g.
\ “sum,mean”. Supported values:
- aggregate_statistics_count: Number of objects (independent of the given column(s)).
- aggregate_statistics_mean: Arithmetic mean (average), equivalent to sum/count.
- aggregate_statistics_stdv: Sample standard deviation (denominator is count-1).
- aggregate_statistics_variance: Unbiased sample variance (denominator is count-1).
- aggregate_statistics_skew: Skewness (third standardized moment).
- aggregate_statistics_kurtosis: Kurtosis (fourth standardized moment).
- aggregate_statistics_sum: Sum of all values in the column(s).
- aggregate_statistics_min: Minimum value of the column(s).
- aggregate_statistics_max: Maximum value of the column(s).
- aggregate_statistics_weighted_average: Weighted arithmetic mean (using the option weight_column_name as the weighting column).
- aggregate_statistics_cardinality: Number of unique values in the column(s).
- aggregate_statistics_estimated_cardinality: Estimate (via hyperloglog technique) of the number of unique values in the column(s).
- aggregate_statistics_percentile: Estimate (via t-digest) of the given percentile of the column(s) (percentile(50.0) will be an approximation of the median). Add a second, comma-separated value to calculate percentile resolution, e.g., ‘percentile(75,150)’.
- aggregate_statistics_percentile_rank: Estimate (via t-digest) of the percentile rank of the given value in the column(s) (if the given value is the median of the column(s), percentile_rank(<median>) will return approximately 50.0).
Definition at line 273 of file aggregate_statistics.h.
◆ tableName
| std::string gpudb::AggregateStatisticsRequest::tableName |
Name of the table on which the statistics operation will be performed, in [ schema_name. ]table_name format, using standard name resolution rules.
Definition at line 209 of file aggregate_statistics.h.
The documentation for this struct was generated from the following file:
- gpudb/protocol/aggregate_statistics.h