AggregateStatisticsRequest

java.lang.Object

com.gpudb.protocol.AggregateStatisticsRequest

All Implemented Interfaces:

org.apache.avro.generic.GenericContainer, org.apache.avro.generic.IndexedRecord

public class AggregateStatisticsRequest extends Object implements org.apache.avro.generic.IndexedRecord

A set of parameters for GPUdb.aggregateStatistics.

Calculates the requested statistics of the given column(s) in a given table.

The available statistics are: COUNT (number of total objects), MEAN, STDV (standard deviation), VARIANCE, SKEW, KURTOSIS, SUM, MIN, MAX, WEIGHTED_AVERAGE, CARDINALITY (unique count), ESTIMATED_CARDINALITY, PERCENTILE, and PERCENTILE_RANK.

Estimated cardinality is calculated by using the hyperloglog approximation technique.

Percentiles and percentile ranks are approximate and are calculated using the t-digest algorithm. They must include the desired PERCENTILE/PERCENTILE_RANK. To compute multiple percentiles each value must be specified separately (i.e. ‘percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)’).

A second, comma-separated value can be added to the PERCENTILE statistic to calculate percentile resolution, e.g., a 50th percentile with 200 resolution would be ‘percentile(50,200)’.

The weighted average statistic requires a weight column to be specified in WEIGHT_COLUMN_NAME. The weighted average is then defined as the sum of the products of columnName times the WEIGHT_COLUMN_NAME values divided by the sum of the WEIGHT_COLUMN_NAME values.

Additional columns can be used in the calculation of statistics via ADDITIONAL_COLUMN_NAMES. Values in these columns will be included in the overall aggregate calculation—individual aggregates will not be calculated per additional column. For instance, requesting the COUNT and MEAN of columnName x and ADDITIONAL_COLUMN_NAMES y and z, where x holds the numbers 1-10, y holds 11-20, and z holds 21-30, would return the total number of x, y, and z values (30), and the single average value across all x, y, and z values (15.5).

The response includes a list of key/value pairs of each statistic requested and its corresponding value.

Nested Class Summary
Nested Classes
Modifier and Type
Class
Description
static final class
AggregateStatisticsRequest.Options
A set of string constants for the AggregateStatisticsRequest parameter options.
static final class
AggregateStatisticsRequest.Stats
A set of string constants for the AggregateStatisticsRequest parameter stats.
Constructor Summary
Constructors
Constructor
Description
AggregateStatisticsRequest()
Constructs an AggregateStatisticsRequest object with default parameters.
AggregateStatisticsRequest(String tableName, String columnName, String stats, Map<String,String> options)
Constructs an AggregateStatisticsRequest object with the specified parameters.
Method Summary
Modifier and Type
Method
Description
boolean
equals(Object obj)

Object
get(int index)
This method supports the Avro framework and is not intended to be called directly by the user.
static org.apache.avro.Schema
getClassSchema()
This method supports the Avro framework and is not intended to be called directly by the user.
String
getColumnName()
Name of the primary column for which the statistics are to be calculated.
Map<String,String>
getOptions()
Optional parameters.
org.apache.avro.Schema
getSchema()
This method supports the Avro framework and is not intended to be called directly by the user.
String
getStats()
Comma separated list of the statistics to calculate, e.g. “sum,mean”.
String
getTableName()
Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.
int
hashCode()

void
put(int index, Object value)
This method supports the Avro framework and is not intended to be called directly by the user.
AggregateStatisticsRequest
setColumnName(String columnName)
Name of the primary column for which the statistics are to be calculated.
AggregateStatisticsRequest
setOptions(Map<String,String> options)
Optional parameters.
AggregateStatisticsRequest
setStats(String stats)
Comma separated list of the statistics to calculate, e.g. “sum,mean”.
AggregateStatisticsRequest
setTableName(String tableName)
Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.
String
toString()

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Details
- AggregateStatisticsRequest
  public AggregateStatisticsRequest()
  Constructs an AggregateStatisticsRequest object with default parameters.
- AggregateStatisticsRequest
  public AggregateStatisticsRequest(String tableName, String columnName, String stats, Map<String,String> options)
  Constructs an AggregateStatisticsRequest object with the specified parameters.
  Parameters:
  tableName - Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.
  columnName - Name of the primary column for which the statistics are to be calculated.
  stats - Comma separated list of the statistics to calculate, e.g. “sum,mean”. Supported values:
  COUNT: Number of objects (independent of the given column(s)).
  MEAN: Arithmetic mean (average), equivalent to sum/count.
  STDV: Sample standard deviation (denominator is count-1).
  VARIANCE: Unbiased sample variance (denominator is count-1).
  SKEW: Skewness (third standardized moment).
  KURTOSIS: Kurtosis (fourth standardized moment).
  SUM: Sum of all values in the column(s).
  MIN: Minimum value of the column(s).
  MAX: Maximum value of the column(s).
  WEIGHTED_AVERAGE: Weighted arithmetic mean (using the option WEIGHT_COLUMN_NAME as the weighting column).
  CARDINALITY: Number of unique values in the column(s).
  ESTIMATED_CARDINALITY: Estimate (via hyperloglog technique) of the number of unique values in the column(s).
  PERCENTILE: Estimate (via t-digest) of the given percentile of the column(s) (percentile(50.0) will be an approximation of the median). Add a second, comma-separated value to calculate percentile resolution, e.g., ‘percentile(75,150)’.
  PERCENTILE_RANK: Estimate (via t-digest) of the percentile rank of the given value in the column(s) (if the given value is the median of the column(s), percentile_rank(<median>) will return approximately 50.0).
  options - Optional parameters.
  ADDITIONAL_COLUMN_NAMES: A list of comma separated column names over which statistics can be accumulated along with the primary column. All columns listed and columnName must be of the same type. Must not include the column specified in columnName and no column can be listed twice.
  WEIGHT_COLUMN_NAME: Name of column used as weighting attribute for the weighted average statistic.
  The default value is an empty Map.
Method Details
- getClassSchema
  public static org.apache.avro.Schema getClassSchema()
  This method supports the Avro framework and is not intended to be called directly by the user.
  Returns:
  The schema for the class.
- getTableName
  public String getTableName()
  Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.
  Returns:
  The current value of tableName.
- setTableName
  public AggregateStatisticsRequest setTableName(String tableName)
  Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.
  Parameters:
  tableName - The new value for tableName.
  Returns:
  this to mimic the builder pattern.
- getColumnName
  public String getColumnName()
  Name of the primary column for which the statistics are to be calculated.
  Returns:
  The current value of columnName.
- setColumnName
  public AggregateStatisticsRequest setColumnName(String columnName)
  Name of the primary column for which the statistics are to be calculated.
  Parameters:
  columnName - The new value for columnName.
  Returns:
  this to mimic the builder pattern.
- getStats
  public String getStats()
  Comma separated list of the statistics to calculate, e.g. “sum,mean”. Supported values:
  COUNT: Number of objects (independent of the given column(s)).
  MEAN: Arithmetic mean (average), equivalent to sum/count.
  STDV: Sample standard deviation (denominator is count-1).
  VARIANCE: Unbiased sample variance (denominator is count-1).
  SKEW: Skewness (third standardized moment).
  KURTOSIS: Kurtosis (fourth standardized moment).
  SUM: Sum of all values in the column(s).
  MIN: Minimum value of the column(s).
  MAX: Maximum value of the column(s).
  WEIGHTED_AVERAGE: Weighted arithmetic mean (using the option WEIGHT_COLUMN_NAME as the weighting column).
  CARDINALITY: Number of unique values in the column(s).
  ESTIMATED_CARDINALITY: Estimate (via hyperloglog technique) of the number of unique values in the column(s).
  PERCENTILE: Estimate (via t-digest) of the given percentile of the column(s) (percentile(50.0) will be an approximation of the median). Add a second, comma-separated value to calculate percentile resolution, e.g., ‘percentile(75,150)’.
  PERCENTILE_RANK: Estimate (via t-digest) of the percentile rank of the given value in the column(s) (if the given value is the median of the column(s), percentile_rank(<median>) will return approximately 50.0).
  Returns:
  The current value of stats.
- setStats
  public AggregateStatisticsRequest setStats(String stats)
  Comma separated list of the statistics to calculate, e.g. “sum,mean”. Supported values:
  COUNT: Number of objects (independent of the given column(s)).
  MEAN: Arithmetic mean (average), equivalent to sum/count.
  STDV: Sample standard deviation (denominator is count-1).
  VARIANCE: Unbiased sample variance (denominator is count-1).
  SKEW: Skewness (third standardized moment).
  KURTOSIS: Kurtosis (fourth standardized moment).
  SUM: Sum of all values in the column(s).
  MIN: Minimum value of the column(s).
  MAX: Maximum value of the column(s).
  WEIGHTED_AVERAGE: Weighted arithmetic mean (using the option WEIGHT_COLUMN_NAME as the weighting column).
  CARDINALITY: Number of unique values in the column(s).
  ESTIMATED_CARDINALITY: Estimate (via hyperloglog technique) of the number of unique values in the column(s).
  PERCENTILE: Estimate (via t-digest) of the given percentile of the column(s) (percentile(50.0) will be an approximation of the median). Add a second, comma-separated value to calculate percentile resolution, e.g., ‘percentile(75,150)’.
  PERCENTILE_RANK: Estimate (via t-digest) of the percentile rank of the given value in the column(s) (if the given value is the median of the column(s), percentile_rank(<median>) will return approximately 50.0).
  Parameters:
  stats - The new value for stats.
  Returns:
  this to mimic the builder pattern.
- getOptions
  public Map<String,String> getOptions()
  Optional parameters.
  ADDITIONAL_COLUMN_NAMES: A list of comma separated column names over which statistics can be accumulated along with the primary column. All columns listed and columnName must be of the same type. Must not include the column specified in columnName and no column can be listed twice.
  WEIGHT_COLUMN_NAME: Name of column used as weighting attribute for the weighted average statistic.
  The default value is an empty Map.
  Returns:
  The current value of options.
- setOptions
  public AggregateStatisticsRequest setOptions(Map<String,String> options)
  Optional parameters.
  ADDITIONAL_COLUMN_NAMES: A list of comma separated column names over which statistics can be accumulated along with the primary column. All columns listed and columnName must be of the same type. Must not include the column specified in columnName and no column can be listed twice.
  WEIGHT_COLUMN_NAME: Name of column used as weighting attribute for the weighted average statistic.
  The default value is an empty Map.
  Parameters:
  options - The new value for options.
  Returns:
  this to mimic the builder pattern.
- getSchema
  public org.apache.avro.Schema getSchema()
  This method supports the Avro framework and is not intended to be called directly by the user.
  Specified by:
  getSchema in interface org.apache.avro.generic.GenericContainer
  Returns:
  The schema object describing this class.
- get
  public Object get(int index)
  This method supports the Avro framework and is not intended to be called directly by the user.
  Specified by:
  get in interface org.apache.avro.generic.IndexedRecord
  Parameters:
  index - the position of the field to get
  Returns:
  value of the field with the given index.
  Throws:
  IndexOutOfBoundsException
- put
  public void put(int index, Object value)
  This method supports the Avro framework and is not intended to be called directly by the user.
  Specified by:
  put in interface org.apache.avro.generic.IndexedRecord
  Parameters:
  index - the position of the field to set
  value - the value to set
  Throws:
  IndexOutOfBoundsException
- equals
  public boolean equals(Object obj)
  Overrides:
  equals in class Object
- toString
  public String toString()
  Overrides:
  toString in class Object
- hashCode
  public int hashCode()
  Overrides:
  hashCode in class Object

API

AggregateStatisticsRequest

Class AggregateStatisticsRequest

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details