public class AggregateStatisticsRequest extends Object implements org.apache.avro.generic.IndexedRecord
GPUdb.aggregateStatistics(AggregateStatisticsRequest)
.
Calculates the requested statistics of the given column(s) in a given table.
The available statistics are count
(number of total objects), mean
, stdv
(standard deviation), variance
, skew
,
kurtosis
, sum
, min
, max
, weighted_average
, cardinality
(unique count), estimated_cardinality
, percentile
and percentile_rank
.
Estimated cardinality is calculated by using the hyperloglog approximation technique.
Percentiles and percentile ranks are approximate and are calculated using
the t-digest algorithm. They must include the desired percentile
/percentile_rank
. To compute multiple percentiles each
value must be specified separately (i.e.
'percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)').
A second, comma-separated value can be added to the percentile
statistic to calculate percentile resolution, e.g., a 50th percentile with
200 resolution would be 'percentile(50,200)'.
The weighted average statistic requires a weight_column_name
to be
specified in options
. The weighted average is then defined as the
sum of the products of columnName
times the weight_column_name
values divided by the sum of the weight_column_name
values.
Additional columns can be used in the calculation of statistics via the
additional_column_names
option. Values in these columns will be
included in the overall aggregate calculation--individual aggregates will
not be calculated per additional column. For instance, requesting the
count
& mean
of columnName
x and additional_column_names
y & z, where x holds the numbers 1-10, y holds
11-20, and z holds 21-30, would return the total number of x, y, & z values
(30), and the single average value across all x, y, & z values (15.5).
The response includes a list of key/value pairs of each statistic requested and its corresponding value.
Modifier and Type | Class and Description |
---|---|
static class |
AggregateStatisticsRequest.Options
Optional parameters.
|
static class |
AggregateStatisticsRequest.Stats
Comma separated list of the statistics to calculate, e.g.
|
Constructor and Description |
---|
AggregateStatisticsRequest()
Constructs an AggregateStatisticsRequest object with default parameters.
|
AggregateStatisticsRequest(String tableName,
String columnName,
String stats,
Map<String,String> options)
Constructs an AggregateStatisticsRequest object with the specified
parameters.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
Object |
get(int index)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
static org.apache.avro.Schema |
getClassSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getColumnName() |
Map<String,String> |
getOptions() |
org.apache.avro.Schema |
getSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getStats() |
String |
getTableName() |
int |
hashCode() |
void |
put(int index,
Object value)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
AggregateStatisticsRequest |
setColumnName(String columnName) |
AggregateStatisticsRequest |
setOptions(Map<String,String> options) |
AggregateStatisticsRequest |
setStats(String stats) |
AggregateStatisticsRequest |
setTableName(String tableName) |
String |
toString() |
public AggregateStatisticsRequest()
public AggregateStatisticsRequest(String tableName, String columnName, String stats, Map<String,String> options)
tableName
- Name of the table on which the statistics operation
will be performed.columnName
- Name of the primary column for which the statistics
are to be calculated.stats
- Comma separated list of the statistics to calculate, e.g.
"sum,mean".
Supported values:
COUNT
: Number of objects (independent of the given
column(s)).
MEAN
: Arithmetic mean (average), equivalent to sum/count.
STDV
: Sample standard deviation (denominator is count-1).
VARIANCE
: Unbiased sample variance (denominator is
count-1).
SKEW
: Skewness (third standardized moment).
KURTOSIS
: Kurtosis (fourth standardized moment).
SUM
: Sum of all values in the column(s).
MIN
: Minimum value of the column(s).
MAX
: Maximum value of the column(s).
WEIGHTED_AVERAGE
: Weighted arithmetic mean (using the
option weight_column_name
as the weighting
column).
CARDINALITY
: Number of unique values in the column(s).
ESTIMATED_CARDINALITY
: Estimate (via hyperloglog
technique) of the number of unique values in the
column(s).
PERCENTILE
: Estimate (via t-digest) of the given
percentile of the column(s) (percentile(50.0) will be an
approximation of the median). Add a second,
comma-separated value to calculate percentile resolution,
e.g., 'percentile(75,150)'
PERCENTILE_RANK
: Estimate (via t-digest) of the
percentile rank of the given value in the column(s) (if
the given value is the median of the column(s),
percentile_rank(options
- Optional parameters.
ADDITIONAL_COLUMN_NAMES
: A list of comma separated
column names over which statistics can be accumulated
along with the primary column. All columns listed and
columnName
must be of the same type. Must not
include the column specified in columnName
and
no column can be listed twice.
WEIGHT_COLUMN_NAME
: Name of column used as weighting
attribute for the weighted average statistic.
Map
.public static org.apache.avro.Schema getClassSchema()
public String getTableName()
public AggregateStatisticsRequest setTableName(String tableName)
tableName
- Name of the table on which the statistics operation
will be performed.this
to mimic the builder pattern.public String getColumnName()
public AggregateStatisticsRequest setColumnName(String columnName)
columnName
- Name of the primary column for which the statistics
are to be calculated.this
to mimic the builder pattern.public String getStats()
COUNT
: Number of objects (independent of the given column(s)).
MEAN
:
Arithmetic mean (average), equivalent to sum/count.
STDV
:
Sample standard deviation (denominator is count-1).
VARIANCE
: Unbiased sample variance (denominator is count-1).
SKEW
:
Skewness (third standardized moment).
KURTOSIS
: Kurtosis (fourth standardized moment).
SUM
:
Sum of all values in the column(s).
MIN
:
Minimum value of the column(s).
MAX
:
Maximum value of the column(s).
WEIGHTED_AVERAGE
: Weighted arithmetic mean (using the option
weight_column_name
as the weighting column).
CARDINALITY
: Number of unique values in the column(s).
ESTIMATED_CARDINALITY
: Estimate (via hyperloglog technique) of
the number of unique values in the column(s).
PERCENTILE
: Estimate (via t-digest) of the given percentile of
the column(s) (percentile(50.0) will be an approximation of the
median). Add a second, comma-separated value to calculate
percentile resolution, e.g., 'percentile(75,150)'
PERCENTILE_RANK
: Estimate (via t-digest) of the percentile rank
of the given value in the column(s) (if the given value is the
median of the column(s), percentile_rank(public AggregateStatisticsRequest setStats(String stats)
stats
- Comma separated list of the statistics to calculate, e.g.
"sum,mean".
Supported values:
COUNT
: Number of objects (independent of the given
column(s)).
MEAN
: Arithmetic mean (average), equivalent to sum/count.
STDV
: Sample standard deviation (denominator is count-1).
VARIANCE
: Unbiased sample variance (denominator is
count-1).
SKEW
: Skewness (third standardized moment).
KURTOSIS
: Kurtosis (fourth standardized moment).
SUM
: Sum of all values in the column(s).
MIN
: Minimum value of the column(s).
MAX
: Maximum value of the column(s).
WEIGHTED_AVERAGE
: Weighted arithmetic mean (using the
option weight_column_name
as the weighting
column).
CARDINALITY
: Number of unique values in the column(s).
ESTIMATED_CARDINALITY
: Estimate (via hyperloglog
technique) of the number of unique values in the
column(s).
PERCENTILE
: Estimate (via t-digest) of the given
percentile of the column(s) (percentile(50.0) will be an
approximation of the median). Add a second,
comma-separated value to calculate percentile resolution,
e.g., 'percentile(75,150)'
PERCENTILE_RANK
: Estimate (via t-digest) of the
percentile rank of the given value in the column(s) (if
the given value is the median of the column(s),
percentile_rank(this
to mimic the builder pattern.public Map<String,String> getOptions()
ADDITIONAL_COLUMN_NAMES
: A list of comma separated column names
over which statistics can be accumulated along with the primary
column. All columns listed and columnName
must be of
the same type. Must not include the column specified in columnName
and no column can be listed twice.
WEIGHT_COLUMN_NAME
: Name of column used as weighting attribute
for the weighted average statistic.
Map
.public AggregateStatisticsRequest setOptions(Map<String,String> options)
options
- Optional parameters.
ADDITIONAL_COLUMN_NAMES
: A list of comma separated
column names over which statistics can be accumulated
along with the primary column. All columns listed and
columnName
must be of the same type. Must not
include the column specified in columnName
and
no column can be listed twice.
WEIGHT_COLUMN_NAME
: Name of column used as weighting
attribute for the weighted average statistic.
Map
.this
to mimic the builder pattern.public org.apache.avro.Schema getSchema()
getSchema
in interface org.apache.avro.generic.GenericContainer
public Object get(int index)
get
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to getIndexOutOfBoundsException
public void put(int index, Object value)
put
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to setvalue
- the value to setIndexOutOfBoundsException
Copyright © 2019. All rights reserved.