public class AggregateGroupByRequest extends Object implements org.apache.avro.generic.IndexedRecord
GPUdb.aggregateGroupByRaw(AggregateGroupByRequest)
.
Calculates unique combinations (groups) of values for the given columns in a given table or view and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT...GROUP BY.
For aggregation details and examples, see Aggregation. For limitations, see Aggregation Limitations.
Any column(s) can be grouped on, and all column types except unrestricted-length strings may be used for computing applicable aggregates; columns marked as store-only are unable to be used in grouping or aggregation.
The results can be paged via the offset
and limit
parameters. For example, to get 10 groups with the largest counts the inputs
would be: limit=10, options={"sort_order":"descending", "sort_by":"value"}.
options
can be used to customize behavior of this call e.g.
filtering or sorting the results.
To group by columns 'x' and 'y' and compute the number of objects within each group, use: column_names=['x','y','count(*)'].
To also compute the sum of 'z' over each group, use: column_names=['x','y','count(*)','sum(z)'].
Available aggregation functions are: count(*), sum, min, max, avg, mean, stddev, stddev_pop, stddev_samp, var, var_pop, var_samp, arg_min, arg_max and count_distinct.
Available grouping functions are Rollup, Cube, and Grouping Sets
This service also provides support for Pivot operations.
Filtering on aggregates is supported via expressions using aggregation functions supplied to having
.
The response is returned as a dynamic schema. For details see: dynamic schemas documentation.
If a result_table
name is specified in the options
, the
results are stored in a new table with that name--no results are returned in
the response. Both the table name and resulting column names must adhere to
standard
naming conventions; column/aggregation expressions will need to be
aliased. If the source table's shard
key is used as the grouping column(s) and all result records are
selected (offset
is 0 and limit
is -9999), the result table
will be sharded, in all other cases it will be replicated. Sorting will
properly function only if the result table is replicated or if there is only
one processing node and should not be relied upon in other cases. Not
available when any of the values of columnNames
is an
unrestricted-length string.
Modifier and Type | Class and Description |
---|---|
static class |
AggregateGroupByRequest.Encoding
Specifies the encoding for returned records.
|
static class |
AggregateGroupByRequest.Options
Optional parameters.
|
Constructor and Description |
---|
AggregateGroupByRequest()
Constructs an AggregateGroupByRequest object with default parameters.
|
AggregateGroupByRequest(String tableName,
List<String> columnNames,
long offset,
long limit,
Map<String,String> options)
Constructs an AggregateGroupByRequest object with the specified
parameters.
|
AggregateGroupByRequest(String tableName,
List<String> columnNames,
long offset,
long limit,
String encoding,
Map<String,String> options)
Constructs an AggregateGroupByRequest object with the specified
parameters.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
Object |
get(int index)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
static org.apache.avro.Schema |
getClassSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
List<String> |
getColumnNames() |
String |
getEncoding() |
long |
getLimit() |
long |
getOffset() |
Map<String,String> |
getOptions() |
org.apache.avro.Schema |
getSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getTableName() |
int |
hashCode() |
void |
put(int index,
Object value)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
AggregateGroupByRequest |
setColumnNames(List<String> columnNames) |
AggregateGroupByRequest |
setEncoding(String encoding) |
AggregateGroupByRequest |
setLimit(long limit) |
AggregateGroupByRequest |
setOffset(long offset) |
AggregateGroupByRequest |
setOptions(Map<String,String> options) |
AggregateGroupByRequest |
setTableName(String tableName) |
String |
toString() |
public AggregateGroupByRequest()
public AggregateGroupByRequest(String tableName, List<String> columnNames, long offset, long limit, Map<String,String> options)
tableName
- Name of an existing table or view on which the
operation will be performed.columnNames
- List of one or more column names, expressions, and
aggregate expressions.offset
- A positive integer indicating the number of initial
results to skip (this can be useful for paging through
the results). The default value is 0.The minimum allowed
value is 0. The maximum allowed value is MAX_INT.limit
- A positive integer indicating the maximum number of
results to be returned, or END_OF_SET (-9999) to indicate
that the max number of results should be returned. The
number of records returned will never exceed the server's
own limit, defined by the max_get_records_size parameter in the
server configuration. Use hasMoreRecords
to see
if more records exist in the result to be fetched, and
offset
& limit
to request subsequent pages
of results. The default value is -9999.options
- Optional parameters.
COLLECTION_NAME
: Name of a collection which is to
contain the table specified in result_table
. If
the collection provided is non-existent, the collection
will be automatically created. If empty, then the table
will be a top-level table.
EXPRESSION
: Filter expression to apply to the table
prior to computing the aggregate group by.
HAVING
: Filter expression to apply to the aggregated
results.
SORT_ORDER
: String indicating how the returned values
should be sorted - ascending or descending.
Supported values:
ASCENDING
: Indicates that the returned values should be
sorted in ascending order.
DESCENDING
: Indicates that the returned values should
be sorted in descending order.
ASCENDING
.
SORT_BY
: String determining how the results are sorted.
Supported values:
KEY
: Indicates that the returned values should be
sorted by key, which corresponds to the grouping
columns. If you have multiple grouping columns (and are
sorting by key), it will first sort the first grouping
column, then the second grouping column, etc.
VALUE
: Indicates that the returned values should be
sorted by value, which corresponds to the aggregates. If
you have multiple aggregates (and are sorting by value),
it will first sort by the first aggregate, then the
second aggregate, etc.
VALUE
.
RESULT_TABLE
: The name of the table used to store the
results. Has the same naming restrictions as tables. Column names (group-by and
aggregate fields) need to be given aliases e.g.
["FChar256 as fchar256", "sum(FDouble) as sfd"]. If
present, no results are returned in the response. This
option is not available if one of the grouping
attributes is an unrestricted string (i.e.; not charN)
type.
RESULT_TABLE_PERSIST
: If true
, then the result
table specified in result_table
will be
persisted and will not expire unless a ttl
is
specified. If false
, then the result table
will be an in-memory table and will expire unless a
ttl
is specified otherwise.
Supported values:
The default value is FALSE
.
RESULT_TABLE_FORCE_REPLICATED
: Force the result table
to be replicated (ignores any sharding). Must be used in
combination with the result_table
option.
Supported values:
The default value is FALSE
.
RESULT_TABLE_GENERATE_PK
: If true
then set a
primary key for the result table. Must be used in
combination with the result_table
option.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified in result_table
.
CHUNK_SIZE
: Indicates the number of records per chunk
to be used for the result table. Must be used in
combination with the result_table
option.
CREATE_INDEXES
: Comma-separated list of columns on
which to create indexes on the result table. Must be
used in combination with the result_table
option.
VIEW_ID
: ID of view of which the result table will be a
member. The default value is ''.
MATERIALIZE_ON_GPU
: No longer used. See Resource Management Concepts for
information about how resources are managed, Tier Strategy Concepts for how
resources are targeted for VRAM, and Tier Strategy Usage for how to specify
a table's priority in VRAM.
Supported values:
The default value is FALSE
.
PIVOT
: pivot column
PIVOT_VALUES
: The value list provided will become the
column headers in the output. Should be the values from
the pivot_column.
GROUPING_SETS
: Customize the grouping attribute sets to
compute the aggregates. These sets can include ROLLUP or
CUBE operartors. The attribute sets should be enclosed
in paranthesis and can include composite attributes. All
attributes specified in the grouping sets must present
in the groupby attributes.
ROLLUP
: This option is used to specify the multilevel
aggregates.
CUBE
: This option is used to specify the
multidimensional aggregates.
Map
.public AggregateGroupByRequest(String tableName, List<String> columnNames, long offset, long limit, String encoding, Map<String,String> options)
tableName
- Name of an existing table or view on which the
operation will be performed.columnNames
- List of one or more column names, expressions, and
aggregate expressions.offset
- A positive integer indicating the number of initial
results to skip (this can be useful for paging through
the results). The default value is 0.The minimum allowed
value is 0. The maximum allowed value is MAX_INT.limit
- A positive integer indicating the maximum number of
results to be returned, or END_OF_SET (-9999) to indicate
that the max number of results should be returned. The
number of records returned will never exceed the server's
own limit, defined by the max_get_records_size parameter in the
server configuration. Use hasMoreRecords
to see
if more records exist in the result to be fetched, and
offset
& limit
to request subsequent pages
of results. The default value is -9999.encoding
- Specifies the encoding for returned records.
Supported values:
BINARY
: Indicates that the returned records should be
binary encoded.
JSON
: Indicates that the returned records should be
json encoded.
BINARY
.options
- Optional parameters.
COLLECTION_NAME
: Name of a collection which is to
contain the table specified in result_table
. If
the collection provided is non-existent, the collection
will be automatically created. If empty, then the table
will be a top-level table.
EXPRESSION
: Filter expression to apply to the table
prior to computing the aggregate group by.
HAVING
: Filter expression to apply to the aggregated
results.
SORT_ORDER
: String indicating how the returned values
should be sorted - ascending or descending.
Supported values:
ASCENDING
: Indicates that the returned values should be
sorted in ascending order.
DESCENDING
: Indicates that the returned values should
be sorted in descending order.
ASCENDING
.
SORT_BY
: String determining how the results are sorted.
Supported values:
KEY
: Indicates that the returned values should be
sorted by key, which corresponds to the grouping
columns. If you have multiple grouping columns (and are
sorting by key), it will first sort the first grouping
column, then the second grouping column, etc.
VALUE
: Indicates that the returned values should be
sorted by value, which corresponds to the aggregates. If
you have multiple aggregates (and are sorting by value),
it will first sort by the first aggregate, then the
second aggregate, etc.
VALUE
.
RESULT_TABLE
: The name of the table used to store the
results. Has the same naming restrictions as tables. Column names (group-by and
aggregate fields) need to be given aliases e.g.
["FChar256 as fchar256", "sum(FDouble) as sfd"]. If
present, no results are returned in the response. This
option is not available if one of the grouping
attributes is an unrestricted string (i.e.; not charN)
type.
RESULT_TABLE_PERSIST
: If true
, then the result
table specified in result_table
will be
persisted and will not expire unless a ttl
is
specified. If false
, then the result table
will be an in-memory table and will expire unless a
ttl
is specified otherwise.
Supported values:
The default value is FALSE
.
RESULT_TABLE_FORCE_REPLICATED
: Force the result table
to be replicated (ignores any sharding). Must be used in
combination with the result_table
option.
Supported values:
The default value is FALSE
.
RESULT_TABLE_GENERATE_PK
: If true
then set a
primary key for the result table. Must be used in
combination with the result_table
option.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified in result_table
.
CHUNK_SIZE
: Indicates the number of records per chunk
to be used for the result table. Must be used in
combination with the result_table
option.
CREATE_INDEXES
: Comma-separated list of columns on
which to create indexes on the result table. Must be
used in combination with the result_table
option.
VIEW_ID
: ID of view of which the result table will be a
member. The default value is ''.
MATERIALIZE_ON_GPU
: No longer used. See Resource Management Concepts for
information about how resources are managed, Tier Strategy Concepts for how
resources are targeted for VRAM, and Tier Strategy Usage for how to specify
a table's priority in VRAM.
Supported values:
The default value is FALSE
.
PIVOT
: pivot column
PIVOT_VALUES
: The value list provided will become the
column headers in the output. Should be the values from
the pivot_column.
GROUPING_SETS
: Customize the grouping attribute sets to
compute the aggregates. These sets can include ROLLUP or
CUBE operartors. The attribute sets should be enclosed
in paranthesis and can include composite attributes. All
attributes specified in the grouping sets must present
in the groupby attributes.
ROLLUP
: This option is used to specify the multilevel
aggregates.
CUBE
: This option is used to specify the
multidimensional aggregates.
Map
.public static org.apache.avro.Schema getClassSchema()
public String getTableName()
public AggregateGroupByRequest setTableName(String tableName)
tableName
- Name of an existing table or view on which the
operation will be performed.this
to mimic the builder pattern.public List<String> getColumnNames()
public AggregateGroupByRequest setColumnNames(List<String> columnNames)
columnNames
- List of one or more column names, expressions, and
aggregate expressions.this
to mimic the builder pattern.public long getOffset()
public AggregateGroupByRequest setOffset(long offset)
offset
- A positive integer indicating the number of initial
results to skip (this can be useful for paging through
the results). The default value is 0.The minimum allowed
value is 0. The maximum allowed value is MAX_INT.this
to mimic the builder pattern.public long getLimit()
hasMoreRecords
to see if more
records exist in the result to be fetched, and offset
&
limit
to request subsequent pages of results. The
default value is -9999.public AggregateGroupByRequest setLimit(long limit)
limit
- A positive integer indicating the maximum number of
results to be returned, or END_OF_SET (-9999) to indicate
that the max number of results should be returned. The
number of records returned will never exceed the server's
own limit, defined by the max_get_records_size parameter in the
server configuration. Use hasMoreRecords
to see
if more records exist in the result to be fetched, and
offset
& limit
to request subsequent pages
of results. The default value is -9999.this
to mimic the builder pattern.public String getEncoding()
public AggregateGroupByRequest setEncoding(String encoding)
public Map<String,String> getOptions()
COLLECTION_NAME
: Name of a collection which is to contain the
table specified in result_table
. If the collection
provided is non-existent, the collection will be automatically
created. If empty, then the table will be a top-level table.
EXPRESSION
: Filter expression to apply to the table prior to
computing the aggregate group by.
HAVING
: Filter expression to apply to the aggregated results.
SORT_ORDER
: String indicating how the returned values should be
sorted - ascending or descending.
Supported values:
ASCENDING
: Indicates that the returned values should be sorted
in ascending order.
DESCENDING
: Indicates that the returned values should be sorted
in descending order.
ASCENDING
.
SORT_BY
: String determining how the results are sorted.
Supported values:
KEY
:
Indicates that the returned values should be sorted by key,
which corresponds to the grouping columns. If you have multiple
grouping columns (and are sorting by key), it will first sort
the first grouping column, then the second grouping column, etc.
VALUE
:
Indicates that the returned values should be sorted by value,
which corresponds to the aggregates. If you have multiple
aggregates (and are sorting by value), it will first sort by the
first aggregate, then the second aggregate, etc.
VALUE
.
RESULT_TABLE
: The name of the table used to store the results.
Has the same naming restrictions as tables. Column names (group-by and aggregate
fields) need to be given aliases e.g. ["FChar256 as fchar256",
"sum(FDouble) as sfd"]. If present, no results are returned in
the response. This option is not available if one of the
grouping attributes is an unrestricted string (i.e.; not charN)
type.
RESULT_TABLE_PERSIST
: If true
, then the result table
specified in result_table
will be persisted and will not
expire unless a ttl
is specified. If false
,
then the result table will be an in-memory table and will expire
unless a ttl
is specified otherwise.
Supported values:
The default value is FALSE
.
RESULT_TABLE_FORCE_REPLICATED
: Force the result table to be
replicated (ignores any sharding). Must be used in combination
with the result_table
option.
Supported values:
The default value is FALSE
.
RESULT_TABLE_GENERATE_PK
: If true
then set a primary
key for the result table. Must be used in combination with the
result_table
option.
Supported values:
The default value is FALSE
.
TTL
:
Sets the TTL of the table specified in result_table
.
CHUNK_SIZE
: Indicates the number of records per chunk to be
used for the result table. Must be used in combination with the
result_table
option.
CREATE_INDEXES
: Comma-separated list of columns on which to
create indexes on the result table. Must be used in combination
with the result_table
option.
VIEW_ID
: ID of view of which the result table will be a member.
The default value is ''.
MATERIALIZE_ON_GPU
: No longer used. See Resource
Management Concepts for information about how resources are
managed, Tier Strategy Concepts for how resources are
targeted for VRAM, and Tier Strategy Usage for how to specify a
table's priority in VRAM.
Supported values:
The default value is FALSE
.
PIVOT
:
pivot column
PIVOT_VALUES
: The value list provided will become the column
headers in the output. Should be the values from the
pivot_column.
GROUPING_SETS
: Customize the grouping attribute sets to compute
the aggregates. These sets can include ROLLUP or CUBE
operartors. The attribute sets should be enclosed in paranthesis
and can include composite attributes. All attributes specified
in the grouping sets must present in the groupby attributes.
ROLLUP
: This option is used to specify the multilevel
aggregates.
CUBE
:
This option is used to specify the multidimensional aggregates.
Map
.public AggregateGroupByRequest setOptions(Map<String,String> options)
options
- Optional parameters.
COLLECTION_NAME
: Name of a collection which is to
contain the table specified in result_table
. If
the collection provided is non-existent, the collection
will be automatically created. If empty, then the table
will be a top-level table.
EXPRESSION
: Filter expression to apply to the table
prior to computing the aggregate group by.
HAVING
: Filter expression to apply to the aggregated
results.
SORT_ORDER
: String indicating how the returned values
should be sorted - ascending or descending.
Supported values:
ASCENDING
: Indicates that the returned values should be
sorted in ascending order.
DESCENDING
: Indicates that the returned values should
be sorted in descending order.
ASCENDING
.
SORT_BY
: String determining how the results are sorted.
Supported values:
KEY
: Indicates that the returned values should be
sorted by key, which corresponds to the grouping
columns. If you have multiple grouping columns (and are
sorting by key), it will first sort the first grouping
column, then the second grouping column, etc.
VALUE
: Indicates that the returned values should be
sorted by value, which corresponds to the aggregates. If
you have multiple aggregates (and are sorting by value),
it will first sort by the first aggregate, then the
second aggregate, etc.
VALUE
.
RESULT_TABLE
: The name of the table used to store the
results. Has the same naming restrictions as tables. Column names (group-by and
aggregate fields) need to be given aliases e.g.
["FChar256 as fchar256", "sum(FDouble) as sfd"]. If
present, no results are returned in the response. This
option is not available if one of the grouping
attributes is an unrestricted string (i.e.; not charN)
type.
RESULT_TABLE_PERSIST
: If true
, then the result
table specified in result_table
will be
persisted and will not expire unless a ttl
is
specified. If false
, then the result table
will be an in-memory table and will expire unless a
ttl
is specified otherwise.
Supported values:
The default value is FALSE
.
RESULT_TABLE_FORCE_REPLICATED
: Force the result table
to be replicated (ignores any sharding). Must be used in
combination with the result_table
option.
Supported values:
The default value is FALSE
.
RESULT_TABLE_GENERATE_PK
: If true
then set a
primary key for the result table. Must be used in
combination with the result_table
option.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified in result_table
.
CHUNK_SIZE
: Indicates the number of records per chunk
to be used for the result table. Must be used in
combination with the result_table
option.
CREATE_INDEXES
: Comma-separated list of columns on
which to create indexes on the result table. Must be
used in combination with the result_table
option.
VIEW_ID
: ID of view of which the result table will be a
member. The default value is ''.
MATERIALIZE_ON_GPU
: No longer used. See Resource Management Concepts for
information about how resources are managed, Tier Strategy Concepts for how
resources are targeted for VRAM, and Tier Strategy Usage for how to specify
a table's priority in VRAM.
Supported values:
The default value is FALSE
.
PIVOT
: pivot column
PIVOT_VALUES
: The value list provided will become the
column headers in the output. Should be the values from
the pivot_column.
GROUPING_SETS
: Customize the grouping attribute sets to
compute the aggregates. These sets can include ROLLUP or
CUBE operartors. The attribute sets should be enclosed
in paranthesis and can include composite attributes. All
attributes specified in the grouping sets must present
in the groupby attributes.
ROLLUP
: This option is used to specify the multilevel
aggregates.
CUBE
: This option is used to specify the
multidimensional aggregates.
Map
.this
to mimic the builder pattern.public org.apache.avro.Schema getSchema()
getSchema
in interface org.apache.avro.generic.GenericContainer
public Object get(int index)
get
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to getIndexOutOfBoundsException
public void put(int index, Object value)
put
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to setvalue
- the value to setIndexOutOfBoundsException
Copyright © 2020. All rights reserved.