> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kinetica.com/llms.txt
> Use this file to discover all available pages before exploring further.

# /aggregate/groupby

```
URL: http://<db.host>:<db.port>/aggregate/groupby
```

Calculates unique combinations (groups) of values for the given columns in a
given table or view and computes aggregates on each unique combination. This is
somewhat analogous to an SQL-style SELECT...GROUP BY.

For aggregation details and examples, see
[Aggregation](../../concepts/aggregation/).  For limitations, see
[Aggregation Limitations](../../concepts/aggregation/#limitations).

Any column(s) can be grouped on, and all column types except
unrestricted-length strings may be used for computing applicable aggregates.

The results can be paged via the input parameter *offset* and input parameter
*limit* parameters. For example, to get 10 groups with the largest counts the
inputs would be: limit=10, options=\{"sort\_order":"descending",
"sort\_by":"value"}.

Input parameter *options* can be used to customize behavior of this call e.g.
filtering or sorting the results.

To group by columns 'x' and 'y' and compute the number of objects within each
group, use:  column\_names=\['x','y','count(\*)'].

To also compute the sum of 'z' over each group, use:
column\_names=\['x','y','count(\*)','sum(z)'].

Available
[aggregation functions](../../concepts/expressions/#aggregate-expressions)
are: count(\*), sum, min, max, avg, mean, stddev, stddev\_pop, stddev\_samp, var,
var\_pop, var\_samp, arg\_min, arg\_max and count\_distinct.

Available grouping functions are [Rollup](../../concepts/rollup/),
[Cube](../../concepts/cube/), and
[Grouping Sets](../../concepts/grouping_sets/)

This service also provides support for [Pivot](../../concepts/pivot/)
operations.

Filtering on aggregates is supported via expressions using
[aggregation functions](../../concepts/expressions/#aggregate-expressions)
supplied to *having*.

The response is returned as a dynamic schema. For details see:
[dynamic schemas documentation](../../api/concepts/#dynamic-schemas).

If a *result\_table* name is specified in the input parameter *options*, the
results are stored in a new table with that name--no results are returned in
the response.  Both the table name and resulting column names must adhere to
[standard naming conventions](../../concepts/tables/#table);
column/aggregation expressions will need to be aliased.  If the source table's
[shard key](../../concepts/tables/#shard-keys) is used as the grouping
column(s) and all result records are selected (input parameter *offset* is 0
and input parameter *limit* is -9999), the result table will be sharded, in all
other cases it will be replicated.  Sorting will properly function only if the
result table is replicated or if there is only one processing node and should
not be relied upon in other cases.  Not available when any of the values of
input parameter *column\_names* is an unrestricted-length string.

## Input Parameter Description

<ParamField body="table_name" type="string">
  Name of an existing table or view on which the operation will be performed, in \[schema\_name.]table\_name format, using standard [name resolution rules](../../concepts/tables/#table-name-resolution).
</ParamField>

<ParamField body="column_names" type="array of strings">
  List of one or more column names, expressions, and aggregate expressions.
</ParamField>

<ParamField body="offset" type="long">
  A positive integer indicating the number of initial results to skip (this can be useful for paging through the results).

  The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX\_INT.
</ParamField>

<ParamField body="limit" type="long">
  A positive integer indicating the maximum number of results to be returned, or END\_OF\_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned.  The number of records returned will never exceed the server's own limit, defined by the [max\_get\_records\_size](../../config/#config-main-general) parameter in the server configuration. Use output parameter *has\_more\_records* to see if more records exist in the result to be fetched, and input parameter *offset* and input parameter *limit* to request subsequent pages of results.

  The default value is -9999.
</ParamField>

<ParamField body="encoding" type="string">
  Specifies the encoding for returned records.

  The default value is `binary`.

  * **binary**: Indicates that the returned records should be binary encoded.
  * **json**: Indicates that the returned records should be JSON-encoded.
</ParamField>

<ParamField body="options" type="map of string to strings">
  Optional parameters.

  The default value is an empty map ( \{} ).

  <Expandable title="options">
    <ParamField body="create_temp_table">
      If *true*, a unique temporary table name will be generated in the sys\_temp schema and used in place of *result\_table*. If *result\_table\_persist* is *false* (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in *qualified\_result\_table\_name*.

      The default value is `false`.

      The supported values are:

      * true
      * false
    </ParamField>

    <ParamField body="collection_name">
      \[DEPRECATED--please specify the containing schema as part of *result\_table* and use [/create/schema](/content/api/rest/create_schema_rest) to create the schema if non-existent]  Name of a schema which is to contain the table specified in *result\_table*. If the schema provided is non-existent, it will be automatically created.
    </ParamField>

    <ParamField body="expression">
      Filter expression to apply to the table prior to computing the aggregate group by.
    </ParamField>

    <ParamField body="pipelined_expression_evaluation">
      Evaluate the group-by during last JoinedSet filter plan step.

      The default value is `false`.

      The supported values are:

      * true
      * false
    </ParamField>

    <ParamField body="having">
      Filter expression to apply to the aggregated results.
    </ParamField>

    <ParamField body="sort_order">
      \[DEPRECATED--use order\_by instead] String indicating how the returned values should be sorted - ascending or descending.

      The default value is `ascending`.

      * **ascending**: Indicates that the returned values should be sorted in ascending order.
      * **descending**: Indicates that the returned values should be sorted in descending order.
    </ParamField>

    <ParamField body="sort_by">
      \[DEPRECATED--use order\_by instead] String determining how the results are sorted.

      The default value is `value`.

      * **key**: Indicates that the returned values should be sorted by key, which corresponds to the grouping columns. If you have multiple grouping columns (and are sorting by key), it will first sort the first grouping column, then the second grouping column, etc.
      * **value**: Indicates that the returned values should be sorted by value, which corresponds to the aggregates. If you have multiple aggregates (and are sorting by value), it will first sort by the first aggregate, then the second aggregate, etc.
    </ParamField>

    <ParamField body="order_by">
      Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., 'timestamp asc, x desc'.

      The default value is ''.
    </ParamField>

    <ParamField body="strategy_definition">
      The [tier strategy](../../rm/concepts/#tier-strategies) for the table and its columns.
    </ParamField>

    <ParamField body="compression_codec">
      The default [compression codec](../../concepts/column_compression/) for the result table's columns.
    </ParamField>

    <ParamField body="result_table">
      The name of a table used to store the results, in \[schema\_name.]table\_name format, using standard [name resolution rules](../../concepts/tables/#table-name-resolution) and meeting [table naming criteria](../../concepts/tables/#table-naming-criteria). Column names (group-by and aggregate fields) need to be given aliases e.g. \["FChar256 as fchar256", "sum(FDouble) as sfd"].  If present, no results are returned in the response.  This option is not available if one of the grouping attributes is an unrestricted string (i.e.; not charN) type.
    </ParamField>

    <ParamField body="result_table_persist">
      If *true*, then the result table specified in *result\_table* will be persisted and will not expire unless a *ttl* is specified.   If *false*, then the result table will be an in-memory table and will expire unless a *ttl* is specified otherwise.

      The default value is `false`.

      The supported values are:

      * true
      * false
    </ParamField>

    <ParamField body="result_table_force_replicated">
      Force the result table to be replicated (ignores any sharding). Must be used in combination with the *result\_table* option.

      The default value is `false`.

      The supported values are:

      * true
      * false
    </ParamField>

    <ParamField body="result_table_generate_pk">
      If *true* then set a primary key for the result table. Must be used in combination with the *result\_table* option.

      The default value is `false`.

      The supported values are:

      * true
      * false
    </ParamField>

    <ParamField body="result_table_generate_soft_pk">
      If *true* then set a soft primary key for the result table. Must be used in combination with the *result\_table* option.

      The default value is `false`.

      The supported values are:

      * true
      * false
    </ParamField>

    <ParamField body="ttl">
      Sets the [TTL](../../concepts/ttl/) of the table specified in *result\_table*.
    </ParamField>

    <ParamField body="chunk_size">
      Indicates the number of records per chunk to be used for the result table. Must be used in combination with the *result\_table* option.
    </ParamField>

    <ParamField body="chunk_column_max_memory">
      Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the *result\_table* option.
    </ParamField>

    <ParamField body="chunk_max_memory">
      Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the *result\_table* option.
    </ParamField>

    <ParamField body="create_indexes">
      Comma-separated list of columns on which to create indexes on the result table. Must be used in combination with the *result\_table* option.
    </ParamField>

    <ParamField body="partition_type">
      [Partitioning](../../concepts/tables/#partitioning) scheme to use for the result table.

      * **RANGE**: Use [range partitioning](../../concepts/tables/#partitioning-by-range).
      * **INTERVAL**: Use [interval partitioning](../../concepts/tables/#partitioning-by-interval).
      * **LIST**: Use [list partitioning](../../concepts/tables/#partitioning-by-list).
      * **HASH**: Use [hash partitioning](../../concepts/tables/#partitioning-by-hash).
      * **SERIES**: Use [series partitioning](../../concepts/tables/#partitioning-by-series).
    </ParamField>

    <ParamField body="partition_keys">
      Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by *partition\_definitions*.
    </ParamField>

    <ParamField body="partition_definitions">
      Comma-separated list of partition definitions, whose format depends on the choice of *partition\_type*.  See [range partitioning](../../concepts/tables/#partitioning-by-range), [interval partitioning](../../concepts/tables/#partitioning-by-interval), [list partitioning](../../concepts/tables/#partitioning-by-list), [hash partitioning](../../concepts/tables/#partitioning-by-hash), or [series partitioning](../../concepts/tables/#partitioning-by-series) for example formats.
    </ParamField>

    <ParamField body="is_automatic_partition">
      If *true*, a new partition will be created for values which don't fall into an existing partition.  Currently only supported for [list partitions](../../concepts/tables/#partitioning-by-list).

      The default value is `false`.

      The supported values are:

      * true
      * false
    </ParamField>

    <ParamField body="view_id">
      ID of view of which the result table will be a member.

      The default value is ''.
    </ParamField>

    <ParamField body="pivot">
      Pivot column.
    </ParamField>

    <ParamField body="pivot_values">
      Comma-separated list of the values in the *pivot* column.  The list provided will become the column header prefixes in the output.
    </ParamField>

    <ParamField body="grouping_sets">
      Customize the grouping attribute sets to compute the aggregates. These sets can include ROLLUP or CUBE operators. The attribute sets should be enclosed in parentheses and can include composite attributes. All attributes specified in the grouping sets must present in the group-by attributes.
    </ParamField>

    <ParamField body="rollup">
      This option is used to specify the multilevel aggregates.
    </ParamField>

    <ParamField body="cube">
      This option is used to specify the multidimensional aggregates.
    </ParamField>

    <ParamField body="shard_key">
      Comma-separated list of the columns to be sharded on; e.g. 'column1, column2'.  The columns specified must be present in input parameter *column\_names*.  If any alias is given for any column name, the alias must be used, rather than the original column name.

      The default value is ''.
    </ParamField>
  </Expandable>
</ParamField>

## Output Parameter Description

The Kinetica server embeds the endpoint response inside a standard response structure which contains status information and the actual response to the query.  Here is a description of the various fields of the wrapper:

<ResponseField name="status" type="String">
  'OK' or 'ERROR'
</ResponseField>

<ResponseField name="message" type="String">
  Empty if success or an error message
</ResponseField>

<ResponseField name="data_type" type="String">
  'aggregate\_group\_by\_response' or 'none' in case of an error
</ResponseField>

<ResponseField name="data" type="String">
  Empty string
</ResponseField>

<ResponseField name="data_str" type="JSON or String">
  This embedded JSON represents the result of the /aggregate/groupby endpoint:

  <Expandable title="data_str">
    <ResponseField name="response_schema_str" type="string">
      Avro schema of output parameter *binary\_encoded\_response* or output parameter *json\_encoded\_response*.
    </ResponseField>

    <ResponseField name="binary_encoded_response" type="bytes">
      Avro binary encoded response.
    </ResponseField>

    <ResponseField name="json_encoded_response" type="string">
      Avro JSON encoded response.
    </ResponseField>

    <ResponseField name="total_number_of_records" type="long">
      Total/Filtered number of records.  This may be an over-estimate if a limit was applied and there are additional records (i.e., when output parameter *has\_more\_records* is true).
    </ResponseField>

    <ResponseField name="has_more_records" type="boolean">
      Too many records. Returned a partial set.
    </ResponseField>

    <ResponseField name="info" type="map of string to strings">
      Additional information.

      The default value is an empty map ( \{} ).

      <Expandable title="info">
        <ResponseField name="qualified_result_table_name">
          The fully qualified name of the table (i.e. including the schema) used to store the results.
        </ResponseField>
      </Expandable>
    </ResponseField>
  </Expandable>

  Empty string in case of an error.
</ResponseField>
