The grouping sets operation calculates aggregates over any user-specified dimensions. It offers much more flexibility and customizability over the Cube and Rollup functions, while allowing both as possible dimensions over which aggregation can be performed.
For example, given grouping sets of B
, (D,E)
, & (ROLLUP(A,C))
,
it computes the requested aggregates for the following combinations of columns:
{ B }
- unique B
values{ DE}
- unique D E
pairs{A C }
- unique A C
pairs{A }
- unique A
values{ }
- all valuesThe grouping sets operation is an aggregate function that can be invoked
natively in the options
map of the /aggregate/groupby endpoint.
It is also available via the SQL GROUPING SETS function.
Be mindful that grouping sets is an expensive operation.
If using the grouping sets operation with a column that contains
null values, it may not be apparent which aggregations pertain to
the column's null values and which pertain to all the column's values, as both
will be represented by null
in the result set. To avoid this confusion,
it's recommended to use the GROUPING()
function to make a distinction between null values in the data and null
grouping values generated by the grouping sets operation. There will be an
example of this below.
The following example uses the REST /aggregate/groupby endpoint to perform a grouping sets operation. For SQL examples, see the GROUPING SETS section in SQL Support.
The following example uses the Python API to perform a grouping sets operation. For SQL examples, see the GROUPING SETS section in SQL Support.
The following request will aggregate the average opening stock price for these groups:
h_db.aggregate_group_by(
table_name = "stocks",
column_names = [
"Sector",
"Symbol",
"IF(Grouping(Sector) = 0, Sector, '<ALL SECTORS>') as SectorGroup",
"IF(Grouping(Symbol) = 0, Symbol, '<ALL SYMBOLS>') as SymbolGroup",
"AVG(Open) as AvgOpen"
],
options = {"grouping_sets": "((Sector), (Symbol), ())"}
)
column_names
parameter