column_name parameter.
For performing window functions in SQL, see Window.
- If the source data set is sharded and the shard key is included in the projected column list; columns in a multi-column shard key must appear in the projected column list in the same relative order they appear in the shard key, though they do not need to be listed consecutively
- If the projection call has a shard key specified
Function Details
A window in Kinetica is constructed like so:Function Type
A window function either be of aggregate or ranking type.Aggregate
AVG(expr)
AVG(expr)
Calculates the average of the given expression
expr over the specified window frameCOUNT(expr)
COUNT(expr)
Calculates the count of the given expression
expr over the specified window frameMAX(expr)
MAX(expr)
Calculates the maximum value of the given expression
expr over the specified window frameMEAN(expr)
MEAN(expr)
Alias for
AVG(). Calculates the average of the given expression expr over the specified
window frameMIN(expr)
MIN(expr)
Calculates the minimum value of the given expression
expr over the specified window framePRODUCT(expr)
PRODUCT(expr)
Calculates the product of the given expression
expr over the specified window frameRATIO_TO_REPORT(expr)
RATIO_TO_REPORT(expr)
Calculates the ratio of the value of
expr to the sum of expr over the specified window
frame. Note that ORDER BY is not supported for this function.STDDEV(expr)
STDDEV(expr)
Alias for
STDDEV_POP(). Calculates the population standard deviation of the given
expression expr over the specified window frameSTDDEV_POP(expr)
STDDEV_POP(expr)
Calculates the population standard deviation of the given expression
expr over the specified
window frameSTDDEV_SAMP(expr)
STDDEV_SAMP(expr)
Calculates the sample standard deviation of the given expression
expr over the specified
window frameSUM(expr)
SUM(expr)
Calculates the sum of the given expression
expr over the specified window frameVAR(expr)
VAR(expr)
Alias for
VAR_POP(). Calculates the population variance of the given expression expr over
the specified window frameVAR_POP(expr)
VAR_POP(expr)
Calculates the population variance of the given expression
expr over the specified window
frameVAR_SAMP(expr)
VAR_SAMP(expr)
Calculates the sample variance of the given expression
expr over the specified window frameRanking
CUME_DIST()
CUME_DIST()
The relative position of the current row within the cumulative distribution of the selected
partition, expressed as a percentage from This function is shorthand for using the
0 (exclusive) to 1 (inclusive). The formula for
this calculation is as follows:COUNT() function in separate partition statements to
arrive at the same result:DENSE_RANK()
DENSE_RANK()
Number of the current row within the selected partition except rows with identical values
evaluate to different ranks. Starts at 1
FIRST_VALUE(<column>) [<IGNORE | RESPECT> NULLS]
FIRST_VALUE(<column>) [<IGNORE | RESPECT> NULLS]
The value found in the first row within a frame of the given expression. Optionally, add
IGNORE NULLS or RESPECT NULLS to the function syntax to ignore or respect nulls,
respectively.LAG(<column>[, <num>]) [<IGNORE | RESPECT> NULLS]
LAG(<column>[, <num>]) [<IGNORE | RESPECT> NULLS]
The value of the row before the given expression’s value. Provide an additional comma-separated
value to specify which row to select, e.g.,
LAG(vendor_id, 3) would list the value in the
vendor_id column from three rows prior to the current row. Optionally, add IGNORE NULLS
or RESPECT NULLS to the function syntax to ignore or respect nulls respectively.LAST_VALUE(<column>) [<IGNORE | RESPECT> NULLS]
LAST_VALUE(<column>) [<IGNORE | RESPECT> NULLS]
The value found in the last row within a frame of the given expression. Optionally, add
IGNORE NULLS or RESPECT NULLS to the function syntax to ignore or respect
nulls respectively.LEAD(<column>[, <num>]) [<IGNORE | RESPECT> NULLS]
LEAD(<column>[, <num>]) [<IGNORE | RESPECT> NULLS]
The value of the row after the given expression’s value. Provide an additional comma-separated
value to specify which row to select, e.g.,
LEAD(vendor_id, 3) would list the value in the
vendor_id column from three rows after the current row. Optionally, add IGNORE NULLS
or RESPECT NULLS to the function syntax to ignore or respect nulls respectively.NTILE(<num of groups>)
NTILE(<num of groups>)
The group number of the row after partitioning the rows into
num of groups groups. For
example, NTILE(4) will partition data by quartiles and return the associated group number,
1 to 4.PERCENT_RANK()
PERCENT_RANK()
The rank of the current row within the selected partition, expressed as a percentage from This function is shorthand for using the
0
to 1, inclusive. The formula for this calculation is as follows:RANK() & COUNT() functions in separate partition
statements to arrive at the same result:RANK()
RANK()
Number of the current row within the selected partition. However, rows with identical values
evaluate to the same rank. Starts at 1
ROW_NUMBER()
ROW_NUMBER()
Number of the current row within the selected partition. Starts at 1
Window Specification
A window specification is constructed like so:-
PARTITION BYclauses can contain several comma-separated keys that define the partitions, e.g.,PARTITION BY vendor_id, payment_type. -
Omitting the
PARTITION BYclause will create a window spanning the entire data set, effectively windowing without a partition.This will force all windowed data to a single node for processing and could consume a great deal of memory on that node. -
The ordering expression list, while optional, is useful in sorting the
individual partitions. For
RANGEframes, theORDER BYcan have a single column, which the frame will use for bounding. ForROWSframes, any number of columns can be included in theORDER BY; e.g.,ORDER BY vendor_id, payment_type. Sort order can be specified usingASC(ascending, the default) andDESC(descending) keywords. The default null ordering for an ascending sort isNULLS FIRST; the default for a descending sort isNULLS LAST. -
All window functions except for
FIRST_VALUE()&LAST_VALUE()require anORDER BYclause.
Frame Clauses
A frame clause within a window specification is constructed like so:RANGE) or number of rows
(ROWS):
Range Frames
RANGE specifies that containment within a frame should be determined by the
value in the column specified in the ORDER BY clause. This has several
implications:
- Peer rows (rows containing the same value in the
ORDER BYcolumn) are always included in a range frame; for any given row, an aggregate function applied to its frame will have the same result for all of its peer rows. - The bounds defined in the frame start & end are based on the column’s values;
e.g.,
BETWEEN 100 PRECEDING AND CURRENT ROWmeans a frame will contain records withORDER BYcolumn values greater than or equal to the current record’s column value minus 100 and less than or equal to the current record’s column value. - The
ORDER BYclause can specify only one column, as this column will be used to bound the frame by its values.
Rows Frames
ROWS specifies that containment within a frame should be determined by the
ordering of the rows, as specified in the ORDER BY clause. This also has
several implications:
- Peer rows are not considered when bounding a rows frame; only the relative ordering of the rows is considered.
- The bounds defined in the frame start & end are based on the row’s relative
position within the ordered result set; e.g.,
BETWEEN 100 PRECEDING AND CURRENT ROWmeans a frame will contain 100 records before the current row as well as the current row. - The
ORDER BYclause can contain multiple comma-separated columns.
Frame Bounds
A frame has a start and end specification:-
Frame start keywords:
UNBOUNDED PRECEDING: the first row of the partition<number> PRECEDING: either n rows before the current row (for rows-based frames), or n values less than the current row’s value (for range-based frames)CURRENT ROW: either the current row (for rows-based frames), or the current row and its peer rows (for range-based frames)<number> FOLLOWING: either n rows after the current row (for rows-based frames), or n values greater than the current row’s value (for range-based frames)
-
Frame end keywords:
UNBOUNDED FOLLOWING: the last row of the partition<number> FOLLOWING: either n rows after the current row (for rows-based frames), or n values greater than the current row’s value (for range-based frames)CURRENT ROW: either the current row (for rows-based frames), or the current row and its peer rows (for range-based frames)<number> PRECEDING: either n rows before the current row (for rows-based frames), or n values less than the current row’s value (for range-based frames)
The frame start must begin before the end of the frame end.
10 PRECEDING will, for the 5th record in the data set, be narrowed to
consider just the first four records as the beginning of the frame in any
calculations.
If you use the
<number> FOLLOWING or
<number> PRECEDING keywords as both the frame start and end
keywords, the first <number> used must be less than or equal
to the second <number>Examples
The following would calculate a rolling sum of total amounts collected per vendor for the first two hours after midnight on January 1st, 2015, as well as the number of trips that occurred within 5 minutes of each trip:Limitations
- Empty window specifications are currently not supported.
-
Any ranking function other than
FIRST_VALUE()&LAST VALUEdoes not accept a frame clause. -
Any ranking function other than
FIRST_VALUE()&LAST VALUErequires anORDER BYclause. - Window functions executed via /create/projection will have the corresponding projection limitations.