Queries (SQL)

SELECT Statement Syntax
1
2
3
4
5
6
7
8
9
SELECT [DISTINCT | TOP <n>]
    <column expression> [EXCLUDE(<column exclusion list>)],...
FROM [<schema name>.]<table name> [<table alias>]
    [<join type> JOIN <join table name> [<join alias>] ON <join expr>],...
[WHERE <filtering expression list>]
[GROUP BY <grouping expression list>]
[HAVING <group filtering expression list>]
[ORDER BY <ordering expression list>]
[LIMIT [<offset>, ]<num rows>]

Note

In any column expression, a wildcard like * can be used to specify all columns, while <table name/alias>.* can be used to specify all columns from the given table.
The EXCLUDE function can be used to list columns to exclude from the wildcard expression it follows; for instance, the following can be used to select all columns from the employee table except for the ssn & date_of_birth columns:
EXCLUDE Example
1 2
SELECT e.* EXCLUDE(ssn, date_of_birth) FROM employee e
Table & column names can be double-quoted to use reserved words, e.g., "PERCENT"; or to use numbers or special characters in column names, e.g., "1234" or "key:value".
TOP <n> returns the first n records (up to 20000 records by default).
The grouping expression list may contain column names, aliases, expressions, or positions (e.g., GROUP BY 2 to aggregate on the 2nd column in the SELECT list).
The having expression list may contain grouping expressions or any grouping expression aliases defined in the SELECT list.
The ordering expression list may contain column names, expressions, or column positions (e.g., GROUP BY 2 to aggregate on the 2nd column in the SELECT list). The default ordering is ASC. The default null ordering is NULLS FIRST when using ascending order and NULLS LAST when using descending order. The general format for each comma-separated ordering expression in the list is:
ORDER BY Expression Syntax
1
<column name/alias/expression/position> [ASC | DESC] [NULLS FIRST | NULLS LAST]
LIMIT applies paging to the result set, starting at the 0-based offset (if specified) and returning num rows records.

SELECT Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT
    e.last_name || ', ' || e.first_name AS "Employee_Name",
    m.last_name || ', ' || m.first_name AS "Manager_Name"
FROM
    example.employee e
    LEFT JOIN example.employee m ON e.manager_id = m.id
WHERE
    e.dept_id IN (1, 2, 3)
ORDER BY
    m.id ASC NULLS FIRST,
    e.hire_date

Tableless Query

A query without a FROM clause can be used to return a single row of data containing a constant or expression.

For example, to select the current day of the week:

Tableless Query Example
1
SELECT DAYNAME(NOW()) AS "Today"

Note

A tableless query will create a result set backed by a replicated table, by default.

Join

The supported join types are:

INNER - matching rows between two tables
LEFT [OUTER] - matching rows between two tables, and rows in the left-hand table with no matching rows in the right-hand table
RIGHT [OUTER] - matching rows between two tables, and rows in the right-hand table with no matching rows in the left-hand table
FULL OUTER matching rows between two tables, and rows in both tables with no matching rows in the other
CROSS - all rows in one table matched against all rows in the other

There are two execution schemes that are used to process joins, depending on the distribution of the joined tables:

Local - highly performant, but native join criteria must be met
Distributed - highly flexible, as native join restrictions are lifted, but less performant due to interprocessor communication overhead and requires more memory & disk space to process

Important

Though the data distribution restrictions on native joins do not exist for joins made via SQL, following the join guidelines on sharding will result in much more performant queries.

Kinetica supports both JOIN ... ON and WHERE clause syntax for inner joins; all outer join types (LEFT, RIGHT, & FULL OUTER) require JOIN ... ON syntax.

For example, to list the name of each employee and the name of the employee's manager, using the WHERE clause to specify the join condition:

JOIN via WHERE Clause Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT
    e.last_name || ', ' || e.first_name AS "Employee_Name",
    m.last_name || ', ' || m.first_name AS "Manager_Name"
FROM
    example.employee e,
    example.employee m
WHERE
    e.manager_id = m.id
ORDER BY
    e.last_name,
    e.first_name

To list the name of each employee and the associated manager, even for employees that don't have a manager, using the JOIN ... ON syntax to specify the join condition:

JOIN via FROM Clause Example
1
2
3
4
5
6
7
8
9
SELECT
    e.last_name || ', ' || e.first_name AS "Employee_Name",
    m.last_name || ', ' || m.first_name AS "Manager_Name"
FROM
    example.employee e
    LEFT JOIN example.employee m ON e.manager_id = m.id
ORDER BY
    e.last_name,
    e.first_name

ASOF

Kinetica supports the notion of an inexact match join via the ASOF join function. This feature allows each left-side table record to be matched to a single right-side table record whose join column value is the smallest or largest value within a range relative to the left-side join column value. In the case where multiple right-side table records have the same smallest or largest value for a given left-side table record, only one of the right-side table records will be chosen (non-deterministically) and returned as part of the join.

ASOF Syntax
1
ASOF(<left_column>, <right_column>, <rel_range_begin>, <rel_range_end>, <MIN|MAX>)

The five parameters are:

left_column - name of the column to join on from the left-side table
right_column - name of the column to join on from the right-side table
rel_range_begin - constant value defining the position, relative to each left-side column value, of the beginning of the range in which to match right-side column values; use a negative constant to begin the range before the left-side column value, or a positive one to begin after it
rel_range_end - constant value defining the position, relative to each left-side column value, of the end of the range in which to match right-side column values; use a negative constant to end the range before the left-side column value, or a positive one to end after it
MIN|MAX - use MIN to return the right-side matched record with the smallest join column value; use MAX to return the right-side matched record with the greatest join column value

Effectively, each matched right-side column value must be:

>= <left-side column value> + rel_range_begin
<= <left-side column value> + rel_range_end

Important

ASOF joins are unsupported on some materialized views. In these situations, the materialized view can be recreated with the KI_HINT_PROJECT_MATERIALIZED_VIEW hint to allow it to be used in an ASOF join.

Examples

The following ASOF call might be used to list, for each flight arrival time, the soonest flight departure time that occurs between half an hour and an hour and a half after the arrival; effectively, the time-matching portion of a connecting flight query:

ASOF Time-Based Expression Example
1
ASOF(inbound.eta, outbound.etd, INTERVAL '30' MINUTE, INTERVAL '90' MINUTE, MIN)

This ASOF call returns right-side locations that are nearest eastward to each left-side location, for locations within 5 degrees of the left-side:

ASOF Distance-Based Expression Example
1
ASOF(b.x, n.x, .00001, 5, MIN)

For example, to match a set of stock trades to the opening prices for those stocks (if an opening price record exists within 24 hours prior to the trade), and to include trades for which there is no opening stock price record:

ASOF JOIN Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT
    t.id,
    t.dt AS execution_dt,
    q.open_dt AS quote_dt,
    t.price AS execution_price,
    q.open_price
FROM
    example.trades t
    LEFT JOIN example.quotes q ON
        t.ticker = q.symbol AND
        ASOF(t.dt, q.open_dt, INTERVAL '-1' DAY, INTERVAL '0' DAY, MAX)

While the ASOF join function can only be used as part of a join, it can effectively be made into a filter condition by sub-selecting the filter criteria in the FROM clause and joining on that criteria.

For instance, to look up the stock price for a given company as of a given date:

ASOF JOIN with Filter Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT
    t.ticker,
    t.asof_dt,
    q.open_dt,
    q.open_price
FROM
    (SELECT 'EBAY' AS ticker, DATETIME('2006-12-15 12:34:56') AS asof_dt) t
    LEFT JOIN example.quotes q ON
        t.ticker = q.symbol AND
        ASOF(t.asof_dt, q.open_dt, INTERVAL '-1' DAY, INTERVAL '0' DAY, MAX)

Aggregation

The GROUP BY clause can be used to segment data into groups and apply aggregate functions over the values within each group. Aggregation functions applied to data without a GROUP BY clause will be applied over the entire result set.

Note

GROUP BY can operate on columns, column expressions, column aliases, or the position of a member of the SELECT clause (where 1 is the first element).

For example, to find the average cab fare from the taxi data set:

Aggregation without GROUP BY Example
1
2
SELECT ROUND(AVG(total_amount),2) AS "Average_Fare"
FROM demo.nyctaxi

To find the minimum, maximum, & average trip distances, as well as the average passenger count for each vendor per year from the taxi data set (weeding out data with errant trip distances):

Aggregate with GROUP BY Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    vendor_id AS Vendor_ID,
    YEAR(pickup_datetime) AS Year,
    MAX(trip_distance) AS Max_Trip,
    MIN(trip_distance) AS Min_Trip,
    ROUND(AVG(trip_distance),2) AS Avg_Trip,
    INT(AVG(passenger_count)) AS Avg_Passenger_Count
FROM demo.nyctaxi
WHERE
    trip_distance > 0 AND
    trip_distance < 100
GROUP BY vendor_id, 2
ORDER BY Vendor_ID, Year

Grouping

The GROUP BY clause can also be used to apply the following grouping functions over the values within each group:

With each of these, the GROUPING() aggregate function can be used to distinguish aggregated null values in the data from null values generated by the ROLLUP, CUBE, or GROUPING SETS grouping function.

For instance, the following CASE will turn the aggregated null values in the Sector column into an <UNKNOWN SECTOR> group and the null value generated by the grouping function into an <ALL SECTORS> group:

GROUPING Expression Example
1
2
3
4
CASE
    WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
    ELSE NVL(Sector, '<UNKNOWN SECTOR>')
END AS Sector_Group,

ROLLUP

The ROLLUP(expr list) function calculates n + 1 aggregates for n number of columns in expr list.

For example, the following query will aggregate the average opening stock price for these groups:

Each market sector & stock symbol pair
Each market sector
All sectors and symbols

ROLLUP Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    CASE
        WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
        ELSE NVL(Sector, '<UNKNOWN SECTOR>')
    END AS Sector_Group,
    CASE
        WHEN (GROUPING(Symbol) = 1) THEN '<ALL SYMBOLS>'
        ELSE NVL(Symbol, '<UNKNOWN SYMBOL>')
    END AS Symbol_Group,
    AVG("Open") AS AvgOpen
FROM demo.Stocks
GROUP BY ROLLUP(Sector, Symbol)
ORDER BY Sector_Group, Symbol_Group

CUBE

The CUBE(expr list) function calculates 2ⁿ aggregates for n number of columns in expr list.

For example, the following query will aggregate the average opening stock price for these groups:

Each market sector & stock symbol pair
Each market sector
Each stock symbol
All sectors and symbols

CUBE Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    CASE
        WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
        ELSE NVL(Sector, '<UNKNOWN SECTOR>')
    END AS Sector_Group,
    CASE
        WHEN (GROUPING(Symbol) = 1) THEN '<ALL SYMBOLS>'
        ELSE NVL(Symbol, '<UNKNOWN SYMBOL>')
    END AS Symbol_Group,
    AVG("Open") AS AvgOpen
FROM demo.Stocks
GROUP BY CUBE(Sector, Symbol)
ORDER BY Sector_Group, Symbol_Group

GROUPING SETS

The GROUPING SETS(expr list) function calculates aggregates for each group of columns in expr list.

For example, the following query will aggregate the average opening stock price for these groups:

Each market sector
Each stock symbol
All sectors and symbols

GROUPING SETS Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    CASE
        WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
        ELSE NVL(Sector, '<UNKNOWN SECTOR>')
    END AS Sector_Group,
    CASE
        WHEN (GROUPING(Symbol) = 1) THEN '<ALL SYMBOLS>'
        ELSE NVL(Symbol, '<UNKNOWN SYMBOL>')
    END AS Symbol_Group,
    AVG("Open") AS AvgOpen
FROM demo.Stocks
GROUP BY GROUPING SETS((Sector), (Symbol), ())
ORDER BY Sector_Group, Symbol_Group

Window

Window functions are available through the use of the OVER clause, which can partition rows into frames. Different types of functions can be used to aggregate data over a sliding window.

Window Function Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT
    <window function> OVER (
        [PARTITION BY <column expression list>]
        [ORDER BY <ordering expression list>]
        [
            <RANGE | ROWS>
            <
                <UNBOUNDED PRECEDING | <number> PRECEDING | CURRENT ROW | <number> FOLLOWING>
                |
                BETWEEN <UNBOUNDED PRECEDING | <number> PRECEDING | CURRENT ROW | <number> FOLLOWING>
                    AND <UNBOUNDED FOLLOWING | <number> PRECEDING | CURRENT ROW | <number> FOLLOWING>
            >
        ]
    ) [AS <alias>]

The default frame type is:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Parameter Description

<window function> One of the supported window functions listed below

PARTITION BY Optional list of columns and/or column expressions to use in partitioning the data. The window function will be applied separately to each partition, defined by the set of records containing the same value(s) for the specified partition-by column(s). If no PARTITION BY clause is given, the window function will be calculated over the entire data set.

ORDER BY

Clause defining the ordering of records within each partition before applying the window function; optional when using FIRST_VALUE() or LAST_VALUE().

The ordering expression syntax is:

<column name/alias/expression/position> [ASC | DESC] [NULLS FIRST | NULLS LAST]

The default sort order is ascending (ASC). The default null ordering is NULLS FIRST when using ascending order and NULLS LAST when using descending order.

Note

Only one column can be specified in the ordering expression list when using RANGE. When using ROWS, the frame is applied after any ordering; so, while several columns may appear in the order expression list, there will be only one ROWS clause following the list.

RANGE

Range-based frame used in applying the window function to each record. The frame is based on the values in the column or column expression used in the ORDER BY clause. All records with this value within the specified relation to the current record's value will be used in calculating its window function result.

A range-based frame can be specified for any aggregate function or the FIRST_VALUE() or LAST_VALUE() ranking function.

For a frame between a specified lower bound and the current row:

RANGE <lower frame bound>

For a frame between a specified lower and upper bound:

RANGE BETWEEN <lower frame bound> AND <upper frame bound>

The frame bound can be one of the following:

Frame Bound	Description
`UNBOUNDED PRECEDING`	All records in the partition before the one at the upper bound; cannot be used as an upper bound
`<number> PRECEDING`	All records in the partition with an `ORDER BY` value between the current record's value and `<number>` less than the current record's value, inclusive
`CURRENT ROW`	The current record being processed by the window function, as well as all peer rows (rows with the same ordering value)
`<number> FOLLOWING`	All records in the partition with an `ORDER BY` value between the current record's value and `<number>` more than the current record's value, inclusive
`UNBOUNDED FOLLOWING`	All records in the partition after the one at the lower bound; cannot be used as a lower bound

ROWS

Row-based frame used in applying the window function to each record. The frame is based on the position of records, as determined by the ORDER BY clause. All records within the specified ordered distance from a given record will be used in calculating its window function value.

A row-based frame can be specified for any aggregate function or the FIRST_VALUE() or LAST_VALUE() ranking function.

For a frame between a specified lower bound and the current row:

ROWS <lower frame bound>

For a frame between a specified lower and upper bound:

ROWS BETWEEN <lower frame bound> AND <upper frame bound>

The frame bound can be one of the following:

Frame Bound	Description
`UNBOUNDED PRECEDING`	All records in the partition before the one at the upper bound; cannot be used as an upper bound
`<number> PRECEDING`	All records in the partition from the one `<number>` records before the current record through the record at the upper bound
`CURRENT ROW`	The current record being processed by the window function
`<number> FOLLOWING`	All records in the partition from the record at the lower bound through the one `<number>` records after the current record
`UNBOUNDED FOLLOWING`	All records in the partition after the one at the lower bound; cannot be used as a lower bound

<alias> Column alias to apply to the window function result column

Aggregate Functions

Function	Description
`AVG(expr)`	Calculates the average of the given expression `expr` over the specified window frame
`COUNT(expr)`	Calculates the count of the given expression `expr` over the specified window frame
`MAX(expr)`	Calculates the maximum value of the given expression `expr` over the specified window frame
`MEAN(expr)`	Alias for `AVG()`. Calculates the average of the given expression `expr` over the specified window frame
`MIN(expr)`	Calculates the minimum value of the given expression `expr` over the specified window frame
`PRODUCT(expr)`	Calculates the product of the given expression `expr` over the specified window frame
`RATIO_TO_REPORT(expr)`	Calculates the ratio of the value of `expr` to the sum of `expr` over the specified window frame. Note that `ORDER BY` is not supported for this function.
`STDDEV(expr)`	Alias for `STDDEV_POP()`. Calculates the population standard deviation of the given expression `expr` over the specified window frame
`STDDEV_POP(expr)`	Calculates the population standard deviation of the given expression `expr` over the specified window frame
`STDDEV_SAMP(expr)`	Calculates the sample standard deviation of the given expression `expr` over the specified window frame
`SUM(expr)`	Calculates the sum of the given expression `expr` over the specified window frame
`VAR(expr)`	Alias for `VAR_POP()`. Calculates the population variance of the given expression `expr` over the specified window frame
`VAR_POP(expr)`	Calculates the population variance of the given expression `expr` over the specified window frame
`VAR_SAMP(expr)`	Calculates the sample variance of the given expression `expr` over the specified window frame

Ranking Functions

Function Description

CUME_DIST()

The relative position of the current row within the cumulative distribution of the selected partition, expressed as a percentage from 0 (exclusive) to 1 (inclusive). The formula for this calculation is as follows:

(count of <= records within partition) / (partition row count)

This function is shorthand for using the COUNT() function in separate partition statements to arrive at the same result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
DOUBLE
(
    COUNT(*) OVER
    (
        PARTITION BY <partition_column>
        ORDER BY <sort_column>
    )
) /
(
    COUNT(*) OVER
    (
        PARTITION BY <partition_column>
        ORDER BY <sort_column>
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    )
)

DENSE_RANK() Number of the current row within the selected partition except rows with identical values evaluate to different ranks. Starts at 1

FIRST_VALUE(<column>) [<IGNORE | RESPECT> NULLS] The value found in the first row within a frame of the given expression. Optionally, add IGNORE NULLS or RESPECT NULLS to the function syntax to ignore or respect nulls, respectively.

LAG(<column>[, <num>]) [<IGNORE | RESPECT> NULLS] The value of the row before the given expression's value. Provide an additional comma-separated value to specify which row to select, e.g., LAG(vendor_id, 3) would list the value in the vendor_id column from three rows prior to the current row. Optionally, add IGNORE NULLS or RESPECT NULLS to the function syntax to ignore or respect nulls respectively.

LAST_VALUE(<column>) [<IGNORE | RESPECT> NULLS] The value found in the last row within a frame of the given expression. Optionally, add IGNORE NULLS or RESPECT NULLS to the function syntax to ignore or respect nulls respectively.

LEAD(<column>[, <num>]) [<IGNORE | RESPECT> NULLS] The value of the row after the given expression's value. Provide an additional comma-separated value to specify which row to select, e.g., LEAD(vendor_id, 3) would list the value in the vendor_id column from three rows after the current row. Optionally, add IGNORE NULLS or RESPECT NULLS to the function syntax to ignore or respect nulls respectively.

NTILE(<num of groups>) The group number of the row after partitioning the rows into num of groups groups. For example, NTILE(4) will partition data by quartiles and return the associated group number, 1 to 4.

PERCENT_RANK()

The rank of the current row within the selected partition, expressed as a percentage from 0 to 1, inclusive. The formula for this calculation is as follows:

(rank within partition - 1) / (partition row count - 1)

This function is shorthand for using the RANK() & COUNT() functions in separate partition statements to arrive at the same result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
DOUBLE
(
    RANK() OVER
    (
        PARTITION BY <partition_column>
        ORDER BY <sort_column>
    ) - 1
) /
(
    COUNT(*) OVER
    (
        PARTITION BY <partition_column>
        ORDER BY <sort_column>
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) - 1
)

RANK() Number of the current row within the selected partition. However, rows with identical values evaluate to the same rank. Starts at 1

ROW_NUMBER() Number of the current row within the selected partition. Starts at 1

Examples

To calculate the rolling sum of total amounts collected by each taxi vendor over the course of a given day, as well as the number of other trips that occurred within 5 minutes of each trip:

Window Rolling Sum Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SELECT
    vendor_id,
    pickup_datetime,
    total_amount,
    passenger_count,
    DECIMAL
    (
        SUM(total_amount) OVER
            (
                PARTITION BY vendor_id
                ORDER BY pickup_datetime
            )
    ) AS growing_sum,
    COUNT(*) OVER
        (
            PARTITION BY vendor_id
            ORDER BY LONG(pickup_datetime)
            RANGE BETWEEN 300000 PRECEDING AND 300000 FOLLOWING
        ) AS trip_demand
FROM demo.nyctaxi
WHERE pickup_datetime >= '2015-01-01' AND pickup_datetime < '2015-01-01 02:00:00'
ORDER BY
    vendor_id,
    pickup_datetime

To calculate a 5-before and 10-after moving average of 4-passenger trip distances per vendor over the course of a given day:

Window Moving Average Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
SELECT
    vendor_id,
    pickup_datetime,
    trip_distance,
    AVG(trip_distance) OVER
        (
            PARTITION BY vendor_id
            ORDER BY pickup_datetime
            ROWS BETWEEN 5 PRECEDING AND 10 FOLLOWING
        ) AS local_avg_dist
FROM demo.nyctaxi
WHERE
    passenger_count = 4 AND
    pickup_datetime >= '2015-01-01' AND pickup_datetime < '2015-01-02'
ORDER BY
    vendor_id,
    pickup_datetime

To rank, by vendor, the total amounts collected from 3-passenger trips on a given day:

Window Ranking Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT
    vendor_id,
    pickup_datetime,
    dropoff_datetime,
    total_amount AS fare,
    RANK() OVER (PARTITION BY vendor_id ORDER BY total_amount) AS ranked_fare,
    DECIMAL(PERCENT_RANK() OVER (PARTITION BY vendor_id ORDER BY total_amount)) * 100 AS percent_ranked_fare
FROM demo.nyctaxi
WHERE
    passenger_count = 3 AND
    pickup_datetime >= '2015-01-11' AND pickup_datetime < '2015-01-12'
ORDER BY
    vendor_id,
    pickup_datetime

To compare each trip's total amount to the lowest (ignoring nulls), highest (ignoring nulls), & average total amount for 5-passenger trips for each vendor over the course of a given day:

Window FIRST_VALUE Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SELECT
    vendor_id,
    pickup_datetime,
    tip_amount,
    tip_amount - 
        FIRST_VALUE(tip_amount) IGNORE NULLS OVER
            (PARTITION BY vendor_id ORDER BY tip_amount) AS lowest_amount,
    tip_amount - 
        DECIMAL
        (
            AVG(tip_amount) OVER
                (
                    PARTITION BY vendor_id
                    ORDER BY tip_amount
                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                )
        ) AS average_amount,
    tip_amount - 
        FIRST_VALUE(tip_amount) IGNORE NULLS OVER
            (PARTITION BY vendor_id ORDER BY tip_amount DESC) AS highest_amount
FROM demo.nyctaxi
WHERE
    passenger_count = 5 AND
    pickup_datetime >= '2015-04-17' AND pickup_datetime < '2015-04-18' AND
    tip_amount > 0 AND
    trip_distance > 0
ORDER BY
    vendor_id,
    pickup_datetime

To compare each vendor's average total amount to their average total amount within the interquartile range:

Window N-Tile Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
SELECT
    vendor_id,
    DECIMAL(AVG(total_amount)) AS average_total_amount,
    DECIMAL(AVG(IF(quartile IN (2,3), total_amount, null))) AS average_interq_range_total_amount
FROM
(
    SELECT
        vendor_id,
        total_amount,
        NTILE(4) OVER (PARTITION BY vendor_id ORDER BY total_amount) quartile
    FROM
        demo.nyctaxi
)
GROUP BY vendor_id
ORDER BY vendor_id

PIVOT

The PIVOT clause can be used to pivot columns, "rotating" column values into multiple columns (one for each value), creating wider and shorter denormalized tables from longer, more normalized tables.

PIVOT Syntax
1
2
3
4
5
6
<select statement>
PIVOT
(
    <aggregate expression [AS <alias>]>[,...]
    FOR <column> IN (<column list>)
)

For example, given a source table phone_number, which lists each phone number for a customer as a separate record in the table, a pivot operation can be performed, creating a single record per customer with the home, work, & cell phone numbers as separate columns.

With this data:

PIVOT Input
+--------+--------------+----------------+
| name   | phone_type   | phone_number   |
+--------+--------------+----------------+
| Jane   | Home         | 123-456-7890   |
| Jane   | Work         | 111-222-3333   |
| John   | Home         | 123-456-7890   |
| John   | Cell         | 333-222-1111   |
+--------+--------------+----------------+

The following pivot operation can be applied:

PIVOT Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT
    name,
    Home_Phone,
    Work_Phone,
    Cell_Phone
FROM
    example.phone_list
PIVOT
(
    MAX(phone_number) AS Phone
    FOR phone_type IN ('Home', 'Work', 'Cell')
)

The data will be pivoted into a table like this:

PIVOT Output
+--------+----------------+----------------+----------------+
| name   | Home_Phone     | Work_Phone     | Cell_Phone     |
+--------+----------------+----------------+----------------+
| Jane   | 123-456-7890   | 111-222-3333   |                |
| John   | 123-456-7890   |                | 333-222-1111   |
+--------+----------------+----------------+----------------+

UNPIVOT

The UNPIVOT clause can be used to unpivot columns, "rotating" row values into column values, creating longer, more normalized tables from shorter, more denormalized tables.

UNPIVOT Syntax
1
2
3
4
5
<select statement>
UNPIVOT
(
    <value_column> FOR <var_column> IN (<column list>)
)

For example, given a source table customer_contact, which lists the home, work, & cell phone numbers for each customer in the table, an unpivot operation can be performed, creating separate home, work, & cell phone records for each customer.

With this data:

UNPIVOT Input
+--------+----------------+----------------+----------------+
| name   | home_phone     | work_phone     | cell_phone     |
+--------+----------------+----------------+----------------+
| Jane   | 123-456-7890   | 111-222-3333   |                |
| John   | 123-456-7890   |                | 333-222-1111   |
+--------+----------------+----------------+----------------+

The following unpivot operation can be applied:

UNPIVOT Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT *
FROM
(
    SELECT
        name,
        Home_Phone AS Home,
        Work_Phone AS Work,
        Cell_Phone AS Cell
    FROM
        example.customer_contact
)
UNPIVOT (
    phone_number FOR phone_type IN (Home, Work, Cell)
)

The data will be unpivoted into a table like this:

UNPIVOT Output
+--------+----------------+--------------+
| name   | phone_number   | phone_type   |
+--------+----------------+--------------+
| Jane   | 123-456-7890   | Home         |
| John   | 123-456-7890   | Home         |
| Jane   | 111-222-3333   | Work         |
| John   | 333-222-1111   | Cell         |
+--------+----------------+--------------+

Note

If the original column names can be used as the values of the unpivot key, as is, the pre-selection and renaming of those columns using a subquery in the FROM clause can be eliminated.

For example, unpivoting without aliasing the quarterly grade columns will result in those exact column names being used as the quarter values:

UNPIVOT without Aliases Example
1
2
3
4
5
6
SELECT *
FROM example.student_grade
UNPIVOT
(
    grade FOR quarter IN (q1_grade, q2_grade, q3_grade, q4_grade)
)

UNPIVOT without Aliases Output
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
+--------------+-------------+------------+
|   student_id |       grade | quarter    |
+--------------+-------------+------------+
|            1 |   80.0      | q1_grade   |
|            2 |   82.0      | q1_grade   |
|            3 |   73.0      | q1_grade   |
|            1 |   90.0      | q2_grade   |
|            3 |   77.0      | q2_grade   |
|            1 |   85.0      | q3_grade   |
|            2 |   87.0      | q3_grade   |
|            3 |   97.0      | q3_grade   |
|            1 |   95.0      | q4_grade   |
|            2 |   92.0      | q4_grade   |
+--------------+-------------+------------+

Set Operations

There are three types of supported set operations, each having the option of returning duplicate records in the result set by using the keyword ALL:

UNION [ALL] - return all records from both source data sets
INTERSECT [ALL] - return only records that exist in both source data sets
EXCEPT [ALL] - return all records that exist in the first data set, but not in the second

UNION

The UNION set operator creates a single list of records from the results of two SELECT statements. Use the ALL keyword to keep all records from both sets; omit it to remove duplicate records and form a single list of records unique between the two sets. See Limitations and Cautions for limitations.

UNION Syntax
1
2
3
<select statement>
UNION [ALL]
<select statement>

For example, given a table of lunch menu items and another table of dinner menu items, a UNION can be used to return all unique lunch & dinner menu items together, including items that are the same on both menus, but of a different price:

UNION Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
UNION
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

Note

Since the example includes price and all columns selected must match between the two sets for an item to be considered a duplicate, a lunch item that is priced differently as a dinner item would also appear in the result set.

A UNION ALL can be used to return all lunch & dinner menu items together, including duplicates:

UNION ALL Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
UNION ALL
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

INTERSECT

The INTERSECT set operator creates a single list of records that exist in both of the result sets from two SELECT statements. Use the ALL keyword to keep duplicate records that exist in both sets; omit it to remove duplicate records and form a single list of records that exist in both sets. See Limitations for limitations.

INTERSECT Syntax
1
2
3
<select statement>
INTERSECT [ALL]
<select statement>

For example, given a table of lunch menu items and another table of dinner menu items, an INTERSECT can be used to return all lunch menu items (excluding duplicates) that are also dinner items for the same price:

INTERSECT Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
INTERSECT
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

Note

Since the example includes price and all columns selected must match between the two sets for an item to be included, a lunch item that is priced differently as a dinner item would not appear in the result set.

EXCEPT

The EXCEPT set operator performs set subtraction, creating a single list of records that exist in the first SELECT statement's result set, but not in the second SELECT statement's result set. Use the ALL keyword to keep duplicate records that exist in the first set but not in the second; omit it to remove duplicate records and form a single list of records that exist in the first set but not the second. See Limitations for limitations:

EXPECT Syntax
1
2
3
<select statement>
EXCEPT [ALL]
<select statement>

For example, given a table of lunch menu items and another table of dinner menu items, an EXCEPT can be used to return all lunch menu items (excluding duplicates) that are not also dinner items for the same price:

EXPECT Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
EXCEPT
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

Note

Since the example includes price and all columns selected must match between the two sets for an item to be eliminated, a lunch item that is priced differently as a dinner item would still appear in the result set.

WITH (Common Table Expressions)

The WITH operation, also known as a Common Table Expression (CTE) creates a set of data that can be assigned table & column aliases and used one or more times in subsequent operations. The aliased set can be used within the SELECT, FROM, or WHERE clauses of a subsequent query or a subsequent CTE within the same WITH operation.

WITH Syntax
1
2
3
WITH [RECURSIVE]
<cte definition>[,...]
<select statement>

Parameter Description

RECURSIVE Allow one or more of the <cte definition> queries to be self-referential, providing for the possibility of recursive queries.

<cte definition>

The subquery or one of the subqueries defined within this WITH clause, structured as follows:

<cte name> [(<column alias list>)] AS (<subquery>)

Parameter	Description
`<cte name>`	Table alias given to the data set returned by the CTE.
`<column alias list>`	List of aliases assigned to the columns the CTE returns.
`<subquery>`	`SELECT` statement to associate with the corresponding `<cte name>`.

Each column alias in <column alias list> is matched to each column returned by the corresponding <subquery> in the order that each were defined; e.g., for column aliases (A, B, C) used with a subquery SELECT x, y, z ..., alias A will reference column x, B will reference y, and C will reference z. If no aliases are used, the names of the source columns themselves can be used to reference the data set returned by the subquery.

<select statement> The SELECT statement to run with the defined WITH subqueries.

The column aliases must be unique within the WITH statement--no other column or column alias can be similarly named, for example. Also, when used in a FROM clause and given a table alias, the table alias must be preceded with AS.

A CTE can be made available to a DML or DDL statement by having the WITH statement follow the CREATE TABLE ... AS, INSERT, UPDATE, or DELETE statement (not precede it).

For example, to define a simple CTE subquery using a WITH clause:

WITH Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
WITH
    dept2_emp_sal_by_mgr (manager_id, sal) AS
    (
        SELECT manager_id, salary
        FROM example.employee
        WHERE dept_id = 2
    )
SELECT
    manager_id dept2_mgr_id,
    MAX(sal) dept2_highest_emp_sal_per_mgr,
    COUNT(*) as dept2_total_emp_per_mgr
FROM dept2_emp_sal_by_mgr
GROUP BY manager_id

To apply the CTE to an INSERT statement while using multiple CTE definitions, follow the INSERT clause with the WITH clause:

WITH Defining Multiple Subqueries in INSERT Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
INSERT INTO example.dept2_emp_mgr_roster (emp_first_name, emp_last_name, mgr_first_name, mgr_last_name)
WITH
    dept2_emp AS
    (
        SELECT first_name, last_name, manager_id
        FROM example.employee
        WHERE dept_id = 2
    ),
    dept2_mgr AS
    (
        SELECT first_name, last_name, id
        FROM example.employee
        WHERE dept_id = 2
    )
SELECT d2emp.first_name, d2emp.last_name, d2mgr.first_name, d2mgr.last_name
FROM
    dept2_emp as d2emp
    JOIN dept2_mgr as d2mgr ON d2emp.manager_id = d2mgr.id

Recursive Queries

Common table expressions are used to create recursive queries, via the RECURSIVE option, using the following form:

Recursive WITH Syntax
1
2
3
4
5
6
7
8
WITH RECURSIVE
<cte name> AS
(
    <non-recursive query>
    UNION [ALL]
    <recursive query>
    )
<select statement>

Use UNION to remove duplicate records from the result set; use UNION ALL to return all records from the recursion.

Important

There is no check on queries that recurse infinitely. Ensure that recursive queries are constructed with an appropriate limit.

For instance, to query all of a given manager's subordinates:

Recursive Query Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
WITH RECURSIVE subordinates AS
(
    SELECT 
        id, 
        manager_id, 
        first_name,
        last_name
    FROM 
        example.employee
    WHERE 
        id = 6
    UNION ALL
    SELECT 
        e.id, 
        e.manager_id, 
        e.first_name,
        e.last_name
    FROM 
        example.employee e 
        INNER JOIN subordinates s ON s.id = e.manager_id
) 
SELECT *
FROM subordinates
ORDER BY manager_id, id

To show the management chain for each employee in a company:

Recursive Query with Recursion Trail Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
WITH RECURSIVE subordinates AS
(
    SELECT 
        id, 
        manager_id, 
        first_name,
        last_name,
        JSON_ARRAY(first_name || ',' || last_name) AS hierarchy
    FROM 
        example.employee
    WHERE 
        id = 1
    UNION ALL
    SELECT 
        e.id, 
        e.manager_id, 
        e.first_name,
        e.last_name,
        JSON_ARRAY_APPEND(s.hierarchy, e.first_name || ',' || e.last_name) AS hierarchy
    FROM 
        example.employee e 
        INNER JOIN subordinates s ON s.id = e.manager_id
) 
SELECT *
FROM subordinates
ORDER BY manager_id, id

Iteration

Kinetica supports iteration over each record within a data set for the purpose of creating a result set with 0 to N result records per record in the original set.

This iteration can be variable, based on some value within each record, or fixed, based on a given constant value.

The iteration is performed by joining against the virtual ITER table, as follows:

Iteration Syntax
1
2
3
SELECT *
FROM table, ITER
WHERE ITER.i < <column expression>

The <column expression> can be replaced by a constant for fixed iteration.

For example, to extract all of the individual letters from a column of words, with one record per letter extracted (using variable iteration):

Iteration Over Column Values Example
1
2
3
4
SELECT id, word, i, SUBSTR(word, i + 1, 1) AS letter
FROM example.dictionary
JOIN ITER ON i < LENGTH(word)
ORDER BY id, i;

To duplicate the set of words five times (using fixed iteration):

Iteration Over Table Example
1
2
3
4
SELECT *
FROM example.dictionary, ITER
WHERE i < 5
ORDER BY id, i;

For more detail, examples, and limitations, see Iteration.

Constants

Each data type has an associated literal constant syntax, which can be used, for instance, to insert constant data values into those columns.

Numeric Constants

Integer and floating point data types can be either single-quoted or not.

Numeric Constants Example
1
2
3
4
5
6
7
8
INSERT INTO example.numeric_types (int_type1, int_type2, float_type1, float_type2)
VALUES
(
    1,
    '2',
    3.4,
    '5.6'
)

String-Based Constants

String-based data types should be single-quoted.

String-Based Constants Example
1
2
3
4
5
6
7
8
INSERT INTO example.string_types (varchar_type, charn_type, ipv4_type, wkt_type)
VALUES
(
    'varchar value',
    'charN value',
    '12.34.56.78',
    'POINT(0 0)'
)

Binary Constants

Binary types can be represented in either of the following forms:

single-quoted or unquoted base-10
single-quoted hexadecimal

Binary Constants Example
1
2
3
4
5
INSERT INTO example.byte_types (bytes_type)
VALUES
    (12345678901234567890),
    ('12345678901234567890'),
    ('0x00AB54A98CEB1F0AD2')

Date/Time Constants

Kinetica accepts unqualified single-quoted date/time values, ANSI SQL, and ODBC escape sequences in the following formats:

Data Type	Native	ANSI	ODBC
Date	`'YYYY-MM-DD'`	`DATE 'YYYY-MM-DD'`	`{d 'YYYY-MM-DD'}`
Time	`'HH:MI:SS.mmm'`	`TIME 'HH:MI:SS.mmm'`	`{t 'HH:MI:SS.mmm'}`
DateTime	`'YYYY-MM-DD[T\| ]HH:MI:SS.mmm[Z]'`	`TIMESTAMP 'YYYY-MM-DD HH:MI:SS.mmm'`	`{ts 'YYYY-MM-DD HH:MI:SS.mmm'}`
Timestamp	`'YYYY-MM-DD[T\| ]HH:MI:SS.mmm[Z]'`	`TIMESTAMP 'YYYY-MM-DD HH:MI:SS.mmm'`	`{ts 'YYYY-MM-DD HH:MI:SS.mmm'}`

Native Date/Time Constants Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
INSERT INTO example.date_time_types (date_type, time_type, datetime_type, timestamp_type)
VALUES
(
    '2000-01-02',
    '12:34:56.789',
    '2000-01-02 12:34:56.789',
    '2000-01-02 12:34:56.789'
),
(
    '2000-1-02',
    '1:23:45.678',
    '2000-01-02T12:34:56.789Z',
    '2000-1-02 1:23:45.678'
)

ANSI Date/Time Constants Example
1
2
3
4
5
6
7
8
INSERT INTO example.date_time_types (date_type, time_type, datetime_type, timestamp_type)
VALUES
(
    DATE '2000-01-02',
    TIME '12:34:56.789',
    TIMESTAMP '2000-01-02 12:34:56.789',
    TIMESTAMP '2000-01-02 12:34:56.789'
)

ODBC Date/Time Constants Example
1
2
3
4
5
6
7
8
INSERT INTO example.date_time_types (date_type, time_type, datetime_type, timestamp_type)
VALUES
(
    {d '2000-01-02'},
    {t '12:34:56.789'},
    {ts '2000-01-02 12:34:56.789'},
    {ts '2000-01-02 12:34:56.789'}
)

Expressions

An expression can consist of a literal constant, a column name, or a function applied to a constant or column name. A compound expression is an operation or function applied to one or more expressions.

The following are the supported expression operators:

+ addition
- subtraction
* multiplication
/ division
() grouping
|| string concatenation

Conditional Functions

Conditional functions are subject to short-circuiting to aid in error-checking.

Function Description

DECODE(expr, match_a, value_a, ..., match_N, value_N[, unmatched_value]) Evaluates expr: returns the first value whose corresponding match is equal to expr; returns the optional unmatched_value (or null), if no match is found

IF(expr, value_if_true, value_if_false)

Evaluates expr: if true, returns value_if_true; otherwise, if false or null, returns value_if_false

Parameter	Description
`expr`	any true/false condition Note When an integer column is used directly, this function will interpret non-zero values as true and zero values as false.
`value_if_true`	any type; must be the same type as `value_if_false`
`value_if_false`	any type; must be the same type as `value_if_true`

CASE

The case statement acts as a scalar function, but has two more complex forms. Note that for each of these CASE statements, the value expressions must all be of the same or convertible data type.

In the first form, each WHEN is followed by a conditional expression whose corresponding THEN expression will have its value returned, if true. Control will continue through each WHEN until a match is found and the corresponding value returned; if no match is found, the value of the ELSE expression will be returned, or null, if no ELSE clause exists.

CASE without Expression Syntax
1
2
3
4
5
6
CASE
    WHEN <cond_expr_a> THEN <value_expr_a>
    ...
    WHEN <cond_expr_n> THEN <value_expr_n>
    ELSE <value_expr>
END

In the second form, the CASE expression is evaluated. A match of that result will be attempted against each WHEN expression until a match is found and the value of the corresponding THEN expression returned; if no match is found, the value of the ELSE expression will be returned, or null, if no ELSE clause exists.

CASE with Expression Syntax
1
2
3
4
5
6
CASE <expr>
    WHEN <match_expr_a> THEN <value_expr_a>
    ...
    WHEN <match_expr_n> THEN <value_expr_n>
    ELSE <value_expr>
END

Note

This second version below has greater optimization than the first.

CASE without Expression Example
1
2
3
4
5
CASE
    WHEN color = 1 THEN 'Red'
    WHEN color >= 2 THEN 'Green'
    ELSE 'Blue'
END

CASE with Expression Example
1
2
3
4
5
CASE MOD(LENGTH(text), 2)
    WHEN 0 THEN 'Even'
    WHEN 1 THEN 'Odd'
    ELSE null
END

Conversion Functions

Function Description

CAST(expr AS [SQL_]<conv_type>)

CONVERT(expr, [SQL_]<conv_type>)

Converts expr into conv_type data type

Conversion Types:

Numeric	String	Date/Time
`BIGINT`	`CHAR(N)`	`DATE`
`DECIMAL(P,S)`	`VARCHAR(N)`	`DATETIME`
`DOUBLE`		`TIME`
`FLOAT`		`TIMESTAMP`
`INTEGER`		`TYPE_DATE`
`NUMERIC(P,S)`		`TYPE_TIME`
`REAL`		`TYPE_TIMESTAMP`
`SMALLINT`
`TINYINT`
`UNSIGNED` `BIGINT`

Note

When using the SQL_ prefix, UNSIGNED BIGINT becomes SQL_UNSIGNED_BIGINT

CHAR(expr) Returns the character associated with the ASCII code given in expr

CHAR1(expr) Converts the given expr to VARCHAR(1) type

CHAR2(expr) Converts the given expr to VARCHAR(2) type

CHAR4(expr) Converts the given expr to VARCHAR(4) type

CHAR8(expr) Converts the given expr to VARCHAR(8) type

CHAR16(expr) Converts the given expr to VARCHAR(16) type

CHAR32(expr) Converts the given expr to VARCHAR(32) type

CHAR64(expr) Converts the given expr to VARCHAR(64) type

CHAR128(expr) Converts the given expr to VARCHAR(128) type

CHAR256(expr) Converts the given expr to VARCHAR(256) type

DATE(expr) Converts expr to date (YYYY-MM-DD) format

DATETIME(expr) Converts expr to datetime (YYYY-MM-DD HH24:MI:SS.mmm) format

DECIMAL(expr) Converts the given expr to DECIMAL type

DOUBLE(expr) Converts the given expr to DOUBLE type

FLOAT(expr) Converts the given expr to REAL type

INT(expr) Converts the given expr to INTEGER type

LONG(expr) Converts the given expr to BIGINT type; if expr is a TIMESTAMP or DATETIME type, the value will be the number of milliseconds since the epoch

STRING(expr) Converts expr to a string format appropriate for the expr type

TIME(expr) Converts expr to time (HH24:MI:SS) format

TIMESTAMP(expr) Converts expr to TIMESTAMP type

TO_CHAR(expr, format)

Converts the given date/time expr to a string matching the given format. The returned string will be truncated at 32 characters. See Date/Time Conversion Codes for the list of format codes.

Example:

Function Call	`TO_CHAR(DATETIME(971181296000), '"Last login: "YYYY-MM-DD')`
Return	`Last login: 2000-10-10`

TO_DATE(string, format)

Converts the given string to a date type with the given format code. See Date/Time Conversion Codes for the list of format codes.

Example:

TO_DATE('2017-06-15', 'YYYY-MM-DD')

TO_DATETIME(string, format)

Converts the given string to a datetime type with the given format code. See Date/Time Conversion Codes for the list of format codes.

Example:

TO_DATETIME('2017-06-15 10:37:30', 'YYYY-MM-DD HH:MI:SS')

TO_TIME(string, format)

Converts the given string to a time type with the given format code. See Date/Time Conversion Codes for the list of format codes.

Example:

TO_TIME('10:37:30', 'HH:MI:SS')

TO_TIMESTAMP(string, format)

Converts the given string to a timestamp type with the given format code. See Date/Time Conversion Codes for the list of format codes.

Example:

TO_TIMESTAMP('2017-06-15 10:37:30', 'YYYY-MM-DD HH:MI:SS')

ULONG(expr) Converts the given expr to UNSIGNED BIGINT type

VECTOR(expr, len)

Converts the given stringified array of len number of float values in expr to VECTOR type.

Example:

VECTOR('[1.1, 2.2, 3.3]', 3)

Date/Time Conversion Codes

The following formatting codes can be used to convert date/time strings to native date/time objects (with TO_DATE, TO_DATETIME, TO_TIME and TO_TIMESTAMP) and for formatting date/time objects as strings (with TO_CHAR).

These characters are interpreted literally: - / , . ; :

Other characters need to be quoted in order to be interpreted literally. For example, MM-DD-YYYY can be used to convert 01-02-2022 to a date, while "Today is "MM-DD-YYYY is needed to convert Today is 01-02-2022.

In the following table, Opt in the Input column indicates that the field may be empty when being read, and still be considered a valid input. This can be useful when reading data from a source that, for example, sometimes includes milliseconds in their time strings and sometimes does not (e.g., the MMM code is Opt). Yes in the Input column indicates that some data must be present in the input string to be considered valid, unless it is after a $ character, in which case all input after the $ is optional.

Format Code	Input	Output	Description
`_`	Yes		Any one character
`?`	Opt		Zero or more non-digit characters
`$`	Yes		Everything after this is optional; e.g., `HH:MI$:SS` would accept `12:34` and `12:34:56`
`AD`		Yes	Era indicator without periods
`A.D.`		Yes	Era indicator with periods
`AM`	Yes	Yes	Meridian indicator without periods [`AM`, `PM`]
`A.M.`	Yes	Yes	Meridian indicator with periods [`A.M.`, `P.M.`]
`BC`		Yes	Era indicator without periods
`B.C.`		Yes	Era indicator with periods
`C`	Yes	Yes	Number of day in calendar month, with optional leading zero [`1` - `31`]
`CC`		Yes	Century (if last 2 digits of the 4-digit year are `00`, this is the first 2 digits; otherwise, this is first 2 digits + 1)
`D`	Yes	Yes	Day of week [`1` - `7`] (Sunday - Saturday)
`DAY`	Yes	Yes	Day of week [`Sunday` - `Saturday`]
`DD`	Yes	Yes	Number of day in month, with leading zero [`01` - `31`]
`DDD`	Yes	Yes	Number of day of year [`1` - `366`]
`DL`	Yes	Yes	Date long format (`DAY, MONTH C, RRRR`)
`DS`	Yes	Yes	Date short format (`M/C/RRRR`)
`DY`	Yes	Yes	Abbreviated day name (e.g., "Mon")
`FF[1-9]`	Opt	Yes	Fractional seconds where the number of fractional second digits can be specified (default is 3 digits); when used for input, the string may contain an optional "." preceding the numbers
`H`	Yes	Yes	Alias for `H12`
`H12`	Yes	Yes	Hour of day in 12-hour format, with optional leading zero [`0` - `11`]
`H24`	Yes	Yes	Hour of day in 24-hour format, with optional leading zero [`0` - `23`]
`HH`	Yes	Yes	Alias for `HH12`
`HH12`	Yes	Yes	Hour of day in 12-hour format, with leading zero [`00` - `11`]
`HH24`	Yes	Yes	Hour of day in 24-hour format, with leading zero [`00` - `23`]
`I`	Yes	Yes	Last digit of ISO year
`ID`	Yes	Yes	ISO 8601 day of week [`1` - `7`] (Monday - Sunday)
`IDDD`	Yes	Yes	ISO 8601 day of year [`001` - `371`] (where `001` is the Monday of the 1st ISO week)
`IW`	Yes	Yes	ISO 8601 week of year [`01` - `53`] (where `01` contains the first Thursday of the year)
`IY`	Yes	Yes	Last 2 digits of ISO year
`IYY`	Yes	Yes	Last 3 digits of ISO year
`IYYY`	Yes	Yes	Last 4 digits of ISO year
`J`	Yes	Yes	Julian day; the number of days since January 1, 4712 BC. Number specified with `J` must be integers.
`LLL`	Opt	Yes	Whole milliseconds (often positioned after a colon); e.g., `12:34:56:78` => 78 milliseconds; when used for input, the string may contain an optional "." or ":" preceding the numbers; when "." is found, this code will act like the `MMM` code
`M`	Yes	Yes	Number of month in year, with optional leading zero [`1` - `12`]
`MI`	Yes	Yes	Minute of hour [`00` - `59`]
`MM`	Yes	Yes	Number of month in year, with leading zero [`01` - `12`]
`MMM`	Opt	Yes	Fractions of a second up to milliseconds; e.g., `12:34:56.78` => 780 milliseconds; when used for input, the string may contain an optional "." or ":" preceding the numbers; when ":" is found, this code will act like the `LLL` code
`MON`	Yes	Yes	3-character abbreviation of month in title case [`Jan` - `Dec`]
`MONTH`	Yes	Yes	Full name of month [`January` - `December`]
`MS`	Opt	Yes	Alias for `MMM`
`OF`	Opt	Yes	Time-zone offset from UTC (`TZHTZM`)
`PM`	Yes	Yes	Alias for `AM`
`P.M.`	Yes	Yes	Alias for `A.M.`
`Q`		Yes	Quarter of the year [`1` - `4`] (January-March - October-December)
`RM`	Yes	Yes	Roman numeral month [`I` - `XII`] (January - December)
`RR`	Yes	Yes	Rounded year. Assume the given 2-digit year occurs in the present century, then: If the assumed date is greater than 50 years into the future, declare the date to be in the previous century If the assumed date is greater than 50 years into the past, declare the date to be in the next century
`RRRR`	Yes	Yes	Either a 2-digit or 4-digit year; if 2-digit, equivalent to `RR`
`SCC`		Yes	Same as `CC`
`SS`	Yes	Yes	Second of minute [`00` - `59`]
`SSSS`	Yes	Yes	Alias for `SSSSS`
`SSSSS`	Yes	Yes	Seconds past midnight [`0` - `86399`]
`SYEAR`		Yes	Same as `YEAR`
`SYYYY`	Yes	Yes	Same as `YYYY`
`TS`	Yes	Yes	Time short format (`H24:MI:SS.MMM`)
`TZ`	Opt	Yes	Time zone (`TZH`)
`TZH`	Opt	Yes	Time zone hour (ISO); "Z" or +/- followed by one or two digits (will always output "Z")
`TZM`	Opt	Yes	Time zone minute (ISO); colon followed by one or two digits; for input, an optional colon is allowed at the front; if present, two digits must follow it
`US`	Opt	Yes	Microseconds; when used for input, the string may contain an optional "." preceding the numbers
`W`	Yes	Yes	Week of month [`1` - `5`], where week 1 starts on the first day of the month and ends on the seventh
`WW`	Yes	Yes	Week of year [`1` - `53`], where week 1 starts on the first day of the year and continues to the seventh day of the year
`X`	Yes	Yes	Local radix character (always `.`. E.g., `HH:MI:SSXFF`)
`Y`	Yes	Yes	Last 1 digit of year
`Y,YYY`	Yes	Yes	Year with a comma after the thousands place
`YEAR`		Yes	Year spelled out (e.g., "Two Thousand Twenty Five")
`YY`	Yes	Yes	Last 2 digits of year
`YYY`	Yes	Yes	Last 3 digits of year
`YYYY`	Yes	Yes	4-digit year

Date/Time Functions

This section comprises the following functions:

Date/Time Base Functions, which can extract parts of date/time expressions, convert back and forth between data types, and return the current date/time
Date/Time Complex Conversion Functions, which can perform more complex date/type conversions

Date/Time Base Functions

Function Description

CLOCK_TIMESTAMP()

Returns the date & time as YYYY-MM-DD HH24:MI:SS.mmm

Note

CLOCK_TIMESTAMP may return different values each time it is called in the same query or SQL Procedure, which may lead to data getting out of sync across HA clusters. Use CURRENT_DATETIME to avoid this issue.

CURRENT_DATE() Returns the date as YYYY-MM-DD

CURRENT_DATETIME()

Returns the date & time as YYYY-MM-DD HH24:MI:SS.mmm

Note

CURRENT_DATETIME will return same values each time it is called in the same query or SQL Procedure, and should keep data in-sync across HA clusters. See CLOCK_TIMESTAMP to always get the actual time each time it is called.

CURRENT_TIME() Returns the time as HH24:MI:SS.mmm

CURRENT_TIMESTAMP() Returns the date & time as YYYY-MM-DD HH24:MI:SS.mmm; to return the date & time as the number of milliseconds since the epoch, pass the result of this function to LONG()

DATEADD(unit, amount, expr)

Adds the positive or negative integral amount of unit date/time intervals to the date/time in expr

The following date/time intervals are supported for unit:

Constant	Description
`YEAR`	Year is modified by interval amount (not affected by leap year, etc.)
`QUARTER`	Month is modified by three times the interval amount, irrespective of the number of days in the months between; day adjusting performed the same as the `MONTH` description, but only on final month (e.g., Jan 31st + 1 quarter will be Apr 30th, not Apr 28th because of February)
`MONTH`	Month is modified by interval amount and date adjusted if overflow/underflow occurs; day adjusted to last day of calculated month if not a valid day for that month (e.g. Apr 31st -> Apr 30th)
`WEEK`	Day is modified by 7 times the interval amount (time not affected by daylight savings time, etc.); month & year are adjusted, if overflow/underflow occurs
`DAY`	Day is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`HOUR`	Hour is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`MINUTE`	Minute is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`SECOND`	Second is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`MILLISECOND`	Millisecond is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`FRAC_SECOND`	Nanosecond is modified by interval amount; date/time are adjusted, if overflow/underflow occurs Note Time is processed to millisecond precision, so any portion of an `amount` with finer granularity than 1,000,000 nanoseconds will be ignored (e.g., requesting the addition of 1,234,567 nanoseconds will result in 1 millisecond actually being added)

Note

Any of these unit types can have a SQL_TSI_ prefix prepended to them; e.g., both DAY and SQL_TSI_DAY are valid unit types for specifying a day interval.

Examples:

Function Call	Result
`DATEADD(YEAR, 1, '2000-10-10')`	`2001-10-10`
`DATEADD(QUARTER, 1, '2000-11-30')`	`2001-02-28`
`DATEADD(MONTH, 1, '2000-01-31')`	`2000-02-29`
`DATEADD(WEEK, 53, '2000-01-01')`	`2001-01-06`
`DATEADD(DAY, 1, '2000-12-31')`	`2001-01-01`
`DATEADD(HOUR, 12, '2000-10-10 12:34:56')`	`2000-10-11 00:34:56.000`
`DATEADD(MINUTE, 1, '2000-10-10 12:34:56')`	`2000-10-10 12:35:56.000`
`DATEADD(SECOND, 1, '2000-12-31 23:59:59')`	`2001-01-01 00:00:00.000`
`DATEADD(MILLISECOND, 1, '2000-10-10 12:34:56')`	`2000-10-10 12:34:56.001`
`DATEADD(FRAC_SECOND, 1000000, '2000-10-10 12:34:56')`	`2000-10-10 12:34:56.001`
`DATEADD(SECOND, 1, TIME '12:34:56')`	`12:34:57.000`

DATE_BUCKET

(width, ds[, offset[, base]])

Calculates the date range in which a given date ds falls, based on a set of fixed-width "buckets" with the given width, start-aligned base date, and offset from that base date.

The width is the number of days each bucket should span.

The offset is the number of days after (positive offset) or number of days before (negative offset) the base date to which the buckets should be aligned. The default is no offset.

The default base is 2000-01-03.

Typically, DATE_BUCKET is used in the following type of query:

SELECT
    DATE_BUCKET(7, ds, -3, '2023-02-21')
        + INTERVAL 3 DAYS AS week_midpoint,
    AVG(cpu) AS avg_cpu
FROM example.host_metrics_summary
GROUP BY week_midpoint
ORDER BY week_midpoint

The result will be as follows:

Dates in the ds column of the example.host_metrics_summary table will be grouped into buckets
Each bucket will span a range of 7 days
The baseline bucket will start at 2023-02-18 (2023-02-21 offset by -3 days) and continue through 2023-02-24 (7 days, including 2023-02-18)
Buckets will extend before & after the baseline bucket in contiguous, non-overlapping fashion
Each result record will show the date in the middle of the bucket's date range (+ INTERVAL 3 DAYS from the start of each 7 day span) and the average CPU usage across the records contained within that date range
Gaps in the data will not be filled in with empty buckets--only buckets containing the dates found in the ds column of example.host_metrics_summary will be returned in the result set

DATEDIFF([unit,] begin, end)

Calculates the difference between two date/time expressions, returning the result as an integral difference in the units specified; more precisely, how many whole date/time intervals of type unit need to be added to (or subtracted from) begin to equal end (or get as close as possible without going past it) using the unit types and and rules specified in DATEADD.

The default unit is DAY.

Note

This is not symmetric with DATEADD in all cases, as adding 1 MONTH to Mar 31st results in Apr 30th, but the DATEDIFF in MONTH units between those two dates is 0.

Examples:

Function Call	Result
`DATEDIFF(DATE('2000-10-10'), DATE('2000-12-31'))`	`82`
`DATEDIFF(DATE('2000-03-31'), DATE('2000-04-30'))`	`30`
`DATEDIFF(DATE('2000-12-31'), DATE('2000-10-10'))`	`-82`
`DATEDIFF(DATETIME('2000-10-10 12:34:56'), 978222896000)`	`81`
`DATEDIFF(MONTH, DATE('2000-10-10'), DATE('2000-12-31'))`	`2`
`DATEDIFF(MONTH, DATE('2000-03-31'), DATE('2000-04-30'))`	`0`
`DATEDIFF(MONTH, DATE('2000-12-31'), DATE('2000-10-10'))`	`-2`
`DATEDIFF(HOUR, DATETIME('2000-10-10 12:34:56'), 978222896000)`	`1956`

DATEPART(part, expr)

Returns the requested part of the date/time expr. The part may be any of those specified in DATE_TRUNC(part, expr), plus the following:

Constant	Description
`SECS_SINCE_EPOCH`	Return the number of seconds since the epoch (1970-01-01).
`MSECS_SINCE_EPOCH`	Return the number of milliseconds since the epoch.

Examples:

Function Call	Result
`DATEPART(YEAR, '2000-12-31')`	`2000`
`DATEPART(MONTH, '2000-03-31 12:34:56.000')`	`3`
`DATEPART(DAY, '2000-03-31 12:34:56.000')`	`31`
`DATEPART(SECOND, '2000-03-31 12:34:56.220')`	`56`
`DATEPART(MSECS_SINCE_EPOCH, '2000-03-31 12:34:56.220')`	`954506096220`

DATE_TRUNC(part, expr)

Returns the date/time expr after truncating it beyond the given date/time part. The following date/time constants are supported for part:

Constant	Description
`YEAR`	Return the first day of the year in which `expr` occurs
`QUARTER`	Return the first day of the quarter in which `expr` occurs
`MONTH`	Return the first day of the month in which `expr` occurs
`WEEK`	Return the first day of the week in which `expr` occurs
`DAY`	Return the date (at midnight) on which `expr` occurs
`HOUR`	Return the timestamp up to the hour in which `expr` occurs
`MINUTE`	Return the timestamp up to the minute in which `expr` occurs
`SECOND`	Return the timestamp up to the second in which `expr` occurs
`MILLISECOND`	Return the timestamp up to the millisecond in which `expr` occurs

Examples:

Function Call	Result
`DATE_TRUNC(YEAR, '2008-09-10 12:34:56.789')`	`2008-01-01 00:00:00.000`
`DATE_TRUNC(QUARTER, '2008-09-10 12:34:56.789')`	`2008-07-01 00:00:00.000`
`DATE_TRUNC(MONTH, '2008-09-10 12:34:56.789')`	`2008-09-01 00:00:00.000`
`DATE_TRUNC(WEEK, '2008-09-10 12:34:56.789')`	`2008-09-07 00:00:00.000`
`DATE_TRUNC(DAY, '2008-09-10 12:34:56.789')`	`2008-09-10 00:00:00.000`
`DATE_TRUNC(HOUR, '2008-09-10 12:34:56.789')`	`2008-09-10 12:00:00.000`
`DATE_TRUNC(MINUTE, '2008-09-10 12:34:56.789')`	`2008-09-10 12:34:00.000`
`DATE_TRUNC(SECOND, '2008-09-10 12:34:56.789')`	`2008-09-10 12:34:56.000`
`DATE_TRUNC(MILLISECOND, '2008-09-10 12:34:56.789')`	`2008-09-10 12:34:56.789`

DAY(expr) Alias for DAYOFMONTH(expr)

DAYNAME(expr) Extracts the day of the week from expr and converts it to the corresponding day name [Sunday - Saturday ]

DAYOFMONTH(expr) Extracts the day of the month from expr [1 - 31]

DAYOFWEEK(expr)

Extracts the day of the week from expr [1 - 7]

Expression Value	Result
Date on Sunday	`1`
Date on Monday	`2`
Date on Tuesday	`3`
Date on Wednesday	`4`
Date on Thursday	`5`
Date on Friday	`6`
Date on Saturday	`7`

DAY_OF_WEEK(expr) Alias for DAYOFWEEK(expr)

DAYOFYEAR(expr) Extracts the day of the year from expr [1 - 366]

DAY_OF_YEAR(expr) Alias for DAYOFYEAR(expr)

EPOCH_MSECS_TO_DATETIME(expr)

Converts expr milliseconds since the epoch to a date/time

Example:

Function Call	`EPOCH_MSECS_TO_DATETIME(971181296789)`
Return	`2000-10-10 12:34:56.789`

EPOCH_SECS_TO_DATETIME(expr)

Converts expr seconds since the epoch to a date/time

Example:

Function Call	`EPOCH_SECS_TO_DATETIME(971181296)`
Return	`2000-10-10 12:34:56.000`

EXTRACT(<part> FROM <expr>)

Extracts the date/time part from the date/time expr. This function is used to support database clients which require the call in this form; however, each of the supported date/time part constants results in a call for which there is a simpler, more direct corresponding call, which are listed below.

Constant	Corresponding Alternative Function Call
`YEAR`	`YEAR(expr)`
`QUARTER`	`QUARTER(expr)`
`MONTH`	`MONTH(expr)`
`WEEK`	`WEEK(expr)`
`DAY`	`DAY(expr)`
`HOUR`	`HOUR(expr)`
`MINUTE`	`MINUTE(expr)`
`SECOND`	`SECOND(expr)`
`MILLISECOND`	`MSEC(expr)`
`EPOCH`	`SECS_SINCE_EPOCH(expr)`

HOUR(expr) Extracts the hour of the day from expr [0 - 23]

<expr> + INTERVAL '<amount>' <part>

<expr> - INTERVAL '<amount>' <part>

Adds to or subtracts from the date/time expr the integral amount units of type part. This mirrors the behavior of the TIMESTAMPADD function, only with a different format and different date/time part constants. The following date/time constants are supported for part:

Constant	Description
`YEAR`	Year is modified by interval amount (not affected by leap year, etc.)
`QUARTER`	Month is modified by three times the interval amount, irrespective of the number of days in the months between; day adjusting performed the same as the `MONTH` description, but only on final month (e.g., Jan 31st + 1 quarter will be Apr 30th, not Apr 28th because of February)
`MONTH`	Month is modified by interval amount and date adjusted if overflow/underflow occurs; day adjusted to last day of calculated month if not a valid day for that month (e.g. Apr 31st -> Apr 30th)
`WEEK`	Day is modified by 7 times the interval amount (time not affected by daylight savings time, etc.); month & year are adjusted, if overflow/underflow occurs
`DAY`	Day is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`HOUR`	Hour is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`MINUTE`	Minute is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`SECOND`	Second is modified by interval amount; date/time are adjusted, if overflow/underflow occurs

LAST_DAY(date) Returns the date of the last day of the month in the given date

MINUTE(expr) Extracts the minute of the day from expr [0 - 59]

MONTH(expr) Extracts the month of the year from expr [1 - 12]

MONTHNAME(expr) Extracts the month of the year from expr and converts it to the corresponding month name [January - December]

MSEC(expr) Extracts the millisecond of the second from expr [0 - 999]

MSECS_SINCE_EPOCH(expr)

Converts expr date/time to the number of milliseconds since the epoch

Example:

Function Call	`MSECS_SINCE_EPOCH('2000-10-10 12:34:56.789')`
Return	`971181296789`

NEXT_DAY(date, day_of_week)

Returns the date of the next day of the week, provided as a day name in day_of_week, that occurs after the given date

Some examples, given that 2000-10-10 is a Tuesday:

Function Call	Result
`NEXT_DAY('2000-10-10', 'Wednesday')`	`2000-10-11`
`NEXT_DAY('2000-10-10', 'Friday')`	`2000-10-13`
`NEXT_DAY('2000-10-10', 'Tuesday')`	`2000-10-17`

NOW() Alias for CURRENT_DATETIME()

QUARTER(expr)

Extracts the quarter of the year from expr [1 - 4]

Expression Value	Result
Date in January, February, or March	`1`
Date in April, May, or June	`2`
Date in July, August, or September	`3`
Date in October, November, or December	`4`

SECOND(expr) Extracts the seconds of the minute from expr [0 - 59]

SEC(expr) Alias for SECOND(expr)

SECS_SINCE_EPOCH(expr)

Converts expr date/time to the number of seconds since the epoch

Example:

Function Call	`SECS_SINCE_EPOCH('2000-10-10 12:34:56')`
Return	`971181296`

SLEEP(expr)

Pause execution for at least expr seconds; though system load may delay the return from this call for longer than the specified amount. Use a decimal for expr to pause for less than a second; e.g., SLEEP(0.001) will pause for at least 1 millisecond. SLEEP should be invoked in a Tableless Query to avoid being called for every record in a result set; e.g.:

SELECT SLEEP(0.001)

TIME_BUCKET

(width, ts[, offset[, base]])

Calculates the date/time range in which a given timestamp ts falls, based on a set of fixed-width "buckets" with the given width, start-aligned base date/time, and offset from that base date/time

The width is the number of milliseconds each bucket should span. An INTERVAL can also be used to specify the width.

The offset is the number of milliseconds after (positive offset) or number of milliseconds before (negative offset) the base date/time to which the buckets should be aligned. An INTERVAL can also be used to specify the offset. The default is no offset.

The default base is 2000-01-03 00:00:00.

Typically, TIME_BUCKET is used in the following type of query:

SELECT
    TIME_BUCKET(
        INTERVAL 5 MINUTES,
        ts,
        INTERVAL -2.5 MINUTES,
        '2023-02-28'
    ) + INTERVAL 2.5 MINUTES AS five_minute_midpoint,
    AVG(cpu) AS avg_cpu
FROM example.host_metrics
GROUP BY five_minute_midpoint
ORDER BY five_minute_midpoint

The result will be as follows:

Timestamps in the ts column of the example.host_metrics table will be grouped into buckets
Each bucket will span a 5 minute interval
The baseline bucket will start at 2023-02-27 23:57:30 (2023-02-28 offset by -2.5 minutes) and continue through 2023-02-28 00:02:30 (5 minutes from 2023-02-27 23:57:30)
Buckets will extend before & after the baseline bucket in contiguous, non-overlapping fashion
Each result record will show the timestamp in the middle of the bucket's range (+ INTERVAL 2.5 MINUTES from the start of each 5 minute span) and the average CPU usage across the records contained within that date/time range
Gaps in the data will not be filled in with empty buckets--only buckets containing the timestamps found in the ts column of example.host_metrics will be returned in the result set

TIMESTAMPADD(unit, amount, expr)

Adds the positive or negative integral amount of unit date/time intervals to the date/time in expr

The following date/time intervals are supported for unit:

Constant	Description
`YEAR`	Year is modified by interval amount (not affected by leap year, etc.)
`QUARTER`	Month is modified by three times the interval amount, irrespective of the number of days in the months between; day adjusting performed the same as the `MONTH` description, but only on final month (e.g., Jan 31st + 1 quarter will be Apr 30th, not Apr 28th because of February)
`MONTH`	Month is modified by interval amount and date adjusted if overflow/underflow occurs; day adjusted to last day of calculated month if not a valid day for that month (e.g. Apr 31st -> Apr 30th)
`WEEK`	Day is modified by 7 times the interval amount (time not affected by daylight savings time, etc.); month & year are adjusted, if overflow/underflow occurs
`DAY`	Day is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`HOUR`	Hour is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`MINUTE`	Minute is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`SECOND`	Second is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`MILLISECOND`	Millisecond is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`FRAC_SECOND`	Nanosecond is modified by interval amount; date/time are adjusted, if overflow/underflow occurs Note Time is processed to millisecond precision, so any portion of an `amount` with finer granularity than 1,000,000 nanoseconds will be ignored (e.g., requesting the addition of 1,234,567 nanoseconds will result in 1 millisecond actually being added)

Note

Any of these unit types can have a SQL_TSI_ prefix prepended to them; e.g., both DAY and SQL_TSI_DAY are valid unit types for specifying a day interval.

Examples:

Function Call	Result
`TIMESTAMPADD(SQL_TSI_YEAR, 1, '2000-10-10')`	`2001-10-10`
`TIMESTAMPADD(SQL_TSI_QUARTER, 1, '2000-11-30')`	`2001-02-28`
`TIMESTAMPADD(SQL_TSI_MONTH, 1, '2000-01-31')`	`2000-02-29`
`TIMESTAMPADD(SQL_TSI_WEEK, 53, '2000-01-01')`	`2001-01-06`
`TIMESTAMPADD(SQL_TSI_DAY, 1, '2000-12-31')`	`2001-01-01`
`TIMESTAMPADD(SQL_TSI_HOUR, 12, '2000-10-10 12:34:56')`	`2000-10-11 00:34:56.000`
`TIMESTAMPADD(SQL_TSI_MINUTE, 1, '2000-10-10 12:34:56')`	`2000-10-10 12:35:56.000`
`TIMESTAMPADD(SQL_TSI_SECOND, 1, '2000-12-31 23:59:59')`	`2001-01-01 00:00:00.000`
`TIMESTAMPADD(SQL_TSI_MILLISECOND, 1, '2000-10-10 12:34:56')`	`2000-10-10 12:34:56.001`
`TIMESTAMPADD(SQL_TSI_FRAC_SECOND, 1000000, '2000-10-10 12:34:56')`	`2000-10-10 12:34:56.001`
`TIMESTAMPADD(SQL_TSI_SECOND, 1, TIME '12:34:56')`	`12:34:57.000`

TIMESTAMPDIFF(unit, begin, end)

Note

This is not symmetric with TIMESTAMPADD in all cases, as adding 1 MONTH to Mar 31st results in Apr 30th, but the TIMESTAMPDIFF in MONTH units between those two dates is 0.

Examples:

Function Call	Result
`TIMESTAMPDIFF(MONTH, DATETIME('2000-10-10 01:23:45.678'), DATETIME('2000-12-31 12:34:56.789'))`	`2`
`TIMESTAMPDIFF(MONTH, DATETIME('2000-03-31 01:23:45.678'), DATETIME('2000-04-30 12:34:56.789'))`	`0`
`TIMESTAMPDIFF(MONTH, DATETIME('2000-12-31 01:23:45.678'), DATETIME('2000-10-10 12:34:56.789'))`	`-2`
`TIMESTAMPDIFF(HOUR, DATETIME('2000-10-10 12:34:56.789'), TIMESTAMP(978222896678))`	`1955`

TIMESTAMP_TRUNC(part, expr) Alias for DATE_TRUNC(part, expr)

UNIX_TIMESTAMP(expr) Alias for SECS_SINCE_EPOCH(expr)

WEEK(expr) Extracts the week of the year from expr [1 - 54]; each full week starts on Sunday (a 1 is returned for the week containing Jan 1st)

YEAR(expr) Extracts the year from expr; 4-digit year, A.D.

Date/Time Complex Conversion Functions

Function Description

DATE_TO_EPOCH_MSECS(year, month, day, hour, min, sec, msec)

Converts the full date to milliseconds since the epoch

Example:

Function Call	`DATE_TO_EPOCH_MSECS(2017, 06, 15, 09, 22, 15, 42)`
Return	`1497518535042`
Resolves To	Thursday, June 15, 2017 9:22:15.042 AM

DATE_TO_EPOCH_SECS(year, month, day, hour, min, sec)

Converts the full date to seconds since the epoch

Example:

Function Call	`DATE_TO_EPOCH_SECS(2017, 06, 15, 09, 22, 15)`
Return	`1494926535`
Resolves To	Thursday, June 15, 2017 9:22:15.042 AM

TIMESTAMP_FROM_DATE_TIME(date, time)

Converts the given date and time to a composite date/time format

Example:

Function Call	`TIMESTAMP_FROM_DATE_TIME('2017-06-15', '10:37:30')`
Return	`2017-06-15 10:37:30.000`

WEEK_TO_EPOCH_MSECS(year, week_number)

Converts the year and week number to milliseconds since the epoch; negative values are accepted

Example:

Function Call	`WEEK_TO_EPOCH_MSECS(2017,-32)`
Return	`1463270400000`
Resolves To	Sunday, May 15, 2016 12:00:00 AM

WEEK_TO_EPOCH_SECS(year, week_number)

Converts the year and week number to seconds since the epoch. Negative values are accepted. Each new week begins Sunday at midnight.

Example:

Function Call	`WEEK_TO_EPOCH_SECS(2017,-32)`
Return	`1463270400`
Resolves To	Sunday, May 15, 2016 12:00:00 AM

Geospatial/Geometry Functions

Five types of geospatial functions are available in Kinetica:

Scalar Functions - apply a geospatial function at the record level to WKT or X/Y data
Enhanced Performance Scalar Functions - apply performance-optimized geospatial functions to X/Y data
Aggregate Functions - apply a geospatial function across groups of records to WKT or X/Y data
Track Functions - apply scalar & aggregate functions to track-based data
H3 Functions - apply H3 gridding functions to WKT or X/Y data

Tip

Use ST_ISVALID to determine if a geometry object is valid. The functions below work best with valid geometry objects.
Use the REMOVE_NULLABLE function to remove any nullable column types that could result from calculating a derived column (e.g., as in Projections) using one of the functions below.

Enhanced Performance Scalar Functions

The functions below all compare x and y coordinates to geometry objects (or vice versa), thus increasing their performance in queries. Each of these functions have a geometry-to-geometry version listed in the next section.

Function	Description
`STXY_CONTAINS(geom, x, y)`	Returns `1` (true) if `geom` contains the `x` and `y` coordinate, e.g. lies in the interior of `geom`. The coordinate cannot be on the boundary and also be contained because `geom` does not contain its boundary
`STXY_CONTAINSPROPERLY(geom, x, y)`	Returns `1` (true) if the `x` and `y` coordinate intersects the interior of `geom` but not the boundary (or exterior) because `geom` does not contain its boundary but does contain itself
`STXY_COVEREDBY(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate is covered by `geom`
`STXY_COVERS(geom, x, y)`	Returns `1` (true) if `geom` covers the `x` and `y` coordinate
`STXY_DISJOINT(x, y, geom)`	Returns `1` (true) if the given `x` and `y` coordinate and the geometry `geom` do not spatially intersect.
`STXY_DISTANCE` `(x, y, geom[, solution])`	Calculates the minimum distance between the given `x` and `y` coordinate and `geom` using the specified solution type. Solution types available: `0` (default) - Euclidean; returns 2-D Euclidean distance `1` - Haversine; returns minimum sphere distance in meters `2` - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance Note If the `x` and `y` coordinate and `geom` intersect (verify using `ST_INTERSECTS`), the distance will always be `0`.
`STXY_DWITHIN` `(x, y, geom, distance[, solution])`	Returns `1` (true) if the `x` and `y` coordinate is within the specified `distance` from `geom` using the specified solution type. Solution types available: `0` (default) - Euclidean; uses degrees to calculate distance `1` - Sphere; uses meters to calculate sphere distance `2` - Spheroid; uses meters to calculate spheroid distance
`STXY_ENVDWITHIN` `(x, y, geom, distance[, solution])`	Returns `1` (true) if the `x` and `y` coordinate is within the specified `distance` from the bounding box of `geom` using the specified solution type. Solution types available: `0` (default) - Euclidean; uses degrees to calculate distance `1` - Sphere; uses meters to calculate distance
`STXY_ENVINTERSECTS(x, y, geom)`	Returns `1` (true) if the bounding box of the given geometry `geom` intersects the `x` and `y` coordinate.
`STXY_GEOHASH(x, y[, precision])`	Returns a hash string representation of the given `x` and `y` coordinates with specified `precision` (the length of the resulting geohash string). The longer the `precision`, the more precise the hash is. By default, `precision` is set to `20`; the max for `precision` is `32`. See Geohashing for an example.
`STXY_H3(x, y, resolution)`	Alias for `H3_XYTOCELL`; see H3 Functions.
`STXY_INTERSECTION(x, y, geom)`	Returns the shared portion between the `x` and `y` coordinate and the given geometry `geom`, i.e. the point itself.
`STXY_INTERSECTS(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate and `geom` intersect in 2-D.
`STXY_TOUCHES(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate and geometry `geom` have at least one point in common but their interiors do not intersect. If `geom` is a GEOMETRYCOLLECTION, a `0` is returned regardless if the point and geometry touch
`STXY_WITHIN(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate is completely inside the `geom` geometry i.e., not on the boundary

Scalar Functions

Function Description

DIST(x1, y1, x2, y2) Computes the Euclidean distance (in degrees), i.e. SQRT( (x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) ).

GEODIST(lon1, lat1, lon2, lat2) Computes the geographic great-circle distance (in meters) between two lat/lon points.

GEOHASH_DECODE_LATITUDE(geohash) Decodes a given geohash and returns the latitude value for the given hash string. Supports a maximum geohash character length of 16.

GEOHASH_DECODE_LONGITUDE(geohash) Decodes a given geohash and returns the longitude value for the given hash string. Supports a maximum geohash character length of 16.

GEOHASH_ENCODE(lat, lon, precision) Encodes a given coordinate pair and returns a hash string with a given precision. The maximum precision is 15.

GEOMETRY(wkt) Alias for ST_GEOMFROMTEXT(wkt)

ST_ADDPOINT

(linestring, point[, position])

Adds a given point geometry to the given linestring geometry at the specified position, which is a 0-based index. If no position is specified, the point will be added to the end.

ST_ALMOSTEQUALS

(geom1, geom2[, decimal])

Returns 1 (true) if given geometries, geom1 and geom2, are almost spatially equal within the given amount of decimal scale. Note that geometries will still be considered equal if the decimal scale for the geometries is within a half order of magnitude of each other; e.g., if decimal is set to 2, then POINT(63.4 123.45) and POINT(63.4 123.454) are equal, but POINT(63.4 123.45) and POINT(63.4 123.459) are not equal. The geometry types must match to be considered equal.

If no decimal scale is specified, a default scale of 6 will be applied.

ST_AREA(geom[, solution])

Returns the area of the given geometry geom if it is a POLYGON or MULTIPOLYGON using the specified solution type. Returns 0 if the input geometry type is (MULTI)POINT or (MULTI)LINESTRING. Solution types available:

0 (default) - 2D Euclidean area
1 - curved surface area on a sphere in square meters
2 - curved surface area on a spheroid in square meters

ST_AZIMUTH(geom1, geom2) Returns the azimuth in radians defined by the segment between two POINTs, geom1 and geom2. Returns a null if the input geometry type is MULTIPOINT, (MULTI)LINESTRING, or (MULTI)POLYGON.

ST_BOUNDARY(geom) Returns the closure of the combinatorial boundary of a given geometry geom. Returns an empty geometry if geom is an empty geometry. Returns a null if geom is a GEOMETRYCOLLECTION

ST_BOUNDINGDIAGONAL(geom) Returns the diagonal of the given geometry's (geom) bounding box.

ST_BUFFER

(geom, radius[, style[, solution]])

Returns a geometry that represents all points whose distance from the given geometry geom is less than or equal to the given distance radius. The radius units can be specified by the solution type (default is in degrees) and the radius is created in the provided style. The style options are specified as a list of blank-separated key-value pairs, e.g., 'quad_segs=8 endcap=round'. If an empty style list ('') is provided, the default settings will be used. The style parameter must be specified to provide a solution type.

Available style options:

quad_segs - the number of segments used to approximate a quarter circle (default is 8)
endcap - the endcap style of the buffer (default is round); options are round, flat (or butt), and square
join - the join style of the buffer (default is round); options are round, mitre (or miter), and bevel
mitre_limit - the mitre ratio limit expressed as a floating-point number (miter_limit is also acceptable)

Available solution types:

0 (default) - 2D Euclidean radius distance in degrees
1 - curved surface radius distance on a sphere in meters
2 - curved surface radius distance on a spheroid in meters

Tip

To create a 5-meter buffer around geom using the default styles: ST_BUFFER(geom, 5, '', 1). To create a 5-foot (converting feet to meters) buffer around geom using the following styles: ST_BUFFER(geom, 5*0.3048,'quad_segs=4 endcap=flat', 1)

ST_BUFFERBYCOMPONENT

(geom, radius[, style[, solution]])

Returns a buffered geometry similar to the output of ST_BUFFER using the same parameters. The only difference is the buffered geometry is calculated by independently buffering each individual component and then the buffered components are dissolved (i.e. unioned) together to produce the final output. This can produce very similar (but not identical) results to ST_BUFFER but will often run much faster.

ST_CENTROID(geom) Calculates the center of the given geometry geom as a POINT. For (MULTI)POINTs, the center is calculated as the average of the input coordinates. For (MULTI)LINESTRINGs, the center is calculated as the weighted length of each given LINESTRING. For (MULTI)POLYGONs, the center is calculated as the weighted area of each given POLYGON. If geom is an empty geometry, an empty GEOMETRYCOLLECTION is returned

ST_CLIP(geom1, geom2) Returns the geometry shared between given geometries geom1 and geom2

ST_CLOSESTPOINT

(geom1, geom2[, solution])

Calculates the 2-D POINT in geom1 that is closest to geom2 using the specified solution type. If geom1 or geom2 is empty, a null is returned. Solution types available:

0 (default) - Euclidean; calculates the closest point using 2-D Euclidean distance
1 - Haversine; calculates the closest point using sphere distance in meters
2 - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance

ST_COLLECT(geom1, geom2) Returns a MULTI* or GEOMETRYCOLLECTION comprising geom1 and geom2. If geom1 and geom2 are the same, singular geometry type, a MULTI* is returned, e.g., if geom1 and geom2 are both POINTs (empty or no), a MULTIPOINT is returned. If geom1 and geom2 are neither the same type nor singular geometries, a GEOMETRYCOLLECTION is returned.

ST_COLLECTIONEXTRACT

(collection, type)

Returns only the specified type from the given geometry collection. Type is a number that maps to the following:

1 = POINT
2 = LINESTRING
3 = POLYGON

ST_COLLECTIONHOMOGENIZE(collection) Returns the simplest form of the given collection, e.g., a collection with a single POINT will be returned as POINT(x y), and a collection with multiple individual points will be returned as a MULTIPOINT.

ST_CONCAVEHULL

(geom, target_percent[, allow_holes])

Returns a potentially concave geometry that encloses all geometries found in the given geom set. Use target_percent (values between 0 and 1) to determine the percent of area of a convex hull the concave hull will attempt to fill; 1 will return the same geometry as an ST_CONVEXHULL operation. Set allow_holes to 1 (true) to allow holes in the resulting geometry; default value is 0 (false). Note that allow_holes is independent of the area of target_percent.

ST_CONTAINS(geom1, geom2) Returns 1 (true) if no points of geom2 lie in the exterior of geom1 and at least one point of geom2 lies in the interior of geom1. Note that geom1 does not contain its boundary but does contain itself.

ST_CONTAINSPROPERLY(geom1, geom2) Returns 1 (true) if geom2 intersects the interior of geom1 but not the boundary (or exterior). Note that geom1 does not contain its boundary but does contain itself.

ST_CONVEXHULL(geom) Returns the minimum convex geometry that encloses all geometries in the given geom set.

ST_COORDDIM(geom) Returns the coordinate dimension of the given geom, e.g., a geometry with x, y, and z coordinates would return 3.

ST_COVEREDBY(geom1, geom2) Returns 1 (true) if no point in geom1 is outside geom2.

ST_COVERS(geom1, geom2) Returns 1 (true) if no point in geom2 is outside geom1.

ST_CROSSES(geom1, geom2) Returns 1 (true) if the given geometries, geom1 and geom2, spatially cross, meaning some but not all interior points in common. If geom1 and/or geom2 are a GEOMETRYCOLLECTION, a 0 is returned regardless of if the two geometries cross

ST_DIFFERENCE(geom1, geom2) Returns a geometry that represents the part of geom1 that does not intersect with geom2.

ST_DIMENSION(geom) Returns the dimension of the given geometry geom, which is less than or equal to the coordinate dimension. If geom is a single geometry, a 0 is for POINT, a 1 is for LINESTRING, and a 2 is for POLYGON. If geom is a collection, it will return the largest dimension from the collection. If geom is empty, 0 is returned.

ST_DISJOINT(geom1, geom2) Returns 1 (true) if the given geometries, geom1 and geom2, do not spatially intersect.

ST_DISTANCE(geom1, geom2[, solution])

Calculates the minimum distance between the given geometries, geom1 and geom2, using the specified solution type. Solution types available:

0 (default) - Euclidean; returns 2-D Euclidean distance
1 - Haversine; returns minimum sphere distance in meters
2 - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance

Note

If geom1 and geom2 intersect (verify using ST_INTERSECTS), the distance will always be 0.

ST_DISTANCEPOINTS

(x1, y1, x2, y2[, solution])

Calculates the minimum distance between the given points, x1, y1 and x2, y2, using the specified solution type. Solution types available:

0 (default) - Euclidean; returns 2-D Euclidean distance
1 - Haversine; returns minimum sphere distance in meters
2 - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance

ST_DFULLYWITHIN

(geom1, geom2, distance[, solution])

Returns 1 (true) if the maximum distance between geometries geom1 and geom2 is less than or equal to the specified distance of each other using the specified solution type. If geom1 or geom2 is null, 0 (false) is returned. Solution types available:

0 (default) - Euclidean; uses degrees to calculate distance
1 - Sphere; uses meters to calculate distance
2 - Spheroid; uses meters to calculate distance, more accurate than sphere but slower performance

ST_DWITHIN

(geom1, geom2, distance[, solution])

Returns 1 (true) if the minimum distance between geometries geom1 and geom2 is within the specified distance of each other using the specified solution type. Solution types available:

0 (default) - Euclidean; uses degrees to calculate distance
1 - Sphere; uses meters to calculate distance
2 - Spheroid; uses meters to calculate distance, more accurate than sphere but slower performance

ST_ELLIPSE(x, y, height, width)

Returns an ellipse using the following values:

x - the x coordinate or longitude used to center the ellipse
y - the y coordinate or latitude used to center the ellipse
height - the height of the ellipse (in degrees)
width - the width of the ellipse (in degrees)

ST_ENDPOINT(geom) Returns the last point of the given geom as a POINT if it's a LINESTRING. If geom is not a LINESTRING, null is returned.

ST_ENVDWITHIN

(geom1, geom2, distance[, solution])

Returns 1 (true) if geom1 is within the specified distance of the bounding box of geom2 using the specified solution type. Solution types available:

0 (default) - Euclidean; uses degrees to calculate distance
1 - Sphere; uses meters to calculate distance

ST_ENVELOPE(geom) Returns the bounding box of a given geometry geom.

ST_ENVINTERSECTS(geom1, geom2) Returns 1 (true) if the bounding box of the given geometries, geom1 and geom2, intersect.

ST_EQUALS(geom1, geom2) Returns 1 (true) if the given geometries, geom1 and geom2, are spatially equal. Note that order does not matter.

ST_EQUALSEXACT

(geom1, geom2[, tolerance])

Returns 1 (true) if the given geometries, geom1 and geom2, are almost spatially equal within some given tolerance. If the values within the given geometries are within the tolerance value of each other, they're considered equal, e.g., if tolerance is 2, POINT(1 1) and POINT(1 3) are considered equal, but POINT(1 1) and POINT(1 3.1) are not. Note that the geometry types have to match for them to be considered equal. The default tolerance is 0, which makes this function effectively equivalent to ST_EQUALS(geom1, geom2) in the default case.

ST_ERASE(geom1, geom2) Returns the result of erasing a portion of geom1 equal to the size of geom2.

ST_EXPAND(geom, units) Returns the bounding box expanded in all directions by the given units of the given geom. The expansion can also be defined for separate directions by providing separate parameters for each direction, e.g., ST_EXPAND(geom, unitsx, unitsy, unitsz, unitsm).

ST_EXPANDBYRATE(geom, rate) Returns the bounding box expanded by a given rate (a ratio of width and height) for the given geometry geom. The rate must be between 0 and 1.

ST_EXTERIORRING(geom) Returns a LINESTRING representing the exterior ring of the given POLYGON geom

ST_FORCE2D(geom) Returns the 2-dimensional version (e.g., X and Y coordinates) of geom, the provided geometry or set of geometries (e.g., via GEOMETRYCOLLECTION or WKT column name).

ST_FORCE3D(geom[, z])

Returns the 3-dimensional version (e.g., X, Y, and Z coordinates) of geom, a provided geometry or set of geometries (e.g., via GEOMETRYCOLLECTION or WKT column name), using z as the geometry's new z-value. The provided z-values can also be derived from a numeric column. If no z is provided, a 0 will be applied.

Note

If a WKT column is provided for geom and a numeric column is provided for z, the z values will be matched to the provided geometries by row in the source table. If a singular geometry is provided for geom and a column is provided for z, three-dimensional versions of the provided geometry will be returned for each z value found in the provided z column. If columns are provided for both geom and z and nulls are present in either column, the row containing null values will be skipped in the results.

ST_GENERATEPOINTS(geom, num) Creates a MULTIPOINT containing a number num of randomly generated points within the boundary of geom.

ST_GEOHASH(geom[, precision])

Returns a hash string representation of the given geometry geom with specified precision (the length of the resulting geohash string). The longer the precision, the more precise the hash is. By default, precision is set to 20; the max for precision is 32. Returns null if geom is an empty geometry.

See Geohashing for an example.

Note

The value returned will not be a geohash of the exact geometry but a geohash of the centroid of the given geometry

ST_GEOMETRYFROMTEXT(wkt) Alias for ST_GEOMFROMTEXT(wkt)

ST_GEOMETRYN(geom, index) Returns the index geometry back from the given geom geometry. The index starts from 1 and goes to the number of geometries in geom.

ST_GEOMETRYTYPE(geom) Returns the type of geometry from the given geom.

ST_GEOMETRYTYPEID(geom)

Returns the type ID of from geom. Type and ID mappings:

POINT = 0
LINESTRING = 1
POLYGON = 3
MULTIPOINT = 4
MULTILINESTRING = 5
MULTIPOLYGON = 6
GEOMETRYCOLLECTION = 7

ST_GEOMFROMGEOHASH

(geohash[, precision])

Returns a POLYGON boundary box using the given geohash with a precision set by the integer precision. If precision is specified, the function will use as many characters in the hash equal to precision to create the geometry. If no precision is specified, the full length of the geohash is used.

See Geohashing for an example.

ST_GEOMFROMH3(h3_index) Alias for H3_CELLTOBOUNDARY; see H3 Functions.

ST_GEOMFROMTEXT(wkt) Returns a geometry from the given Well-Known text representation wkt. Note that this function is only compatible with constants.

ST_H3(wkt, resolution) Alias for H3_GEOMTOCELL; see H3 Functions.

ST_HEXGRID

(xmin, ymin, xmax, ymax,

cell_side[, limit])

Creates a MULTIPOLYGON containing a grid of hexagons between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual hexagons' sides is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:

-1 - No limit to the number of cells generated (effectively limited by system memory)
0 (default) - 100 million cells
<n> - Custom limit of n cells

If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned.

ST_INTERIORRINGN(geom, n) Returns the n-th interior LINESTRING ring of the POLYGON geom. If geom is not a POLYGON or the given n is out of range, a null is returned. The index begins at 1

ST_INTERSECTION(geom1, geom2) Returns the shared portion between given geometries geom1 and geom2

ST_INTERSECTS(geom1, geom2) Returns 1 (true) if the given geometries, geom1 and geom2, intersect in 2-D

ST_ISCLOSED(geom) Returns 1 (true) if the given geometry's (geom) start and end points coincide

ST_ISCOLLECTION(geom) Returns 1 (true) if geom is a collection, e.g., GEOMETRYCOLLECTION, MULTIPOINT, MULTILINESTRING, etc.

ST_ISEMPTY(geom) Returns 1 (true) if geom is empty

ST_ISRING(geom) Returns 1 (true) if LINESTRING geom is both closed (per ST_ISCLOSED) and "simple" (per ST_ISSIMPLE). Returns 0 if geom is not a LINESTRING

ST_ISSIMPLE(geom) Returns 1 (true) if geom has no anomalous geometric points, e.g., self-intersection or self-tangency

ST_ISVALID(geom) Returns 1 (true) if geom (typically a [MULTI]POLYGON) is well formed. A POLYGON is valid if its rings do not cross, and its boundary intersects only at POINTs (not along a line). The POLYGON must also not have dangling LINESTRINGs. A MULTIPOLYGON is valid if all of its elements are also valid, and the interior rings of those elements do not intersect. Each element's boundaries may touch but only at POINTs (not along a line). ST_MAKEVALID(geom) can be used to help correct invalid geometries.

ST_ISVALIDREASON(geom)

Returns Valid Geometry if geom is well formed, according to ST_ISVALID(geom); otherwise, returns the reason geom is determined to be malformed. ST_MAKEVALID(geom) can be used to help correct invalid geometries.

Example:

Function Call	`ST_ISVALIDREASON('POLYGON((-1 0, 1 0, 1 1, -1 -1))')`
Return	`Self-intersection[0 0]`

ST_LENGTH(geom[, solution])

Returns the length of the geometry if it is a LINESTRING or MULTILINESTRING. Returns 0 if another type of geometry, e.g., POINT, MULTIPOINT, etc. GEOMETRYCOLLECTIONs are also supported but the aforementioned type limitation still applies; the collection will be recursively searched for LINESTRINGs and MULTILINESTRINGs and the summation of all supported geometry types is returned (unsupported types are ignored). Solution types available:

0 (default) - 2D Euclidean length
1 - length on a sphere in meters
2 - length on a spheroid in meters

ST_LINEFROMMULTIPOINT(geom) Creates a LINESTRING from geom if it is a MULTIPOINT. Returns null if geom is not a MULTIPOINT

ST_LINEINTERPOLATEPOINT(geom, frac) Returns a POINT on the LINESTRING geom that is the frac fraction of the distance along the line. If geom is either empty or not a LINESTRING, null is returned

ST_LINELOCATEPOINT(linestring, point) Returns the location of the closest point in the given linestring to the given point as a value between 0 and 1. The return value is a fraction of the total linestring length.

ST_LINEMERGE(geom) Returns a LINESTRING or MULTILINESTRING from a given geom. If geom is a MULTILINESTRING comprising LINESTRINGs with shared endpoints, a contiguous LINESTRING is returned. If geom is a LINESTRING or a MULTILINESTRING comprising LINESTRINGS without shared endpoints, geom is returned If geom is an empty (MULTI)LINESTRING or a (MULTI)POINT or (MULTI)POLYGON, an empty GEOMETRYCOLLECTION is returned.

ST_LINESUBSTRING

(geom, start_frac, end_frac)

Returns the fraction of a given geom LINESTRING from the point that is the start_frac fraction of the distance along the line to the point that is the end_frac fraction of the distance along the line.

For example, given LINESTRING(1 1, 2 2, 3 3) a start_fraction of 0 and an end_fraction of 0.25 would yield the first quarter of the given LINESTRING, or LINESTRING(1 1, 1.5 1.5).

Returns null in the following cases:

input geometry is (MULTI)POINT, MULTILINESTRING, or (MULTI)POLYGON
start_frac is greater than end_frac
start_frac or end_frac are not between 0 & 1, inclusive

ST_LONGESTLINE

(geom1, geom2[, solution])

Returns the LINESTRING that represents the longest line of points between the two geometries. If multiple longest lines are found, only the first line found is returned. If geom1 or geom2 is empty, null is returned. Solution types available:

0 (default) - Euclidean; uses degrees to calculate the longest line
1 - Sphere; uses meters to calculate the longest line
2 - Spheroid; uses meters to calculate the longest line, more accurate than sphere but slower performance

ST_MAKEENVELOPE

(xmin, ymin, xmax, ymax)

Creates a rectangular POLYGON from the given min and max parameters

ST_MAKELINE(geom[, geom2])

Creates a LINESTRING from geom if it is a MULTIPOINT. If geom is a POINT, there must be at least one other POINT to construct a LINESTRING. If geom is a LINESTRING, it must have at least two points. Returns null if geom is not a POINT, MULTIPOINT, or LINESTRING

Note

This function can be rather costly in terms of performance

ST_MAKEPOINT(x, y)

Creates a POINT at the given coordinate

Note

This function can be rather costly in terms of performance

ST_MAKEPOLYGON(geom)

Creates a POLYGON from geom. Inputs must be closed LINESTRINGs

Note

This function can be rather costly in terms of performance

ST_MAKETRIANGLE2D

(x1, y1, x2, y2, x3, y3)

Creates a closed 2-D POLYGON with three vertices

ST_MAKETRIANGLE3D

(x1, y1, z1, x2, y2, z2, x3, y3, z3)

Creates a closed 3-D POLYGON with three vertices

ST_MAKEVALID(geom[, options])

Attempts to convert geom into a valid geometry when it is malformed, as determined by ST_ISVALID(geom). Returns geom if it is a valid geometry already. The method used to convert invalid geometries into valid ones can be specified in options as a space-separated string of x=y key/value pairs. The keys and corresponding values are as follows:

method - the algorithm used to convert invalid geometries into valid ones; either:
- linework (default) - build geometry from lines extracted from geom
- structure - build geometry from interior & exterior rings extracted from geom
keepcollapsed - if using the method of structure, whether to drop portions of the converted geometry that collapse to lower dimensions:
- true (default) - keep portions of geometry that collapse to lower dimensions
- false - don't keep portions of geometry that collapse to lower dimensions

Example using default linework method:

Function Call	`ST_MAKEVALID('POLYGON((-1 0, 1 0, 1 1, -1 -1))')`
Return	`MULTIPOLYGON (((-1 -1, -1 0, 0 0, -1 -1)), ((1 0, 0 0, 1 1, 1 0)))`

Example using the structure method without dropping collapsible parts of the converted geometry:

Function Call	`ST_MAKEVALID('POLYGON((0 0, 0 0, 0 0, 0 0))', 'method=structure keepcollapsed=true')`
Return	`POINT (0 0)`

Example using the structure method with dropping collapsible parts of the converted geometry:

Function Call	`ST_MAKEVALID('POLYGON((0 0, 0 0, 0 0, 0 0))', 'method=structure keepcollapsed=false')`
Return	`POLYGON EMPTY`

ST_MAXDISTANCE

(geom1, geom2[, solution])

Returns the maximum distance between the given geom1 and geom2 geometries using the specified solution type. If geom1 or geom2 is empty, null is returned. Solution types available:

0 (default) - returns maximum 2-D Euclidean distance
1 - Sphere; returns maximum distance in meters
2 - Spheroid; returns maximum distance in meters, more accurate than sphere but slower performance

ST_MAXX(geom) Returns the maximum x coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries.

ST_MAXY(geom) Returns the maximum y coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries.

ST_MAXZ(geom) Returns the maximum z coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries.

ST_MINX(geom) Returns the minimum x coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries.

ST_MINY(geom) Returns the minimum y coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries.

ST_MINZ(geom) Returns the minimum z coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries.

ST_MULTI(geom) Returns geom as a MULTI- geometry, e.g., a POINT would return a MULTIPOINT.

ST_MULTIPLERINGBUFFERS

(geom, distance[, outside])

Creates multiple buffers at specified distance around the given geom geometry. Multiple distances are specified as comma-separated values in an array, e.g., [10,20,30]. Valid values for outside are:

FULL - indicates that buffers will overlap or cover the given geom geometry. This is the default.
OUTSIDE_ONLY - indicates that buffers will be rings around the given geom geometry.

ST_NDIMS(geom) Returns the number of dimensions in geom. For X,Y data, this will return 2; if a Z component is present, it will return 3.

ST_NEAR(geom1, geom2) Returns the portion of geom2 that is closest to geom1. If geom2 is a singular geometry object (e.g., POINT, LINESTRING, POLYGON), geom2 will be returned. If geom2 a multi-geometry, e.g., MULTIPOINT, MULTILINESTRING, etc., the nearest singular geometry in geom2 will be returned.

ST_NORMALIZE(geom) Returns geom in its normalized (canonical) form, which may rearrange the points in lexicographical order.

ST_NPOINTS(geom) Returns the number of points (vertices) in geom.

ST_NRINGS(geom) Returns the total number of rings (including interior rings) in geom. For non-polygonal geometries, it will return 0. For MULTIPOLYGONs, it will return the total number of rings across all components.

ST_NUMGEOMETRIES(geom) If geom is a collection or MULTI- geometry, returns the number of geometries. If geom is a single geometry, returns 1.

ST_NUMINTERIORRINGS(geom) Returns the number of interior rings if geom is a POLYGON. Returns null if geom is anything else.

ST_NUMPOINTS(geom) Returns the number of points in the geom LINESTRING. Returns null if geom is not a LINESTRING.

ST_OVERLAPS(geom1, geom2) Returns 1 (true) if given geometries geom1 and geom2 share space. If geom1 and/or geom2 are a GEOMETRYCOLLECTION, a 0 is returned regardless of if the two geometries overlap

ST_PARTITION(geom[, threshold]) Returns a MULTIPOLYGON representing the given geom partitioned into a number of POLYGONs with a maximum number of vertices equal to the given threshold. Minimum value for threshold is 10; default value is 10000. If geom is not a POLYGON or MULTIPOLYGON, geom is returned. If the number of vertices in geom is less than the threshold, geom is returned.

ST_PERIMETER(geom[, solution])

Returns the perimeter of the geometry if it is a POLYGON or MULTIPOLYGON. Returns 0 if another type of geometry, e.g., POINT, MULTIPOINT, LINESTRING, or MULTILINESTRING. GEOMETRYCOLLECTIONs are also supported but the aforementioned type limitation still applies; the collection will be recursively searched for POLYGONs and MULTIPOLYGONs and the summation of all supported geometry types is returned (unsupported types are ignored). Solution types available:

0 (default) - 2D Euclidean length
1 - length on a sphere in meters
2 - length on a spheroid in meters

ST_POINT(x, y) Returns a POINT with the given x and y coordinates.

ST_POINTFROMGEOHASH

(geohash[, precision])

Returns a POINT using the given geohash with a precision set by the integer precision. If precision is specified, the function will use as many characters in the hash equal to precision to create the geometry. If no precision is specified, the full length of the geohash is used.

Note

The POINT returned represents the center of the bounding box of the geohash

ST_POINTGRID

(xmin, ymin, xmax, ymax,

cell_side[, limit])

Creates a MULTIPOLYGON containing a square-shaped grid of points between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The distance between the points (in meters) is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:

-1 - No limit to the number of cells generated (effectively limited by system memory)
0 (default) - 100 million cells
<n> - Custom limit of n cells

If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned.

ST_POINTN(geom, n) Returns the n-th point in LINESTRING geom. Negative values are valid, but note that they are counted backwards from the end of geom. A null is returned if geom is not a LINESTRING.

ST_POINTS(geom) Returns a MULTIPOINT containing all of the coordinates of geom.

ST_PROJECT(geom, distance, azimuth) Returns a POINT projected from a start point geom along a geodesic calculated using distance and azimuth. If geom is not a POINT, null is returned.

ST_REMOVEPOINT(geom, offset) Remove a point from LINESTRING geom using offset to skip over POINTs in the LINESTRING. The offset is 0-based.

ST_REMOVEREPEATEDPOINTS

(geom, tolerance)

Removes points from geom if the point's vertices are greater than or equal to the tolerance of the previous point in the geometry's list. If geom is not a MULTIPOINT, MULTILINESTRING, or a MULTIPOLYGON, no points will be removed.

ST_REVERSE(geom) Return the geometry with its coordinate order reversed.

ST_ROTATE(geom, radians[, x, y])

ST_ROTATE(geom, radians[, wkt])

Rotates geom counter-clockwise by radians radians. Optionally, the rotation origin can be provided as either a coordinate pair (x & y) or WKT POINT (wkt). If not provided, geom will be rotated around (0, 0).

ST_SCALE(geom, x, y)

ST_SCALE(geom, wkt)

Scales geom by multiplying its respective vertices by either the given x & y values or the corresponding x, y values in the given WKT POINT.

Example using the 3-parameter (x, y) version:

Function Call	`ST_SCALE('POLYGON((1 2, -2 1, -1 -2, 2 -1, 1 2))', 3, 5)`
Return	`POLYGON ((3 10, -6 5, -3 -10, 6 -5, 3 10))`

Example using the 2-parameter (wkt) version:

Function Call	`ST_SCALE('POLYGON((1 2, -2 1, -1 -2, 2 -1, 1 2))', GEOMETRY('POINT(3 5)'))`
Return	`POLYGON ((3 10, -6 5, -3 -10, 6 -5, 3 10))`

ST_SEGMENTIZE

(geom, max_segment_size[, solution])

Returns the given geom, but segmentized n number of times depending on how the max_segment_size distance (in units based on the solution type) divides up the original geometry. The new geom is guaranteed to have segments that are smaller than the given max_segment_size. Note that POINTs are not able to be segmentized. Collection geometries (GEOMETRYCOLLECTION, MULTILINESTRING, MULTIPOINT, etc.) can be segmentized, but only the individual parts will be segmentized, not the collection as a whole. Solution types available:

0 - Euclidean; uses degrees to calculate distance
1 (default) - Sphere; uses meters to calculate distance

ST_SETPOINT(geom1, position, geom2) Replace a point of LINESTRING geom1 with POINT geom2 at position (base 0). Negative values are valid, but note that they are counted backwards from the end of geom.

ST_SHAREDPATH(geom1, geom2) Returns a collection containing paths shared by geom1 and geom2.

ST_SHORTESTLINE(geom1, geom2) Returns the 2-D LINESTRING that represents the shortest line of points between the two geometries. If multiple shortest lines are found, only the first line found is returned. If geom1 or geom2 is empty, null is returned

ST_SIMPLIFY(geom, tolerance)

Returns a simplified version of the given geom using an algorithm to reduce the number of points comprising a given geometry while attempting to best retain the original shape. The given tolerance determines how much to simplify the geometry. The higher the tolerance, the more simplified the returned geometry. Some holes might be removed and some invalid polygons (e.g., self-intersecting, etc.) might be present in the returned geometry. Only (MULTI)LINESTRINGs and (MULTI)POLYGONs can be simplified, including those found within GEOMETRYCOLLECTIONs; any other geometry objects will be returned unsimplified.

Note

The tolerance should be provided in the same units as the data. As a rule of thumb, a tolerance of 0.00001 would correspond to about one meter.

ST_SIMPLIFYPRESERVETOPOLOGY

(geom, tolerance)

Returns a simplified version of the given geom using an algorithm to reduce the number of points comprising a given geometry while attempting to best retain the original shape. The given tolerance determines how much to simplify the geometry. The higher the tolerance, the more simplified the returned geometry. No holes will be removed and no invalid polygons (e.g., self-intersecting, etc.) will be present in the returned geometry. Only (MULTI)LINESTRINGs and (MULTI)POLYGONs can be simplified, including those found within GEOMETRYCOLLECTIONs; any other geometry objects will be returned unsimplified.

Note

The tolerance should be provided in the same units as the data. As a rule of thumb, a tolerance of 0.00001 would correspond to about one meter.

ST_SNAP(geom1, geom2, tolerance) Snaps geom1 to geom2 within the given tolerance. If the tolerance causes geom1 to not snap, the geometries will be returned unchanged.

ST_SPLIT(geom1, geom2) Returns a collection of geometries resulting from the split between geom1 and geom2 geometries.

ST_SQUAREGRID

(xmin, ymin, xmax, ymax,

cell_side[, limit])

Creates a MULTIPOLYGON containing a grid of squares between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual squares' sides is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:

-1 - No limit to the number of cells generated (effectively limited by system memory)
0 (default) - 100 million cells
<n> - Custom limit of n cells

If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned.

ST_STARTPOINT(geom) Returns the first point of LINESTRING geom as a POINT. Returns null if geom is not a LINESTRING.

ST_SYMDIFFERENCE(geom1, geom2) Returns a geometry that represents the portions of geom1 and geom2 geometries that do not intersect.

ST_TOUCHES(geom1, geom2) Returns 1 (true) if the given geometries, geom1 and geom2, have at least one point in common but their interiors do not intersect. If geom1 and/or geom2 are a GEOMETRYCOLLECTION, a 0 is returned regardless of if the two geometries touch

ST_TRANSLATE

(geom, deltax, deltay[, deltaz])

Translate geom by given offsets deltax and deltay. A z-coordinate offset can be applied using deltaz. intersect.

ST_TRIANGLEGRID

(xmin, ymin, xmax, ymax,

cell_side[, limit])

Creates a MULTIPOLYGON containing a grid of triangles between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual triangles' sides is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:

-1 - No limit to the number of cells generated (effectively limited by system memory)
0 (default) - 100 million cells
<n> - Custom limit of n cells

If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned.

ST_UNION(geom1, geom2) Returns a geometry that represents the point set union of the two given geometries, geom1 and geom2.

ST_UNIONCOLLECTION(geom) Returns a geometry that represents the point set union of a single given geometry geom.

ST_UPDATE(geom1, geom2) Returns a geometry that is geom1 geometry updated by geom2 geometry

ST_VORONOIPOLYGONS(geom[, tolerance])

Returns a GEOMETRYCOLLECTION containing Voronoi polygons (regions consisting of points closer to a vertex in geom than any other vertices in geom) calculated from the vertices in geom and the given tolerance. The tolerance determines the distance at which points will be considered the same. An empty GEOMETRYCOLLECTION is returned if geom is an empty geometry, a single POINT, or a LINESTRING or POLYGON composed of equivalent vertices (e.g., POLYGON((0 0, 0 0, 0 0, 0 0)), LINESTRING(0 0, 0 0)).

If no tolerance is specified, no vertices will be considered the same; each will have its own polygon.

The bounding box for the result POLYGONs extends past the four edges of the input geom bounding box by an amount that is the greater of the input bounding box's height and width. For instance, an input geom with a 3 x 4 bounding box will result in Voronoi polygons filling a space that is 11 x 12.

ST_WITHIN(geom1, geom2) Returns 1 (true) if the geom1 geometry is inside the geom2 geometry. Note that as long as at least one point is inside of geom2, geom1 is considered within geom2 even if the rest of the geom1 lies along the boundary of geom2

ST_WKBTOWKT(geom) Returns the text form (WKT) of a geometry from the given byte form (WKB)

ST_WKTTOWKB(geom) Returns the byte form (WKB) of a geometry from the given text form (WKT)

ST_X(geom) Returns the X coordinate of the POINT geom; if the coordinate is not available, null is returned. geom must be a POINT.

ST_XMAX(geom) Alias for ST_MAXX()

ST_XMIN(geom) Alias for ST_MINX()

ST_Y(geom) Returns the Y coordinate of the POINT geom; if the coordinate is not available, null is returned. geom must be a POINT.

ST_YMAX(geom) Alias for ST_MAXY()

ST_YMIN(geom) Alias for ST_MINY()

ST_ZMAX(geom) Alias for ST_MAXZ()

ST_ZMIN(geom) Alias for ST_MINZ()

Aggregation Functions

Function	Description
`ST_AGGREGATE_COLLECT(geom)`	Alias for `ST_COLLECT_AGGREGATE()`
`ST_AGGREGATE_INTERSECTION(geom)`	Alias for `ST_INTERSECTION_AGGREGATE()`
`ST_COLLECT_AGGREGATE(geom)`	Returns a GEOMETRYCOLLECTION comprising all geometries found in the `geom` set. Any MULTI* geometries will be divided into separate singular geometries, e.g., MULTIPOINT((0 0), (1 1)) would be divided into POINT(0 0) and POINT(1 1) in the results; the same is true for elements of a GEOMETRYCOLLECTION found in `geom`, where a GEOMETRYCOLLECTION within the provided `geom` set will also be parsed, effectively flattening it and adding the individual geometries to the resulting GEOMETRYCOLLECTION. Any empty geometries in `geom` are ignored even if they are part of a GEOMETRYCOLLECTION. Any duplicate WKTs will be retained.
`ST_DISSOLVE(geom)`	Dissolves all geometries within a given set into a single geometry. Note that the resulting single geometry can still be a group of noncontiguous geometries but represented as a single group, e.g., a GEOMETRYCOLLECTION. Best performance when used in conjunction with adjacent geometries.
`ST_DISSOLVEOVERLAPPING(geom)`	Dissolves all geometries within a given set into a single geometry. Note that the resulting single geometry can still be a group of noncontiguous geometries but represented as a single group, e.g., a GEOMETRYCOLLECTION. Best performance when used in conjunction with overlapping geometries.
`ST_INTERSECTION_AGGREGATE(geom)`	Returns a POLYGON or MULTIPOLYGON comprising the shared portion between all geometries found in the `geom` set. Returns an empty GEOMETRYCOLLECTION if there is no shared portion between all geometries. Functionally equivalent to `ST_INTERSECTION(ST_INTERSECTION(geom1, geom2), ... geomN)`.
`ST_LINESTRINGFROMORDEREDPOINTS(x, y, t)`	Returns a LINESTRING that represents a "track" of the given points (`x`, `y`) ordered by the given sort column `t` (e.g., a timestamp or sequence number). If any of the values in the specified columns are `null`, the null "point" will be left out of the resulting LINESTRING. If there's only one non-null "point" in the source table, a POINT is returned. If there are no non-null "points" in the source table, a `null` is returned.
`ST_LINESTRINGFROMORDEREDPOINTS3D(x, y, z, t)`	Returns a LINESTRING that represents a "track" of the given 3D points (`x`, `y`, `z`) ordered by the given sort column `t` (e.g., a timestamp or sequence number). If any of the values in the specified columns are `null`, the null "point" will be left out of the resulting LINESTRING. If there's only one non-null "point" in the source table, a POINT is returned. If there are no non-null "points" in the source table, a `null` is returned.
`ST_POLYGONIZE(geom)`	Returns a GEOMETRYCOLLECTION containing POLYGONs comprising the provided (MULTI)LINESTRING(s). (MULTI)POINT and (MULTI)POLYGON geometries are ignored when calculating the resulting GEOMETRYCOLLECTION. If a valid POLYGON cannot be constructed from the provided (MULTI)LINESTRING(s), an empty GEOMETRYCOLLECTION will be returned.

Track Functions

The following functions are available in both SQL and the native API.

Function Description

ST_TRACKDURATION([unit,] t)

Returns the total time, in the given unit, spanned by timestamp values in column t. Grouping by track ID will return the duration per track.

The duration can be returned in any of the following date/time units:

YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
MILLISECOND

The default unit is MILLISECOND.

ST_TRACKLENGTH(lat, lon, t[, solution])

Returns the total length of the track whose position values are specified by lat & lon and whose ordering is determined by an ascending sort on the timestamp t. Length can be returned with any of the following solution types:

Type	Description
`0`	2D Euclidean length in degrees
`1`	(default) Length on a sphere in meters
`2`	Length on a spheroid in meters

ST_TRACK_DWITHIN

The ST_TRACK_DWITHIN table function finds tracks that are related, within spatial or temporal bounds (or both) to the given track(s). The track(s) to use as the filter criteria will be specified by the SEARCH_* parameters. The TRACK_* parameters specify the set of tracks to search through for a match.

Note

This function is only available in SQL or in the native API via /execute/sql.

The basic form of the ST_TRACK_DWITHIN function follows.

ST_TRACK_DWITHIN Table Function Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT *
FROM TABLE
(
    ST_TRACK_DWITHIN
    (
        TRACK_TABLE => INPUT_TABLE(<table name | select statement>),
        TRACK_ID_COLUMN => < '<column name>' | <column position> >,
        TRACK_X_COLUMN => < '<column name>' | <column position> >,
        TRACK_Y_COLUMN => < '<column name>' | <column position> >,
        TRACK_ORDER_COLUMN => < '<column name>' | <column position> >,
        SEARCH_TABLE => INPUT_TABLE(<table name | select statement>),
        SEARCH_ID_COLUMN => < '<column name>' | <column position> >,
        SEARCH_X_COLUMN => < '<column name>' | <column position> >,
        SEARCH_Y_COLUMN => < '<column name>' | <column position> >,
        SEARCH_ORDER_COLUMN => < '<column name>' | <column position> >,
        [
            SEARCH_XY_DISTANCE => '<spatial distance with unit>',
            SPATIAL_SOLUTION_TYPE => <solution type>,
            SEARCH_TIME_DISTANCE => '<temporal distance with unit>'
        ]
    )
)

Parameters Description

TRACK_TABLE

Name of the table to search for tracks matching the track(s) specified in the SEARCH_* data set.

To perform a search on the flights table, pass the name of the table to INPUT_TABLE:

INPUT_TABLE(flights)

To perform a search on the result of a query, pass the query to INPUT_TABLE:

INPUT_TABLE
(
    SELECT * FROM flights_west
    UNION
    SELECT * FROM flights_east
)

TRACK_ID_COLUMN Table to search track column, containing the unique identifier for the track to which each track point belongs.

TRACK_X_COLUMN Table to search track column, containing the longitude value of each track point.

TRACK_Y_COLUMN Table to search track column, containing the latitude value of each track point.

TRACK_ORDER_COLUMN Table to search track column, by which the searched track points will be sorted in ascending order.

SEARCH_TABLE

Name of the search criteria track table, containing the track(s) to be used as the filter criteria when searching for matching tracks in the TRACK_* data set.

To match tracks from the flights_of_interest table, pass the name of the table to INPUT_TABLE:

INPUT_TABLE(flights_of_interest)

To match tracks from the result of a query, pass the query to INPUT_TABLE:

INPUT_TABLE
(
    SELECT * FROM flights_of_interest_west
    UNION
    SELECT * FROM flights_of_interest_east
)

SEARCH_ID_COLUMN Search criteria track column, containing the unique identifier for the track to which each track point belongs.

SEARCH_X_COLUMN Search criteria track column, containing the longitude value of each track point.

SEARCH_Y_COLUMN Search criteria track column, containing the latitude value of each track point.

SEARCH_ORDER_COLUMN Search criteria track column, by which the filter track points will be sorted in ascending order.

SEARCH_XY_DISTANCE

The radius around the given tracks to search for matching tracks.

Important

This parameter is not applicable when using a SPATIAL_SOLUTION_TYPE of 0.

Unit	Description
`f`	Feet
`ki`	Kilometers
`m`	(default) Meters
`mi`	Miles

SPATIAL_SOLUTION_TYPE

Spatial match solution type; any of the following:

Type	Description
`0`	(default) 2D Euclidean length in degrees
`1`	Length on a sphere, returned in units specified by `SEARCH_XY_DISTANCE`
`2`	Length on a spheroid, returned in units specified by `SEARCH_XY_DISTANCE`

SEARCH_TIME_DISTANCE

The maximum allowable time difference between a search criteria track's point and a matched track's points. The time can use any of the following suffices for units:

Unit	Description
`ms`	Milliseconds
`s`	(default) Seconds
`m`	Minutes
`h`	Hours
`d`	Days
`w`	Weeks
`months`	Months
`y`	Years

To see the matches between a set of flights and a given set of flights of interest:

ST_TRACK_DWITHIN Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
SELECT *
FROM TABLE
(
    ST_TRACK_DWITHIN
    (
        TRACK_TABLE => INPUT_TABLE(example_geospatial.flights),
        TRACK_ID_COLUMN => 'id',
        TRACK_X_COLUMN => 'lon',
        TRACK_Y_COLUMN => 'lat',
        TRACK_ORDER_COLUMN => 'flight_time',
        SEARCH_TABLE => INPUT_TABLE
                        (
                            SELECT id, lon, lat, flight_time
                            FROM example_geospatial.flights_northwest
                            UNION
                            SELECT id, lon, lat, flight_time
                            FROM example_geospatial.flights_northeast
                        ),
        SEARCH_ID_COLUMN => 1,
        SEARCH_X_COLUMN => 2,
        SEARCH_Y_COLUMN => 3,
        SEARCH_ORDER_COLUMN => 4,
        SEARCH_TIME_DISTANCE => '5m',
        SEARCH_XY_DISTANCE => '1km',
        SPATIAL_SOLUTION_TYPE => 1
    )
)

ST_TRACKINTERSECTS

The ST_TRACKINTERSECTS table function finds tracks pass through the given geofence(s). The geofence(s) use as the filter will be specified by the GEOFENCE_* parameters. The TRACK_* parameters specify the set of tracks to search through for any intersecting the geofence(s).

The result will include a record for each intersecting track & geofence pair, with:

a LINESTRING representing the full track intersecting a geofence
a LINESTRING representing the geofence it intersected

Note

This function is only available in SQL or in the native API via /execute/sql.

The basic form of the ST_TRACKINTERSECTS function follows.

ST_TRACKINTERSECTS Table Function Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
SELECT *
FROM TABLE
(
    ST_TRACKINTERSECTS
    (
        TRACK_TABLE => INPUT_TABLE(<table name | select statement>),
        TRACK_ID_COLUMN => < '<column name>' | <column position> >,
        TRACK_X_COLUMN => < '<column name>' | <column position> >,
        TRACK_Y_COLUMN => < '<column name>' | <column position> >,
        TRACK_ORDER_COLUMN => < '<column name>' | <column position> >,
        GEOFENCE_TABLE => INPUT_TABLE(<table name | select statement>),
        GEOFENCE_ID_COLUMN => < '<column name>' | <column position> >,
        GEOFENCE_WKT_COLUMN => < '<column name>' | <column position> >
    )
)

Parameters	Description
`TRACK_TABLE`	Name of the table to search for tracks intersecting the geofence(s) specified in the `SEARCH_` data set. To perform a search on the flights* table, pass the name of the table to `INPUT_TABLE`: INPUT_TABLE(flights) To perform a search on the result of a query, pass the query to `INPUT_TABLE`: INPUT_TABLE ( SELECT * FROM flights_west UNION SELECT * FROM flights_east )
`TRACK_ID_COLUMN`	Table to search track column, containing the unique identifier for the track to which each track point belongs.
`TRACK_X_COLUMN`	Table to search track column, containing the longitude value of each track point.
`TRACK_Y_COLUMN`	Table to search track column, containing the latitude value of each track point.
`TRACK_ORDER_COLUMN`	Table to search track column, by which the searched track points will be sorted in ascending order.
`GEOFENCE_TABLE`	Name of the geofence table, containing the WKT(s) to be used as the filter criteria when searching for intersecting tracks in the `TRACK_` data set. To search for tracks intersecting the geofence(s) from the flight_area_of_interest* table, pass the name of the geofence table to `INPUT_TABLE`: INPUT_TABLE(flight_area_of_interest) To search for tracks intersecting the geofence(s) from the result of a query, pass the query to `INPUT_TABLE`: INPUT_TABLE ( SELECT * FROM flight_area_of_interest_west UNION SELECT * FROM flight_area_of_interest_east )
`GEOFENCE_ID_COLUMN`	Geofence column, containing the unique identifier for the geofence.
`GEOFENCE_WKT_COLUMN`	Geofence column, containing the WKT bounds of the geofence.

To see the intersections between a set of flights and an area of interest:

ST_TRACKINTERSECTS Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
SELECT *
FROM TABLE
(
    ST_TRACKINTERSECTS
    (
        TRACK_TABLE =>         INPUT_TABLE(example_geospatial.flights),
        TRACK_ID_COLUMN =>     'id',
        TRACK_X_COLUMN =>      'lon',
        TRACK_Y_COLUMN =>      'lat',
        TRACK_ORDER_COLUMN =>  'flight_time',
        GEOFENCE_TABLE =>      INPUT_TABLE(example_geospatial.track_geofence),
        GEOFENCE_ID_COLUMN =>  'fence_name',
        GEOFENCE_WKT_COLUMN => 'fence_wkt'
    )
)
ORDER BY id, fence_name

H3 Functions

The functions below support various operations using the H3 geospatial indexing scheme.

Function	Description
`H3_CELLTOBOUNDARY(h3_index)`	Returns a POLYGON boundary box of the H3 index identified by the given `h3_index`. See H3 Geohashing for an example.
`H3_CELLTOCENTERCHILD(h3_index, res)`	Alias for `H3_CELLTOFIRSTCHILD`.
`H3_CELLTOCHILDN(h3_index, res, i)`	Returns the H3 index corresponding to the 0-based `i` th child at resolution `res` for the given `h3_index`. The value `i` should be less than the number of children returned from calling `H3_CELLTOCHILDRENSIZE(h3_index, res)`.
`H3_CELLTOCHILDPOS(h3_index, res)`	Returns the position of the given `h3_index` within an ordered list of the children of the cell's parent at resolution `res`. This is the inverse of `H3_CHILDPOSTOCELL` (`H3_CELLTOCHILDN`).
`H3_CELLTOCHILDRENSIZE(h3_index, res)`	Returns the number of child cells at resolution `res` for the given `h3_index`.
`H3_CELLTOFIRSTCHILD(h3_index, res)`	Returns the H3 index corresponding to the first child at resolution `res` for the given `h3_index`. This is equivalent to `H3_CELLTOCHILDN(h3_index,res,0)`.
`H3_CELLTOLASTCHILD(h3_index, res)`	Returns the H3 index corresponding to the last child at resolution `res` for the given `h3_index`. This is equivalent to `H3_CELLTOCHILDN(h3_index,res,H3_CELLTOCHILDRENSIZE(h3_index, res)-1)`.
`H3_CELLTOPARENT(h3_index, res)`	Returns the H3 index corresponding to the parent cell of the given `h3_index` at resolution `res`.
`H3_CELLTOXY(h3_index)`	Returns a WKT POINT corresponding to the centroid of the given `h3_index`.
`H3_CHILDPOSTOCELL(i, h3_index, res)`	Alias for `H3_CELLTOCHILDN(h3_index, res, i)`.
`H3_GEOMTOCELL(geom, res)`	Returns the H3 index, similar to a geohash, for the cell containing the centroid of the geometry `geom` with the given resolution `res`. The higher the resolution, the more precise the index is. The resolution `res` must be an integer between `0` and `15`. See H3 Geohashing for an example.
`H3_GETRESOLUTION(h3_index)`	Returns the resolution of the H3 index `h3_index`.
`H3_GRIDDISK(h3_index, k)`	Returns an array of H3 indexes within a given distance `k` from the provided H3 index `h3_index`.
`H3_GRIDDISKN(h3_index, k, i)`	Returns the `i` ^th H3 index within a given distance `k` from the provided H3 index `h3_index`. This function would typically be used in conjunction with `H3_NUMGRIDDISK` via iter-join. The value of `i` should be between `0` and the result of `H3_NUMGRIDDISK(h3_index, k)` - `1`.
`H3_GRIDRING(h3_index, k)`	Returns an array of H3 indexes comprising a hollow ring at a distance `k` from the provided H3 index `h3_index`.
`H3_H3TOSTRING(h3_index)`	Returns the string representation of the H3 index `h3_index`. Note This function is the inverse of `H3_STRINGTOH3`.
`H3_ISVALID(h3_index)`	Returns `1` (true) if the given H3 index `h3_index` is a valid H3 index value; otherwise returns `0` (false).
`H3_LATLNGTOCELL` `(latitude, longitude, res)`	Returns the H3 index, similar to a geohash, for the cell containing the `latitude` and `longitude` coordinate, with the given resolution `res`. The higher the resolution, the more precise the index is. The resolution `res` must be an integer between `0` and `15`. Equivalent to `H3_XYTOCELL(longitude, latitude, res)`.
`H3_NUMGRIDDISK(h3_index, k)`	Returns the number of cells at a distance of `k` from the provided H3 index `h3_index`. This function would typically be used in conjunction with `H3_GRIDDISKN` via iter-join.
`H3_NUMPOLYGONTOCELLS(geom, res)`	Returns the number of cells at the given resolution `res` that are within the given geometry `geom`. Only polygon geometries are supported. This function would typically be used in conjunction with `H3_POLYGONTOCELLSN` via iter-join.
`H3_POLYGONTOCELLS(geom, res)`	Returns an array of H3 indexes at the given resolution `res` that are within the given geometry `geom`. Only polygon geometries are supported.
`H3_POLYGONTOCELLSN(geom, res, i)`	Returns the `i` ^th H3 index at the given resolution `res` that is within the given geometry `geom`. Only polygon geometries are supported. This function would typically be used in conjunction with `H3_NUMPOLYGONTOCELLS` via iter-join. The value of `i` should be between `0` and the value returned from `H3_NUMPOLYGONTOCELLS(geom, res)` - `1`.
`H3_STRINGTOH3(h3_string)`	Returns the H3 index corresponding to the string representation `h3_string`. Note This function is the inverse of `H3_H3TOSTRING`.
`H3_XYTOCELL(x, y, res)`	Returns the H3 index, similar to a geohash, for the cell containing the `x` and `y` coordinate, with the given resolution `res`. The higher the resolution, the more precise the index is. The resolution `res` must be an integer between `0` and `15`. Equivalent to `H3_LATLNGTOCELL(y, x, res)`. See H3 Geohashing for an example.

Math Functions

Function Description

ABS(expr) Calculates the absolute value of expr

ACOS(expr) Returns the inverse cosine (arccosine) of expr as a double

ACOSF(expr) Returns the inverse cosine (arccosine) of expr as a float

ACOSH(expr) Returns the inverse hyperbolic cosine of expr as a double

ACOSHF(expr) Returns the inverse hyperbolic cosine of expr as a float

ASIN(expr) Returns the inverse sine (arcsine) of expr as a double

ASINF(expr) Returns the inverse sine (arcsine) of expr as a float

ASINH(expr) Returns the inverse hyperbolic sine of expr as a double

ASINHF(expr) Returns the inverse hyperbolic sine of expr as a float

ATAN(expr) Returns the inverse tangent (arctangent) of expr as a double

ATANF(expr) Returns the inverse tangent (arctangent) of expr as a float

ATANH(expr) Returns the inverse hyperbolic tangent of expr as a double

ATANHF(expr) Returns the inverse hyperbolic tangent of expr as a float

ATAN2(x, y) Returns the inverse tangent (arctangent) using two arguments as a double

ATAN2F(x, y) Returns the inverse tangent (arctangent) using two arguments as a float

ATN2(x, y) Alias for ATAN2

ATN2F(x, y) Alias for ATAN2F

CBRT(expr) Returns the cube root of expr as a double

CBRTF(expr) Returns the cube root of expr as a float

CEIL(expr) Alias for CEILING

CEILING(expr) Rounds expr up to the next highest integer

COS(expr) Returns the cosine of expr as a double

COSF(expr) Returns the cosine of expr as a float

COSH(expr) Returns the hyperbolic cosine of expr as a double

COSHF(expr) Returns the hyperbolic cosine of expr as a float

COT(expr) Returns the cotangent of expr as a double

COTF(expr) Returns the cotangent of expr as a float

DEGREES(expr) Returns the conversion of expr (in radians) to degrees as a double

DEGREESF(expr) Returns the conversion of expr (in radians) to degrees as a float

DIVZ(a, b, c) Returns the quotient a / b unless b == 0, in which case it returns c

EXP(expr) Returns e to the power of expr as a double

EXPF(expr) Returns e to the power of expr as a float

FLOOR(expr) Rounds expr down to the next lowest integer

GREATER(expr_a, expr_b) Returns whichever of expr_a and expr_b has the larger value, based on typed comparison

HYPOT(x, y) Returns the hypotenuse of x and y as a double

HYPOTF(x, y) Returns the hypotenuse of x and y as a float

IFERROR(expr, val) Alias for IF_ERROR(expr, val)

IFINF(expr, val) Alias for IF_INF(expr, val)

IFINFINITY(expr, val) Alias for IF_INF(expr, val)

IFNAN(expr, val) Alias for IF_NAN(expr, val)

IF_ERROR(expr, val)

Evaluates the given double or float expr, and if it resolves to infinity or NaN, return val

Tip

Conceptually, this function is the same as IF_INF(IF_NAN(expr, val), val)

Example:

Function Call	Result
`IF_ERROR((double(10)/0), 1)`	`1.0`
`IF_ERROR(log(-1), 1)`	`1.0`

IF_INF(expr, val)

Evaluates the given double or float expr, and if it resolves to infinity, return val

Example:

Function Call	Result
`IF_INF((double(10)/0), 999)`	`999.0`
`IF_INF(log(-1), 999)`	`NaN`

IF_INFINITY(expr, val) Alias for IF_INF(expr, val)

IF_NAN(expr, val)

Evaluates the given double or float expr, and if it resolves to NaN, return val

Example:

Function Call	Result
`IF_NAN((double(10)/0), -1)`	`Infinity`
`IF_NAN(log(-1), -1)`	`-1.0`

ISINFINITY(expr) Returns 1 (true) if expr is infinity by IEEE standard; otherwise, returns 0 (false)

IS_INFINITY(expr) Alias for ISINFINITY

ISNAN(expr) Returns 1 (true) if expr is not a number by IEEE standard; otherwise, returns 0 (false)

IS_NAN(expr) Alias for ISNAN

ISNUMERIC(expr) Returns 1 (true) if expr is a number by IEEE standard; otherwise, returns 0 (false)

IS_NUMERIC(expr) Alias for ISNUMERIC

LDEXP(x, exp) Returns the value of x * 2^exp as a double

LDEXPF(x, exp) Returns the value of x * 2^exp as a float

LESSER(expr_a, expr_b) Returns whichever of expr_a and expr_b has the smaller value, based on typed comparison

LN(expr) Returns the natural logarithm of expr as a double

LNF(expr) Returns the natural logarithm of expr as a float

LOG(expr) Alias for LN

LOG10(expr) Returns the base-10 logarithm of expr as a double

LOG10F(expr) Returns the base-10 logarithm of expr as a float

LOG1P(expr) Returns the natural logarithm of one plus expr as a double

LOG1PF(expr) Returns the natural logarithm of one plus expr as a float

LOG2(expr) Returns the binary (base-2) logarithm of expr as a double

LOG2F(expr) Returns the binary (base-2) logarithm of expr as a float

LOGF(expr) Alias for LNF

MAX_CONSECUTIVE_BITS(expr) Calculates the length of the longest series of consecutive 1 bits in the integer expr

MOD(dividend, divisor) Calculates the remainder after integer division of dividend by divisor

PI() Returns the value of pi

POW(base, exponent) Alias for POWER

POWF(base, exponent) Alias for POWERF

POWER(base, exponent) Returns base raised to the power of exponent as a double

POWERF(base, exponent) Returns base raised to the power of exponent as a float

RADIANS(expr) Returns the conversion of expr (in degrees) to radians as a double

RADIANSF(expr) Returns the conversion of expr (in degrees) to radians as a float

RAND([seed]) Returns a random floating-point value, with an optional seed

REGR_VALX(y, x) Returns NULL if y is NULL; otherwise, returns x

REGR_VALY(y, x) Returns NULL if x is NULL; otherwise, returns y

ROUND(expr[, scale])

Rounds expr to the nearest decimal number with scale decimal places when scale is a positive number; rounds to the nearest number such that the result has -(scale) zeros to the left of the decimal point when scale is negative; use scale of 0 to round to the nearest integer.

The default value of scale is 0.

Examples:

Function Call	Result
`ROUND(12345.678)`	`12346`
`ROUND(12345.678, 2)`	`12345.68`
`ROUND(12345.678, 0)`	`12346`
`ROUND(12345.678, -2)`	`12300`

SIGN(expr)

Determines whether a number is positive, negative, or zero; returns one of the following three values:

Expression Value	Result
positive	`1`
zero	`0`
negative	`-1`

SIN(expr) Returns the sine of expr as a double

SINF(expr) Returns the sine of expr as a float

SINH(expr) Returns the hyperbolic sine of expr as a double

SINHF(expr) Returns the hyperbolic sine of expr as a float

SQRT(expr) Returns the square root of expr as a double

SQRTF(expr) Returns the square root of expr as a float

TAN(expr) Returns the tangent of expr as a double

TANF(expr) Returns the tangent of expr as a float

TANH(expr) Returns the hyperbolic tangent of expr as a double

TANHF(expr) Returns the hyperbolic tangent of expr as a float

TRUNCATE(expr[, scale])

Rounds expr down to the nearest decimal number with scale decimal places when scale is a positive number; rounds down to the nearest number such that the result has -(scale) zeros to the left of the decimal point when scale is negative; use scale of 0 to round down to the nearest integer.

The default value of scale is 0.

Examples:

Function Call	Result
`TRUNCATE(12345.678)`	`12345`
`TRUNCATE(12345.678, 2)`	`12345.67`
`TRUNCATE(12345.678, 0)`	`12345`
`TRUNCATE(12345.678, -2)`	`12300`

WIDTH_BUCKET(expr, min, max, count)

Defines a set of count equal intervals (buckets) within the range of min & max, and puts the value of expr into one of those buckets, where the value is greater than or equal to the minimum value of the bucket and less than the maximum value of the bucket. Returns the 1-based number of the bucket into which the value of expr fell. For values smaller than min, 0 is returned; for values greater than or equal to max, count + 1 is returned. Examples:

In the following examples, a set of 5 equal buckets are defined between 0 and 10 (0-2, 2-4, 4-6, 6-8, & 8-10), and various values are bucketed using that set.

Function Call	Result
`WIDTH_BUCKET(-1, 0, 10, 5)`	`0`
`WIDTH_BUCKET(0, 0, 10, 5)`	`1`
`WIDTH_BUCKET(5, 0, 10, 5)`	`3`
`WIDTH_BUCKET(10, 0, 10, 5)`	`6`
`WIDTH_BUCKET(11, 0, 10, 5)`	`6`

Null Functions

Some of the following null functions require parameters to be of convertible data types. Note that limited-width (charN) & unlimited-width (non-charN) string types are not convertible.

Function	Description
`COALESCE(expr_a, ..., expr_N)`	Returns the value of the first expression that is not null starting with `expr_a` and ending with `expr_N`. If all are null, then null is returned. All expressions should be of the same or convertible data type.
`IFNULL(expr_a, expr_b)`	Returns `expr_a` if it is not null; otherwise, returns `expr_b`. Both should be of the same or convertible data type. See Short-Circuiting for error-checking details.
`IS_NULL(expr)`	Returns `1` if `expr` is null; otherwise, returns `0`
`ISNULL(expr)`	Alias for `IS_NULL(expr)`
`NULLIF(expr_a, expr_b)`	Returns null if `expr_a` equals `expr_b`; otherwise, returns the value of `expr_a`; both expressions should be of the same or convertible data type.
`NVL(expr_a, expr_b)`	Alias for `IFNULL`
`NVL2(expr, value_if_not_null, value_if_null)`	Evaluates `expr`: if not null, returns `value_if_not_null`; if null, returns `value_if_null`. Both `value_if_not_null` & `value_if_null` should be of the same data type as `expr` or implicitly convertible; see Short-Circuiting for error-checking details.
`REMOVE_NULLABLE(expr)`	Alias for `ZEROIFNULL`
`ZEROIFNULL(expr)`	Replaces null values with appropriate values based on the column type (e.g., `0` if numeric column, an empty string if charN column, etc.). Also removes the `nullable` column property if used to calculate a derived column.

Record Distribution Functions

Record distribution functions provide a means to locate any database record within the memory processing hierarchy; and through aggregating the results, determine the distribution of records across the cluster.

Function	Description
`KI_CHUNK()`	Returns the index of the chunk within the rank/TOM containing this record. This value may change as data is added or deleted from a table.
`KI_CHUNK_ID()`	Returns the ID of the chunk within the rank/TOM containing this record. This value should remain the same even after system restarts.
`KI_CHUNK_MAX(col)`	Returns the maximum value for the specified column in its chunk. If the data has changed, this value may be higher than any actual values.
`KI_CHUNK_MIN(col)`	Returns the minimum value for the specified column in its chunk. If the data has changed, this value may be lower than any actual values.
`KI_CHUNK_OFFSET()`	Returns the index of the record within the chunk.
`KI_PARTITION()`	Returns the index of the partition within the rank/TOM containing this record, for partitioned tables.
`KI_RANK()`	Returns the index of the rank containing this record.
`KI_TOM()`	Returns the index of the TOM within the rank containing this record.

For example, the following query can be used to determine the distribution of records of the example.stocks table across the nodes of the cluster.

Non-Partitioned Distribution
1
2
3
4
5
6
SELECT
    KI_RANK() AS "Rank",
    COUNT(*) AS "TotalRecords"
FROM example.stocks
GROUP BY 1
ORDER BY 1
Partitioned Distribution
1
2
3
4
5
6
7
SELECT
    KI_RANK() AS "Rank",
    KI_PARTITION() AS "Partition",
    COUNT(*) AS "TotalRecords"
FROM example.stocks
GROUP BY 1, 2
ORDER BY 1, 2
Multi-TOM Distribution
1
2
3
4
5
6
7
SELECT
    KI_RANK() AS "Rank",
    KI_TOM() AS "TOM",
    COUNT(*) AS "TotalRecords"
FROM example.stocks
GROUP BY 1, 2
ORDER BY 1, 2
Chunk-Level Distribution
1
2
3
4
5
6
7
SELECT
    KI_RANK() AS "Rank",
    KI_CHUNK() AS "Chunk",
    COUNT(*) AS "TotalRecords"
FROM example.stocks
GROUP BY 1, 2
ORDER BY 1, 2

For partitioned tables, KI_PARTITION() can be added to the query to also the distribution across partitions, and for clusters with more than 1 TOM per rank (1 TOM/rank is the default), KI_TOM() can be added. To see chunk-level distribution, use KI_CHUNK().

String Functions

There are three different types of string functions:

Scalar Functions

Important

String columns are stored as byte arrays, but allow multi-byte characters, as they are UTF-8 encoded. Some functions may behave in unexpected ways when given multi-byte input.

Function Description

ASCII(expr) Returns the ASCII code for the first byte in expr

BIN(expr[, minimum_digits])

Convert the value represented by expr into a binary string representation. Use optional minimum_digits to add leading 0s when needed; defaults to 1.

Examples:

Function Call	Result
`BIN(42)`	`'101010'`
`BIN(42, 8)`	`'00101010'`

CHAR(expr) The character represented by the standard ASCII code expr in the range [ 0 - 127 ]

CONCAT(expr_a, expr_b)

Performs a string concatenation of expr_a & expr_b; use nested CONCAT calls to concatenate more than two strings

Note

The resulting field size of any CONCAT will be a VARCHAR(N) field big enough to hold the concatenated fields, e.g., concatenating a VARCHAR(32) column and a VARCHAR(64) column will result in a VARCHAR(128) column. Columns of type VARCHAR(256) used with CONCAT will result in a VARCHAR(256) column, truncated at 256 characters.

Tip

The || operator can be used in place of CONCAT to perform string concatenation; e.g., the following two expressions are equivalent:

expr_a || expr_b || expr_c

CONCAT(CONCAT(expr_a, expr_b), expr_c)

Examples:

Function Call	Result
`CONCAT(CONCAT('ABC','123'),'!')`	`ABC123!`
`'ABC' \|\| '123' \|\| '!'`	`ABC123!`

CONCAT_TRUNCATE(expr_a, expr_b)

Returns the concatenation of expr_a and expr_b, truncated at the maximum size of the larger of expr_a and expr_b. For data columns, the size is explicit; for string constants, the size will be the smallest charN type that can hold the constant string.

Note

CONCAT_TRUNCATE will not work if any parameter is an unrestricted-width string. Use STRING() to cast non-string data types to strings for use in this function.

Examples:

Function Call	Result
`CONCAT_TRUNCATE('ABC123','!')`	`ABC123!`
`CONCAT_TRUNCATE('AB','CDE')`	`ABCD` (char4 is the minimum size required to hold the `CDE` value, so the result is truncated at 4 characters)
`CONCAT_TRUNCATE('ABCD1234','DEFG')`	`ABCD1234` ( `VARCHAR(8)` is the minimum size required to hold the `ABCD1234` value, so no characters can be appended)
`CONCAT_TRUNCATE('ABC', STRING(123))`	`ABC123`

CONTAINS(match, expr) Returns 1 if expr contains match by string-literal comparison; otherwise, returns 0

CONV(expr, from_base, to_base)

Convert the value represented by the expr string from one numeric base to another. Base parameters may be [2 - 36]. Binary, octal and hexadecimal strings may have an optional prefix.

Examples:

Function Call	Result
`CONV('42', 10, 16)`	`'2A'`
`CONV('2a', 16, 10)`	`'42'`
`CONV('11', 2, 10)`	`'3'`
`CONV('0o52', 8, 2)`	`'101010'`

DEC2HEX(expr[, minimum_digits]) Alias for HEX

DIFFERENCE(expr_a, expr_b) Returns a value between 0 and 4 that represents the difference between the sounds of expr_a and expr_b based on the SOUNDEX() value of the strings--a value of 4 is the best possible sound match

EDIT_DISTANCE(expr_a, expr_b) Returns the Levenshtein edit distance between expr_a and expr_b; the lower the value, the more similar the two strings are

ENDS_WITH(match, expr) Returns 1 if expr ends with match by string-literal comparison; otherwise, returns 0

FROM_HEX(expr) Alias for UNHEX

HEX(expr[, minimum_digits])

Convert the value represented by expr into a hexadecimal string representation. Use optional minimum_digits to add leading 0s when needed; defaults to 1.

Examples:

Function Call	Result
`HEX(42)`	`'2A'`
`HEX(42, 8)`	`'0000002A'`

HEX2DEC(expr) Alias for UNHEX

INITCAP(expr) Returns expr with the first letter of each word in uppercase

IPV4_PART(expr, part_num)

Returns the octet of the IP address given in expr at the position specified by part_num. Valid part_num values are constants from 1 to 4.

Examples:

Function Call	Result
`IPV4_PART('12.34.56.78', 1)`	`12`
`IPV4_PART('12.34.56.78', 4)`	`78`

IS_IPV4(expr) Returns 1 if expr is an IPV4 address; returns 0 otherwise

ISIPV4(expr) Alias for IS_IPV4

LCASE(expr) Converts expr to lowercase

LEFT(expr, num_bytes) Returns the leftmost num_bytes bytes from expr

LEN(expr) Alias for LENGTH

LENGTH(expr) Returns the number of characters in expr

LOCATE(match, expr[, start_pos]) Returns the starting position of the first match of match in expr, starting from position 1 or start_pos (if specified). If match can't be found or start_pos is outside the range of letters in expr, a 0 is returned.

LOWER(expr) Alias for LCASE

LPAD(expr, length, pad)

Left pads the given expr string with the pad string to the given length of bytes. If base_expr is longer than length, the return value is shortened to length bytes. If length is larger than 256, it will be truncated to 256 bytes.

Caution!

The use of multi-byte characters in this function may have unexpected results.

Examples:

Function Call	Result
`LPAD('test', 9, 'pad')`	`padpatest`
`LPAD('test', 3, 'pad')`	`tes`

LTRIM(expr) Removes whitespace from the left side of expr

OCT(expr[, minimum_digits])

Convert the value represented by expr into an octal (base 8) string representation. Use optional minimum_digits to add leading 0s when needed; defaults to 1.

Examples:

Function Call	Result
`OCT(42)`	`'52'`
`OCT(42, 8)`	`'00000052'`

POSITION(match IN expr) Returns the starting position of the first match of match in expr, starting from position 1. If match can't be found or start_pos is outside the range of letters in expr, a 0 is returned.

REGEXP_COUNT

(expr, regex[, position [, mode]])

Returns a count of the number of times the regex pattern is matched in expr. Matches do not overlap, so the start of a future match must start after the end of the previous match.

The regex parameter is the regular expression to try to match. It must be a string-literal with 256 characters or fewer. Generally, POSIX-compliant regular expressions are supported for regex. The escape character used to match wildcards in the expr literally is \. See REGEXP_LIKE for more regular expression examples.

The optional position parameter specifies where to start searching in expr for the first match. The first character in the string has a position of 1 (the default).

The optional mode parameter is a string which can be empty (the default) for the default behavior. See REGEXP_LIKE for the list of supported mode flags.

Examples of REGEXP_COUNT:

Function Call	Result
`REGEXP_COUNT('abababab', 'abab')`	2
`REGEXP_COUNT('abababab', 'abab', 2)`	1
`REGEXP_COUNT('don''t', 'DON''T', 1, 'i')`	1

REGEXP_INSTR

(expr, regex [, position [, occurrence

[, begin_end [, mode [,group]]]]])

Returns the starting position (1-based) in expr where a regex match is found. If no match is found, 0 is returned.

The optional position parameter specifies where to start searching in expr for the first match. The first character in the string has a position of 1 (the default).

The optional occurrence parameter specifies which occurrence of the regex match is desired. For example, 2 would return the second occurrence of regex in expr. Matches do not overlap, so the start of a future match must start after the end of the previous match.

The optional begin_end parameter specifies if the beginning or ending position is desired. Use 0 (the default) for the beginning of the match and use 1 for the position after the end of the match.

The optional mode parameter is a string which can be empty (the default) for the default behavior. See REGEXP_LIKE for the list of supported mode flags.

The optional group parameter specifies which regular expression group's (i.e., parentheses inside regex) beginning/ending position to return. The default of 0 uses the entire matched expression, while a group of 1 through 9 corresponds to the 1st group up through the 9th group of the match.

Examples of REGEXP_INSTR:

Function Call	Result
`REGEXP_INSTR('abcdefg', 'bc')`	2
`REGEXP_INSTR('abcdefg', 'ab', 2)`	0
`REGEXP_INSTR('abcabc', 'a', 1, 2)`	4
`REGEXP_INSTR('abcabc', 'abc', 1, 1, 1)`	4
`REGEXP_INSTR('abcabc', 'B C', 1, 1, 0, 'ix')`	2
`REGEXP_INSTR('abcdefg', '(C(.(.)))', 1, 1, 0, 'i', 3)`	5

REGEXP_LIKE(expr, regex[, mode])

Returns whether expr matches the given regex. Generally, POSIX-compliant regular expressions are supported.

The optional mode parameter is a string which can be empty (the default) for the default behavior. It can contain the following letters for the associated optional modified behaviors:

Letter	Meaning
`i`	Case-insensitive matches
`m`	Treat input as multiple lines so `^` and `$` match around newlines and not just the beginning and ending of the string
`n`	Allows `.` to also match a newline character
`x`	Ignore any whitespace (e.g., spaces) in `regex`

The escape character used to match wildcards in the expr literally is \.

Note

The regex can match the expr partially. To perform full matches, ^ and $ can be used to match the start and end of expr, respectively.

Examples of successful matches:

Function Call	Match Type
`REGEXP_LIKE('partial', 'part')`	Partial
`REGEXP_LIKE('Case', 'cAsE', 'i')`	Case-insensitive
`REGEXP_LIKE('dot', 'd.t')`	Any character
`REGEXP_LIKE('range', 'ra[a-z]ge')`	Character range
`REGEXP_LIKE('zeroorone', 'z?zer(oor)?one')`	0 or 1 token
`REGEXP_LIKE('zeroormore', 'zzer([om]or)e')`	0 or more tokens
`REGEXP_LIKE('oneormore', 'o+n([em]or)+e')`	1 or more tokens
`REGEXP_LIKE('A to Z', '^A.*Z$')`	Begin/End
`REGEXP_LIKE('41ph4Num', '^[[:alnum:]]+$')`	Character class
`REGEXP_LIKE('Escape?', 'Escape\?')`	Escape wildcard

REGEXP_MATCH(expr, regex[, options]) Alias for REGEXP_LIKE

REGEXP_REPLACE

(expr, regex [, replace [, position

[, occurrence [, mode]]]])

Returns the expr string after replacing regex matches with the replace string parameter.

The replace parameter is the optional text with which to replace each match of regex in expr. The default is an empty string, which will simply remove the specified occurrences of regex in expr. A replace of \0 will insert the entire matched expression, while a replace of \1 through \9 will use the corresponding matched grouping (parentheses inside regex) as the replacement text.

The optional position parameter specifies where to start searching in expr for the first match. The first character in the string has a position of 1 (the default).

The optional occurrence parameter specifies which occurrence of the regex match to replace. For example, 2 would only replace the second occurrence of regex in expr. Matches do not overlap, so the start of a future match must start after the end of the previous match. Use 0 (the default) to replace all occurrences.

The optional mode parameter is a string which can be empty (the default) for the default behavior. See REGEXP_LIKE for the list of supported mode flags.

Examples of REGEXP_REPLACE:

Function Call	Result
`REGEXP_REPLACE('abc', 'b')`	ac
`REGEXP_REPLACE('abc', 'b.', 'x')`	ax
`REGEXP_REPLACE('abcd', '.', 'x', 3, 1)`	abxd
`REGEXP_REPLACE('abcd', '(b(.))', '-\2-')`	a-c-d

REGEXP_SUBSTR

(expr, regex [, position [, occurrence

[, mode [,group]]]])

Returns the portion of the expr string that matched regex. An empty string is returned if no match is found.

The optional position parameter specifies where to start searching in expr for the first match. The first character in the string has a position of 1 (the default).

The optional mode parameter is a string which can be empty (the default) for the default behavior. See REGEXP_LIKE for the list of supported mode flags.

The optional group parameter specifies which regular expression grouping (i.e., parentheses inside regex) to use. The default of 0 uses the entire matched expression, while a group of 1 through 9 corresponds to the 1st group up through the 9th group of the match.

Examples of REGEXP_SUBSTR:

Function Call	Result
`REGEXP_SUBSTR('abcdefg', 'b.')`	bc
`REGEXP_SUBSTR('abcadeafghij', 'a...', 2, 1)`	adea
`REGEXP_SUBSTR('abcadeafghij', 'a...', 1, 2)`	afgh
`REGEXP_SUBSTR('abcdefg', '(C(.(.)))', 1, 1, 'i', 3)`	e

REPLACE(expr, match, repl) Replaces every occurrence of match in expr with repl

REPLACE_CHAR(expr, match, repl) Replaces every occurrence of the single-byte character match in expr with the single-byte character repl

REPLACE_TRUNCATE(expr, match, repl)

Replaces every occurrence of match in expr with repl, and then truncates the resulting string at 256 bytes if it is longer than that

Note

REPLACE_TRUNCATE will not work if any parameter is an unrestricted-width string.

Caution!

The use of multi-byte characters in this function may have unexpected results.

REVERSE(expr)

Returns expr with the order of bytes reversed.

Caution!

The use of multi-byte characters in this function may have unexpected results.

Examples:

Function Call	Result
`REVERSE('Reverse')`	`esreveR`
`REVERSE('Was it a bat I saw?')`	`?was I tab a ti saW`

RIGHT(expr, num_bytes) Returns the rightmost num_bytes bytes from expr

RPAD(expr, length, pad)

Right pads the given expr string with the pad string to the given length of bytes. If expr is longer than length, the return value is shortened to length bytes. If length is larger than 256, it will be truncated to 256 bytes.

Caution!

The use of multi-byte characters in this function may have unexpected results.

Examples:

Function Call	Result
`RPAD('test', 9, 'pad')`	`testpadpa`
`RPAD('test', 3, 'pad')`	`tes`

RTRIM(expr) Removes whitespace from the right side of expr

SOUNDEX(expr)

Returns a soundex value from expr. Only the first word in the string will be considered in the calculation.

Note

This is the algorithm used by most programming languages.

SPACE(n) Returns a string consisting of n space characters. The value of n can only be within the range of 0-256.

SPLIT(expr, delim, group_num)

Splits expr into groups delimited by the delim single-byte character and returns the group_num split group. If group_num is positive, groups will be counted from the beginning of expr; if negative, groups will be counted from the end of expr going backwards. Two consecutive delimiters will result in an empty string being added to the list of selectable groups. If no instances of delim exist in expr, the entire string is available at group 1 (and -1). Group 0 returns nothing.

Examples:

Function Call	Result
`SPLIT('apple', 'p', 1)`	`a`
`SPLIT('apple', 'p', 2)`	<empty string>
`SPLIT('apple', 'p', -1)`	`le`

STARTS_WITH(match, expr) Returns 1 if expr starts with match by string-literal comparison; otherwise, returns 0

STRCMP(expr_a, expr_b)

Compares expr_a to expr_b in a lexicographical sort

Situation	Result
`expr_a` and `expr_b` are the same	`0`
`expr_a` comes before `expr_b`, lexicographically	`-1`
`expr_a` comes after `expr_b`, lexicographically	`1`

SUBSTR(expr, start_pos[, num_chars]) Alias for SUBSTRING

SUBSTRING(expr, start_pos[, num_bytes])

Returns num_bytes bytes from the expr, starting at the 1-based start_pos byte. If num_bytes is not specified, all bytes after start_pos will be returned.

Caution!

The use of multi-byte characters in this function may have unexpected results.

Examples:

Function Call	Result
`SUBSTRING('banana', 3)`	`nana`
`SUBSTRING('banana', 3, 2)`	`na`

TO_HEX(expr[, minimum_digits]) Alias for HEX

TRIM(expr) Removes whitespace from both sides of expr

UCASE(expr) Converts expr to uppercase

UNHEX(expr)

Convert the hexadecimal string expr into a (decimal) number.

Examples:

Function Call	Result
`UNHEX('42')`	`66`
`UNHEX('1A')`	`26`
`UNHEX('1a')`	`26`

UPPER(expr) Alias for UCASE

LIKE

Simple pattern-matching capability is provided through the use of the LIKE clause, for fixed-width string columns (VARCHAR(1) - VARCHAR(256)).

LIKE Syntax
1
<expr> [NOT] LIKE <match> [ESCAPE '<esc_char>']

This clause matches records where reference expression expr does (or does NOT) match the string value of match. The match is a string literal one, with the following exceptions:

% matches any string of 0 or more characters
_ matches any single character
[<character_set>] matches any single character listed in the set. The character_set may contain ranges using a dash (-). A character_set starting with ^ will match any character not in the specified set.

By default, there is no escape character. Use the ESCAPE clause to specify an escape character. The escape character is used to match special characters in the expr literally. For example, ~% will match the % character if ~ is the escape character.

To search for employees whose last name starts with C, ends with a, and has exactly five letters:

LIKE Single Character Match Example
1
2
3
SELECT *
FROM example.employee
WHERE last_name LIKE 'C___a'

To search for employees whose first name does not start with Brook:

LIKE Multiple Character Match Example
1
2
3
SELECT *
FROM example.employee
WHERE first_name NOT LIKE 'Brook%'

To search for employees whose first name has a capital letter after the first letter:

LIKE Character Range Match Example
1
2
3
SELECT *
FROM example.employee
WHERE first_name LIKE '%_[A-Z]%'

FILTER_BY_STRING

The FILTER_BY_STRING table function allows for several types of pattern matching for fixed-width string columns (VARCHAR(1) - VARCHAR(256)), as well as full text search capability for all string columns with the TEXT_SEARCH column property. Its features are based on the /filter/bystring endpoint.

The basic form of the FILTER_BY_STRING function when called in a SELECT statement follows.

FILTER_BY_STRING Table Function Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT *
FROM TABLE
(
    FILTER_BY_STRING
    (
        TABLE_NAME => INPUT_TABLE(<table name | select statement>),
        [COLUMN_NAMES => '<column list>',]
        MODE => '<filter type>',
        EXPRESSION => '<filter expression>',
        [OPTIONS => KV_PAIRS('<option name>' = '<option value>'[,...])]
    )
)

The basic form of the FILTER_BY_STRING function when called in an EXECUTE FUNCTION statement follows. When called this way, the results are saved in the specified view.

FILTER_BY_STRING Execute Function Syntax
1
2
3
4
5
6
7
8
9
EXECUTE FUNCTION FILTER_BY_STRING
(
    TABLE_NAME => INPUT_TABLE(<table name | select statement>),
    VIEW_NAME => '[<schema name>.]<view name>',
    [COLUMN_NAMES => '<column list>',]
    MODE => '<filter type>',
    EXPRESSION => '<filter expression>',
    [OPTIONS => KV_PAIRS('<option name>' = '<option value>'[,...])]
)

Parameters Description

TABLE_NAME

The data set to filter, which should be passed via query to the INPUT_TABLE function.

To perform a string filter on a column in the customer table, pass the name of the table to INPUT_TABLE:

INPUT_TABLE(customer)

To perform a string filter on a column in the result of a query, pass the query to INPUT_TABLE:

INPUT_TABLE(SELECT * FROM customer_west UNION SELECT * FROM customer_east)

VIEW_NAME

Name of the view in which the results are to be stored, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria

Important

This parameter is only applicable when using EXECUTE FUNCTION syntax.

COLUMN_NAMES

Comma-separated list of columns to which the filter will be applied.

Important

Omit this parameter when using the search filter MODE.

MODE

Type of string filter to apply:

Mode	Description
`contains`	Partial substring match (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
`equals`	Exact whole-string match (accelerated).
`regex`	Full regular expression search (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
`search`	Full text search with wildcards and boolean operators, for string columns (`VARCHAR`) that have the `TEXT_SEARCH` column property applied. See Full Text Search for syntax & grammar detail. Important Omit the `COLUMN_NAMES` parameter when using this `MODE`. All text-searchable string columns will be searched. Also, only a table name can be used for `TABLE_NAME`--a query cannot be used.
`starts_with`	Match strings that start with the given expression (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.

EXPRESSION String literal or pattern, depending on MODE selected, to use in filtering string columns

OPTIONS

Optional list of string filtering options, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.:

KV_PAIRS(case_sensitive = 'false')

Option	Description
`case_sensitive`	If true, the filter will be case-sensitive; if false, case-insensitive. Note Not applicable when `MODE` is `search`.

To see the message text & time for events in the event_log table containing the word ERROR:

FILTER_BY_STRING Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT *
FROM TABLE
(
    FILTER_BY_STRING
    (
        TABLE_NAME => INPUT_TABLE(SELECT event_time, message FROM example.event_log),
        COLUMN_NAMES => 'message',
        MODE => 'contains',
        EXPRESSION => 'ERROR'
    )
)

User/Security Functions

Function Description

CURRENT_SCHEMA() Returns the default schema of the current user

CURRENT_USER() Alias for USER

HASH(column[, seed]) Returns a non-negative integer representing an obfuscated version of column, using the given seed; default seed is 0

IS_MEMBER(role[, user/role])

Returns whether the current user (or the given user/role, if specified) has been assigned the given role, either directly or indirectly:

Situation	Result
Current user (or given `user/role`) has been granted `role`	`true`
Current user (or given `user/role`) has not been granted `role`	`false`
Role `role` does not exist	null

IS_ROLEMEMBER(role[, user/role]) Alias for IS_MEMBER

MASK(expr, start, length[, char])

Masks length bytes of expr, beginning at the byte position identified by start, with * characters (or the single-byte character specified in char):

Caution!

The use of multi-byte characters in this function may have unexpected results.

Function Call	Result
`MASK('Characters', 4, 5)`	`Cha*****rs`
`MASK('Characters', 5, 2, '#')`	`Char##ters`

NEW_UUID() Returns a randomly-generated UUID

OBFUSCATE(column[, seed]) Alias for HASH

SHA256(expr) Returns the hex digits of the SHA-256 hash of the given value expr as a char64 string.

SYSTEM_USER() Alias for USER

USER() Returns the username of the current user

Aggregation Functions

Function	Description
`APPROX_COUNT_DISTINCT(expr)`	The approximate number of distinct values of `expr`; this is faster to calculate than `COUNT(DISTINCT expr)` but is only an approximation
`APPROX_MEDIAN(expr)`	The approximate median of `expr`; the result should be within about 2% of the true median value. This is equivalent to issuing `APPROX_PERCENTILE(expr, 50)`.
`APPROX_PERCENTILE(expr, p)`	The approximate pth percentile of `expr`; `p` should be a value between 0.0 and 100.0. `APPROX_PERCENTILE(expr, 50)` will return the approximate median of `expr`.
`ARG_MAX(agg_expr, ret_expr)`	The value of `ret_expr` where `agg_expr` is the maximum value (e.g. `ARG_MAX(cost, product_id)` returns the product ID of the highest cost product). `ARG_MAX(a, b)` is equivalent to `LAST(b, a)`.
`ARG_MIN(agg_expr, ret_expr)`	The value of `ret_expr` where `agg_expr` is the minimum value (e.g. `ARG_MIN(cost, product_id)` returns the product ID of the lowest cost product). `ARG_MIN(a, b)` is equivalent to `FIRST(b, a)`.
`ATTR(expr)`	If `MIN(expr)` = `MAX(expr)`, returns `expr`; otherwise `*` Note `expr` must resolve to a string type, and must be casted to one, if not already
`AVG(expr)`	Calculates the average value of `expr`
`CORR(expr1, expr2)`	Calculates the correlation coefficient of `expr1` and `expr2`
`CORRELATION(expr1, expr2)`	Alias for `CORR`
`CORRCOEF(expr1, expr2)`	Alias for `CORR`
`COUNT(*)`	Returns the number of records in a table
`COUNT(expr)`	Returns the number of non-null data values in `expr`
`COUNT(DISTINCT expr)`	Returns the number of distinct non-null data values in `expr`
`COV(expr1, expr2)`	Alias for `COVAR_POP`
`COVAR(expr1, expr2)`	Alias for `COVAR_POP`
`COVARIANCE(expr1, expr2)`	Alias for `COVAR_POP`
`COVAR_POP(expr1, expr2)`	Calculates the population covariance of `expr1` and `expr2`
`COVAR_SAMP(expr1, expr2)`	Calculates the sample covariance of `expr1` and `expr2`
`FIRST(ret_expr, agg_expr)`	The value of `ret_expr` where `agg_expr` is the minimum value (e.g. `FIRST(product_id, cost)` returns the product ID ID of the lowest cost product). `FIRST(a, b)` is equivalent to `ARG_MIN(b,a)`
`GROUPING(expr)`	Used primarily with ROLLUP, CUBE, and GROUPING SETS, to distinguish the source of null values in an aggregated result set, returns whether `expr` is part of the aggregation set used to calculate the values in a given result set row. Returns `0` if `expr` is part of the row's aggregation set, `1` if `expr` is not (meaning that aggregation took place across all `expr` values). For example, in a `ROLLUP(A)` operation, there will be two potential rows with null in the result set for column `A`. One row will contain null values of `A` aggregated together, and the other will contain null, but be an aggregation over the entire table, irrespective of `A` values. In this case, `GROUPING(A)` will return `0` for the null values of `A` aggregated together (as well as all other grouped `A` values) and `1` for the row resulting from aggregating across all `A` values.
`KURT(expr)`	Alias for `KURTOSIS_POP`
`KURTOSIS(expr)`	Alias for `KURTOSIS_POP`
`KURTOSIS_POP(expr)`	Calculate the population kurtosis of `expr`
`KURTOSIS_SAMP(expr)`	Calculate the sample kurtosis of `expr`
`KURT_POP(expr)`	Alias for `KURTOSIS_POP`
`KURT_SAMP(expr)`	Alias for `KURTOSIS_SAMP`
`LAST(ret_expr, agg_expr)`	The value of `ret_expr` where `agg_expr` is the maximum value (e.g. `LAST(product_id, cost)` returns the product ID of the highest cost product). `LAST(a, b)` is equivalent to `ARG_MAX(b, a)`.
`MAX(expr)`	Finds the maximum value of `expr`
`MEAN(expr)`	Alias for `AVG`
`MIN(expr)`	Finds the minimum value of `expr`
`PRODUCT(expr)`	Multiplies all the values of `expr`
`REGR_AVGX(y, x)`	Average of the independent variable (`SUM(x)/N`) of the line determined by computing a least-squares-fit linear regression over the given (X, Y) pairs
`REGR_AVGY(y, x)`	Average of the dependent variable (`SUM(y)/N`) of the line determined by computing a least-squares-fit linear regression over the given (X, Y) pairs
`REGR_COUNT(y, x)`	Number of input rows used in computing a linear regression, where both expressions are non-null
`REGR_INTERCEPT(y, x)`	Y-intercept of the line determined by computing a least-squares-fit linear regression over the given (X, Y) pairs
`REGR_R2(y, x)`	Square of the correlation coefficient, marking how well the least-squares-fit linear regression fit the data set
`REGR_SLOPE(y, x)`	Slope of the line determined by computing a least-squares-fit linear regression over the given (X, Y) pairs
`REGR_SXX(y, x)`	"Sum of squares" of the independent variable (`SUM(x^2) - SUM(x)^2/N`) of the line determined by computing a least-squares-fit linear regression over the given (X, Y) pairs
`REGR_SXY(y, x)`	"Sum of Products" of independent variable times dependent variable (`SUM(x * y) - SUM(x) * SUM(y)/N`) of the line determined by computing a least-squares-fit linear regression over the given (X, Y) pairs
`REGR_SYY(y, x)`	"Sum of squares" of the dependent variable (`SUM(y^2) - SUM(y)^2/N`) of the line determined by computing a least-squares-fit linear regression over the given (X, Y) pairs
`SKEW(expr)`	Alias for `SKEWNESS_POP`
`SKEWNESS(expr)`	Alias for `SKEWNESS_POP`
`SKEWNESS_POP(expr)`	Calculate the population skew of `expr`
`SKEWNESS_SAMP(expr)`	Calculate the sample skew of `expr`
`SKEW_POP(expr)`	Alias for `SKEWNESS_POP`
`SKEW_SAMP(expr)`	Alias for `SKEWNESS_SAMP`
`STDDEV(expr)`	Alias for `STDDEV_POP`
`STDDEV_POP(expr)`	Calculates the population standard deviation of the values of `expr`
`STDDEV_SAMP(expr)`	Calculates the sample standard deviation of the values of `expr`
`SUM(expr)`	Sums all the values of `expr`
`VAR(expr)`	Alias for `VAR_POP`
`VAR_POP(expr)`	Calculates the population variance of the values of `expr`
`VAR_SAMP(expr)`	Calculates the sample variance of the values of `expr`
`VARIANCE(expr)`	Alias for `VAR`
`VARIANCE_POP(expr)`	Alias for `VAR_POP`
`VARIANCE_SAMP(expr)`	Alias for `VAR_SAMP`

Grouping Functions

Function	Description
`ROLLUP(expr)`	Calculates n + 1 aggregates for n number of columns in `expr`
`CUBE(expr)`	Calculates 2ⁿ aggregates for n number of columns in `expr`
`GROUPING SETS(expr)`	Calculates aggregates for any given aggregates in `expr`, including `ROLLUP()` and `CUBE()`

ML Functions

There are two supported machine learning functions:

PREDICT

The PREDICT table function will predict the values of the dependent variables that correspond to a given column of independent variables, using a given base table containing "historical" values of each. This table will be used as the basis to calculate the prediction.

To make the prediction, the slope & y-intercept of the least-squares-fit linear equation of the base table data will be calculated. Then, that line will be used to calculate the dependent variable for each given independent variable, and the values of each will be returned in the result set as Y and X, respectively.

The basic form of the PREDICT function, called within a SELECT statement follows.

PREDICT Table Function Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT x, y FROM TABLE
(
    PREDICT
    (
        HISTORY_TABLE     => INPUT_TABLE(<table name | select statement>),
        X_COLUMN          => < '<column name>' | <column position> >,
        Y_COLUMN          => < '<column name>' | <column position> >,
        PREDICT_ON_TABLE  => INPUT_TABLE(<table name | select statement>),
        PREDICT_ON_COLUMN => < '<column name>' | <column position> >,
        [PREDICT_METHOD   => 'LINEAR']
    )
)

Parameters	Description
`HISTORY_TABLE`	The name of the table containing the "historical" data upon which the prediction will be made. To make a prediction from the data based on the ticket_prices table, pass the name of the table to `INPUT_TABLE`: INPUT_TABLE(ticket_prices) To make a prediction from data that is the result of a query, pass the query to `INPUT_TABLE`: INPUT_TABLE ( SELECT * FROM ticket_prices_local UNION SELECT * FROM ticket_prices_global )
`X_COLUMN`	The name or position of the column in `HISTORY_TABLE` containing the independent variable that will be used as the basis for the prediction.
`Y_COLUMN`	The name or position of the column in `HISTORY_TABLE` containing the dependent variable that will be used as the basis for the prediction.
`PREDICT_ON_TABLE`	The name of the table containing the independent variable against which the prediction will be made. To make a prediction against column data in the future_years table, pass the name of the table to `INPUT_TABLE`: INPUT_TABLE(future_years) To make a prediction against data that is the result of a query, pass the query to `INPUT_TABLE`: INPUT_TABLE ( SELECT * FROM future_years UNION SELECT * FROM future_decades )
`PREDICT_ON_COLUMN`	The name or position of the column in `PREDICT_ON_TABLE` containing the independent variable against which the prediction will be made.
`PREDICT_METHOD`	The optional calculation method to use to make the prediction. The only supported (and default) method is `LINEAR`.

To predict ticket prices for years found in the future_years table, based on historical ticket price data found in the ticket_prices table:

PREDICT Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT * FROM TABLE
(
    PREDICT
    (
        HISTORY_TABLE => INPUT_TABLE(example.ticket_prices),
        X_COLUMN => 'year',
        Y_COLUMN => 'cost',
        PREDICT_ON_TABLE => INPUT_TABLE(example.future_years),
        PREDICT_ON_COLUMN => 'year'
    )
)

OUTLIERS

The OUTLIERS table function will calculate the outliers in a given data set, based on a specified calculation type, threshold, and partition column. The partition column allows the data to be segmented into subsets, one per unique partition column value, and have the outliers for each subset calculated & determined independently from other subsets.

The basic form of the OUTLIERS function, called within a SELECT statement follows.

OUTLIERS Table Function Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT * FROM TABLE
(
    OUTLIERS
    (
        DATA_TABLE        => INPUT_TABLE(<table name | select statement>),
        DATA_COLUMN       => < '<column name>' | <column position> >,
        [PARTITION_COLUMN => < '<column name>' | <column position> >,]
        [OUTLIER_METHOD   => < 'ZSCORE' | 'PERCENTILE' >,]
        [THRESHOLD_LOW    => <threshold value>,]
        [THRESHOLD_HIGH   => <threshold value>,]
        [OUTPUT_DATA      => < 'OUTLIERS' | 'NON_OUTLIERS' | 'ALL' >,]
        [OUTPUT_SCORE     => < TRUE | FALSE >]
    )
)

Parameters Description

DATA_TABLE

The name of the data table within which outliers will be detected.

To detect outliers in the data in the employee table, pass the name of the table to INPUT_TABLE:

INPUT_TABLE(employee)

To detect outliers in the data that is the result of a query, pass the query to INPUT_TABLE:

INPUT_TABLE
(
    SELECT * FROM employee_east
    UNION
    SELECT * FROM employee_west
)

DATA_COLUMN The name or position of the column in DATA_TABLE containing the outliers to detect.

PARTITION_COLUMN The name or position of the column in DATA_TABLE containing the value to partition over, when detecting outliers. Each unique PARTITION_COLUMN value will denote a subset of the source data, within which outliers will be determined independently from other subsets.

OUTLIER_METHOD

The optional calculation method to use to detect outliers. The default method is ZSCORE.

Method Description

ZSCORE

Z-score calculation, indicating the number of standard deviations above the mean each value within the set is. If a PARTITION_COLUMN is given, the z-score will be calculated for each subset of data corresponding to a unique PARTITION_COLUMN value. The z-score uses the following formula:

(DATA_COLUMN - AVG(DATA_COLUMN)) / STDDEV(DATA_COLUMN)

Scores will be decimals centered around 0.

PERCENTILE The standard percentile calculation, performed within each PARTITION_COLUMN group (if specified). Scores will be decimals between 0, inclusive, and 100, exclusive.

THRESHOLD_LOW

The lower bound to use for the determination of outliers. Records with scores lower than this value will be considered outliers. If not given, no lower bound will be applied. The value should match the OUTLIER_METHOD used:

ZSCORE - Threshold is the negative number of standard deviations to the left of the mean, beyond which outliers are found; e.g., -3 would indicate a threshold of 3 standard deviations lower than the mean.
PERCENTILE - Threshold is the percentage lower than which outliers are found; e.g., 25 would indicate outliers have percentile scores below 25%.

THRESHOLD_HIGH

The upper bound to use for the determination of outliers. Records with scores higher than this value will be considered outliers. If not given, no upper bound will be applied. The value should match the OUTLIER_METHOD used:

ZSCORE - Threshold is the positive number of standard deviations to the right of the mean, beyond which outliers are found; e.g., 3 would indicate a threshold of 3 standard deviations higher than the mean.
PERCENTILE - Threshold is the percentage higher than which outliers are found; e.g., 75 would indicate outliers have percentile scores above 75%.

OUTLIER_DATA

The selection of source records to return in the result set. The default is OUTLIERS.

Method	Description
`OUTLIERS`	Return only the records that are outliers.
`NON_OUTLIERS`	Return only the records that are not outliers.
`ALL`	Return all records.

OUTPUT_SCORE If TRUE, the calculated score for each record will be added to the result set in a column named OUTPUT_SCORE.

To find the outlier employee salaries in the employee table that are 1 standard deviation away from the mean, using the z-score method:

Outliers Using Z-Score Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT * FROM TABLE
(
    OUTLIERS
    (
        DATA_TABLE     => INPUT_TABLE(example.employee),
        DATA_COLUMN    => 'salary',
        THRESHOLD_LOW  => -1,
        THRESHOLD_HIGH => 1
    )
)

To show the percentile for non-outlier employee salaries by department in the employee table, between the 25th & 75th percentiles, using the percentile method:

Non-Outliers Using Percentile Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT * FROM TABLE
(
    OUTLIERS
    (
        DATA_TABLE       => INPUT_TABLE(example.employee),
        DATA_COLUMN      => 'salary',
        PARTITION_COLUMN => 'dept_id',
        OUTLIER_METHOD   => 'PERCENTILE',
        THRESHOLD_LOW    => 25,
        THRESHOLD_HIGH   => 75,
        OUTPUT_DATA      => 'NON_OUTLIERS',
        OUTPUT_SCORE     => TRUE
    )
)

Distribution Functions

Distribution functions are column expressions that affect the sharded/replicated nature of the result set of a given query. It may be necessary to force a result set to be distributed in a certain way for a subsequent operation on that result set to be performant.

Important

Employing these functions will prevent any automatic resharding of data to allow the query to succeed. Use only when a better query plan (with respect to data distribution) is known than any the system can devise.

Function	Description
`KI_REPLICATE()`	Force a scalar result set to be replicated (query/subquery with no `GROUP BY`)
`KI_REPLICATE_GROUP_BY(0)`	Force an aggregated result set to be replicated (query/subquery with `GROUP BY`)
`KI_MATCH_COLUMN(0)`	Aligns the column count of queries that are part of a UNION, INTERSECT or EXCEPT with a query whose column list has been amended with either `KI_REPLICATE_GROUP_BY` or `KI_SHARD_KEY`
`KI_SHARD_KEY(<column list>)`	Force the result set to be sharded on the given columns. This will override any implicitly-derived or explicitly-defined replication status the table would have had. Note The column(s) listed in `column list` must also appear in the `SELECT` list; `KI_SHARD_KEY` merely identifies which of the selected columns should be used as the shard key.

Sharding Example

For example, a query for all employees and their total employees managed, including employees who don't manage anyone, could employ a UNION like this:

Sharding Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT manager_id, COUNT(*)
FROM example.employee
GROUP BY manager_id
UNION
SELECT id, 0
FROM example.employee
WHERE id NOT IN
    (
        SELECT manager_id
        FROM example.employee
        WHERE manager_id IS NOT NULL
    )

In this example, the employee table is sharded on id. Since the first part of the UNION aggregates on manager_id, the result will be replicated. The second part of the UNION does no aggregation and includes the shard key in the SELECT list; the result of this will be sharded.

Prior to Kinetica v7.0, the limitation of UNION operations requiring that both parts of a UNION have to be distributed the same way, would make the query fail, with the following message:

GPUdb Error: either all input tables must be replicated or all input tables
must be non-replicated

In order to work around this limitation, a distribution function could be used. This is what the SQL engine of Kinetica, versions 7.0 or later, do automatically.

One option is to shard the first part of the UNION to match the second part:

Re-shard Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
SELECT
    manager_id,
    COUNT(*),
    KI_SHARD_KEY(manager_id)
FROM example.employee
GROUP BY manager_id
UNION
SELECT
    id,
    0,
    KI_MATCH_COLUMN(0)
FROM example.employee
WHERE id NOT IN
    (
        SELECT manager_id
        FROM example.employee
        WHERE manager_id IS NOT NULL
    )

Here, the distribution function KI_SHARD_KEY is used to make the selected manager_id column the new shard key for the first part of the UNION. Now, the shard key for the first part of the UNION (manager_id) aligns with the shard key for the second part (id), and the query succeeds. Note the use of KI_MATCH_COLUMN, which aligns the selected column lists on each side of the UNION. Without this matching distribution function, the UNION would appear to be merging three columns from the first part of the query into two columns in the second part and would fail.

Note

The manager_id column must exist in the SELECT list in order for the KI_SHARD_KEY function to designate it as the shard key.

Predicates

Predicate are generally used within a SQL WHERE clause to query records. They compare the values of two or more expressions; whenever a record meets the criteria defined in a predicate clause it will be marked as eligible to be part of the query result set. If it meets all predicate clauses defined within a query, it will be returned in the result set.

A single predicate clause may use a simple predicate operator to compare the values of two expressions or a more complex predicate clause form. A compound predicate clause uses a compound predicate operator to link together multiple predicate clauses to further refine a result set.

Unlimited-width (non-charN) strings can only be used within equality-based predicates, e.g. =, IN, etc.

Predicate Operators

= equality
!= or <> inequality
< less than
<= less than or equal to
> greater than
>= greater than or equal to

Predicate Clauses

In the following list of predicate clauses, ref_expr is the reference expression to apply the predicate to; note that EXISTS has no reference expression.

Predicate Clause	Description
`<expr_a> <pred_op> <expr_b>`	Matches records where `expr_a` relates to `expr_b` according to predicate operator `pred_op`.
`<ref_expr> <pred_op> ALL (<select statement>)`	Matches records where the reference expression `ref_expr` relates to all of the results of `select statement` according to the predicate operator `pred_op`
`<ref_expr> <pred_op> ANY (<select statement>)`	Matches records where the reference expression `ref_expr` relates to any of the results of `select statement` according to the predicate operator `pred_op`
`<ref_expr> [NOT] BETWEEN <begin_expr> AND <end_expr>`	Matches records where the reference expression `ref_expr` is (or is `NOT`) between the values of `begin_expr` and `end_expr`, inclusive
`<ref_expr> [NOT] IN (<match_list>)`	Matches records where the reference expression `ref_expr` is (or is `NOT`) in the `match_list` list of match values. The list can either be a comma-separated list of terms/expressions or the result of a `SELECT` statement.
`<ref_expr> IS [NOT] NULL`	Matches records where the reference expression `ref_expr` is (or is `NOT`) null.
`[NOT] EXISTS (<select statement>)`	Matches records where `select statement` returns 1 or more records.

Compound Predicate Operators

Predicate Operator	Description
`<pred_a> AND <pred_b>`	Matches records where both `pred_a` & `pred_b` are true
`<pred_a> OR <pred_b>`	Matches records where either `pred_a` or `pred_b` is true
`NOT <pred_b>`	Matches records where `pred` is false

Subqueries

Non-Correlated Subqueries

These are subqueries that are self-contained, in that they can be executed independently of the surrounding query.

Non-Correlated Subquery in SELECT Example
1
2
3
4
5
6
7
8
9
SELECT
    last_name,
    first_name,
    sal,
    (
        SELECT AVG(sal)
        FROM example.employee
    ) AS avg_emp_sal
FROM example.employee

Non-Correlated Subquery in FROM Example
1
2
3
4
5
6
7
SELECT
    l || ', ' || f AS "Employee_Name"
FROM
    (
        SELECT last_name AS l, first_name AS f
        FROM example.employee
    )

Non-Correlated Subquery in WHERE Example
1
2
3
4
5
6
7
8
SELECT last_name, first_name
FROM example.employee
WHERE dept_id IN
    (
        SELECT id
        FROM example.department
        WHERE code = 'ENGR' OR code = 'MKTG'
    )

Correlated Subqueries

These are subqueries that depend on the values in the surrounding query, and cannot be executed independently of the surrounding query.

Correlated Subquery Example
1
2
3
4
5
6
7
8
SELECT *
FROM demo.nyctaxi o
WHERE fare_amount =
    (
        SELECT MAX(fare_amount)
        FROM demo.nyctaxi i
        WHERE o.passenger_count = i.passenger_count
    )

Important

Correlated subqueries have the following limitations:

They cannot reference grouping columns in the parent query
They cannot reference tables beyond the immediate outer query; i.e., a table cannot be referenced in a correlated subquery that is two or more levels of nesting deeper than it is
They cannot contain disjunctive conditions
They cannot be part of an OUTER JOIN ON clause condition

Remote Queries

A remote query makes use of a data source to run a given query on the remote system referenced by that data source.

Remote Query Syntax
1
2
3
4
5
6
7
8
SELECT * FROM TABLE
(
    EXECUTE_REMOTE_QUERY
    (
        QUERY => '<remote query>',
        OPTIONS => KV_PAIRS(datasource_name = '<data source name>')
    )
)

Parameter	Description
`<remote query>`	SQL query to execute on the remote system.
`<data source name>`	Name of the data source referencing the remote system on which the query should be executed.

For instance, to query employees' managers from a given remote system:

Remote Query Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
SELECT * FROM TABLE
(
    EXECUTE_REMOTE_QUERY
    (
        QUERY => '
            SELECT
                e.last_name || '', '' || e.first_name AS "Employee_Name",
                m.last_name || '', '' || m.first_name AS "Manager_Name"
            FROM
                example.employee e
                LEFT JOIN example.employee m ON e.manager_id = m.id
            WHERE
                e.dept_id IN (1, 2, 3)
            ORDER BY
                m.id ASC NULLS FIRST,
                e.hire_date
        ',
        OPTIONS => KV_PAIRS(datasource_name = 'example.jdbc_ds')
    )
)

To query employees' information from a given remote system and join it to data in the host system:

Remote/Local JOIN Query Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
SELECT
    e.last_name || ', ' || e.first_name AS "Employee_Name",
    d.code
FROM
    TABLE
    (
        EXECUTE_REMOTE_QUERY
        (
            QUERY => '
                SELECT last_name, first_name, dept_id
                FROM example.employee
            ',
            OPTIONS => KV_PAIRS(datasource_name = 'example.jdbc_ds')
        )
    ) e,
    example.department d
WHERE e.dept_id = d.id
ORDER BY 1

Hints

Hints can be added as comments within queries, and affect just the query in which they appear. They will override the corresponding client & server settings (when such settings exist).

Kinetica supports both global & scoped hints:

Scoped Hints - affect a portion of the query
Global Hints - affect the entire query

Scoped Hints

Scoped hints may appear anywhere in the query. Scoped hints must be in a comment block in the form:

/*+ <HINT>[,<HINT>...] */

Scoped Hint Example
1
2
3
4
CREATE MATERIALIZED VIEW example.weather_zone AS
SELECT w.name AS event_name, w.type AS event_type, gz.name AS zone
FROM example.weather /*+ KI_HINT_DELTA_TABLE */ w
JOIN example.geo_zone gz ON STXY_INTERSECTS(lon, lat, zone)

Hint	Description
`KI_HINT_DELTA_TABLE`	Designates the corresponding table as a delta table, within the context of a CREATE MATERIALIZED VIEW statement. This hint must be placed immediately after the `FROM <table>` clause to which it applies, but before any table alias specified for that table.
`KI_HINT_GROUP_BY_PK`	Create primary keys for this GROUP BY result set on the grouped-upon columns/expressions; often used to create a primary key on the result of a CREATE TABLE ... AS that ends in a `GROUP BY`, and can also make materialized views containing grouping operations more performant. Note If any of the grouped-on expressions are nullable, no primary key will be applied. This hint may be placed either immediately after the `SELECT` keyword or immediately after the `FROM <table>` clause.
`KI_HINT_MATERIALIZE`	This can make some operations more performant, but can use more memory. It would normally be used on a multi-dimensional join or union result set to speed up operations done on that result set. This can also be used on a query with a CTE in the `WITH` clause; if different `WHERE` clauses are applied to the CTE, the CTE query may only have to be evaluated once. This hint must be placed immediately after the `SELECT` keyword, like the `SELECT` producing a join. Note `KI_HINT_MATERIALIZE` includes the functionality of `KI_HINT_NO_VIRTUAL_UNION`.
`KI_HINT_NO_VIRTUAL_UNION`	This can make some operations more performant, but can use more memory. This would normally be used on a union query combining many tables or sub-queries. This scoped must be placed immediately after the `SELECT` keyword, like the first `SELECT` keyword in the union.

Global Hints

Global hints may appear anywhere in the query. Global hints must be in a comment block in the form:

/* <HINT>[,<HINT>...] */

Global Hint Example
1
2
3
4
5
CREATE TABLE example.taxi_trip_daily_totals AS
/* KI_HINT_GROUP_BY_PK, KI_HINT_INDEX(transaction_date) */
SELECT vendor_id, DATE(dropoff_datetime) AS transaction_date, COUNT(*) AS total_trips
FROM demo.nyctaxi
GROUP BY vendor_id, transaction_date

Hint Description

KI_HINT_BATCH_SIZE(n)

Use an ingest batch size of n records. Default: 10,000.

Only applicable when issuing INSERT statements.

KI_HINT_CHUNK_SIZE(n) Use chunk sizes of n records per chunk within result sets. Suffixes of K & M can be used to represent thousands or millions of records; e.g., 20K, 50M.

KI_HINT_COST_BASED_OPTIMIZATION Cost-based optimizations are usually turned on by default. If they have been turned off by default using a global configuration setting (sql.cost_based_optimization), they can be turned on for an individual query using this hint.

KI_HINT_DICT_PROJECTION Retain the dictionary encoding attributes of source columns in a projection result set.

KI_HINT_DISTRIBUTED_OPERATIONS Reshard data when doing so would be the only way to process one or more operations in this query.

KI_HINT_DONT_COMBINE Don't combine joins and unions for this query.

KI_HINT_GROUP_BY_FORCE_REPLICATED Make all result tables within a single query replicated; useful when meeting the input table requirements of JOIN, UNION, etc. in a query containing aggregated subqueries which generate differently-sharded result tables.

KI_HINT_GROUP_BY_PK

Create primary keys for all GROUP BY result sets on the grouped-upon columns/expressions within a given query; often used to create a primary key on the result of a CREATE TABLE ... AS that ends in a GROUP BY, and can also make materialized views containing grouping operations more performant.

Note

If any of the grouped-on expressions are nullable, no primary key will be applied.

KI_HINT_HAS_HEADER Assume the first line of the source CSV file has a header row. Only used with CSV ingestion via INSERT INTO ... SELECT ... FROM FILE.

KI_HINT_IGNORE_EXISTING_PK When inserting into or updating a table with a primary key, if the INSERT or UPDATE results in a primary key collision, reject the command with no error or warning. If the specified table does not have a primary key or the KI_HINT_UPDATE_ON_EXISTING_PK hint is used to switch to upsert mode for inserts or overwrite mode for updates, then this hint is ignored.

KI_HINT_INDEX(column_list) Create an index on each of the comma-separated columns in the given column_list; often used with CREATE TABLE ... AS to create an index on a persisted result set.

KI_HINT_JOIN_TABLE_CHUNK_SIZE(n) Use chunk sizes of n records per chunk within joins. Suffixes of K & M can be used to represent thousands or millions of records; e.g., 20K, 50M.

KI_HINT_KEEP_TEMP_TABLES Don't erase temp tables created by this query.

KI_HINT_KEY_LOOKUP

Use distributed key lookup to retrieve records from a single table using a fast, indexed lookup.

See SQL for details and examples.

KI_HINT_NO_COST_BASED_OPTIMIZATION Don't use the cost-based optimizer when calculating the query plan.

KI_HINT_NO_DICT_PROJECTION Don't retain dictionary encoding attributes of source columns in a projection result set.

KI_HINT_NO_DISTRIBUTED_OPERATIONS Don't reshard data when doing so would be the only way to process one or more operations in this query.

KI_HINT_NO_HEADER Assume the source CSV file has no header row. Only used with CSV ingestion via INSERT INTO ... SELECT ... FROM FILE.

KI_HINT_NO_JOIN_COUNT Optimize joins by not calculating intermediate set counts.

KI_HINT_NO_LATE_MATERIALIZATION Force the materialization of intermediary result sets.

KI_HINT_NO_NATIVE_SEMI_JOINS Don't use semi-joins.

KI_HINT_NO_PARALLEL_EXECUTION Execute all components of this query in series.

KI_HINT_NO_PLAN_CACHE Don't cache the query plan calculated for this query.

KI_HINT_NO_QUERY_RESULT_CACHING Don't use the query cache for processing this query.

KI_HINT_NO_RULE_BASED_OPTIMIZATION Don't use the rule-based optimizer when calculating the query plan.

KI_HINT_NO_SHADOW_CUBE Don't use the shadow cube for caching requests in the worker processes.

KI_HINT_NO_SSQ_OPTIMIZATION Don't optimize the query using Scalar Sub-Query (SSQ) substitution.

KI_HINT_NO_VALIDATE_CHANGE Don't fail an ALTER TABLE command when changing column types/sizes and the column data is too long/large. Truncate the data instead and allow the modification to succeed.

KI_HINT_NO_VIRTUAL_UNION Don't utilize Virtual Unions when performing this query.

KI_HINT_PROJECT_MATERIALIZED_VIEW Force the materialization of a materialized view. Some materialized views containing JOIN clauses will be backed by a native join view. This is done to improve the performance of materialized view refreshes and reduce memory usage at the cost of reduced query performance. This hint will induce the reverse of this trade-off -- increased query performance at the cost of reduced refresh performance and increased memory usage.

KI_HINT_REPL_ASYNC

Instruct the target database to treat this statement as one that should be run asynchronously and serially across all high availability clusters in its ring. See HA Asynchronous Operation Handling for details.

Note

The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details.

KI_HINT_REPL_ASYNC_PARALLEL

Instruct the target database to treat this statement as one that should be run asynchronously and in parallel across all high availability clusters in its ring. See HA Asynchronous Operation Handling for details.

Note

The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details.

KI_HINT_REPL_NONE

Instruct the target database to treat this statement as one that should not be run across all high availability clusters in its ring. See HA Query Operation Handling for details.

Note

The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details.

KI_HINT_REPL_SYNC

Instruct the target database to treat this statement as one that should be run synchronously and serially across all high availability clusters in its ring. See HA Synchronous Operation Handling for details.

Note

The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details.

KI_HINT_REPL_SYNC_PARALLEL

Instruct the target database to treat this statement as one that should be run synchronously and in parallel across all high availability clusters in its ring. See HA Synchronous Operation Handling for details.

Note

The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details.

KI_HINT_REQUEST_TIMEOUT(m) Use a timeout of m minutes when processing this command.

KI_HINT_SAVE_UDF_STATS Save statistics of UDF runs for UDFs invoked via SQL.

KI_HINT_SERVER_SIDE_INSERT Issue the INSERT INTO ... VALUE statement as a server-side command, instead of incurring the overhead of the client-side distributed ingestion processor, which can give better performance when inserting many rows.

KI_HINT_TRUNCATE_STRINGS Truncate all strings being inserted into restricted-width (charN) columns to the max width for the column. Used with any INSERT INTO, LOAD INTO, or CREATE EXTERNAL TABLE statement.

KI_HINT_UPDATE_ON_EXISTING_PK

Change the record collision policy for inserting into or updating a table with a primary key:

For INSERT statements, invoke an upsert scheme; any existing table record with primary key values that match those of a record being inserted will be replaced by that new record.
For UPDATE statements, invoke an overwrite scheme; if a record update causes the record's primary key to match that of another record in the same table, that other record will be removed and the original record will remain, updated with its new primary key and any other values from the SET clause.

Without this hint, the INSERT or UPDATE will be rejected and the error handled based on the presence of the KI_HINT_IGNORE_EXISTING_PK hint. If the specified table does not have a primary key, then this hint will be ignored.

KI_HINT_WAL_SYNC_METHOD(method)

Specify the WAL sync method for a query.

Method	Description
`none`	Disables the write-ahead log feature.
`background`	WAL entries are periodically written instead of immediately after each operation.
`flush`	Protects entries in the event of a database crash.
`fsync`	Protects entries in the event of an OS crash.

Testing Hints

The following global hints are intended to be used for testing. They are usually not recommended for regular use and are subject to change.

Hint	Description
`KI_HINT_NO_CHUNK_SKIPPING`	Do not use any chunk-skipping optimizations when performing this query. Currently, this only affects chunk-skipping performed in join operations.
`KI_HINT_NO_CONSTANT_FOLDING`	Do not use constant-folding optimizations when performing this query.
`KI_HINT_NO_HAVING_PUSHDOWN`	Do not push down any `HAVING` clauses when planning this query.
`KI_HINT_NO_TRANSITIVE_SIMPLIFICATIONS`	Normally, if a query contains "x = y" and "x = 5" then the transitive simplification will add "y = 5" in some cases. This usually helps perform optimizations. This hint disables this behavior.

EXPLAIN

Outputs the execution plan of a given SQL statement.

Tip

For the visual explain plan utility in GAdmin, see the Explain feature under SQL Tool.

EXPLAIN Syntax
1
EXPLAIN [PHYSICAL|ANALYZE|VERBOSE|VERBOSE ANALYZE] [FORMAT <JSON|TABLE>] <statement>

Each supporting API endpoint call that is made in servicing the request is listed as an execution plan step in the output, along with any input or output tables associated with the call and the prior plan execution steps on which a given execution step depends.

The following options can be specified:

PHYSICAL - (default) outputs the physical execution plan with the following endpoint-level details per step:
- ID - execution step number
- ENDPOINT - name of native API endpoint called
- INPUT_TABLES - input tables used by the endpoint (if any)
- OUTPUT_TABLE - output table created by the endpoint (if any)
- DEPENDENCIES - list of prior execution steps upon which this step depends
ANALYZE - same as PHYSICAL, including additional run-time details:
- RUN_TIME - execution time of each endpoint call
- RESULT_ROWS - number of records produced in the endpoint call
VERBOSE - same as PHYSICAL, including endpoint parameter details:
- COLUMNS - columns passed to the endpoint call
- EXPRESSIONS - expressions passed to the endpoint call
- OPTIONS - option keys & values passed to the endpoint call
- LAST_USE_TABLES - list of tables that will not be used by any following execution step
- ADDITIONAL_INFO - other parameters passed to the endpoint call
VERBOSE ANALYZE - same as VERBOSE & ANALYZE together, including the execution plan for any joins contained within the query
FORMAT JSON - outputs the result in JSON format
FORMAT TABLE - (default) outputs the result in tabular format

Tip

For PHYSICAL & VERBOSE modes, the /* KI_HINT_RECURSIVE_EXPLAIN */ hint can be used to show the plan recursively through any materialized views involved in the query.

Important

Specifying ANALYZE will cause the statement to be executed in order to collect run-time statistics on the endpoint calls made.

For example, to output the execution plan for a query that aggregates the number of taxi rides between boroughs:

EXPLAIN Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
EXPLAIN
SELECT
    n_begin.ntaname AS boro_begin,
    boro_end,
    COUNT(*) AS total_trips
FROM
(
    SELECT pickup_latitude, pickup_longitude, n_end.ntaname AS boro_end
    FROM demo.nyctaxi t
    JOIN example_geospatial.nyc_neighborhood n_end ON STXY_INTERSECTS(dropoff_longitude, dropoff_latitude, geom)
)
JOIN example_geospatial.nyc_neighborhood n_begin ON STXY_INTERSECTS(pickup_longitude, pickup_latitude, geom)
GROUP BY 1, 2

The execution plan is listed in table format, as follows:

EXPLAIN Output
1
2
3
4
5
6
7
8
+------+-------------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+----------------+
| ID   | ENDPOINT                | INPUT_TABLES                                                                                                                     | OUTPUT_TABLE                                                           | DEPENDENCIES   |
+------+-------------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+----------------+
| 0    | /create/jointable       | demo.nyctaxi AS TableAlias_0_,example_geospatial.nyc_neighborhood AS TableAlias_1_                                               | sys_sql_temp.Join_3_01234567_89ab_cdef_0123_456789abcdef               | -1             |
| 1    | /create/jointable       | sys_sql_temp.Join_3_01234567_89ab_cdef_0123_456789abcdef AS TableAlias_0_,example_geospatial.nyc_neighborhood AS TableAlias_1_   | sys_sql_temp.Join_5_01234567_89ab_cdef_0123_456789abcdef               | 0              |
| 2    | /aggregate/groupby      | sys_sql_temp.Join_5_01234567_89ab_cdef_0123_456789abcdef                                                                         | sys_sql_temp.ShardedAggregate_7_01234567_89ab_cdef_0123_456789abcdef   | 1              |
| 3    | /get/records/bycolumn   | sys_sql_temp.ShardedAggregate_7_01234567_89ab_cdef_0123_456789abcdef                                                             |                                                                        | 2              |
+------+-------------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+----------------+

If there is an error processing a query, the error can be returned in the JSON-formatted execution plan:

EXPLAIN with Detail Example
1
2
3
4
EXPLAIN VERBOSE ANALYZE FORMAT JSON
SELECT * /* KI_HINT_NO_DISTRIBUTED_OPERATIONS */
FROM example.explain_table t1
JOIN example.explain_table t2 ON t1.shard_column = t2.not_shard_column

The execution plan is listed in JSON format with the query error, as follows:

EXPLAIN with Detail Output
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
"PLAN": [
    {
        "ADDITIONAL_INFO": "Not all of the non-replicated tables are equated by shard keys. Number of sharded sets: 2, Number of components connected: 0 (REF)",
        "COLUMNS": "TableAlias_0_.shard_column AS shard_column,TableAlias_0_.not_shard_column AS not_shard_column,TableAlias_1_.shard_column AS shard_column0,TableAlias_1_.not_shard_column AS not_shard_column0",
        "DEPENDENCIES": "-1",
        "ENDPOINT": "/create/jointable",
        "EXPRESSIONS": "inner join TableAlias_0_, TableAlias_1_  on   (TableAlias_0_.shard_column = TableAlias_1_.not_shard_column )  ",
        "ID": "0",
        "INFO": "{"estimate_input_row_counts":"1.0, 1.0","estimate_row_count":"1.0"}",
        "JSON_REQUEST": "{
            "join_table_name": "sys_sql_temp.Join_1_01234567_89ab_cdef_0123_456789abcdef",
            "table_names": [
                "example.explain_table AS TableAlias_0_",
                "example.explain_table AS TableAlias_1_"
            ],
            "column_names": [
                "TableAlias_0_.shard_column AS shard_column",
                "TableAlias_0_.not_shard_column AS not_shard_column",
                "TableAlias_1_.shard_column AS shard_column0",
                "TableAlias_1_.not_shard_column AS not_shard_column0"
            ],
            "expressions": [
                "inner join TableAlias_0_, TableAlias_1_  on   (TableAlias_0_.shard_column = TableAlias_1_.not_shard_column )  "
            ],
            "options": {
                "create_explain": "true",
                "show_filters": "true",
                "ttl": "-1"
            }
        }",
        "LAST_USE_TABLES": "",
        "OPTIONS": "{create_explain,true} {show_filters,true} {ttl,-1}",
        "RESULT_DISTRIBUTION": "NA / ;",
        "RESULT_ROWS": "0",
        "RUN_TIME": "0",
        "TABLE_DEFINITIONS": "CREATE TABLE "example"."explain_table"\r
        (\r
            "shard_column" INTEGER (shard_key),\r
            "not_shard_column" INTEGER\r
        )\r
        TIER STRATEGY (\r
            ( ( VRAM 1, RAM 5, DISK0 5, PERSIST 5 ) )\r
        );
        CREATE TABLE "example"."explain_table"\r
        (\r
            "shard_column" INTEGER (shard_key),\r
            "not_shard_column" INTEGER\r
        )\r
        TIER STRATEGY (\r
            ( ( VRAM 1, RAM 5, DISK0 5, PERSIST 5 ) )\r
        );"
    },
    {
        "ADDITIONAL_INFO": "",
        "COLUMNS": "shard_column AS shard_column, not_shard_column AS not_shard_column, shard_column0 AS shard_column0, not_shard_column0 AS not_shard_column0",
        "DEPENDENCIES": "0",
        "ENDPOINT": "/get/records/bycolumn",
        "EXPRESSIONS": "",
        "ID": "1",
        "INFO": "",
        "JSON_REQUEST": "{
            "table_name": "sys_sql_temp.Join_1_01234567_89ab_cdef_0123_456789abcdef",
            "column_names": [
                "shard_column AS shard_column",
                "not_shard_column AS not_shard_column",
                "shard_column0 AS shard_column0",
                "not_shard_column0 AS not_shard_column0"
            ],
            "offset": 0,
            "limit": -9999,
            "encoding": "binary",
            "options": {}
        }",
        "LAST_USE_TABLES": "sys_sql_temp.Join_1_01234567_89ab_cdef_0123_456789abcdef",
        "OPTIONS": "",
        "RESULT_DISTRIBUTION": "",
        "RESULT_ROWS": "0",
        "RUN_TIME": "0",
        "TABLE_DEFINITIONS": ""
    }
]