SQL Support¶

Kinetica has broad support for the SQL-92 standard through its ODBC connector interface. For details on installation, configuration, logging, & use, see the The ODBC/JDBC Connector section.

Topics

Query
Data Definition (DDL)
Data Manipulation (DML)
- INSERT
- UPDATE
- DELETE
Loading Data from Files
- Loading Client-Side Files
- Loading Server-Side Files
SQL Procedures
Security
Resource Management
System Properties
- ALTER SYSTEM SET PROPERTIES
- SHOW SYSTEM PROPERTIES
Limitations
Not Yet Supported SQL
Reserved Words

Query¶

The basic form of the supported SELECT statement is:

SELECT [DISTINCT | TOP <n>] <column expression list>
FROM [<schema name>.]<table name>
    [<join type> JOIN <join table name> ON <join expression>],...
[WHERE <filtering expression list>]
[GROUP BY <grouping expression list>]
[HAVING <group filtering expression list>]
[ORDER BY <ordering expression list>]
[LIMIT [<offset>, ]<num rows>]

Note

The * can be used to specify all columns in the column expression list, while <table_name>.* can be used to specify all columns from the given table.
Columns can be double-quoted to induce case-sensitivity and to use reserved words as column names; e.g., "PERCENT".
TOP <n> returns the first n records (up to 20000 records by default), but is configurable.
The grouping expression list may contain column names, aliases, expressions, or positions (e.g., GROUP BY 2 to aggregate on the 2nd column in the SELECT list).
The having expression list may contain grouping expressions or any grouping expression aliases defined in the SELECT list.
The ordering expression list may contain column names, expressions, or column positions (e.g., GROUP BY 2 to aggregate on the 2nd column in the SELECT list). The default ordering is ASC. The default null ordering is NULLS FIRST when using ascending order and NULLS LAST when using descending order. The general format for each comma-separated ordering expression in the list is:
```
<column name/alias/expression/position> [ASC | DESC] [NULLS FIRST | NULLS LAST]
```
LIMIT applies paging to the result set, starting at the 0-based offset (if specified) and returning num rows records.

For example:

SELECT
    e.last_name || ', ' || e.first_name as "Employee_Name",
    m.last_name || ', ' || m.first_name as "Manager_Name"
FROM
    employee e
    LEFT JOIN employee m ON e.manager_id = m.id
WHERE
    e.dept_id IN (1, 2, 3)
ORDER BY
    m.id ASC NULLS FIRST,
    e.hire_date

Tableless Query¶

A query without a FROM clause can be used to return a single row of data containing a constant or constant expression.

For example, to select the current day of the week:

SELECT DAYNAME(NOW()) AS "Today"

Note

A tableless query will create a result set backed by a replicated table, by default.

Join¶

The supported join types are:

INNER - matching rows between two tables
LEFT - matching rows between two tables, and rows in the left-hand table with no matching rows in the right-hand table
RIGHT - matching rows between two tables, and rows in the right-hand table with no matching rows in the left-hand table
FULL OUTER matching rows between two tables, and rows in both tables with no matching rows in the other
CROSS - all rows in one table matched against all rows in the other

There are two execution schemes that are used to process joins, depending on the distribution of the joined tables:

Local - highly performant, but native join criteria must be met
Distributed - highly flexible, as native join restrictions are lifted, but less performant due to interprocessor communication overhead

Important

Though the data distribution restrictions on native joins do not exist for joins made via SQL, following the join guidelines on sharding will result in much more performant queries.

Kinetica supports both JOIN...ON and WHERE clause syntax for inner joins; all outer join types (LEFT, RIGHT, & FULL OUTER) require JOIN...ON syntax.

For example, to list the name of each employee and the name of the employee's manager, using the WHERE clause to specify the join condition:

SELECT
    e.last_name || ', ' || e.first_name as "Employee_Name",
    m.last_name || ', ' || m.first_name as "Manager_Name"
FROM
    employee e,
    employee m
WHERE
    e.manager_id = m.id
ORDER BY
    e.last_name,
    e.first_name

To list the name of each employee and the associated manager, even for employees that don't have a manager, using the JOIN...ON syntax to specify the join condition:

SELECT
    e.last_name || ', ' || e.first_name as "Employee_Name",
    m.last_name || ', ' || m.first_name as "Manager_Name"
FROM
    employee e
    LEFT JOIN employee m ON e.manager_id = m.id
ORDER BY
    e.last_name,
    e.first_name

ASOF¶

Kinetica supports the notion of an inexact match join via the ASOF join function. This feature allows each left-side table record to be matched to a single right-side table record whose join column value is the smallest or largest value within a range relative to the left-side join column value. In the case where multiple right-side table records have the same smallest or largest value for a given left-side table record, only one of the right-side table records will be chosen and returned as part of the join.

The format of the ASOF function is as follows:

ASOF(<left_column>, <right_column>, <rel_range_begin>, <rel_range_end>, <MIN|MAX>)

The five parameters are:

left_column - name of the column to join on from the left-side table
right_column - name of the column to join on from the right-side table
rel_range_begin - constant value defining the position, relative to each left-side column value, of the beginning of the range in which to match right-side column values; use a negative constant to begin the range before the left-side column value, or a positive one to begin after it
rel_range_end - constant value defining the position, relative to each left-side column value, of the end of the range in which to match right-side column values; use a negative constant to end the range before the left-side column value, or a positive one to end after it
MIN|MAX - use MIN to return the right-side matched record with the smallest join column value; use MAX to return the right-side matched record with the greatest join column value

Effectively, each matched right-side column value must be:

>= <left-side column value> + rel_range_begin
<= <left-side column value> + rel_range_end

Within the set of right-side matches for each left-side record, the one with the MIN or MAX column value will be returned in the join. In the case of a tie for the MIN or MAX column value, only one right-side record will be selected for return in the join for that left-side record.

Examples¶

The following ASOF call might be used to list, for each flight arrival time, the soonest flight departure time that occurs between half an hour and an hour and a half after the arrival; effectively, the time-matching portion of a connecting flight query:

ASOF(inbound.eta, outbound.etd, INTERVAL '30' MINUTE, INTERVAL '90' MINUTE, MIN)

This ASOF call returns right-side locations that are nearest eastward to each left-side location, for locations within 5 degrees of the left-side:

ASOF(b.x, n.x, .00001, 5, MIN)

For example, to match a set of stock trades to the opening prices for those stocks (if an opening price record exists within 24 hours prior to the trade), and to include trades for which there is no opening stock price record:

SELECT
    t.id,
    t.dt AS execution_dt,
    q.open_dt AS quote_dt,
    t.price AS execution_price,
    q.open_price
FROM
    trades t
    LEFT JOIN quotes q ON t.ticker = q.symbol AND ASOF(t.dt, q.open_dt, INTERVAL '-1' DAY, INTERVAL '0' DAY, MAX)

While the ASOF join function can only be used as part of a join, it can effectively be made into a filter condition by sub-selecting the filter criteria in the FROM clause and joining on that criteria.

For instance, to look up the stock price for a given company as of a given date:

SELECT
    /* KI_HINT_NO_LATE_MATERIALIZATION */
    t.ticker,
    t.asof_dt,
    q.open_dt,
    q.open_price
FROM
    (SELECT 'EBAY' AS ticker, DATETIME('2006-12-15 12:34:56') AS asof_dt) t
    LEFT JOIN quotes q ON t.ticker = q.symbol AND ASOF(t.asof_dt, q.open_dt, INTERVAL '-1' DAY, INTERVAL '0' DAY, MAX)

Important

The use of the KI_HINT_NO_LATE_MATERIALIZATION is key, here, as the join requires a materialized table, which this hint ensures, to succeed.

Aggregation¶

The GROUP BY clause can be used to segment data into groups and apply aggregate functions over the values within each group. Aggregation functions applied to data without a GROUP BY clause will be applied over the entire result set.

Note

GROUP BY can operate on columns, column expressions, or the position of a member of the SELECT clause (where 1 is the first element), but does not work on column aliases.

For example, to find the average cab fare from the taxi data set:

SELECT ROUND(AVG(total_amount),2) AS "Average_Fare"
FROM nyctaxi

To find the minimum, maximum, & average trip distances, as well as the average passenger count for each vendor per year from the taxi data set (weeding out data with errant trip distances):

SELECT
    vendor_id AS Vendor_ID,
    YEAR(pickup_datetime) AS Year,
    MAX(trip_distance) AS Max_Trip,
    MIN(trip_distance) AS Min_Trip,
    ROUND(AVG(trip_distance),2) AS Avg_Trip,
    INT(AVG(passenger_count)) AS Avg_Passenger_Count
FROM nyctaxi
WHERE
    trip_distance > 0 AND
    trip_distance < 100
GROUP BY vendor_id, 2
ORDER BY Vendor_ID, Year

Grouping¶

The GROUP BY clause can also be used to apply the following grouping functions over the values within each group:

ROLLUP
CUBE
GROUPING SETS

With each of these, the GROUPING() aggregate function can be used to distinguish aggregated null values in the data from null values generated by the ROLLUP, CUBE, or GROUPING SETS grouping function.

For instance, the following CASE will turn the aggregated null values in the Sector column into an <UNKNOWN SECTOR> group and the null value generated by the grouping function into an <ALL SECTORS> group:

CASE
    WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
    ELSE NVL(Sector, '<UNKNOWN SECTOR>')
END AS Sector_Group,

ROLLUP¶

The ROLLUP(expr list) function calculates n + 1 aggregates for n number of columns in expr list.

For example, the following query will aggregate the average opening stock price for these groups:

Each market sector & stock symbol pair
Each market sector
All sectors and symbols

SELECT
    CASE
        WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
        ELSE NVL(Sector, '<UNKNOWN SECTOR>')
    END AS Sector_Group,
    CASE
        WHEN (GROUPING(Symbol) = 1) THEN '<ALL SYMBOLS>'
        ELSE NVL(Symbol, '<UNKNOWN SYMBOL>')
    END AS Symbol_Group,
    AVG("Open") AS AvgOpen
FROM Stocks
GROUP BY ROLLUP(Sector, Symbol)
ORDER BY Sector_Group, Symbol_Group

CUBE¶

The CUBE(expr list) function calculates 2ⁿ aggregates for n number of columns in expr list.

For example, the following query will aggregate the average opening stock price for these groups:

Each market sector & stock symbol pair
Each market sector
Each stock symbol
All sectors and symbols

SELECT
    CASE
        WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
        ELSE NVL(Sector, '<UNKNOWN SECTOR>')
    END AS Sector_Group,
    CASE
        WHEN (GROUPING(Symbol) = 1) THEN '<ALL SYMBOLS>'
        ELSE NVL(Symbol, '<UNKNOWN SYMBOL>')
    END AS Symbol_Group,
    AVG("Open") AS AvgOpen
FROM Stocks
GROUP BY CUBE(Sector, Symbol)
ORDER BY Sector_Group, Symbol_Group

GROUPING SETS¶

The GROUPING SETS(expr list) function calculates aggregates for each group of columns in expr list.

For example, the following query will aggregate the average opening stock price for these groups:

Each market sector
Each stock symbol
All sectors and symbols

SELECT
    CASE
        WHEN (GROUPING(Sector) = 1) THEN '<ALL SECTORS>'
        ELSE NVL(Sector, '<UNKNOWN SECTOR>')
    END AS Sector_Group,
    CASE
        WHEN (GROUPING(Symbol) = 1) THEN '<ALL SYMBOLS>'
        ELSE NVL(Symbol, '<UNKNOWN SYMBOL>')
    END AS Symbol_Group,
    AVG("Open") AS AvgOpen
FROM Stocks
GROUP BY GROUPING SETS((Sector), (Symbol), ())
ORDER BY Sector_Group, Symbol_Group

Window¶

Window functions are available through the use of the OVER clause, which can partition rows into frames. Different types of functions can be used to aggregate data over a sliding window.

The basic form for a window is:

SELECT
    <window function> OVER (
        PARTITION BY <column expression list>
        [ORDER BY <ordering expression list>]
        [
            <RANGE | ROWS>
            <
                <UNBOUNDED PRECEDING | <number> PRECEDING | CURRENT ROW | <number> FOLLOWING> |
                BETWEEN <UNBOUNDED PRECEDING | <number> PRECEDING | CURRENT ROW | <number> FOLLOWING>
                    AND <UNBOUNDED FOLLOWING | <number> PRECEDING | CURRENT ROW | <number> FOLLOWING>
            >
        ]
    ) [AS <alias>]

Note

The ORDER BY clause is not required when the window function is either FIRST_VALUE() or LAST_VALUE(). These two functions are also the only ranking functions that can contain a RANGE or ROWS frame clause.

The default ordering of records within each partition is ASC. The default null ordering is NULLS FIRST when using ascending order and NULLS LAST when using descending order. The general format for each comma-separated ordering expression in the list is:

<column name/alias/expression/position> [ASC | DESC] [NULLS FIRST | NULLS LAST]

Note

Only one column can be specified in the ordering expression list when using RANGE. When using ROWS, the frame is applied after any ordering; so, while several columns may appear in the order expression list, there will be only one ROWS clause following the list.

When a RANGE frame is specified, CURRENT ROW includes all peer rows (rows with the same ordering values). Thus, when the first of a set of peer rows is encountered, all associated peer rows are included in the frame (not just the first one).

In contrast, when a ROWS frame is specified, CURRENT ROW will direct that only the peer rows up to and including the current row are contained within the frame--the following peer rows will not be included.

The default frame type is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

If a frame clause is specified without a BETWEEN, the clause is applied to the frame start; the frame end will still be the default of CURRENT ROW.

For example, to calculate the rolling sum of total amounts collected by each taxi vendor over the course of a given day, as well as the number of other trips that occurred within 5 minutes of each trip:

SELECT
    vendor_id,
    pickup_datetime,
    total_amount,
    passenger_count,
    DECIMAL
    (
        SUM(total_amount) OVER
            (
                PARTITION BY vendor_id
                ORDER BY pickup_datetime
            )
    ) AS growing_sum,
    COUNT(*) OVER
        (
            PARTITION BY vendor_id
            ORDER BY LONG(pickup_datetime)
            RANGE BETWEEN 300000 PRECEDING AND 300000 FOLLOWING
        ) AS trip_demand
FROM nyctaxi
WHERE pickup_datetime >= '2015-01-01' AND pickup_datetime < '2015-01-01 02:00:00'
ORDER BY
    vendor_id,
    pickup_datetime

To calculate a 5-before and 10-after moving average of 4-passenger trip distances per vendor over the course of a given day:

SELECT
    vendor_id,
    pickup_datetime,
    trip_distance,
    AVG(trip_distance) OVER
        (
            PARTITION BY vendor_id
            ORDER BY pickup_datetime
            ROWS BETWEEN 5 PRECEDING AND 10 FOLLOWING
        ) AS local_avg_dist
FROM nyctaxi
WHERE
    passenger_count = 4 AND
    pickup_datetime >= '2015-01-01' AND pickup_datetime < '2015-01-02'
ORDER BY
    vendor_id,
    pickup_datetime

To rank, by vendor, the total amounts collected from 3-passenger trips on a given day:

SELECT
    vendor_id,
    pickup_datetime,
    dropoff_datetime,
    total_amount AS fare,
    RANK() OVER (PARTITION BY vendor_id ORDER BY total_amount) AS ranked_fare,
    DECIMAL(PERCENT_RANK() OVER (PARTITION BY vendor_id ORDER BY total_amount)) * 100 AS percent_ranked_fare
FROM nyctaxi
WHERE
    passenger_count = 3 AND
    pickup_datetime >= '2015-01-11' AND pickup_datetime < '2015-01-12'
ORDER BY
    vendor_id,
    pickup_datetime

To compare each trip's total amount to the lowest, highest, & average total amount for 5-passenger trips for each vendor over the course of a given day:

SELECT
    vendor_id,
    pickup_datetime,
    tip_amount,
    tip_amount - 
        FIRST_VALUE(tip_amount) OVER
            (PARTITION BY vendor_id ORDER BY tip_amount NULLS LAST) AS lowest_amount,
    tip_amount - 
        DECIMAL
        (
            AVG(tip_amount) OVER
                (
                    PARTITION BY vendor_id
                    ORDER BY tip_amount
                    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                )
        ) AS average_amount,
    tip_amount - 
        FIRST_VALUE(tip_amount) OVER
            (PARTITION BY vendor_id ORDER BY tip_amount DESC) AS highest_amount
FROM nyctaxi
WHERE
    passenger_count = 5 AND
    pickup_datetime >= '2015-04-17' AND pickup_datetime < '2015-04-18' AND
    tip_amount > 0 AND
    trip_distance > 0
ORDER BY
    vendor_id,
    pickup_datetime

To compare each vendor's average total amount to their average total amount within the interquartile range:

SELECT
    vendor_id,
    DECIMAL(AVG(total_amount)) AS average_total_amount,
    DECIMAL(AVG(IF(quartile IN (2,3), total_amount, null))) AS average_interq_range_total_amount
FROM
(
    SELECT
        vendor_id,
        total_amount,
        NTILE(4) OVER (PARTITION BY vendor_id ORDER BY total_amount) quartile
    FROM
        nyctaxi
)
GROUP BY vendor_id
ORDER BY vendor_id

PIVOT¶

The PIVOT clause can be used to pivot columns, "rotating" column values into row values, creating wider and shorter denormalized tables from longer, more normalized tables.

The basic form for a pivot is:

<SELECT statement>
PIVOT
(
    <aggregate expression [AS <alias>]>[,...]
    FOR <column> IN (<column list>)
)

For example, given a source table phone_number, which lists each phone number for a customer as a separate record in the table, a pivot operation can be performed, creating a single record per customer with the home, work, & cell phone numbers as separate columns.

With this data:

+--------+--------------+----------------+
| name   | phone_type   | phone_number   |
+--------+--------------+----------------+
| Jane   | Home         | 123-456-7890   |
| Jane   | Work         | 111-222-3333   |
| John   | Home         | 123-456-7890   |
| John   | Cell         | 333-222-1111   |
+--------+--------------+----------------+

The following pivot operation can be applied:

SELECT *
FROM example.phone_list
PIVOT
(
    MAX(phone_number) AS Phone
    FOR phone_type IN ('Home', 'Work', 'Cell')
)

The data will be pivoted into a table like this:

+--------+----------------+----------------+----------------+
| name   | Home_Phone     | Work_Phone     | Cell_Phone     |
+--------+----------------+----------------+----------------+
| Jane   | 123-456-7890   | 111-222-3333   |                |
| John   | 123-456-7890   |                | 333-222-1111   |
+--------+----------------+----------------+----------------+

UNPIVOT¶

The UNPIVOT clause can be used to unpivot columns, "rotating" row values into column values, creating longer, more normalized tables from shorter, more denormalized tables.

The basic form for an unpivot is:

<SELECT statement>
UNPIVOT
(
    <value_column> FOR <var_column> IN (<column list>)
)

For example, given a source table customer_contact, which lists the home, work, & cell phone numbers for each customer in the table, an unpivot operation can be performed, creating separate home, work, & cell phone records for each customer.

With this data:

+--------+----------------+----------------+----------------+
| name   | home_phone     | work_phone     | cell_phone     |
+--------+----------------+----------------+----------------+
| Jane   | 123-456-7890   | 111-222-3333   |                |
| John   | 123-456-7890   |                | 333-222-1111   |
+--------+----------------+----------------+----------------+

The following unpivot operation can be applied:

SELECT *
FROM
(
    SELECT
        name,
        Home_Phone AS Home,
        Work_Phone AS Work,
        Cell_Phone AS Cell
    FROM
        example.customer_contact
)
UNPIVOT (
    phone_number FOR phone_type in (Home, Work, Cell)
)

The data will be unpivoted into a table like this:

+--------+----------------+--------------+
| name   | phone_number   | phone_type   |
+--------+----------------+--------------+
| Jane   | 123-456-7890   | Home         |
| Jane   | 111-222-3333   | Work         |
| Jane   |                | Cell         |
| John   | 123-456-7890   | Home         |
| John   |                | Work         |
| John   | 333-222-1111   | Cell         |
+--------+----------------+--------------+

Note

If the original column names can be used as the values of the unpivot key, as is, the pre-selection and renaming of those columns using a subquery in the FROM clause can be eliminated.

For example, unpivoting without aliasing the quarterly grade columns will result in those exact column names being used as the quarter values:

SELECT *
FROM example.student_grade
UNPIVOT
(
    grade FOR quarter IN (q1_grade, q2_grade, q3_grade, q4_grade)
)

+--------------+-------------+------------+
|   student_id |       grade | quarter    |
+--------------+-------------+------------+
|            1 |   80.0      | q1_grade   |
|            1 |   90.0      | q2_grade   |
|            1 |   85.0      | q3_grade   |
|            1 |   95.0      | q4_grade   |
|            2 |   82.0      | q1_grade   |
|            2 |             | q2_grade   |
|            2 |   87.0      | q3_grade   |
|            2 |   92.0      | q4_grade   |
|            3 |   73.0      | q1_grade   |
|            3 |   77.0      | q2_grade   |
|            3 |   97.0      | q3_grade   |
|            3 |             | q4_grade   |
+--------------+-------------+------------+

Set Operations¶

There are three types of supported set operations, each having the option of returning duplicate records in the result set by using the keyword ALL:

UNION [ALL] - return all records from both source data sets
INTERSECT [ALL] - return only records that exist in both source data sets
EXCEPT [ALL] - return all records that exist in the first data set, but not in the second

UNION¶

The UNION set operator creates a single list of records from the results of two SELECT statements. Use the ALL keyword to keep all records from both sets; omit it to remove duplicate records and form a single list of records unique between the two sets. See Limitations and Cautions for limitations.

<SELECT statement>
UNION [ALL]
<SELECT statement>

For example, given a table of lunch menu items and another table of dinner menu items, a UNION can be used to return all unique lunch & dinner menu items together, including items that are the same on both menus, but of a different price:

SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
UNION
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

Note

Since the example includes price and all columns selected must match between the two sets for an item to be considered a duplicate, a lunch item that is priced differently as a dinner item would also appear in the result set.

A UNION ALL can be used to return all lunch & dinner menu items together, including duplicates:

SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
UNION ALL
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

INTERSECT¶

The INTERSECT set operator creates a single list of records that exist in both of the result sets from two SELECT statements. Use the ALL keyword to keep duplicate records that exist in both sets; omit it to remove duplicate records and form a single list of records that exist in both sets. See Limitations for limitations.

<SELECT statement>
INTERSECT [ALL]
<SELECT statement>

For example, given a table of lunch menu items and another table of dinner menu items, an INTERSECT can be used to return all lunch menu items (excluding duplicates) that are also dinner items for the same price:

SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
INTERSECT
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

Note

Since the example includes price and all columns selected must match between the two sets for an item to be included, a lunch item that is priced differently as a dinner item would not appear in the result set.

EXCEPT¶

The EXCEPT set operator performs set subtraction, creating a single list of records that exist in the first SELECT statement's result set, but not in the second SELECT statement's result set. Use the ALL keyword to keep duplicate records that exist in the first set but not in the second; omit it to remove duplicate records and form a single list of records that exist in the first set but not the second. See Limitations for limitations:

<SELECT statement>
EXCEPT [ALL]
<SELECT statement>

For example, given a table of lunch menu items and another table of dinner menu items, an EXCEPT can be used to return all lunch menu items (excluding duplicates) that are not also dinner items for the same price:

SELECT
    food_name,
    category,
    price
FROM
    example.lunch_menu
EXCEPT
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

Note

Since the example includes price and all columns selected must match between the two sets for an item to be eliminated, a lunch item that is priced differently as a dinner item would still appear in the result set.

WITH (Common Table Expressions)¶

The WITH operation, also known as a Common Table Expression (CTE) creates a set of data that can be aliased and used one or more times in subsequent operations. The aliased set can be used within the SELECT, FROM, or WHERE clauses of a subsequent query or a subsequent CTE within the same WITH operation.

Recursive WITH operations are not supported--the aliased set cannot refer to itself. The alias must be unique within the WITH statement--no other column or column alias can be similarly named, for example. Also, when used in a FROM clause and given a table alias, the table alias must be preceded with AS.

A CTE can be made available to a DML or DDL statement by having the WITH statement follow the CREATE TABLE...AS, INSERT, UPDATE, or DELETE statement (not precede it).

Each CTE definition within a WITH statement is structured as follows:

<cte name> [(column list)] AS (<SELECT statement>)

Each WITH statement can contain one or more CTE definitions, followed by a SELECT statement, as shown here:

WITH <cte definition>,...
<SELECT statement>

For example:

WITH
    dept2_emp_sal_by_mgr (manager_id, sal) AS
    (
        SELECT manager_id, salary
        FROM employee
        WHERE dept_id = 2
    )
SELECT
    manager_id dept2_mgr_id,
    MAX(sal) dept2_highest_emp_sal_per_mgr,
    COUNT(*) as dept2_total_emp_per_mgr
FROM dept2_emp_sal_by_mgr
GROUP BY manager_id

To apply the CTE to an INSERT statement, follow the INSERT clause with the WITH clause:

INSERT INTO dept2_emp_mgr_roster (emp_first_name, emp_last_name, mgr_first_name, mgr_last_name)
WITH
    dept2_emp AS
    (
        SELECT first_name, last_name, manager_id
        FROM employee
        WHERE dept_id = 2
    ),
    dept2_mgr AS
    (
        SELECT first_name, last_name, id
        FROM employee
        WHERE dept_id = 2
    )
SELECT d2emp.first_name, d2emp.last_name, d2mgr.first_name, d2mgr.last_name
FROM
    dept2_emp as d2emp
    JOIN dept2_mgr as d2mgr ON d2emp.manager_id = d2mgr.id

Iteration¶

Kinetica supports iteration over each record within a data set for the purpose of creating a result set with 0 to N result records per record in the original set.

This iteration can be variable, based on some value within each record, or fixed, based on a given constant value.

The iteration is performed by joining against the virtual ITER table, as follows:

SELECT *
FROM table, ITER
WHERE ITER.i < <column expression>

The <column expression> can be replaced by a constant for fixed iteration.

For example, to extract all of the individual letters from a column of words, with one record per letter extracted (using variable iteration):

SELECT id, word, i, SUBSTR(word, i + 1, 1) AS letter
FROM dictionary
JOIN ITER ON i < LENGTH(word)
ORDER BY id, i;

To duplicate the set of words five times (using fixed iteration):

SELECT *
FROM dictionary, ITER
WHERE i < 5
ORDER BY id, i;

For more detail, examples, and limitations, see Iteration.

Constants¶

Each data type has an associated literal constant syntax, which can be used, for instance, to insert constant data values into those columns.

Numeric Constants¶

Integer and floating point data types can be either single-quoted or not.

For example:

INSERT INTO numeric_types (int_type1, int_type2, float_type1, float_type2)
VALUES
(
    1,
    '2',
    3.4,
    '5.6'
)

String-Based Constants¶

String-based data types should be single-quoted.

For example:

INSERT INTO string_types (varchar_type, charn_type, ipv4_type, wkt_type)
VALUES
(
    'varchar value',
    'charN value',
    '12.34.56.78',
    'POINT(0 0)'
)

Binary Constants¶

Binary types can be represented in either of the following forms:

single-quoted or unquoted base-10
single-quoted hexadecimal

For example:

INSERT INTO byte_types (bytes_type)
VALUES
    (12345678901234567890),
    ('12345678901234567890'),
    ('0x00AB54A98CEB1F0AD2')

Date/Time Constants¶

Kinetica accepts unqualified single-quoted date/time values, ANSI SQL, and ODBC escape sequences in the following formats:

Data Type	Native	ANSI	ODBC
Date	`'YYYY-[M]M-[D]D'`	`DATE 'YYYY-MM-DD'`	`{d 'YYYY-MM-DD'}`
Time	`'[H]H24:MI:SS.mmm'`	`TIME 'HH24:MI:SS.mmm'`	`{t 'HH24:MI:SS.mmm'}`
DateTime	`'YYYY-[M]M-[D]D[T\| ][H]H24:MI:SS.mmm[Z]'`	`TIMESTAMP 'YYYY-MM-DD HH24:MI:SS.mmm'`	`{ts 'YYYY-MM-DD HH24:MI:SS.mmm'}`
Timestamp	`'YYYY-[M]M-[D]D [H]H24:MI:SS.mmm'`	`TIMESTAMP 'YYYY-MM-DD HH24:MI:SS.mmm'`	`{ts 'YYYY-MM-DD HH24:MI:SS.mmm'}`

For example:

Native:

INSERT INTO date_time_types (date_type, time_type, datetime_type, timestamp_type)
VALUES
(
    '2000-01-02',
    '12:34:56.789',
    '2000-01-02 12:34:56.789',
    '2000-01-02 12:34:56.789'
),
(
    '2000-1-02',
    '1:23:45.678',
    '2000-01-02T12:34:56.789Z',
    '2000-1-02 1:23:45.678'
)

ANSI:

INSERT INTO date_time_types (date_type, time_type, datetime_type, timestamp_type)
VALUES
(
    DATE '2000-01-02',
    TIME '12:34:56.789',
    TIMESTAMP '2000-01-02 12:34:56.789',
    TIMESTAMP '2000-01-02 12:34:56.789'
)

ODBC:

INSERT INTO date_time_types (date_type, time_type, datetime_type, timestamp_type)
VALUES
(
    {d '2000-01-02'},
    {t '12:34:56.789'},
    {ts '2000-01-02 12:34:56.789'},
    {ts '2000-01-02 12:34:56.789'}
)

Expressions¶

An expression can consist of a literal constant, a column name, or a function applied to a constant or column name. A compound expression is an operation or function applied to one or more expressions.

The following are the supported expression operators:

+ addition
- subtraction
* multiplication
/ division
() grouping
|| string concatenation

Note

Use double quotes to specify column names in a case-sensitive manner.

Conditional Functions¶

Function Description

DECODE(expr, match_a, value_a, ..., match_N, value_N) Evaluates expr: returns the first value whose corresponding match is equal to expr

IF(expr, value_if_true, value_if_false)

Evaluates expr: if true, returns value_if_true; otherwise, value_if_false

Parameter	Description
`expr`	any true/false condition Note When an integer column is used directly, this function will will interpret non-zero values as true and zero values as false.
`value_if_true`	any numeric value
`value_if_false`	any numeric value

CASE¶

The case statement acts as a scalar function, but has two more complex forms. Note that for each of these CASE statements, the value expressions must all be of the same or convertible data type.

In the first form, each WHEN is followed by a conditional expression whose corresponding THEN expression will have its value returned, if true. Control will continue through each WHEN until a match is found and the corresponding value returned; if no match is found, the value of the ELSE expression will be returned, or null, if no ELSE clause exists.

CASE
    WHEN <cond_expr_a> THEN <value_expr_a>
    ...
    WHEN <cond_expr_N> THEN <value_expr_N>
    ELSE <value_expr>
END

In the second form, the CASE expression is evaluated. A match of that result will be attempted against each WHEN expression until a match is found and the value of the corresponding THEN expression returned; if no match is found, the value of the ELSE expression will be returned, or null, if no ELSE clause exists.

CASE <expr>
    WHEN <match_expr_a> THEN <value_expr_a>
    ...
    WHEN <match_expr_N> THEN <value_expr_N>
    ELSE <value_expr>
END

Note

This second version below has greater optimization than the first.

Examples:

CASE
    WHEN color = 1 THEN 'Red'
    WHEN color >= 2 THEN 'Green'
    ELSE 'Blue'
END

CASE mod(length(text), 2)
    WHEN 0 THEN 'Even'
    WHEN 1 THEN 'Odd'
    ELSE null
END

Conversion Functions¶

Function Description

CAST(expr AS [SQL_]<conv_type>)

or

CONVERT(expr, [SQL_]<conv_type>)

Converts expr into conv_type data type

Conversion Types:

Numeric	String	Date/Time
`BIGINT`	`CHAR(N)`	`DATE`
`DECIMAL(P,S)`	`VARCHAR(N)`	`DATETIME`
`DOUBLE`		`TIME`
`FLOAT`		`TIMESTAMP`
`INTEGER`		`TYPE_DATE`
`NUMERIC(P,S)`		`TYPE_TIME`
`REAL`		`TYPE_TIMESTAMP`
`SMALLINT`
`TINYINT`
`UNSIGNED` `BIGINT`

Note

When using the SQL_ prefix, UNSIGNED BIGINT becomes SQL_UNSIGNED_BIGINT

CHAR(expr) Returns the character associated with the ASCII code given in expr

CHAR1(expr) Converts the given expr to VARCHAR(1) type

CHAR2(expr) Converts the given expr to VARCHAR(2) type

CHAR4(expr) Converts the given expr to VARCHAR(4) type

CHAR8(expr) Converts the given expr to VARCHAR(8) type

CHAR16(expr) Converts the given expr to VARCHAR(16) type

CHAR32(expr) Converts the given expr to VARCHAR(32) type

CHAR64(expr) Converts the given expr to VARCHAR(64) type

CHAR128(expr) Converts the given expr to VARCHAR(128) type

CHAR256(expr) Converts the given expr to VARCHAR(256) type

DATE(expr) Converts expr to date (YYYY-MM-DD) format

DATETIME(expr) Converts expr to datetime (YYYY-MM-DD HH24:MI:SS.mmm) format

DECIMAL(expr) Converts the given expr to DECIMAL type

DOUBLE(expr) Converts the given expr to DOUBLE type

FLOAT(expr) Converts the given expr to REAL type

INT(expr) Converts the given expr to INTEGER type

LONG(expr) Converts the given expr to BIGINT type

TIME(expr) Converts expr to time (HH24:MI:SS) format

TIMESTAMP(expr) Converts expr to the number of milliseconds since the epoch

TO_CHAR(expr, format)

Converts the given date/time expr to a string matching the given format. The format must be a string literal--expressions are not supported at this time. Arbitrary text can be injected into the format string using double-quotes.

The returned string will be truncated at 32 characters.

Valid format codes include:

Format	Description
`YYYY`	4-digit year
`YY`	2-digit year
`M`	Number of month in year, without leading zero [`1` - `12`]
`MM`	Number of month in year, with leading zero [`01` - `12`]
`MON`	3-character abbreviation of month in title case; e.g., `Jan`
`MONTH`	Full name of month; e.g., `January`
`C`	Number of day in month, without leading zero [`1` - `31`]
`DD`	Number of day in month, with leading zero [`01` - `31`]
`DY`	3-character abbreviation of day of the week; e.g., `Fri`
`DAY`	Full name of day of the week; e.g., `Friday`
`H`	Alias for H12
`HH`	Alias for HH12
`H12`	Hour of day in 12-hour format, without leading zero [`0` - `11`]
`HH12`	Hour of day in 12-hour format, with leading zero [`00` - `11`]
`H24`	Hour of day in 24-hour format, without leading zero [`0` - `23`]
`HH24`	Hour of day in 24-hour format, with leading zero [`00` - `23`]
`MI`	Minute of hour [`00` - `59`]
`SS`	Second of minute [`00` - `59`]
`AM`	Before or after noon designator [`AM`, `PM`]
`PM`	Alias for AM

Example:

Function Call	`TO_CHAR(DATETIME(971181296000), '"Last login: "YYYY-MM-DD')`
Return	`Last login: 2000-10-10`

ULONG(expr) Converts the given expr to UNSIGNED BIGINT type

Date/Time Functions¶

This section comprises the following functions:

Date/Time Base Functions, which can extract parts of date/time expressions, convert back and forth between data types, and return the current date/time
Date/Time Complex Conversion Functions, which can perform more complex date/type conversions

Date/Time Base Functions¶

Function Description

CURRENT_DATE() Returns the date as YYYY-MM-DD

CURRENT_DATETIME() Returns the date & time as YYYY-MM-DD HH24:MI:SS.mmm

CURRENT_TIME() Returns the time as HH24:MI:SS.mmm

CURRENT_TIMESTAMP() Returns the date & time as YYYY-MM-DD HH24:MI:SS.mmm; to return the date & time as the number of milliseconds since the epoch, pass the result of this function to LONG()

DATEADD(unit, amount, expr)

Adds the positive or negative integral amount of unit date/time intervals to the date/time in expr

The following date/time intervals are supported for unit:

Constant	Description
`'YEAR'`	Year is modified by interval amount (not affected by leap year, etc.)
`'MONTH'`	Month is modified by interval amount and date adjusted if overflow/underflow occurs; day adjusted to last day of calculated month if not a valid day for that month (e.g. Apr 31st -> Apr 30th)
`'DAY'`	Day is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`'HOUR'`	Hour is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`'MINUTE'`	Minute is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`'SECOND'`	Second is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`'MILLISECOND'`	Millisecond is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`'QUARTER'`	Month is modified by three times the interval amount, irrespective of the number of days in the months between; day adjusting performed the same as the `MONTH` description, but only on final month (e.g., Jan 31st + 1 quarter will be Apr 30th, not Apr 28th because of February)
`'WEEK'`	Day is modified by 7 times the interval amount (time not affected by daylight savings time, etc.); month & year are adjusted, if overflow/underflow occurs

Note

Any of these unit types can have a SQL_TSI_ prefix prepended to them; e.g., both DAY and SQL_TSI_DAY are valid unit types for specifying a day interval

Examples:

Function Call	Result
`DATEADD('YEAR', 1, '2000-10-10')`	`2001-10-10`
`DATEADD('MONTH', 1, '2000-01-31')`	`2000-02-29`
`DATEADD('DAY', 1, '2000-12-31')`	`2001-01-01`
`DATEADD('HOUR', 12, '2000-10-10 12:34:56')`	`2000-10-11 00:34:56.000`
`DATEADD('MINUTE', 1, '2000-10-10 12:34:56')`	`2000-10-10 12:35:56.000`
`DATEADD('SECOND', 1, '2000-12-31 23:59:59')`	`2001-01-01 00:00:00.000`
`DATEADD('MILLISECOND', 1, '2000-10-10 12:34:56')`	`2000-10-10 12:34:56.001`
`DATEADD('QUARTER', 1, '2000-11-30')`	`2001-02-28`
`DATEADD('WEEK', 53, '2000-01-01')`	`2001-01-06`

DATEDIFF(expr_begin, expr_end) Determines the difference between two dates, irrespective of time component, as the number of days when expr_begin is subtracted from expr_end; returns a negative number of days if expr_begin occurs after expr_end

DAY(expr) Alias for DAYOFMONTH(expr)

DAYNAME(expr) Extracts the day of the week from expr and converts it to the corresponding day name [Sunday - Saturday ]

DAYOFMONTH(expr) Extracts the day of the month from expr [1 - 31]

DAYOFWEEK(expr)

Extracts the day of the week from expr [1 - 7]

Expression Value	Result
Date on Sunday	`1`
Date on Monday	`2`
Date on Tuesday	`3`
Date on Wednesday	`4`
Date on Thursday	`5`
Date on Friday	`6`
Date on Saturday	`7`

DAY_OF_WEEK(expr) Alias for DAYOFWEEK(expr)

DAYOFYEAR(expr) Extracts the day of the year from expr [1 - 366]

DAY_OF_YEAR(expr) Alias for DAYOFYEAR(expr)

HOUR(expr) Extracts the hour of the day from expr [0 - 23]

<expr> + INTERVAL '<amount>' <part>

<expr> - INTERVAL '<amount>' <part>

Adds to or subtracts from the date/time expr the integral amount units of type part. This mirrors the behavior of the TIMESTAMPADD function, only with a different format and different date/time part constants. The following date/time constants are supported for part:

Constant	Description
`YEAR`	Year is modified by interval amount (not affected by leap year, etc.)
`MONTH`	Month is modified by interval amount and date adjusted if overflow/underflow occurs; day adjusted to last day of calculated month if not a valid day for that month (e.g. Apr 31st -> Apr 30th)
`DAY`	Day is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`HOUR`	Hour is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`MINUTE`	Minute is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`SECOND`	Second is modified by interval amount; date/time are adjusted, if overflow/underflow occurs

LAST_DAY(date) Returns the date of the last day of the month in the given date

MINUTE(expr) Extracts the minute of the day from expr [0 - 59]

MONTH(expr) Extracts the month of the year from expr [1 - 12]

MONTHNAME(expr) Extracts the month of the year from expr and converts it to the corresponding month name [January - December]

MSEC(expr) Extracts the millisecond of the second from expr [0 - 999]

NEXT_DAY(date, day_of_week)

Returns the date of the next day of the week, provided as a day name in day_of_week, that occurs after the given date

Some examples, given that 2000-10-10 is a Tuesday:

Function Call	Result
`NEXT_DAY('2000-10-10', 'Wednesday')`	`2000-10-11`
`NEXT_DAY('2000-10-10', 'Friday')`	`2000-10-13`
`NEXT_DAY('2000-10-10', 'Tuesday')`	`2000-10-17`

NOW() Alias for CURRENT_DATETIME()

QUARTER(expr)

Extracts the quarter of the year from expr [1 - 4]

Expression Value	Result
Date in January, February, or March	`1`
Date in April, May, or June	`2`
Date in July, August, or September	`3`
Date in October, November, or December	`4`

SECOND(expr) Extracts the seconds of the minute from expr [0 - 59]

SEC(expr) Alias for SECOND(expr)

TIMESTAMPADD(unit, amount, expr)

Adds the positive or negative integral amount of unit date/time intervals to the date/time in expr

The following date/time intervals are supported for unit:

Constant	Description
`YEAR`	Year is modified by interval amount (not affected by leap year, etc.)
`MONTH`	Month is modified by interval amount and date adjusted if overflow/underflow occurs; day adjusted to last day of calculated month if not a valid day for that month (e.g. Apr 31st -> Apr 30th)
`DAY`	Day is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`HOUR`	Hour is modified by interval amount (time not affected by daylight savings time, etc.); date is adjusted, if overflow/underflow occurs
`MINUTE`	Minute is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`SECOND`	Second is modified by interval amount; date/time are adjusted, if overflow/underflow occurs
`FRAC_SECOND`	Nanosecond is modified by interval amount; date/time are adjusted, if overflow/underflow occurs Note Time is processed to millisecond precision, so any portion of an `amount` with finer granularity than 1,000,000 nanoseconds will be ignored (e.g., requesting the addition of 1,234,567 nanoseconds will result in 1 millisecond actually being added)
`QUARTER`	Month is modified by three times the interval amount, irrespective of the number of days in the months between; day adjusting performed the same as the `MONTH` description, but only on final month (e.g., Jan 31st + 1 quarter will be Apr 30th, not Apr 28th because of February)
`WEEK`	Day is modified by 7 times the interval amount (time not affected by daylight savings time, etc.); month & year are adjusted, if overflow/underflow occurs

Note

Any of these unit types can have a SQL_TSI_ prefix prepended to them; e.g., both DAY and SQL_TSI_DAY are valid unit types for specifying a day interval

Examples:

Function Call	Result
`TIMESTAMPADD(SQL_TSI_YEAR, 1, '2000-10-10')`	`2001-10-10`
`TIMESTAMPADD(SQL_TSI_MONTH, 1, '2000-01-31')`	`2000-02-29`
`TIMESTAMPADD(SQL_TSI_DAY, 1, '2000-12-31')`	`2001-01-01`
`TIMESTAMPADD(SQL_TSI_HOUR, 12, '2000-10-10 12:34:56')`	`2000-10-11 00:34:56.000`
`TIMESTAMPADD(SQL_TSI_MINUTE, 1, '2000-10-10 12:34:56')`	`2000-10-10 12:35:56.000`
`TIMESTAMPADD(SQL_TSI_SECOND, 1, '2000-12-31 23:59:59')`	`2001-01-01 00:00:00.000`
`TIMESTAMPADD(SQL_TSI_FRAC_SECOND, 1000000, '2000-10-10 12:34:56')`	`2000-10-10 12:34:56.001`
`TIMESTAMPADD(SQL_TSI_QUARTER, 1, '2000-11-30')`	`2001-02-28`
`TIMESTAMPADD(SQL_TSI_WEEK, 53, '2000-01-01')`	`2001-01-06`

TIMESTAMPDIFF(unit, begin, end)

Calculates the difference between two date/time expressions, returning the result as an integral difference in the units specified; more precisely, how many whole date/time intervals of type unit need to be added to (or subtracted from) begin to equal end (or get as close as possible without going past it) using the unit types and and rules specified in TIMESTAMPADD.

Note

This is not symmetric with TIMESTAMPADD in all cases, as adding 1 MONTH to Mar 31st results in Apr 30th, but the TIMESTAMPDIFF in MONTH units between those two dates is 0.

Examples:

Function Call	Result
`TIMESTAMPDIFF(MONTH, DATE('2000-10-10'), DATE('2000-12-31'))`	`2`
`TIMESTAMPDIFF(MONTH, DATE('2000-03-31'), DATE('2000-04-30'))`	`0`
`TIMESTAMPDIFF(MONTH, DATE('2000-12-31'), DATE('2000-10-10'))`	`-2`
`TIMESTAMPDIFF(HOUR, 978222896000, DATETIME('2000-10-10 12:34:56'))`	`-1956`

WEEK(expr) Extracts the week of the year from expr [1 - 54]; each full week starts on Sunday (a 1 is returned for the week containing Jan 1st)

YEAR(expr) Extracts the year from expr; 4-digit year, A.D.

Date/Time Complex Conversion Functions¶

Function Description

DATE_TO_EPOCH_MSECS(year, month, day, hour, min, sec, msec)

Converts the full date to milliseconds since the epoch; negative values are accepted

Example:

Function Call	`DATE_TO_EPOCH_MSECS(2017, 06, -15, 09, 22, 15, 42)`
Return	`1494926535042`
Resolves To	Tuesday, May 16, 2017 9:22:15.042 AM

DATE_TO_EPOCH_SECS(year, month, day, hour, min, sec)

Converts the full date to seconds since the epoch; negative values are accepted

Example:

Function Call	`DATE_TO_EPOCH_SECS(2017, 06, -15, 09, 22, 15)`
Return	`1494926535`
Resolves To	Tuesday, May 16, 2017 9:22:15 AM

MSECS_SINCE_EPOCH(timestamp)

Converts the timestamp to milliseconds since the epoch

Example:

Function Call	`MSECS_SINCE_EPOCH('2000-10-10 12:34:56.789')`
Return	`971181296789`

TIMESTAMP_FROM_DATE_TIME(date, time)

Converts the given date and time to a composite date/time format

Example:

Function Call	`TIMESTAMP_FROM_DATE_TIME('2017-06-15', '10:37:30')`
Return	`2017-06-15 10:37:30.000`

WEEK_TO_EPOCH_MSECS(year, week_number)

Converts the year and week number to milliseconds since the epoch; negative values are accepted

Example:

Function Call	`WEEK_TO_EPOCH_MSECS(2017,-32)`
Return	`1463270400000`
Resolves To	Sunday, May 15, 2016 12:00:00 AM

WEEK_TO_EPOCH_SECS(year, week_number)

Converts the year and week number to seconds since the epoch. Negative values are accepted. Each new week begins Sunday at midnight.

Example:

Function Call	`WEEK_TO_EPOCH_SECS(2017,-32)`
Return	`1463270400`
Resolves To	Sunday, May 15, 2016 12:00:00 AM

Geospatial/Geometry Functions¶

Tip

Use ST_ISVALID to determine if a geometry object is valid. The functions below work best with valid geometry objects.
Use the REMOVE_NULLABLE function to remove any nullable column types that could result from calculating a derived column (e.g., as in Projections) using one of the functions below.

Enhanced Performance Scalar Functions¶

The functions below all compare x and y coordinates to geometry objects (or vice versa), thus increasing their performance in queries. Each of these functions have a geometry-to-geometry version listed in the next section.

Function	Description
`STXY_CONTAINS(geom, x, y)`	Returns `1` (true) if `geom` contains the `x` and `y` coordinate, e.g. lies in the interior of `geom`. The coordinate cannot be on the boundary and also be contained because `geom` does not contain its boundary
`STXY_CONTAINSPROPERLY(geom, x, y)`	Returns `1` (true) if the `x` and `y` coordinate intersects the interior of `geom` but not the boundary (or exterior) because `geom` does not contain its boundary but does contain itself
`STXY_COVEREDBY(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate is covered by `geom`
`STXY_COVERS(geom, x, y)`	Returns `1` (true) if `geom` covers the `x` and `y` coordinate
`STXY_DISJOINT(x, y, geom)`	Returns `1` (true) if the given `x` and `y` coordinate and the geometry `geom` do not spatially intersect.
`STXY_DISTANCE(x, y, geom[, solution])`	Calculates the minimum distance between the given `x` and `y` coordinate and `geom` using the specified solution type. Solution types available: `0` (default) - Euclidean; returns 2-D Euclidean distance `1` - Haversine; returns minimum sphere distance in meters `2` - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance Note: If the `x` and `y` coordinate and `geom` intersect (verify using `ST_INTERSECTS`), the distance will always be `0`.
`STXY_DWITHIN(x, y, geom, distance[, solution])`	Returns `1` (true) if the `x` and `y` coordinate is within the specified `distance` from `geom` using the specified solution type. Solution types available: `0` (default) - Euclidean; uses degrees to calculate distance `1` - Sphere; uses meters to calculate sphere distance `2` - Spheroid; uses meters to calculate spheroid distance
`STXY_ENVDWITHIN(x, y, geom, distance[, solution])`	Returns `1` (true) if the `x` and `y` coordinate is within the specified `distance` from the bounding box of `geom` using the specified solution type. Solution types available: `0` (default) - Euclidean; uses degrees to calculate distance `1` - Sphere; uses meters to calculate distance
`STXY_ENVINTERSECTS(x, y, geom)`	Returns `1` (true) if the bounding box of the given geometry `geom` intersects the `x` and `y` coordinate.
`STXY_INTERSECTION(x, y, geom)`	Returns the shared portion between the `x` and `y` coordinate and the given geometry `geom`, i.e. the point itself.
`STXY_INTERSECTS(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate and `geom` intersect in 2-D.
`STXY_TOUCHES(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate and geometry `geom` have at least one point in common but their interiors do not intersect. If `geom` is a GEOMETRYCOLLECTION, a `0` is returned regardless if the point and geometry touch
`STXY_WITHIN(x, y, geom)`	Returns `1` (true) if the `x` and `y` coordinate is completely inside the `geom` geometry i.e., not on the boundary

Scalar Functions¶

Function	Description
`DIST(x1, y1, x2, y2)`	Computes the Euclidean distance (in degrees), i.e. `SQRT( (x1-x2)(x1-x2) + (y1-y2)(y1-y2) )`.
`GEODIST(lon1, lat1, lon2, lat2)`	Computes the geographic great-circle distance (in meters) between two lat/lon points.
`GEOHASH_DECODE_LATITUDE(geohash)`	Decodes a given `geohash` and returns the latitude value for the given hash string. Supports a maximum geohash character length of 16.
`GEOHASH_DECODE_LONGITUDE(geohash)`	Decodes a given `geohash` and returns the longitude value for the given hash string. Supports a maximum geohash character length of 16.
`GEOHASH_ENCODE(lat, lon, precision)`	Encodes a given coordinate pair and returns a hash string with a given `precision`.
`ST_ADDPOINT(linestring, point, position)`	Adds a the given `point` geometry to the given `linestring` geometry at the specified `position`, which is a 0-based index.
`ST_ALMOSTEQUALS(geom1, geom2, decimal)`	Returns `1` (true) if given geometries, `geom1` and `geom2`, are almost spatially equal within the given amount of `decimal` scale. Note that geometries will still be considered equal if the decimal scale for the geometries is within a half order of magnitude of each other, e.g, if `decimal` is set to 2, then POINT(63.4 123.45) and POINT(63.4 123.454) are equal, but POINT(63.4 123.45) and POINT(63.4 123.459) are not equal. The geometry types must match to be considered equal.
`ST_AREA(geom[, solution])`	Returns the area of the given geometry `geom` if it is a POLYGON or MULTIPOLYGON using the specified solution type. Returns `0` if the input geometry type is (MULTI)POINT or (MULTI)LINESTRING. Solution types available: `0` (default) - 2D Euclidean area `1` - curved surface area on a sphere in square meters `2` - curved surface area on a spheroid in square meters
`ST_AZIMUTH(geom1, geom2)`	Returns the azimuth in radians defined by the segment between two POINTs, `geom1` and `geom2`. Returns a `null` if the input geometry type is MULTIPOINT, (MULTI)LINESTRING, or (MULTI)POLYGON.
`ST_BOUNDARY(geom)`	Returns the closure of the combinatorial boundary of a given geometry `geom`. Returns an empty geometry if `geom` is an empty geometry. Returns a `null` if `geom` is a GEOMETRYCOLLECTION
`ST_BOUNDINGDIAGONAL(geom)`	Returns the diagonal of the given geometry's (`geom`) bounding box.
`ST_BUFFER(geom, radius[, style[, solution]])`	Returns a geometry that represents all points whose distance from the given geometry `geom` is less than or equal to the given distance `radius`. The `radius` units can be specified by the `solution` type (default is in degrees) and the `radius` is created in the provided `style`. The `style` options are specified as a list of blank-separated key-value pairs, e.g., `'quad_segs=8 endcap=round'`. If an empty `style` list (`''`) is provided, the default settings will be used. The `style` parameter must be specified to provide a `solution` type. Available `style` options: `quad_segs` -- the number of segments used to approximate a quarter circle (default is `8`) `endcap` -- the endcap style of the buffer (default is `round`); options are `round`, `flat` (or `butt`), and `square` `join` -- the join style of the buffer (default is `round`); options are `round`, `mitre` (or `miter`), and `bevel` `mitre_limit` -- the mitre ratio limit expressed as a floating point number (`miter_limit` is also acceptable) Available `solution` types: `0` (default) - 2D Euclidean radius distance in degrees `1` - curved surface radius distance on a sphere in meters `2` - curved surface radius distance on a spheroid in meters Tip To create a 5-meter buffer around `geom` using the default styles: `ST_BUFFER(geom, 5, '', 1)`. To create a 5-foot (converting feet to meters) buffer around `geom` using the following styles: `ST_BUFFER(geom, 5*0.3048,'quad_segs=4 endcap=flat', 1)`
`ST_CENTROID(geom)`	Calculates the center of the given geometry `geom` as a POINT. For (MULTI)POINTs, the center is calculated as the average of the input coordinates. For (MULTI)LINESTRINGs, the center is calculated as the weighted length of each given LINESTRING. For (MULTI)POLYGONs, the center is calculated as the weighted area of each given POLYGON. If `geom` is an empty geometry, an empty GEOMETRYCOLLECTION is returned
`ST_CLIP(geom1, geom2)`	Returns the geometry shared between given geometries `geom1` and `geom2`
`ST_CLOSESTPOINT(geom1, geom2[, solution])`	Calculates the 2-D `POINT` in `geom1` that is closest to `geom2` using the specified solution type. If `geom1` or `geom2` is empty, a `null` is returned. Solution types available: `0` (default) - Euclidean; calculates the closest point using 2-D Euclidean distance `1` - Haversine; calculates the closest point using sphere distance in meters `2` - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance
`ST_COLLECT(geom1, geom2)`	Returns a MULTI* or GEOMETRYCOLLECTION comprising `geom1` and `geom2`. If `geom1` and `geom2` are the same, singular geometry type, a MULTI* is returned, e.g., if `geom1` and `geom2` are both POINTs (empty or no), a MULTIPOINT is returned. If `geom1` and `geom2` are neither the same type nor singular geometries, a GEOMETRYCOLLECTION is returned.
`ST_COLLECTIONEXTRACT(collection, type)`	Returns only the specified `type` from the given geometry `collection`. Type is a number that maps to the following: `1` = `POINT` `2` = `LINESTRING` `3` = `POLYGON`
`ST_COLLECTIONHOMOGENIZE(collection)`	Returns the simplest form of the given `collection`, e.g., a collection with a single POINT will be returned as `POINT(x y)`, and a collection with multiple individual points will be returned as a MULTIPOINT.
`ST_CONCAVEHULL(geom, target_percent[, allow_holes])`	Returns a potentially concave geometry that encloses all geometries found in the given `geom` set. Use `target_percent` (values between 0 and 1) to determine the percent of area of a convex hull the concave hull will attempt to fill; `1` will return the same geometry as an `ST_CONVEXHULL` operation. Set `allow_holes` to `1` (true) to allow holes in the resulting geometry; default value is `0` (false). Note that `allow_holes` is independent of the area of `target_percent`.
`ST_CONTAINS(geom1, geom2)`	Returns `1` (true) if no points of `geom2` lie in the exterior of `geom1` and at least one point of `geom2` lies in the interior of `geom1`. Note that `geom1` does not contain its boundary but does contain itself.
`ST_CONTAINSPROPERLY(geom1, geom2)`	Returns `1` (true) if `geom2` intersects the interior of `geom1` but not the boundary (or exterior). Note that `geom1` does not contain its boundary but does contain itself.
`ST_CONVEXHULL(geom)`	Returns the minimum convex geometry that encloses all geometries in the given `geom` set.
`ST_COORDDIM(geom)`	Returns the coordinate dimension of the given `geom`, e.g., a geometry with `x`, `y`, and `z` coordinates would return `3`.
`ST_COVEREDBY(geom1, geom2)`	Returns `1` (true) if no point in `geom1` is outside `geom2`.
`ST_COVERS(geom1, geom2)`	Returns `1` (true) if no point in `geom2` is outside `geom1`.
`ST_CROSSES(geom1, geom2)`	Returns `1` (true) if the given geometries, `geom1` and `geom2`, spatially cross, meaning some but not all interior points in common. If `geom1` and/or `geom2` are a GEOMETRYCOLLECTION, a `0` is returned regardless if the two geometries cross
`ST_DIFFERENCE(geom1, geom2)`	Returns a geometry that represents the part of `geom1` that does not intersect with `geom2`.
`ST_DIMENSION(geom)`	Returns the dimension of the given geometry `geom`, which is less than or equal to the coordinate dimension. If `geom` is a single geometry, a `0` is for `POINT`, a `1` is for `LINESTRING`, and a `2` is for `POLYGON`. If `geom` is a collection, it will return the largest dimension from the collection. If `geom` is empty, `0` is returned.
`ST_DISJOINT(geom1, geom2)`	Returns `1` (true) if the given geometries, `geom1` and `geom2`, do not spatially intersect.
`ST_DISTANCE(geom1, geom2[, solution])`	Calculates the minimum distance between the given geometries, `geom1` and `geom2`, using the specified solution type. Solution types available: `0` (default) - Euclidean; returns 2-D Euclidean distance `1` - Haversine; returns minimum sphere distance in meters `2` - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance Note: If `geom1` and `geom2` intersect (verify using `ST_INTERSECTS`), the distance will always be `0`.
`ST_DISTANCEPOINTS(x1, y1, x2, y2[, solution])`	Calculates the minimum distance between the given points, `x1, y1` and `x2, y2`, using the specified solution type. Solution types available: `0` (default) - Euclidean; returns 2-D Euclidean distance `1` - Haversine; returns minimum sphere distance in meters `2` - Vincenty; returns minimum spheroid distance in meters, more accurate than Haversine but slower performance
`ST_DFULLYWITHIN(geom1, geom2, distance[, solution])`	Returns `1` (true) if the maximum distance between geometries `geom1` and `geom2` is less than or equal to the specified `distance` of each other using the specified solution type. If `geom1` or `geom2` is `null`, `0` (false) is returned. Solution types available: `0` (default) - Euclidean; uses degrees to calculate distance `1` - Sphere; uses meters to calculate distance `2` - Spheroid; uses meters to calculate distance, more accurate than sphere but slower performance
`ST_DWITHIN(geom1, geom2, distance[, solution])`	Returns `1` (true) if the minimum distance between geometries `geom1` and `geom2` is within the specified `distance` of each other using the specified solution type. Solution types available: `0` (default) - Euclidean; uses degrees to calculate distance `1` - Sphere; uses meters to calculate distance `2` - Spheroid; uses meters to calculate distance, more accurate than sphere but slower performance
`ST_ELLIPSE(centerx, centery, height, width)`	Returns an ellipse using the following values: `centerx` -- the x coordinate or longitude used to center the ellipse `centery` -- the y coordinate or latitude used to center the ellipse `height` -- the height of the ellipse (in degrees) `width` -- the width of the ellipse (in degrees)
`ST_ENDPOINT(geom)`	Returns the last point of the given `geom` as a POINT if it's a LINESTRING. If `geom` is not a a LINESTRING, `null` is returned.
`ST_ENVDWITHIN(geom1, geom2, distance[, solution])`	Returns `1` (true) if `geom1` is within the specified `distance` of the bounding box of `geom2` using the specified solution type. Solution types available: `0` (default) - Euclidean; uses degrees to calculate distance `1` - Sphere; uses meters to calculate distance
`ST_ENVELOPE(geom)`	Returns the bounding box of a given geometry `geom`.
`ST_ENVINTERSECTS(geom1, geom2)`	Returns `1` (true) if the bounding box of the given geometries, `geom1` and `geom2`, intersect.
`ST_EQUALS(geom1, geom2)`	Returns `1` (true) if the given geometries, `geom1` and `geom2`, are spatially equal. Note that order does not matter.
`ST_EQUALSEXACT(geom1, geom2, tolerance)`	Returns `1` (true) if the given geometries, `geom1` and `geom2`, are almost spatially equal within some given `tolerance`. If the values within the given geometries are within the `tolerance` value of each other, they're considered equal, e.g., if `tolerance` is 2, POINT(1 1) and POINT(1 3) are considered equal, but POINT(1 1) and POINT(1 3.1) are not. Note that the geometry types have to match for them to be considered equal.
`ST_ERASE(geom1, geom2)`	Returns the result of erasing a portion of `geom1` equal to the size of `geom2`.
`ST_EXPAND(geom, units)`	Returns the bounding box expanded in all directions by the given `units` of the given `geom`. The expansion can also be defined for separate directions by providing separate parameters for each direction, e.g., `ST_EXPAND(geom, unitsx, unitsy, unitsz, unitsm)`.
`ST_EXPANDBYRATE(geom, rate)`	Returns the bounding box expanded by a given `rate` (a ratio of width and height) for the given geometry `geom`. The `rate` must be between 0 and 1.
`ST_EXTERIORRING(geom)`	Returns a LINESTRING representing the exterior ring of the given POLYGON `geom`
`ST_FORCE2D(geom)`	Returns the 2-dimensional version (e.g., X and Y coordinates) of `geom`, the provided geometry or set of geometries (e.g., via GEOMETRYCOLLECTION or WKT column name).
`ST_FORCE3D(geom[, z])`	Returns the 3-dimensional version (e.g., X, Y, and Z coordinates) of `geom`, a provided geometry or set of geometries (e.g., via GEOMETRYCOLLECTION or WKT column name), using `z` as the geometry's new z-value. The provided z-values can also be derived from a numeric column. If no `z` is provided, a `0` will be applied. Note If a WKT column is provided for `geom` and a numeric column is provided for `z`, the z values will be matched to the provided geometries by row in the source table. If a singular geometry is provided for `geom` and a column is provided for `z`, three-dimensional versions of the provided geometry will be returned for each z value found in the provided `z` column. If columns are provided for both `geom` and `z` and nulls are present in either column, the row containing null values will be skipped in the results.
`ST_GENERATEPOINTS(geom, num)`	Creates a MULTIPOINT containing a number `num` of randomly generated points within the boundary of `geom`.
`ST_GEOHASH(geom, precision)`	Returns a hash string representation of the given geometry `geom` with specified `precision` (the length of the geohash string). The longer the `precision`, the more precise the hash is. By default, `precision` is set to `20`; the max for `precision` is `32`. Returns `null` if `geom` is an empty geometry. Note The value returned will not be a geohash of the exact geometry but a geohash of the centroid of the given geometry
`ST_GEOMETRYN(geom, index)`	Returns the `index` geometry back from the given `geom` geometry. The `index` starts from 1 to the number of geometry in `geom`.
`ST_GEOMETRYTYPE(geom)`	Returns the type of geometry from the given `geom`.
`ST_GEOMETRYTYPEID(geom)`	Returns the type ID of from `geom`. Type and ID mappings: POINT = 0 LINESTRING = 1 POLYGON = 3 MULTIPOINT = 4 MULTILINESTRING = 5 MULTIPOLYGON = 6 GEOMETRYCOLLECTION = 7
`ST_GEOMFROMGEOHASH(geohash, precision)`	Returns a POLYGON boundary box using the given `geohash` with a precision set by the integer `precision`. If `precision` is specified, the function will use as many characters in the hash equal to `precision` to create the geometry. If no `precision` is specified, the full length of the `geohash` is used.
`ST_GEOMFROMTEXT(wkt)`	Returns a geometry from the given Well-Known text representation `wkt`. Note that this function is only compatible with constants
`ST_HEXGRID(xmin, ymin, xmax, ymax, cell_side[, limit])`	Creates a MULTIPOLYGON containing a grid of hexagons between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual hexagons' sides is determined by `cell_side`. The `cell_side` cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by `limit`, a positive integer. Supported values for `limit`: `-1` - No limit to the number of cells generated (effectively limited by system memory) `0` (default) - 100 million cells `<n>` - Custom limit of `n` cells If the custom limit request specifies more cells (based on the bounding box and the `cell_side`) than the system limit, a `null` is returned.
`ST_INTERIORRINGN(geom, n)`	Returns the `n`-th interior LINESTRING ring of the POLYGON `geom`. If `geom` is not a POLYGON or the given `n` is out of range, a `null` is returned. The index begins at 1
`ST_INTERSECTION(geom1, geom2)`	Returns the shared portion between given geometries `geom1` and `geom2`
`ST_INTERSECTS(geom1, geom2)`	Returns `1` (true) if the given geometries, `geom1` and `geom2`, intersect in 2-D
`ST_ISCLOSED(geom)`	Returns `1` (true) if the given geometry's (`geom`) start and end points coincide
`ST_ISCOLLECTION(geom)`	Returns `1` (true) if `geom` is a collection, e.g., GEOMETRYCOLLECTION, MULTIPOINT, MULTILINESTRING, etc.
`ST_ISEMPTY(geom)`	Returns `1` (true) if `geom` is empty
`ST_ISRING(geom)`	Returns `1` (true) if LINESTRING `geom` is both closed (per `ST_ISCLOSED`) and "simple" (per `ST_ISSIMPLE`). Returns `0` if `geom` is not a LINESTRING
`ST_ISSIMPLE(geom)`	Returns `1` (true) if `geom` has no anomalous geometric points, e.g., self-intersection or self-tangency
`ST_ISVALID(geom)`	Returns `1` (true) if `geom` (typically a [MULTI]POLYGON) is well formed. A POLYGON is valid if its rings do not cross and its boundary intersects only at POINTs (not along a line). The POLYGON must also not have dangling LINESTRINGs. A MULTIPOLYGON is valid if all of its elements are also valid and the interior rings of those elements do not intersect. Each element's boundaries may touch but only at POINTs (not along a line)
`ST_LENGTH(geom[, solution])`	Returns the length of the geometry if it is a LINESTRING or MULTILINESTRING. Returns `0` if another type of geometry, e.g., POINT, MULTIPOINT, etc. GEOMETRYCOLLECTIONs are also supported but the aforementioned type limitation still applies; the collection will be recursively searched for LINESTRINGs and MULTILINESTRINGs and the summation of all supported geometry types is returned (unsupported types are ignored). Solution types available: `0` (default) - 2D Euclidean length `1` - length on a sphere in meters `2` - length on a spheroid in meters
`ST_LINEFROMMULTIPOINT(geom)`	Creates a LINESTRING from `geom` if it is a MULTIPOINT. Returns `null` if `geom` is not a MULTIPOINT
`ST_LINEINTERPOLATEPOINT(geom, fraction)`	Returns a POINT that represents the specified `fraction` of the LINESTRING `geom`. If `geom` is either empty or not a LINESTRING, `null` is returned
`ST_LINELOCATEPOINT(linestring, point)`	Returns the location of the closest point in the given `linestring` to the given `point` as a value between `0` and `1`. The return value is a fraction of the total `linestring` length.
`ST_LINEMERGE(geom)`	Returns a LINESTRING or MULTILINESTRING from a given `geom`. If `geom` is a MULTILINESTRING comprising LINESTRINGs with shared endpoints, a contiguous LINESTRING is returned. If `geom` is a LINESTRING or a MULTILINESTRING comprising LINESTRINGS without shared endpoints, `geom` is returned If `geom` is an empty (MULTI)LINESTRING or a (MULTI)POINT or (MULTI)POLYGON, an empty GEOMETRYCOLLECTION is returned.
`ST_LINESUBSTRING(geom, start_fraction, end_fraction)`	Returns the fraction of a given `geom` LINESTRING where `start_fraction` and `end_fraction` are between `0` and `1`. For example, given `LINESTRING(1 1, 2 2, 3 3)` a `start_fraction` of `0` and an `end_fraction` of `0.25` would yield the first quarter of the given LINESTRING, or `LINESTRING(1 1, 1.5 1.5)`. Returns `null` if `start_fraction` is greater than `end_fraction`. Returns `null` if input geometry is (MULTI)POINT, MULTILINESTRING, or (MULTI)POLYGON. Returns `null` if `start_fraction` and/or `end_fraction` are less than `0` or more than `1`.
`ST_LONGESTLINE(geom1, geom2[, solution])`	Returns the LINESTRING that represents the longest line of points between the two geometries. If multiple longest lines are found, only the first line found is returned. If `geom1` or `geom2` is empty, `null` is returned. Solution types available: `0` (default) - Euclidean; uses degrees to calculate the longest line `1` - Sphere; uses meters to calculate the longest line `2` - Spheroid; uses meters to calculate the longest line, more accurate than sphere but slower performance
`ST_MAKEENVELOPE(xmin, ymin, xmax, ymax)`	Creates a rectangular POLYGON from the given min and max parameters
`ST_MAKELINE(geom[, geom2])`	Creates a LINESTRING from `geom` if it is a MULTIPOINT. If `geom` is a POINT, there must be at least one other POINT to construct a LINESTRING. If `geom` is a LINESTRING, it must have at least two points. Returns `null` if `geom` is not a POINT, MULTIPOINT, or LINESTRING Note This function can be rather costly in terms of performance
`ST_MAKEPOINT(x, y)`	Creates a POINT at the given coordinate Note This function can be rather costly in terms of performance
`ST_MAKEPOLYGON(geom)`	Creates a POLYGON from `geom`. Inputs must be closed LINESTRINGs Note This function can be rather costly in terms of performance
`ST_MAKETRIANGLE2D(x1, y1, x2, y2, x3, y3)`	Creates a closed 2-D POLYGON with three vertices
`ST_MAKETRIANGLE3D(x1, y1, z1, x2, y2, z2,` `x3, y3, z3)`	Creates a closed 3-D POLYGON with three vertices
`ST_MAXDISTANCE(geom1, geom2[, solution])`	Returns the maximum distance between the given `geom1` and `geom2` geometries using the specifed solution type. If `geom1` or `geom2` is empty, `null` is returned. Solution types available: `0` (default) - returns maximum 2-D Euclidean distance `1` - Sphere; returns maximum distance in meters `2` - Spheroid; returns maximum distance in meters, more accurate than sphere but slower performance
`ST_MAXX(geom)`	Returns the maximum x coordinate of a bounding box for the given `geom` geometry. This function works for 2-D and 3-D geometries.
`ST_MAXY(geom)`	Returns the maximum y coordinate of a bounding box for the given `geom` geometry. This function works for 2-D and 3-D geometries.
`ST_MAXZ(geom)`	Returns the maximum z coordinate of a bounding box for the given `geom` geometry. This function works for 2-D and 3-D geometries.
`ST_MINX(geom)`	Returns the minimum x coordinate of a bounding box for the given `geom` geometry. This function works for 2-D and 3-D geometries.
`ST_MINY(geom)`	Returns the minimum y coordinate of a bounding box for the given `geom` geometry. This function works for 2-D and 3-D geometries.
`ST_MINZ(geom)`	Returns the minimum z coordinate of a bounding box for the given `geom` geometry. This function works for 2-D and 3-D geometries.
`ST_MULTI(geom)`	Returns `geom` as a MULTI- geometry, e.g., a POINT would return a MULTIPOINT.
`ST_MULTIPLERINGBUFFERS(geom, distance, outside)`	Creates multiple buffers at specified `distance` around the given `geom` geometry. Multiple distances are specified as comma-separated values in an array, e.g., `[10,20,30]`. Valid values for `outside` are: `FULL` -- indicates that buffers will overlap or cover the given `geom` geometry. This is the default. `OUTSIDE_ONLY` -- indicates that buffers will be rings around the given `geom` geometry.
`ST_NEAR(geom1, geom2)`	Returns the portion of `geom2` that is closest to `geom1`. If `geom2` is a singular geometry object (e.g., POINT, LINESTRING, POLYGON), `geom2` will be returned. If `geom2` a multi-geometry, e.g., MULTIPOINT, MULTILINESTRING, etc., the nearest singular geometry in `geom2` will be returned.
`ST_NORMALIZE(geom)`	Returns `geom` in its normalized (canonical) form, which may rearrange the points in lexicographical order.
`ST_NPOINTS(geom)`	Returns the number of points (vertices) in `geom`.
`ST_NUMGEOMETRIES(geom)`	If `geom` is a collection or MULTI- geometry, returns the number of geometries. If `geom` is a single geometry, returns 1.
`ST_NUMINTERIORRINGS(geom)`	Returns the number of interior rings if `geom` is a POLYGON. Returns `null` if `geom` is anything else.
`ST_NUMPOINTS(geom)`	Returns the number of points in the `geom` LINESTRING. Returns `null` if `geom` is not a LINESTRING.
`ST_OVERLAPS(geom1, geom2)`	Returns `1` (true) if given geometries `geom1` and `geom2` share space. If `geom1` and/or `geom2` are a GEOMETRYCOLLECTION, a `0` is returned regardless if the two geometries overlap
`ST_PARTITION(geom, threshold)`	Returns a MULTIPOLYGON representing the given `geom` partitioned into a number of POLYGONs with a maximum number of vertices equal to the given `threshold`. Minimum value for `threshold` is `10`; default value is `10000`. If `geom` is not a POLYGON or MULTIPOLYGON, `geom` is returned. If the number of vertices in `geom` is less than the `threshold`, `geom` is returned.
`ST_PERIMETER(geom[, solution])`	Returns the perimeter of the geometry if it is a POLYGON or MULTIPOLYGON. Returns `0` if another type of geometry, e.g., POINT, MULTIPOINT, LINESTRING, or MULTILINESTRING. GEOMETRYCOLLECTIONs are also supported but the aforementioned type limitation still applies; the collection will be recursively searched for POLYGONs and MULTIPOLYGONs and the summation of all supported geometry types is returned (unsupported types are ignored). Solution types available: `0` (default) - 2D Euclidean length `1` - length on a sphere in meters `2` - length on a spheroid in meters
`ST_POINT(x, y)`	Returns a POINT with the given `x` and `y` coordinates.
`ST_POINTFROMGEOHASH(geohash, precision)`	Returns a POINT using the given `geohash` with a precision set by the integer `precision`. If `precision` is specified, the function will use as many characters in the hash equal to `precision` to create the geometry. If no `precision` is specified, the full length of the `geohash` is used. Note The POINT returned represents the center of the bounding box of the geohash
`ST_POINTGRID(xmin, ymin, xmax, ymax, cell_side[, limit])`	Creates a MULTIPOLYGON containing a square-shaped grid of points between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The distance between the points (in meters) is determined by `cell_side`. The `cell_side` cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by `limit`, a positive integer. Supported values for `limit`: `-1` - No limit to the number of cells generated (effectively limited by system memory) `0` (default) - 100 million cells `<n>` - Custom limit of `n` cells If the custom limit request specifies more cells (based on the bounding box and the `cell_side`) than the system limit, a `null` is returned.
`ST_POINTN(geom, n)`	Returns the `n`-th point in LINESTRING `geom`. Negative values are valid, but note that they are counted backwards from the end of `geom`. A `null` is returned if `geom` is not a LINESTRING.
`ST_POINTS(geom)`	Returns a MULTIPOINT containing all of the coordinates of `geom`.
`ST_PROJECT(geom, distance, azimuth)`	Returns a POINT projected from a start point `geom` along a geodesic calculated using `distance` and `azimuth`. If `geom` is not a POINT, `null` is returned.
`ST_REMOVEPOINT(geom, offset)`	Remove a point from LINESTRING `geom` using `offset` to skip over POINTs in the LINESTRING. The `offset` is 0-based.
`ST_REMOVEREPEATEDPOINTS(geom, tolerance)`	Removes points from `geom` if the point's vertices are greater than or equal to the `tolerance` of the previous point in the geometry's list. If `geom` is not a MULTIPOINT, MULTILINESTRING, or a MULTIPOLYGON, no points will be removed.
`ST_REVERSE(geom)`	Return the geometry with its coordinate order reversed.
`ST_SCALE(geom, x, y)`	Scales `geom` by multiplying its respective vertices by the given `x` and `y` values. This function also supports scaling `geom` using another geometry object, e.g., `ST_SCALE('POINT(3 4)', 'POINT(5 6)')` would return `POINT(15 24)`. If specifying `x` and `y` for scale, note that the default value is `0`, e.g., `ST_SCALE('POINT(1 3)', 4)` would return `POINT(4 0)`.
`ST_SEGMENTIZE(geom, max_segment_length[, solution])`	Returns the given `geom` but segmentized n number of times depending on how the `max_segment_length` distance (in units based on the `solution` type) divides up the original geometry. The new `geom` is guaranteed to have segments that are smaller than the given `max_segment_length`. Note that POINTs are not able to be segmentized. Collection geometries (GEOMETRYCOLLECTION, MULTILINESTRING, MULTIPOINT, etc.) can be segmentized, but only the individual parts will be segmentized, not the collection as a whole. Solution types available: `0` - Euclidean; uses degrees to calculate distance `1` (default) - Sphere; uses meters to calculate distance
`ST_SETPOINT(geom1, position, geom2)`	Replace a point of LINESTRING `geom1` with POINT `geom2` at `position` (base 0). Negative values are valid, but note that they are counted backwards from the end of `geom`.
`ST_SHAREDPATH(geom1, geom2)`	Returns a collection containing paths shared by `geom1` and `geom2`.
`ST_SHORTESTLINE(geom1, geom2)`	Returns the 2-D LINESTRING that represents the shortest line of points between the two geometries. If multiple shortest lines are found, only the first line found is returned. If `geom1` or `geom2` is empty, `null` is returned
`ST_SIMPLIFY(geom, tolerance)`	Returns a simplified version of the given `geom` using an algorithm to reduce the number of points comprising a given geometry while attempting to best retain the original shape. The given `tolerance` determines how much to simplify the geometry. The higher the `tolerance`, the more simplified the returned geometry. Some holes might be removed and some invalid polygons (e.g., self-intersecting, etc.) might be present in the returned geometry. Only (MULTI)LINESTRINGs and (MULTI)POLYGONs can be simplified, including those found within GEOMETRYCOLLECTIONs; any other geometry objects will be returned unsimplified. Note The `tolerance` should be provided in the same units as the data. As a rule of thumb, a `tolerance` of `0.00001` would correspond to about one meter.
`ST_SIMPLIFYPRESERVETOPOLOGY(geom, tolerance)`	Returns a simplified version of the given `geom` using an algorithm to reduce the number of points comprising a given geometry while attempting to best retain the original shape. The given `tolerance` determines how much to simplify the geometry. The higher the `tolerance`, the more simplified the returned geometry. No holes will be removed and no invalid polygons (e.g., self-intersecting, etc.) will be present in the returned geometry. Only (MULTI)LINESTRINGs and (MULTI)POLYGONs can be simplified, including those found within GEOMETRYCOLLECTIONs; any other geometry objects will be returned unsimplified. Note The `tolerance` should be provided in the same units as the data. As a rule of thumb, a `tolerance` of `0.00001` would correspond to about one meter.
`ST_SNAP(geom1, geom2, tolerance)`	Snaps `geom1` to `geom2` within the given `tolerance`. If the `tolerance` causes `geom1` to not snap, the geometries will be returned unchanged.
`ST_SPLIT(geom1, geom2)`	Returns a collection of geometries resulting from the split between `geom1` and `geom2` geometries.
`ST_SQUAREGRID(xmin, ymin, xmax, ymax, cell_side[, limit])`	Creates a MULTIPOLYGON containing a grid of squares between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual squares' sides is determined by `cell_side`. The `cell_side` cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by `limit`, a positive integer. Supported values for `limit`: `-1` - No limit to the number of cells generated (effectively limited by system memory) `0` (default) - 100 million cells `<n>` - Custom limit of `n` cells If the custom limit request specifies more cells (based on the bounding box and the `cell_side`) than the system limit, a `null` is returned.
`ST_STARTPOINT(geom)`	Returns the first point of LINESTRING `geom` as a POINT. Returns `null` if `geom` is not a LINESTRING.
`ST_SYMDIFFERENCE(geom1, geom2)`	Returns a geometry that represents the portions of `geom1` and `geom2` geometries that do not intersect.
`ST_TOUCHES(geom1, geom2)`	Returns `1` (true) if the given geometries, `geom1` and `geom2`, have at least one point in common but their interiors do not intersect. If `geom1` and/or `geom2` are a GEOMETRYCOLLECTION, a `0` is returned regardless if the two geometries touch
`ST_TRANSLATE(geom, deltax, deltay[, deltaz])`	Translate `geom` by given offsets `deltax` and `deltay`. A z-coordinate offset can be applied using `deltaz`.
`ST_TRIANGLEGRID(xmin, ymin, xmax, ymax, cell_side[, limit])`	Creates a MULTIPOLYGON containing a grid of triangles between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual triangles' sides is determined by `cell_side`. The `cell_side` cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by `limit`, a positive integer. Supported values for `limit`: `-1` - No limit to the number of cells generated (effectively limited by system memory) `0` (default) - 100 million cells `<n>` - Custom limit of `n` cells If the custom limit request specifies more cells (based on the bounding box and the `cell_side`) than the system limit, a `null` is returned.
`ST_UNION(geom1, geom2)`	Returns a geometry that represents the point set union of the two given geometries, `geom1` and `geom2`.
`ST_UNIONCOLLECTION(geom)`	Returns a geometry that represents the point set union of a single given geometry `geom`.
`ST_UPDATE(geom1, geom2)`	Returns a geometry that is `geom1` geometry updated by `geom2` geometry
`ST_VORONOIPOLYGONS(geom, tolerance)`	Returns a GEOMETRYCOLLECTION containing Voronoi polygons (regions consisting of points closer to a vertex in `geom` than any other vertices in `geom`) calculated from the vertices in `geom` and the given `tolerance`. The `tolerance` determines the distance at which points will be considered the same. An empty GEOMETRYCOLLECTION is returned if `geom` is an empty geometry, a single POINT, or a LINESTRING or POLYGON composed of equivalent vertices (e.g., `POLYGON((0 0, 0 0, 0 0, 0 0))`, `LINESTRING(0 0, 0 0)`).
`ST_WITHIN(geom1, geom2)`	Returns `1` (true) if the `geom1` geometry is inside the `geom2` geometry. Note that as long as at least one point is inside of `geom2`, `geom1` is considered within `geom2` even if the rest of the `geom1` lies along the boundary of `geom2`
`ST_WKTTOWKB(geom)`	Returns the binary form (WKB) of a `geom` (WKT) Note This function can only be used in queries against a single table.
`ST_X(geom)`	Returns the X coordinate of the POINT `geom`; if the coordinate is not available, `null` is returned. `geom` must be a POINT.
`ST_XMAX(geom)`	Alias for `ST_MAXX()`
`ST_XMIN(geom)`	Alias for `ST_MINX()`
`ST_Y(geom)`	Returns the Y coordinate of the POINT `geom`; if the coordinate is not available, `null` is returned. `geom` must be a POINT.
`ST_YMAX(geom)`	Alias for `ST_MAXY()`
`ST_YMIN(geom)`	Alias for `ST_MINY()`
`ST_ZMAX(geom)`	Alias for `ST_MAXZ()`
`ST_ZMIN(geom)`	Alias for `ST_MINZ()`

Aggregation Functions¶

Function	Description
`ST_AGGREGATE_COLLECT(geom)`	Alias for `ST_COLLECT_AGGREGATE()`
`ST_AGGREGATE_INTERSECTION(geom)`	Alias for `ST_INTERSECTION_AGGREGATE()`
`ST_COLLECT_AGGREGATE(geom)`	Returns a GEOMETRYCOLLECTION comprising all geometries found in the `geom` set. Any MULTI* geometries will be divided into separate singular geometries, e.g., MULTIPOINT((0 0), (1 1)) would be divided into POINT(0 0) and POINT(1 1) in the results; the same is true for elements of a GEOMETRYCOLLECTION found in `geom`, where a GEOMETRYCOLLECTION within the provided `geom` set will also be parsed, effectively flattening it and adding the individual geometries to the resulting GEOMETRYCOLLECTION. Any empty geometries in `geom` are ignored even if they are part of a GEOMETRYCOLLECTION. Any duplicate WKTs will be retained.
`ST_DISSOLVE(geom)`	Dissolves all geometries within a given set into a single geometry. Note that the resulting single geometry can still be a group of noncontiguous geometries but represented as a single group, e.g., a GEOMETRYCOLLECTION. Best performance when used in conjunction with adjacent geometries
`ST_DISSOLVEOVERLAPPING(geom)`	Dissolves all geometries within a given set into a single geometry. Note that the resulting single geometry can still be a group of noncontiguous geometries but represented as a single group, e.g., a GEOMETRYCOLLECTION. Best performance when used in conjunction with overlapping geometries
`ST_INTERSECTION_AGGREGATE(geom)`	Returns a POLYGON or MULTIPOLYGON comprising the shared portion between all geometries found in the `geom` set. Returns an empty GEOMETRYCOLLECTION if there is no shared portion between all geometries. Functionally equivalent to `ST_INTERSECTION(ST_INTERSECTION(ST_INTERSECTION(geom1, geom2), geom3), ... geomN)`.
`ST_LINESTRINGFROMORDEREDPOINTS(x, y, t)`	Returns a LINESTRING that represents a "track" of the given points (`x`, `y`) ordered by the given sort column `t` (e.g., a timestamp or sequence number). If any of the values in the specified columns are `null`, the null "point" will be left out of the resulting LINESTRING. If there's only one non-null "point" in the source table, a POINT is returned. If there are no non-null "points" in the source table, a `null` is returned
`ST_LINESTRINGFROMORDEREDPOINTS3D(x, y, z, t)`	Returns a LINESTRING that represents a "track" of the given 3D points (`x`, `y`, `z`) ordered by the given sort column `t` (e.g., a timestamp or sequence number). If any of the values in the specified columns are `null`, the null "point" will be left out of the resulting LINESTRING. If there's only one non-null "point" in the source table, a POINT is returned. If there are no non-null "points" in the source table, a `null` is returned
`ST_POLYGONIZE(geom)`	Returns a GEOMETRYCOLLECTION containing POLYGONs comprising the provided (MULTI)LINESTRING(s). (MULTI)POINT and (MULTI)POLYGON geometries are ignored when calculating the resulting GEOMETRYCOLLECTION. If a valid POLYGON cannot be constructed from the provided (MULTI)LINESTRING(s), an empty GEOMETRYCOLLECTION will be returned.

Math Functions¶

Function Description

ABS(expr) Calculates the absolute value of expr

ACOS(expr) Returns the inverse cosine (arccosine) of expr as a double

ACOSF(expr) Returns the inverse cosine (arccosine) of expr as a float

ACOSH(expr) Returns the inverse hyperbolic cosine of expr as a double

ACOSHF(expr) Returns the inverse hyperbolic cosine of expr as a float

ASIN(expr) Returns the inverse sine (arcsine) of expr as a double

ASINF(expr) Returns the inverse sine (arcsine) of expr as a float

ASINH(expr) Returns the inverse hyperbolic sine of expr as a double

ASINHF(expr) Returns the inverse hyperbolic sine of expr as a float

ATAN(expr) Returns the inverse tangent (arctangent) of expr as a double

ATANF(expr) Returns the inverse tangent (arctangent) of expr as a float

ATANH(expr) Returns the inverse hyperbolic tangent of expr as a double

ATANHF(expr) Returns the inverse hyperbolic tangent of expr as a float

ATAN2(x, y) Returns the inverse tangent (arctangent) using two arguments as a double

ATAN2F(x, y) Returns the inverse tangent (arctangent) using two arguments as a float

ATN2(x, y) Alias for ATAN2

ATN2F(x, y) Alias for ATAN2F

CBRT(expr) Returns the cube root of expr as a double

CBRTF(expr) Returns the cube root of expr as a float

CEIL(expr) Alias for CEILING

CEILING(expr) Rounds expr up to the next highest integer

COS(expr) Returns the cosine of expr as a double

COSF(expr) Returns the cosine of expr as a float

COSH(expr) Returns the hyperbolic cosine of expr as a double

COSHF(expr) Returns the hyperbolic cosine of expr as a float

COT(expr) Returns the cotangent of expr as a double

COTF(expr) Returns the cotangent of expr as a float

DEGREES(expr) Returns the conversion of expr (in radians) to degrees as a double

DEGREESF(expr) Returns the conversion of expr (in radians) to degrees as a float

DIVZ(a, b, c) Returns the quotient a / b unless b == 0, in which case it returns c

EXP(expr) Returns e to the power of expr as a double

EXPF(expr) Returns e to the power of expr as a float

FLOOR(expr) Rounds expr down to the next lowest integer

GREATER(expr_a, expr_b) Returns whichever of expr_a and expr_b has the larger value, based on typed comparison

HYPOT(x, y) Returns the hypotenuse of x and y as a double

HYPOTF(x, y) Returns the hypotenuse of x and y as a float

ISNAN(expr) Returns 1 (true) if expr is not a number by IEEE standard; otherwise, returns 0 (false)

IS_NAN(expr) Alias for ISNAN

ISINFINITY(expr) Returns 1 (true) if expr is infinity by IEEE standard; otherwise, returns 0 (false)

IS_INFINITY(expr) Alias for ISINFINITY

LDEXP(x, exp) Returns the value of x * 2^exp as a double

LDEXPF(x, exp) Returns the value of x * 2^exp as a float

LESSER(expr_a, expr_b) Returns whichever of expr_a and expr_b has the smaller value, based on typed comparison

LN(expr) Returns the natural logarithm of expr as a double

LNF(expr) Returns the natural logarithm of expr as a float

LOG(expr) Alias for LN

LOGF(expr) Alias for LNF

LOG10(expr) Returns the base-10 logarithm of expr as a double

LOG10F(expr) Returns the base-10 logarithm of expr as a float

MAX_CONSECUTIVE_BITS(expr) Calculates the length of the longest series of consecutive 1 bits in the integer expr

MOD(dividend, divisor) Calculates the remainder after integer division of dividend by divisor

PI() Returns the value of pi

POW(base, exponent) Alias for POWER

POWF(base, exponent) Alias for POWERF

POWER(base, exponent) Returns base raised to the power of exponent as a double

POWERF(base, exponent) Returns base raised to the power of exponent as a float

RADIANS(expr) Returns the conversion of expr (in degrees) to radians as a double

RADIANSF(expr) Returns the conversion of expr (in degrees) to radians as a float

RAND() Returns a random floating-point value.

ROUND(expr, scale)

Rounds expr to the nearest decimal number with scale decimal places when scale is a positive number; rounds to the nearest number such that the result has -(scale) zeros to the left of the decimal point when scale is negative; use scale of 0 to round to the nearest integer. Examples:

Function Call	Result
`ROUND(12345.678, 2)`	`12345.68`
`ROUND(12345.678, 0)`	`12346`
`ROUND(12345.678, -2)`	`12300`

SIGN(expr)

Determines whether a number is positive, negative, or zero; returns one of the following three values:

Expression Value	Result
positive	`1`
zero	`0`
negative	`-1`

SIN(expr) Returns the sine of expr as a double

SINF(expr) Returns the sine of expr as a float

SINH(expr) Returns the hyperbolic sine of expr as a double

SINHF(expr) Returns the hyperbolic sine of expr as a float

SQRT(expr) Returns the square root of expr as a double

SQRTF(expr) Returns the square root of expr as a float

TAN(expr) Returns the tangent of expr as a double

TANF(expr) Returns the tangent of expr as a float

TANH(expr) Returns the hyperbolic tangent of expr as a double

TANHF(expr) Returns the hyperbolic tangent of expr as a float

TRUNCATE(expr, scale)

Rounds expr down to the nearest decimal number with scale decimal places, following the same rules as ROUND. Examples:

Function Call	Result
`TRUNCATE(12345.678, 2)`	`12345.67`
`TRUNCATE(12345.678, 0)`	`12345`
`TRUNCATE(12345.678, -2)`	`12300`

Null Functions¶

Some of the following null functions require parameters to be of convertible data types. Note that limited-width (charN) & unlimited-width (non-charN) string types are not convertible.

Function	Description
`COALESCE(expr_a, ..., expr_N)`	Returns the value of the first expression that is not null starting with `expr_a` and ending with `expr_N`. If all are null, then null is returned. All expressions should be of the same or convertible data type.
`IFNULL(expr_a, expr_b)`	Returns `expr_a` if it is not null; otherwise, returns `expr_b`. Both should be of the same or convertible data type.
`ISNULL(expr)`	Returns `1` if `expr` is null; otherwise, returns `0`
`IS_NULL(expr)`	Synonymous with `ISNULL(expr)`
`NULLIF(expr_a, expr_b)`	Returns null if `expr_a` equals `expr_b`; otherwise, returns the value of `expr_a`; both expressions should be of the same or convertible data type.
`NVL(expr_a, expr_b)`	Alias for `IFNULL`
`NVL2(expr, value_if_not_null, value_if_null)`	Evaluates `expr`: if not null, returns `value_if_not_null`; if null, returns `value_if_null`. Both `value_if_not_null` & `value_if_null` should be of the same data type as `expr` or implicitly convertible.
`REMOVE_NULLABLE(expr)`	Alias for `ZEROIFNULL`
`ZEROIFNULL(expr)`	Replaces null values with appropriate values based on the column type (e.g., `0` if numeric column, an empty string if charN column, etc.). Also removes the `nullable` column property if used to calculate a derived column.

String Functions¶

Important

These functions will only work with fixed-width string fields (char1 - char256).

Function Description

ASCII(expr) Returns the ASCII code for the first character in expr

CHAR(expr) The character represented by the standard ASCII code expr in the range [ 0 - 127 ]

CONCAT(expr_a, expr_b)

Performs a string concatenation of expr_a & expr_b; use nested CONCAT calls to concatenate more than two strings

Note

The resulting field size of any CONCAT will be a charN field big enough to hold the concatenated fields, e.g., concatenating a char32 column and a char64 column will result in a char128 column. Columns of type char256 cannot be used with CONCAT.

CONCAT_TRUNCATE(expr_a, expr_b)

Returns the concatenation of expr_a and expr_b, truncated at the maximum size of expr_a. For columns, this size is explicit; for string constants, this will be the smallest charN type that can hold the expr_a string.

Examples:

Function Call Result

CONCAT_TRUNCATE('ABC123','!') ABC123!

CONCAT_TRUNCATE('ABC123','DEFG')

ABC123DE

(char8 is the minimum size required to hold the ABC123 value, so the result is truncated at 8 characters)

CONCAT_TRUNCATE('ABCD1234','DEFG')

ABCD1234

(char8 is the minimum size required to hold the ABCD1234 value, so no additional characters can be concatenated)

CONTAINS(match_expr, ref_expr) Returns 1 if ref_expr contains match_expr by string-literal comparison; otherwise, returns 0

DIFFERENCE(expr_a, expr_b) Returns a value between 0 and 4 that represents the difference between the sounds of expr_a and expr_b based on the SOUNDEX() value of the strings--a value of 4 is the best possible sound match

EDIT_DISTANCE(expr_a, expr_b) Returns the Levenshtein edit distance between expr_a and expr_b; the lower the the value, the more similar the two strings are

ENDS_WITH(match_expr, ref_expr) Returns 1 if ref_expr ends with match_expr by string-literal comparison; otherwise, returns 0

INITCAP(expr) Returns expr with the first letter of each word in uppercase

IPV4_PART(expr, part_num)

Returns the octet of the IP address given in expr at the position specified by part_num. Valid part_num values are constants from 1 to 4.

Examples:

Function Call	Result
`IPV4_PART('12.34.56.78', 1)`	`12`
`IPV4_PART('12.34.56.78', 4)`	`78`

IS_IPV4(expr) Returns 1 if expr is an IPV4 address; returns 0 otherwise

LCASE(expr) Converts expr to lowercase

LEFT(expr, num_chars) Returns the leftmost num_chars characters from expr

LENGTH(expr) Returns the number of characters in expr

LOCATE(match_expr, ref_expr, [start_pos]) Returns the starting position of the first match of match_expr in ref_expr, starting from position 1 or start_pos (if specified)

LOWER(expr) Alias for LCASE

LPAD(base_expr, length, pad_expr)

Left pads the given base_expr string with the pad_expr string to the given length of characters. If base_expr is longer than length, the return value is shortened to length characters. If length is larger than 256, it will be truncated to 256.

Examples:

Function Call	Result
`LPAD('test', 9, 'pad')`	`padpatest`
`LPAD('test', 3, 'pad')`	`tes`

LTRIM(expr) Removes whitespace from the left side of expr

POSITION(match_expr, ref_expr, [start_pos]) Alias for LOCATE

REPLACE(ref_expr, match_expr, repl_expr) Replaces every occurrence of match_expr in ref_expr with repl_expr

REVERSE(expr)

Returns expr with the order of characters reversed.

Examples:

Function Call	Result
`REVERSE('Reverse')`	`esreveR`
`REVERSE('Was it a bat I saw?')`	`?was I tab a ti saW`

RIGHT(expr, num_chars) Returns the rightmost num_chars characters from expr

RPAD(base_expr, length, pad_expr)

Right pads the given base_expr string with the pad_expr string to the given length of characters. If base_expr is longer than length, the return value is shortened to length characters. If length is larger than 256, it will be truncated to 256.

Examples:

Function Call	Result
`RPAD('test', 9, 'pad')`	`testpadpa`
`RPAD('test', 3, 'pad')`	`tes`

RTRIM(expr) Removes whitespace from the right side of expr

SOUNDEX(expr)

Returns a soundex value from expr. Only the first word in the string will be considered in the calculation.

Note

This is the algorithm used by most programming languages.

SPACE(n) Returns a string consisting of n space characters. The value of n can only be within the range of 0-256.

SPLIT(expr, delim, group_num)

Splits expr into groups delimited by the delim character and returns the group_num split group. If group_num is positive, groups will be counted from the beginning of expr; if negative, groups will be counted from the end of expr going backwards. Two consecutive delimiters will result in an empty string being added to the list of selectable groups. If no instances of delim exist in expr, the entire string is available at group 1 (and -1). Group 0 returns nothing.

Examples:

Function Call	Result
`SPLIT('apple', 'p', 1)`	`a`
`SPLIT('apple', 'p', 2)`	<empty string>
`SPLIT('apple', 'p', -1)`	`le`

STARTS_WITH(match_expr, ref_expr) Returns 1 if ref_expr starts with match_expr by string-literal comparison; otherwise, returns 0

STRCMP(expr_a, expr_b)

Compares expr_a to expr_b in a lexicographical sort

Situation	Result
`expr_a` and `expr_b` are the same	`0`
`expr_a` comes before `expr_b`, lexicographically	`-1`
`expr_a` comes after `expr_b`, lexicographically	`1`

SUBSTR(expr, start_pos, num_chars) Alias for SUBSTRING

SUBSTRING(expr, start_pos, num_chars) Returns num_chars characters from the expr, starting at the 1-based start_pos

TRIM(expr) Removes whitespace from both sides of expr

UCASE(expr) Converts expr to uppercase

UPPER(expr) Alias for UCASE

User/Security Functions¶

Function Description

CURRENT_USER() Alias for USER

HASH(column[, seed]) Returns a non-negative integer representing an obfuscated version of column, using the given seed; default seed is 0

IS_MEMBER(role[, user])

or

IS_ROLEMEMBER(role[, user])

Returns whether the current user (or the given user, if specified) has been assigned the given role, either directly or indirectly:

Situation	Result
Current/given user has been granted `role`	`1`
Current/given user has not been granted `role`	`0`
Role `role` does not exist	null

MASK(expr, start, length[, char])

Masks length characters of expr, beginning at the position identified by start, with * characters (or the character specified in char):

Function Call	Result
`MASK('Characters', 5, 5)`	`Char*****s`
`MASK('Characters', 5, 2, '#')`	`Char##ters`

OBFUSCATE(column[, seed]) Alias for HASH

SHA256(expr) Returns the hex digits of the SHA-256 hash of the given value expr as a char64 string.

SYSTEM_USER() Alias for USER

USER() Returns the username of the current user

Aggregation Functions¶

Function	Description
`ATTR(expr)`	If `MIN(expr)` = `MAX(expr)`, returns `expr`; otherwise `*`
`ARG_MIN(agg_expr, ret_expr)`	The value of `ret_expr` where `agg_expr` is the minimum value (e.g. `ARG_MIN(cost, product_id)` returns the product ID of the lowest cost product)
`ARG_MAX(agg_expr, ret_expr)`	The value of `ret_expr` where `agg_expr` is the maximum value (e.g. `ARG_MAX(cost, product_id)` returns the product ID of the highest cost product)
`AVG(expr)`	Calculates the average value of `expr`
`CORR(expr1, expr2)`	Calculates the correlation coefficient of `expr1` and `expr2`
`CORRELATION(expr1, expr2)`	Alias for `CORR`
`CORRCOEF(expr1, expr2)`	Alias for `CORR`
`COUNT(*)`	Returns the number of records in a table
`COUNT(expr)`	Returns the number of non-null data values in `expr`
`COUNT(DISTINCT expr)`	Returns the number of distinct non-null data values in `expr`
`COV(expr1, expr2)`	Alias for `COVAR_POP`
`COVAR(expr1, expr2)`	Alias for `COVAR_POP`
`COVARIANCE(expr1, expr2)`	Alias for `COVAR_POP`
`COVAR_POP(expr1, expr2)`	Calculates the population covariance of `expr1` and `expr2`
`COVAR_SAMP(expr1, expr2)`	Calculates the sample covariance of `expr1` and `expr2`
`GROUPING(expr)`	Used primarily with ROLLUP, CUBE, and GROUPING SETS, to distinguish the source of null values in an aggregated result set, returns whether `expr` is part of the aggregation set used to calculate the values in a given result set row. Returns `0` if `expr` is part of the row's aggregation set, `1` if `expr` is not (meaning that aggregation took place across all `expr` values). For example, in a `ROLLUP(A)` operation, there will be two potential rows with null in the result set for column `A`. One row will contain null values of `A` aggregated together, and the other will contain null, but be an aggregation over the entire table, irrespective of `A` values. In this case, `GROUPING(A)` will return `0` for the null values of `A` aggregated together (as well as all other grouped `A` values) and `1` for the row resulting from aggregating across all `A` values.
`KURT(expr)`	Alias for `KURTOSIS_POP`
`KURTOSIS(expr)`	Alias for `KURTOSIS_POP`
`KURTOSIS_POP(expr)`	Calculate the population kurtosis of `expr`
`KURTOSIS_SAMP(expr)`	Calculate the sample kurtosis of `expr`
`KURT_POP(expr)`	Alias for `KURTOSIS_POP`
`KURT_SAMP(expr)`	Alias for `KURTOSIS_SAMP`
`MAX(expr)`	Finds the maximum value of `expr`
`MEAN(expr)`	Alias for `AVG`
`MIN(expr)`	Finds the minimum value of `expr`
`SKEW(expr)`	Alias for `SKEWNESS_POP`
`SKEWNESS(expr)`	Alias for `SKEWNESS_POP`
`SKEWNESS_POP(expr)`	Calculate the population skew of `expr`
`SKEWNESS_SAMP(expr)`	Calculate the sample skew of `expr`
`SKEW_POP(expr)`	Alias for `SKEWNESS_POP`
`SKEW_SAMP(expr)`	Alias for `SKEWNESS_SAMP`
`STDDEV(expr)`	Alias for `STDDEV_POP`
`STDDEV_POP(expr)`	Calculates the population standard deviation of the values of `expr`
`STDDEV_SAMP(expr)`	Calculates the sample standard deviation of the values of `expr`
`SUM(expr)`	Sums all the values of `expr`
`VAR(expr)`	Alias for `VAR_POP`
`VAR_POP(expr)`	Calculates the population variance of the values of `expr`
`VAR_SAMP(expr)`	Calculates the sample variance of the values of `expr`

Grouping Functions¶

Function	Description
`ROLLUP(expr)`	Calculates n + 1 aggregates for n number of columns in `expr`
`CUBE(expr)`	Calculates 2ⁿ aggregates for n number of columns in `expr`
`GROUPING SETS(expr)`	Calculates aggregates for any given aggregates in `expr`, including `ROLLUP()` and `CUBE()`

Distribution Functions¶

Distribution functions are column expressions that affect the sharded/replicated nature of the result set of a given query. It may be necessary to force a result set to be distributed in a certain way for a subsequent operation on that result set to be performant.

Important

Employing these functions will prevent any automatic resharding of data to allow the query to succeed. Use only when a better query plan (with respect to data distribution) is known than any the system can devise.

Function	Description
`KI_REPLICATE()`	Force a scalar result set to be replicated (query with no `GROUP BY`)
`KI_REPLICATE_GROUP_BY(0)`	Force an aggregated result set to be replicated (query with `GROUP BY`)
`KI_MATCH_COLUMN(0)`	Aligns the column count of queries that are part of a UNION, INTERSECT or EXCEPT with a query whose column list has been amended with either `KI_REPLICATE_GROUP_BY` or `KI_SHARD_KEY`
`KI_SHARD_KEY(<column list>)`	Force the result set to be sharded on the given columns. This will override any implicitly-derived or explicitly-defined replication status the table would have had. Note The column(s) listed in `column list` must also appear in the `SELECT` list; `KI_SHARD_KEY` merely identifies which of the selected columns should be used as the shard key.

Sharding Example¶

For example, a query for all employees and their total employees managed, including employees who don't manage anyone, could employ a UNION like this:

SELECT manager_id, COUNT(*)
FROM employee
GROUP BY manager_id
UNION
SELECT id, 0
FROM employee
WHERE id NOT IN
    (
        SELECT manager_id
        FROM employee
        WHERE manager_id IS NOT NULL
    );

In this example, the employee table is sharded on id. Since the first part of the UNION aggregates on manager_id, the result will be replicated. The second part of the UNION does no aggregation and includes the shard key in the SELECT list; the result of this will be sharded.

Given that a limitation of UNION operations is that both parts of a UNION have to be distributed the same way, this query will fail, with the following message:

GPUdb Error: either all input tables must be replicated or all input tables
must be non-replicated

In order to work around this limitation, a distribution function can be used.

One option is to shard the first part of the UNION to match the second part:

SELECT
     manager_id,
     COUNT(*),
     KI_SHARD_KEY(manager_id)
FROM employee
GROUP BY manager_id
UNION
SELECT
    id,
    0,
    KI_MATCH_COLUMN(0)
FROM employee
WHERE id NOT IN
    (
        SELECT manager_id
        FROM employee
        WHERE manager_id IS NOT NULL
    );

Here, the distribution function KI_SHARD_KEY is used to make the selected manager_id column the new shard key for the first part of the UNION. Now, the shard key for the first part of the UNION (manager_id) aligns with the shard key for the second part (id), and the query succeeds. Note the use of KI_MATCH_COLUMN, which aligns the selected column lists on each side of the UNION. Without this matching distribution function, the UNION would appear to be merging three columns from the first part of the query into two columns in the second part and would fail.

Note

The manager_id column must exist in the SELECT list in order for the KI_SHARD_KEY function to designate it as the shard key.

Replication Example¶

SQL support does not currently extend to creating replicated tables from the results of queries using CREATE TABLE...AS. For instance, the following will fail:

CREATE REPLICATED TABLE employee_replicated AS
SELECT *
FROM employee

...returning this error:

GPUdb Error: REPLICATED option not allowed with AS option in SQL

However, a replicated table can be created using CREATE TABLE...AS by forcing the result set to already be replicated using the KI_REPLICATE() distribution function. This will succeed in creating a replicated version of the employee table:

CREATE TABLE employee_replicated AS
SELECT *, KI_REPLICATE()
FROM employee

Predicates¶

Predicate are generally used within a SQL WHERE clause to query records. They compare the values of two or more expressions; whenever a record meets the criteria defined in a predicate clause it will be marked as eligible to be part of the query result set. If it meets all predicate clauses defined within a query, it will be returned in the result set.

A single predicate clause may use a simple predicate operator to compare the values of two expressions or a more complex predicate clause form. A compound predicate clause uses a compound predicate operator to link together multiple predicate clauses to further refine a result set.

Unlimited-width (non-charN) strings can only be used within equality-based predicates, e.g. =, IN, etc.

Predicate Operators¶

= equality
!= or <> inequality
< less than
<= less than or equal to
> greater than
>= greater than or equal to

Predicate Clauses¶

In the following list of predicate clauses, ref_expr is the reference expression to apply the predicate to; note that EXISTS has no reference expression.

Predicate Clause	Description
`<expr_a> <pred_op> <expr_b>`	Matches records where `expr_a` relates to `expr_b` according to predicate operator `pred_op`.
`<ref_expr> <pred_op> ALL (<SELECT statement>)`	Matches records where the reference expression `ref_expr` relates to all of the results of `SELECT statement` according to the predicate operator `pred_op`
`<ref_expr> <pred_op> ANY (<SELECT statement>)`	Matches records where the reference expression `ref_expr` relates to any of the results of `SELECT statement` according to the predicate operator `pred_op`
`<ref_expr> [NOT] BETWEEN <begin_expr> AND <end_expr>`	Matches records where the reference expression `ref_expr` is (or is `NOT`) between the values of `begin_expr` and `end_expr`
`<ref_expr> [NOT] IN (<match_list>)`	Matches records where the reference expression `ref_expr` is (or is `NOT`) in the `match_list` list of match values. The list can either be a comma-separated list of terms/expressions or the result of a `SELECT` statement.
`<ref_expr> IS [NOT] NULL`	Matches records where the reference expression `ref_expr` is (or is `NOT`) null.
`<ref_expr> [NOT] LIKE <match_expr>`	Matches records where reference expression `ref_expr` does (or does `NOT`) match the string value of `match_expr`. The match is a string literal one, with the following exceptions: `%` matches any string of 0 or more characters `_` matches any single character
`[NOT] EXISTS (<SELECT statement>)`	Matches records where `SELECT statement` returns 1 or more records. Note This clause has limited utility, as correlated subqueries, upon whose real value it relies, are unsupported at this time.

Compound Predicate Operators¶

Predicate Operator	Description
`<pred_a> AND <pred_b>`	Matches records where both `pred_a` & `pred_b` are true
`<pred_a> OR <pred_b>`	Matches records where either `pred_a` or `pred_b` is true
`NOT <pred_b>`	Matches records where `pred` is false

Subqueries¶

Non-Correlated Subqueries¶

These are subqueries that are self-contained, in that they can be executed independently of the surrounding query.

For example:

SELECT
    last_name,
    first_name,
    sal,
    (
        SELECT AVG(sal)
        FROM employee
    ) AS avg_emp_sal
FROM employee

SELECT
    l || ', ' || f as "Employee_Name"
FROM
    (
        SELECT last_name AS l, first_name AS f
        FROM employee
    )

SELECT last_name, first_name
FROM employee
WHERE dept_id IN
    (
        SELECT id
        FROM department
        WHERE code = 'ENGR' OR code = 'MKTG'
    )

Correlated Subqueries¶

These are subqueries that depend on the values in the surrounding query, and cannot be executed independently of the surrounding query.

For example:

SELECT *
FROM nyctaxi o
WHERE fare_amount =
    (
        SELECT MAX(fare_amount)
        FROM nyctaxi i
        WHERE o.passenger_count = i.passenger_count
    )

Important

Correlated subqueries have the following limitations:

They cannot reference grouping columns in the parent query
They cannot reference tables beyond the immediate outer query; i.e., a table cannot be referenced in a correlated subquery that is two or more levels of nesting deeper than it is
They cannot contain disjunctive conditions
They cannot be part of an OUTER JOIN ON clause condition

Hints¶

Hint strings (KI_HINT) can be added as comments within queries, and affect just the query in which they appear. They will override the corresponding client & server settings (when such settings exist). For example:

CREATE TABLE taxi_trip_daily_totals AS
/* KI_HINT_GROUP_BY_PK, KI_HINT_INDEX(transaction_date) */
SELECT vendor_id, DATE(dropoff_datetime) AS transaction_date, COUNT(*) AS total_trips
FROM nyctaxi
GROUP BY vendor_id, transaction_date

Hint	Description
`KI_HINT_BATCH_SIZE(n)`	Use an ingest batch size of `n` records. Default: 10,000. Only applicable when issuing `INSERT` statements.
`KI_HINT_CHUNK_SIZE(n)`	Use chunk sizes of `n` records per chunk within result sets. Suffixes of `K` & `M` can be used to represent thousands or millions of records; e.g., `20K`, `50M`.
`KI_HINT_COMPARABLE_EXPLAIN`	Simplify `EXPLAIN` output removing unique identifiers from table names. Note Potentially unsafe for use in a multiuser environment, as the table names produced will not be unique and may collide with the table names of other users who execute `EXPLAIN` with this hint.
`KI_HINT_DICT_PROJECTION`	Retain the dictionary encoding attributes of source columns in a projection result set.
`KI_HINT_DISTRIBUTED_OPERATIONS`	Reshard data when doing so would be the only way to process one or more operations in this query.
`KI_HINT_DONT_COMBINE`	Don’t combine joins and unions for this query.
`KI_HINT_GROUP_BY_FORCE_REPLICATED`	Make all result tables within a single query replicated; useful when meeting the input table requirements of `JOIN`, `UNION`, etc. in a query containing aggregated subqueries which generate differently-sharded result tables.
`KI_HINT_GROUP_BY_PK`	Create primary keys for all `GROUP BY` result sets on the grouped-upon columns/expressions within a given query; often used to create a primary key on the result of a `CREATE TABLE...AS` that ends in a `GROUP BY`, and can also make materialized views containing grouping operations more performant. Note If any of the grouped-on expressions are nullable, no primary key will be applied.
`KI_HINT_HAS_HEADER`	Assume the first line of the source CSV file has a Kinetica header row. Only used with CSV ingestion.
`KI_HINT_INDEX(<column list>)`	Create an index on each of the comma-separated columns in the given list; often used with `CREATE TABLE...AS` to create an index on a persisted result set.
`KI_HINT_JOBID_PREFIX(x)`	Tag corresponding database job names(s) with `x`; e.g., `KI_HINT_JOBID_PREFIX(tag)` will result in job names like `ODBC_tag_01234567-89ab-cdef-0123-456789abcdef`.
`KI_HINT_KEEP_TEMP_TABLES`	Don’t erase temp tables created by this query.
`KI_HINT_NO_COST_BASED_OPTIMIZATION`	Don't use the cost-based optimizer when calculating the query plan.
`KI_HINT_NO_DICT_PROJECTION`	Don't retain the dictionary encoding attributes of source columns in a projection result set.
`KI_HINT_NO_DISTRIBUTED_OPERATIONS`	Don't reshard data when doing so would be the only way to process one or more operations in this query.
`KI_HINT_NO_HEADER`	Assume the source CSV file has no Kinetica header row. Only used with CSV ingestion.
`KI_HINT_NO_JOIN_COUNT`	Optimize joins by not calculating intermediate set counts.
`KI_HINT_NO_LATE_MATERIALIZATION`	Force the materialization of intermediary result sets.
`KI_HINT_NO_PARALLEL_EXECUTION`	Execute all components of this query in series.
`KI_HINT_NO_PLAN_CACHE`	Don't cache the query plan calculated for this query.
`KI_HINT_NO_RULE_BASED_OPTIMIZATION`	Don't use the rule-based optimizer when calculating the query plan.
`KI_HINT_NO_VALIDATE_CHANGE`	Don't fail an `ALTER TABLE` command when changing column types/sizes and the column data is too long/large. Truncate the data instead and allow the modification to succeed.
`KI_HINT_PROJECT_MATERIALIZED_VIEW`	Force the materialization of a materialized view. Some materialized views containing `JOIN` clauses will be backed by a native join view. This is done to improve the performance of materialized view refreshes and reduce memory usage at the cost of reduced query performance. This hint will induce the reverse of this trade-off -- increased query performance at the cost of reduced refresh performance and increased memory usage.
`KI_HINT_REPL_SYNC`	Instruct the target database to treat this statement as one that should be run synchronously across all HA clusters in its ring. See High Availability Operation Handling for details. Note The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details.
`KI_HINT_REPL_ASYNC`	Instruct the target database to treat this statement as one that should be run asynchronously across all HA clusters in its ring. See High Availability Operation Handling for details. Note The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details.
`KI_HINT_TRUNCATE_STRINGS`	Truncate all strings being inserted into restricted-width (charN) columns to their max width. Used with any `INSERT INTO` statement, including CSV ingestion.
`KI_HINT_UPDATE_ON_EXISTING_PK`	Change the record collision policy for inserting into a table with a primary key to an upsert scheme; any existing table record with a primary key that matches a record being inserted will be replaced by that new record. Without this hint, the record being inserted will be discarded. If the specified table does not have a primary key, then this hint is ignored.

EXPLAIN¶

Outputs the execution plan of a given SQL statement.

Tip

For the visual explain plan utility in GAdmin, see the Explain feature under SQL Tool.

The general form of the command is:

EXPLAIN [LOGICAL|PHYSICAL|ANALYZE|VERBOSE|VERBOSE ANALYZE] [FORMAT <JSON|TABLE>] <statement>

If LOGICAL is specified, the algebraic execution tree of the statement is output. Otherwise, the physical execution plan will be output.

Each supporting API endpoint call that is made in servicing the request is listed as an execution plan step in the output, along with any input or output tables associated with the call and the prior plan execution steps on which a given execution step depends.

The following options can be specified:

LOGICAL - outputs the algebraic execution tree of the statement
PHYSICAL - (default) outputs the physical execution plan with the following endpoint-level details per step:
- ID - execution step number
- ENDPOINT - name of native API endpoint called
- INPUT_TABLES - input tables used by the endpoint (if any)
- OUTPUT_TABLE - output table created by the endpoint (if any)
- DEPENDENCIES - list of prior execution steps upon which this step depends
ANALYZE - same as PHYSICAL, including additional run-time details:
- RUN_TIME - execution time of each endpoint call
- RESULT_ROWS - number of records produced in the endpoint call
VERBOSE - same as PHYSICAL, including endpoint parameter details:
- COLUMNS - columns passed to the endpoint call
- EXPRESSIONS - expressions passed to the endpoint call
- OPTIONS - option keys & values passed to the endpoint call
- LAST_USE_TABLES - list of tables that will not be used by any following execution step
- ADDITIONAL_INFO - other parameters passed to the endpoint call
VERBOSE ANALYZE - same as VERBOSE & ANALYZE together, including the execution plan for any joins contained within the query
FORMAT JSON - outputs the result in JSON format
FORMAT TABLE - (default) outputs the result in tabular format

Important

Specifying ANALYZE will cause the statement to be executed in order to collect run-time statistics on the endpoint calls made.

For example, to output the execution plan for a query that aggregates the number of taxi rides between boroughs (using KI_HINT_COMPARABLE_EXPLAIN hint to simplify output):

EXPLAIN /* KI_HINT_COMPARABLE_EXPLAIN */
SELECT
    n_begin.ntaname AS boro_begin,
    boro_end,
    COUNT(*) AS total_trips
FROM
(
    SELECT pickup_latitude, pickup_longitude, n_end.ntaname AS boro_end
    FROM nyctaxi t
    JOIN nyct2010 n_end ON STXY_INTERSECTS(dropoff_longitude, dropoff_latitude, geom)
)
JOIN nyct2010 n_begin ON STXY_INTERSECTS(pickup_longitude, pickup_latitude, geom)
GROUP BY 1, 2

The execution plan is listed in table format, as follows:

+------+-------------------------+------------------------------------------------------+----------------+----------------+
| ID   | ENDPOINT                | INPUT_TABLES                                         | OUTPUT_TABLE   | DEPENDENCIES   |
+------+-------------------------+------------------------------------------------------+----------------+----------------+
| 0    | /create/jointable       | nyctaxi AS TableAlias_0_,nyct2010 AS TableAlias_1_   | Join_3         | -1             |
| 1    | /create/jointable       | Join_3 AS TableAlias_0_,nyct2010 AS TableAlias_1_    | Join_6         | 0              |
| 2    | /aggregate/groupby      | Join_6                                               | Aggregate_8    | 1              |
| 3    | /get/records/bycolumn   | Aggregate_8                                          |                | 2              |
+------+-------------------------+------------------------------------------------------+----------------+----------------+

If there is an error processing a query, the error can be returned in the JSON-formatted execution plan:

EXPLAIN VERBOSE ANALYZE FORMAT JSON
SELECT * /* KI_HINT_NO_DISTRIBUTED_OPERATIONS */
FROM explain_table t1
JOIN explain_table t2 ON t1.shard_column = t2.not_shard_column

The execution plan is listed in JSON format with the query error, as follows:

"PLAN": [
    {
        "ADDITIONAL_INFO": "Not all of the non-replicated tables are equated by shard keys (TM/FPc:1975); code:1 'Error' in Job pre-process; code:1 'Error'",
        "COLUMNS": "TableAlias_0_.shard_column AS shard_column,TableAlias_0_.not_shard_column AS not_shard_column,TableAlias_1_.shard_column AS shard_column0,TableAlias_1_.not_shard_column AS not_shard_column0",
        "DEPENDENCIES": "-1",
        "ENDPOINT": "/create/jointable",
        "EXPRESSIONS": "inner join TableAlias_0_, TableAlias_1_  on   (TableAlias_0_.shard_column = TableAlias_1_.not_shard_column )  ",
        "ID": "0",
        "JSON_REQUEST": "{
            "join_table_name": "31031_Join_1_2249791e-63cb-11ea-b3aa-f4ce46bdd59a",
            "table_names": [
                "explain_table AS TableAlias_0_",
                "explain_table AS TableAlias_1_"
            ],
            "column_names": [
                "TableAlias_0_.shard_column AS shard_column",
                "TableAlias_0_.not_shard_column AS not_shard_column",
                "TableAlias_1_.shard_column AS shard_column0",
                "TableAlias_1_.not_shard_column AS not_shard_column0"
            ],
            "expressions": [
                "inner join TableAlias_0_, TableAlias_1_  on   (TableAlias_0_.shard_column = TableAlias_1_.not_shard_column )  "
            ],
            "options": {
                "collection_name": "__SQL_TEMP",
                "create_explain": "true",
                "show_filters": "true",
                "ttl": "-1"
            }
        }",
        "LAST_USE_TABLES": "",
        "OPTIONS": "{collection_name,__SQL_TEMP} {create_explain,true} {show_filters,true} {ttl,-1}",
        "RESULT_DISTRIBUTION": "shard_column;shard_column0; / ;",
        "RESULT_ROWS": "0",
        "RUN_TIME": "0"
    },
    {
        "ADDITIONAL_INFO": "",
        "COLUMNS": "shard_column, not_shard_column, shard_column0, not_shard_column0",
        "DEPENDENCIES": "0",
        "ENDPOINT": "/get/records/bycolumn",
        "EXPRESSIONS": "",
        "ID": "1",
        "JSON_REQUEST": "{
            "table_name": "Join_1_04ef7d35-ec70-4431-ae7d-687802a4cece",
            "column_names": [
                "shard_column",
                "not_shard_column",
                "shard_column0",
                "not_shard_column0"
            ],
            "offset": 0,
            "limit": -9999,
            "encoding": "binary",
            "options": {
                "create_explain": "true",
                "show_filters": "true"
            }
        }",
        "LAST_USE_TABLES": "Join_1_04ef7d35-ec70-4431-ae7d-687802a4cece",
        "OPTIONS": "{create_explain,true} {show_filters,true}",
        "RESULT_DISTRIBUTION": "",
        "RESULT_ROWS": "0",
        "RUN_TIME": "0"
    }
]

Data Definition (DDL)¶

Kinetica supports the basic notion of SQL tables as containers of one or more columns of data. Tables can be created, altered, and dropped.

Table & column names must adhere to the supported naming criteria.

A column definition consists of a column type and optional column size, column properties, and nullability. Column properties are used to optimize data storage & speed.

The format of a defined column is column name, followed by column definition. A column definition is column type optionally followed by any column size limit or column properties all enclosed in parentheses, followed by an optional nullability statement:

<column name> <column type> [(<column size / property list>)] [[NOT] NULL]

This format applies to any DDL statement requiring the definition of columns, like CREATE TABLE and ALTER TABLE (when adding/modifying a column).

For example, the following are valid defined columns:

id INT(SHARD_KEY)                -- id is an integer and the table's shard key
name VARCHAR(64, TEXT_SEARCH)    -- name is a 64-char limited string and text-searchable
ip IPV4                          -- ip is a string in IPv4 format
cost DECIMAL(10, 2, STORE_ONLY)  -- cost is able to hold an 8.2 decimal and not held in memory

Column Types¶

Category Data Type Description

Number TINYINT Effective type: int8

SMALLINT Effective type: int16

INTEGER Effective type: integer

INT Alias for INTEGER

BIGINT Effective type: long

LONG Alias for BIGINT

UNSIGNED BIGINT Effective type: ulong

UNSIGNED LONG Alias for UNSIGNED BIGINT

REAL Effective type: float

DOUBLE Effective type: double

FLOAT Alias for REAL

DECIMAL Alias for BIGINT

DECIMAL(P,S)

Effective type: varies by P & S

P	S	Effective Type
`1`, `2`	`0`	int8
`3`, `4`	`0`	int16
`5`, `6`	`0`	integer
`7`- n	`0`	long
`1`- n	`1`-`4`	decimal
`5`, `6`	`5`, `6`	float
`7`- n	`5`- n	double

NUMERIC Alias for DECIMAL

String VARCHAR Effective type: string; character limit based on configured system property

VARCHAR(N) Effective type: char1 - char256 or string, whichever is large enough to hold N characters

STRING Alias for VARCHAR

TEXT Alias for VARCHAR

IPV4 Shorthand for VARCHAR(IPV4), which applies the IPV4 column property

Date/Time DATE Effective type: date

DATETIME Effective type: datetime

TIME Effective type: time

TIMESTAMP Effective type: timestamp

TYPE_DATE Alias for DATE

TYPE_TIME Alias for TIME

TYPE_TIMESTAMP Alias for TIMESTAMP

Binary BLOB Effective type: bytes

BYTES Alias for BLOB

BINARY Alias for BLOB

Geospatial GEOMETRY Effective type: wkt

ST_GEOMETRY Alias for GEOMETRY

WKT Alias for GEOMETRY

Column Properties¶

Property	Description
`DICT`	Applies dict data handling to a column, enabling dictionary-encoding of its values; see Dictionary Encoding for details
`DISK_OPTIMIZED`	Applies disk-optimized data handling to a column
`IPV4`	Treats the associated string-based column as an IPv4 address
`LZ4`	Applies LZ4 compression to a column
`LZ4HC`	Applies LZ4HC compression to a column
`PRIMARY_KEY`	Treats the associated column as a primary key, or part of a composite primary key if other columns also have this property
`SHARD_KEY`	Treats the associated column as a shard key, or part of a composite shard key if other columns also have this property
`SNAPPY`	Applies Snappy compression to a column
`STORE_ONLY`	Applies store-only data handling to a column
`TEXT_SEARCH`	Applies text-searchability to a column
`INIT_WITH_NOW`	For `DATE`, `TIME`, `DATETIME`, and `TIMESTAMP` column types, enables the database to replace empty strings and invalid timestamp values with `NOW()`

CREATE SCHEMA¶

Schemas are logical containers for tables, referred to as collections, natively.

To create one:

CREATE < SCHEMA | COLLECTION > <schema name>

For example, to create a schema:

CREATE SCHEMA container

ALTER SCHEMA¶

Any of the following facets of a schema can be altered:

Name
Protection Mode
TTL

Rename Schema¶

A schema can be renamed.

ALTER SCHEMA <schema name>
RENAME TO <new schema name>

For example, to rename a schema:

ALTER SCHEMA container
RENAME TO container_renamed

Set Protection Mode¶

A schema can be put into a protected mode, which will prevent tables & view within it from expiring. This will override the individual protection modes of this contained entities, though it will not overwrite their modes. When a schema is taken out of protected status, the contained tables & views return to their previous protection mode. A protected entity cannot have its TTL updated.

Setting PROTECTED to TRUE enables protection for a schema's tables & views, and prevents ALTER...SET TTL from being executed against it or its contained entities.

Setting PROTECTED to FALSE disables protection for a schema's tables & views, and allows subsequent ALTER...SET TTL requests to succeed.

ALTER SCHEMA <schema name>
SET PROTECTED < TRUE | FALSE >

For example, to protect a schema:

ALTER SCHEMA container
SET PROTECTED TRUE

Set TTL¶

A schema can have a time-to-live (TTL) set, which is a convenience for setting all of the tables & views within that schema to a given TTL.

ALTER SCHEMA <schema name>
SET TTL <new ttl>

For example, to set a schema to expire in 10 minutes:

ALTER SCHEMA container
SET TTL 10

DROP SCHEMA¶

When removing a schema from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the schema to drop doesn't exist; if IF EXISTS is specified, no error will be reported. Also, an error will be reported if the schema to drop contains any database objects (tables, etc.); if CASCADE is specified, the schema and all objects within it will be removed.

DROP < SCHEMA | COLLECTION > [IF EXISTS] <schema name> [CASCADE]

For example, to drop a schema, including its contained tables and views:

DROP SCHEMA container CASCADE

CREATE TABLE¶

Creates a new table in the configured ParentSet collection, unless a schema is specified.

The basic form of the supported CREATE TABLE statement follows. See here for column definition format.

CREATE [OR REPLACE] [REPLICATED] [TEMP] TABLE [<schema name>.]<table name>
(
    <column name> <column definition>,
    ...
    <column name> <column definition>,
    [PRIMARY KEY (<column list>)],
    [SHARD KEY (<column list>)],
    [FOREIGN KEY (<column list>) REFERENCES <foreign table name>(<foreign column list>) [AS <foreign key name>],...]
)
[<partition clause>]
[<tier strategy clause>]
[<index clause>]
[<option clause>]

Keyword(s)	Description
`OR REPLACE`	Any existing table/view with the same name will be dropped before creating this one
`REPLICATED`	The table will be distributed within the database as a replicated table
`TEMP`	If the database is restarted, the table will be removed

The table & column names used must adhere to the supported naming criteria.

Table clause definitions:

Partition Clause
Tier Strategy Clause
Index Clause
Option Clause

For example, to create a table with various column types and properties:

CREATE TABLE various_types
(
    i   INTEGER NOT NULL,                         /* non-nullable integer, part of primary key (defined at end)                         */
    bi  BIGINT NOT NULL,                          /* long, part of primary key, shard key, foreign key source (defined at end)          */
    ub  UNSIGNED BIGINT,                          /* native unsigned long                                                               */
    r   REAL,                                     /* native float                                                                       */
    f   FLOAT,                                    /* native double                                                                      */
    d   DOUBLE(STORE_ONLY),                       /* native double, not in-memory                                                       */
    s   VARCHAR(STORE_ONLY, TEXT_SEARCH),         /* string, searchable, not in-memory, only limited in size by system-configured value */
    c   VARCHAR(30, DICT),                        /* char32 using dictionary-encoding of values                                         */
    p   VARCHAR(256, SNAPPY, TEXT_SEARCH),        /* char256, searchable, using Snappy compression of values                            */
    ip  IPV4,                                     /* IP address                                                                         */
    ts  TIMESTAMP,                                /* timestamp                                                                          */
    td  DATE,                                     /* simple date                                                                        */
    tt  TIME,                                     /* simple time                                                                        */
    dt  DATETIME(INIT_WITH_NOW),                  /* date/time                                                                          */
    dc1 DECIMAL,                                  /* native long                                                                        */
    dc2 DECIMAL(18,4),                            /* native decimal                                                                     */
    dc3 DECIMAL(6,5),                             /* native float                                                                       */
    dc4 DECIMAL(7, 5, STORE_ONLY),                /* double, not in-memory                                                              */
    n   NUMERIC(5, 3),                            /* native decimal, the next largest native numeric type to hold the number type       */
    wkt WKT,                                      /* geospatial column for WKT string data                                              */
    PRIMARY KEY (i, bi),                          /* primary key columns must be NOT NULL                                               */
    SHARD KEY (bi),                               /* shard key columns must be part of the primary key                                  */
    FOREIGN KEY (bi) REFERENCES lookup(id) AS fk  /* foreign key is often on the shard key                                              */
)
INDEX (ip)                                        /* index on IP column                                                                 */
INDEX (ts)                                        /* index on timestamp column                                                          */

Partition Clause¶

A table can be further segmented into partitions. The supported partition types are:

Range
Interval
List
Hash

See Partitioning for details.

Range Partitioning¶

The general format for the range partition clause is:

CREATE TABLE <table name> ( <column list> )
PARTITION BY RANGE ( <column expression> )
[
    PARTITIONS
    (
        <partition name> [ MIN ( <least value> ) ] [ MAX ( <greatest value> ) ],
        ...
        <partition name> [ MIN ( <least value> ) ] [ MAX ( <greatest value> ) ]
    )
]

The partition definition clause, PARTITIONS, is optional, though it is recommended to define partitions at table creation time, when feasible.

Warning

Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.

For example, to create a range-partitioned table with the following criteria:

partitioned on the date/time of the order
partitions for years:
- 2014 - 2016
- 2017
- 2018
- 2019
records not in that range go to the default partition

CREATE TABLE customer_order_range_partition_by_year
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY RANGE (YEAR(purchase_ts))
PARTITIONS
(
    order_2014_2016 MIN(2014) MAX(2017),
    order_2017                MAX(2018),
    order_2018                MAX(2019),
    order_2019                MAX(2020)
)

Interval Partitioning¶

The general format for the interval partition clause is:

CREATE TABLE <table name> ( <column list> )
PARTITION BY INTERVAL ( <column expression> )
PARTITIONS
(
    STARTING [AT] (<least value>) INTERVAL (<interval size>)
)

For example, to create an interval-partitioned table with the following criteria:

partitioned on the date/time of the order
one partition for each year from 2014 on
later year partitions are added as necessary
records prior to 2014 go to the default partition

CREATE TABLE customer_order_interval_partition_by_year
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (YEAR(purchase_ts))
PARTITIONS
(
    STARTING AT (2014) INTERVAL (1)
)

To create an interval-partitioned table with the following criteria:

partitioned on the date/time of the order
one partition for each day from January 1st, 2014 on
later day partitions are added as necessary
records prior to 2014 go to the default partition

CREATE TABLE customer_order_interval_partition_by_day_timestampdiff
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (TIMESTAMPDIFF(DAY, '2014-01-01', purchase_ts))
PARTITIONS
(
    STARTING AT (0) INTERVAL (1)
)

The same interval-partitioned scheme above can be created using the timestamp column directly, with the help of the INTERVAL function (described in the Date/Time Functions section):

CREATE TABLE customer_order_interval_partition_by_day_interval
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (purchase_ts)
PARTITIONS
(
    STARTING AT ('2014-01-01') INTERVAL (INTERVAL '1' DAY)
)

This scheme can be easily modified to create an hourly partition instead:

CREATE TABLE customer_order_interval_partition_by_hour_interval
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (purchase_ts)
PARTITIONS
(
    STARTING AT ('2014-01-01') INTERVAL (INTERVAL '1' HOUR)
)

List Partitioning¶

The list partition clause has two forms:

manual
automatic

Manual¶

The general format for the manual list partition clause is:

CREATE TABLE <table name> ( <column list> )
PARTITION BY LIST ( <column expression list> )
[
    PARTITIONS
    (
        <partition name> VALUES ( <value lists> ),
        ...
        <partition name> VALUES ( <value lists> )
    )
]

The partition definition clause, PARTITIONS, is optional, though it is recommended to define partitions at table creation time, when feasible.

Warning

Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.

For example, to create a manual list-partitioned table with the following criteria:

partitioned on the date/time of the order
partitions for years:
- 2014 - 2016
- 2017
- 2018
- 2019
records not in that list go to the default partition

CREATE TABLE customer_order_manual_list_partition_by_year
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY LIST (YEAR(purchase_ts))
PARTITIONS
(
    order_2014_2016 VALUES (2014, 2015, 2016),
    order_2017      VALUES (2017),
    order_2018      VALUES (2018),
    order_2019      VALUES (2019)
)

To create a manual list-partitioned table with a multi-column key and the following criteria:

partitioned on the date/time of the order
each partition corresponds to a unique year & month pair
partitions for years/months:
- February 2016 & March 2016
- March 2020
records not in that list go to the default partition

CREATE TABLE customer_order_manual_list_partition_by_year_and_month
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY LIST (YEAR(purchase_ts), MONTH(purchase_ts))
PARTITIONS
(
    order_2016_0203 VALUES ((2016, 2), (2016, 3)),
    order_2020_03   VALUES ((2020, 3))
)

Automatic¶

The general format for the automatic list partition clause is:

CREATE TABLE <table name> ( <column list> )
PARTITION BY LIST ( <column expression list> )
AUTOMATIC

To create an automatic list-partitioned table with the following criteria:

partitioned on the date/time of the order
one partition for each unique year & month across all orders
partitions are added as necessary

CREATE TABLE customer_order_automatic_list_partition_by_year_and_month
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY LIST (YEAR(purchase_ts), MONTH(purchase_ts))
AUTOMATIC

Hash Partitioning¶

The general format for the hash partition clause is:

CREATE TABLE <table name> ( <column list> )
PARTITION BY HASH ( <column expressions> )
PARTITIONS <total partitions>

To create a hash-partitioned table with the following criteria:

partitioned on the date/time of the order
distributed among the fixed set of partitions, based on the hash of the year & month of the order
10 partitions

CREATE TABLE customer_order_hash_partition_by_year_and_month
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY HASH (YEAR(purchase_ts), MONTH(purchase_ts))
PARTITIONS 10

Tier Strategy Clause¶

A table can have a tier strategy specified at creation time. If not assigned a tier strategy upon creation, a table will be assigned a default tier strategy.

The general format for the tier strategy clause is:

TIER STRATEGY
(
    <tier strategy>,
    ...
    <tier strategy>
)

For example, to create a customer_order table with an above-average eviction priority in the RAM Tier:

CREATE OR REPLACE TABLE customer_order
(
    id          INT NOT NULL,
    customer_id INT NOT NULL,
    total_price DECIMAL(10,2),
    purchase_ts TIMESTAMP,
    SHARD KEY (customer_id)
)
TIER STRATEGY
(
    ( ( VRAM 1, RAM 7, PERSIST 5 ) )
)

If not specified, the default tier strategy will be assigned:

CREATE OR REPLACE TABLE customer_order
(
    id          INT NOT NULL,
    customer_id INT NOT NULL,
    total_price DECIMAL(10,2),
    purchase_ts TIMESTAMP,
    SHARD KEY (customer_id)
)

SHOW CREATE TABLE customer_order

| CREATE TABLE "customer_order"
(
   "id" INTEGER NOT NULL,
   "customer_id" INTEGER (shard_key) NOT NULL,
   "total_price" DECIMAL(18,4),
   "purchase_ts" TIMESTAMP
)
TIER STRATEGY (
( ( VRAM 1, RAM 7, PERSIST 5 ) )
) |

Note

The response to SHOW CREATE TABLE is a single-record result set with the DDL statement as the value in the DDL column, shown here with the column separators returned by kisql.

Index Clause¶

A table can have any number of indexes applied to any of its columns at creation time.

The two types of indexes supported are:

The general format for the table indexes clause is:

<CHUNK [SKIP] | [ATTRIBUTE]> INDEX (<column>)
...
<CHUNK [SKIP] | [ATTRIBUTE]> INDEX (<column>)

For example, to create a table with a column index on dept_id and a chunk skip index on id:

CREATE TABLE employee
(
    id INT NOT NULL,
    dept_id INT NOT NULL,
    manager_id INT,
    first_name VARCHAR(30),
    last_name VARCHAR(30),
    sal DECIMAL,
    hire_date DATE,
    PRIMARY KEY (id, dept_id),
    SHARD KEY (dept_id)
)
INDEX (dept_id)
CHUNK SKIP INDEX (id)

Option Clause¶

A subset of table options can be specified at creation time.

Option	Description
`CHUNK SIZE`	Size of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold
`TTL`	The time-to-live (TTL) for the table; if not set, the table will not expire

For example, to create a table with up to 1,000,000 records per chunk and that will expire in 15 minutes:

CREATE OR REPLACE TABLE customer_order
(
    id          INT NOT NULL,
    customer_id INT NOT NULL,
    total_price DECIMAL(10,2),
    purchase_ts TIMESTAMP,
    SHARD KEY (customer_id)
)
USING TABLE PROPERTIES (CHUNK SIZE = 1000000, TTL = 15)

CREATE TABLE...AS¶

Creates a new table from the given query in the configured ParentSet collection, unless a schema is specified.

The general format is:

CREATE [OR REPLACE] [REPLICATED] [TEMP] TABLE [<schema name>.]<table name> AS
(
    <SELECT statement>
)

Keyword(s)	Description
`OR REPLACE`	Any existing table/view with the same name will be dropped before creating this one
`REPLICATED`	The table will be distributed within the database as a replicated table
`TEMP`	If the database is restarted, the table will be removed

The table & column names used must adhere to the supported naming criteria.

While primary keys & foreign keys are not transferred to the new table, shard keys will be, if the column(s) composing them are part of the SELECT list.

The following can be applied to the SELECT statement to affect the resulting table:

Keyword	Type	Description
KI_HINT_GROUP_BY_PK	hint	Creates a primary keys on the columns in the `GROUP BY` clause if the outermost `SELECT` statement contains a `GROUP BY`
KI_HINT_INDEX(column list)	hint	Indexes each of the columns specified in the column list
KI_SHARD_KEY(column list)	pseudo-function	Shards the result table with a shard key composed of all columns in the specified column list

For example, to create a replicated temporary table that is a copy of an existing table, failing if a table with the same name as the target table already exists:

CREATE REPLICATED TEMP TABLE new_temporary_table AS
(
    SELECT *
    FROM old_table
)

To create a permanent table with columns a, b, c, & d a new shard key on columns a & b, and an index on column d, replacing a table with the same name as the target table, if it exists:

CREATE OR REPLACE TABLE new_sharded_table AS
(
    SELECT a, b, c, d, KI_SHARD_KEY(a, b) /* KI_HINT_INDEX(d) */
    FROM old_table
)

To copy a table with columns a, b, c, & d, preserving the primary key on a, b, & c, and the foreign key from d; a new table must be created to match the schema of the old one and then records can be copied from the old one to the new one:

CREATE TABLE new_pk_copy_table
(
    a INT NOT NULL,
    b INT NOT NULL,
    c VARCHAR(32) NOT NULL,
    d TIMESTAMP,
    PRIMARY KEY (a, b, c),
    FOREIGN KEY (d) REFERENCES old_table_lookup(d)
)

INSERT INTO new_pk_copy_table
SELECT *
FROM old_table

Note

This create/insert process is necessary, as neither primary keys nor foreign keys can be preserved through hints.

See Limitations for other restrictions.

ALTER TABLE¶

Any of the following facets of a table can be altered:

Name
Schema
Global Access Mode
Protection Mode
TTL
Columns
Column Indexes
Chunk Skip Indexes
Foreign Keys
Partitions
Tier Strategy

Rename Table¶

A table can be renamed, following the supported naming criteria.

ALTER TABLE [<schema name>.]<table name>
RENAME TO <new table name>

Move Table¶

A table can be moved from one schema to another, or to the top-level schema, ROOT.

The general form of the command is:

ALTER TABLE [<schema name>.]<table name>
< MOVE TO | SET SCHEMA > < ROOT | other schema name >

For example, to move the sales_2017 table from the olap schema to the archive schema:

ALTER TABLE olap.sales_2017
MOVE TO archive

To move the sales_2017 table from the archive schema to the top-level schema:

ALTER TABLE archive.sales_2017
MOVE TO ROOT

Set Access Mode¶

A table can have its global accessibility modified for all users in the system, independently from and further restricting any role-based access controls in place. Note that changing the access mode cannot widen access for users not already granted access; it can only narrow access for those who already have access. This setting will also trump administrative access to a table.

ALTER TABLE [<schema name>.]<table name>
SET ACCESS MODE < NO_ACCESS | READ_ONLY | WRITE_ONLY | READ_WRITE >

Set Protection Mode¶

A table can be altered to not expire, by altering its protection mode. Note that tables don't normally expire, but can be set to expire.

Setting PROTECTED to TRUE enables protection for a table and prevents ALTER TABLE...SET TTL from being executed against it.

Setting PROTECTED to FALSE disables protection for a table and allows subsequent ALTER TABLE...SET TTL requests to succeed.

ALTER TABLE <table name>
SET PROTECTED < TRUE | FALSE >

Set TTL¶

A table's time-to-live (TTL) can be altered.

ALTER TABLE <table name>
SET TTL <new ttl>

For example, to set a TTL of 7 minutes on a table:

ALTER TABLE employee
SET TTL 7

To set a table to never expire by TTL timeout:

ALTER TABLE employee
SET TTL -1

Add Column¶

A column can be added, specifying a column definition.

A new column can have its values initially populated through the use of the DEFAULT keyword. These values can either be a string/numeric constant or the name of an existing column in the table from which values can be copied into the new column. This default value is only in effect for the column creation; the new column will have no default value after that.

ALTER TABLE [<schema name>.]<table name>
ADD <column name> <column definition> [DEFAULT <string/numeric constant | column name>]

Note

Column compression must be applied after a new column is added; see Compress Column for syntax.

Examples

To add, to the employee table, a salary column that is a non-nullable, store-only, 10-digit number field containing 2 decimal places with a default value of 0:

ALTER TABLE employee
ADD salary NUMERIC(10, 2, STORE_ONLY) NOT NULL DEFAULT 0

To add, to the employee table, a category column that is a nullable, dictionary-encoded, 32-character text field:

ALTER TABLE employee
ADD category VARCHAR(32, DICT)

To add, to the employee table, a bio column that is a nullable, text-searchable, disk-optimized, unrestricted-width text field:

ALTER TABLE employee
ADD bio VARCHAR(TEXT_SEARCH, DISK_OPTIMIZED)

Rename Column¶

An existing column can be renamed:

ALTER TABLE [<schema name>.]<table name>
RENAME COLUMN <column current name> TO <column new name>

For example, to rename a column:

ALTER TABLE employee
RENAME COLUMN bio TO biography

Modify Column¶

A column can have its column definition modified, affecting column type, column size, column properties, and nullability.

If a column is modified to be non-nullable, it will be populated with default values--empty string for string fields and 0 for numeric fields.

Either of the following can be used to modify a column:

ALTER TABLE [<schema name>.]<table name>
MODIFY [COLUMN] <column name> <column definition>

ALTER TABLE [<schema name>.]<table name>
ALTER COLUMN <column name> <column definition>

Note

Column compression must be applied after an existing column is modified; see Compress Column for syntax.

Examples

To change, in the employee table, the first_name column to one that is a non-nullable, dictionary-encoded, 50-character text field:

ALTER TABLE employee
ALTER COLUMN first_name VARCHAR(50, DICT) NOT NULL

Compress Column¶

A column can have its data compressed in memory.

The general form to alter a column's compression setting is:

ALTER TABLE [<schema name>.]<table name>
SET COLUMN <column name> COMPRESSION [TO] <compression type>

For example, to use LZ4 compression on a column:

ALTER TABLE employee
SET COLUMN last_name COMPRESSION lz4

To use no compression on a column:

ALTER TABLE employee
SET COLUMN last_name COMPRESSION none

Drop Column¶

An existing column can be removed from a table:

ALTER TABLE [<schema name>.]<table name>
DROP COLUMN <column name>

Add Column Index¶

A column (attribute) index can be added to a table column in order to improve the performance of operations whose expressions contain relational operators against the column. See Limitations for restrictions.

ALTER TABLE [<schema name>.]<table name>
ADD INDEX (<column name>)

For example, to index the employee table's department ID column:

ALTER TABLE employee
ADD INDEX (dept_id)

Drop Column Index¶

An existing column (attribute) index can be removed from a table:

ALTER TABLE [<schema name>.]<table name>
DROP INDEX (<column name>)

For example, to drop the index on the employee table's department ID column:

ALTER TABLE employee
DROP INDEX (dept_id)

Add Chunk Skip Index¶

A chunk skip index can be added to a table column in order to improve the performance of operations containing equality-based filters against the column. See Limitations for restrictions.

ALTER TABLE [<schema name>.]<table name>
ADD CHUNK [SKIP] INDEX (<column name>)

For example, to index the employee table's department ID column:

ALTER TABLE employee
ADD CHUNK SKIP INDEX (dept_id)

Drop Chunk Skip Index¶

An existing chunk skip index can be removed from a table:

ALTER TABLE [<schema name>.]<table name>
DROP CHUNK [SKIP] INDEX (<column name>)

For example, to drop the chunk skip index on the employee table's department ID column:

ALTER TABLE employee
DROP CHUNK SKIP INDEX (dept_id)

Add Foreign Key¶

A foreign key can be added to any column or set of columns not marked as store-only in order to improve the performance of join operations between the table being altered and the table referenced in the foreign key.

ALTER TABLE [<schema name>.]<table name>
ADD FOREIGN KEY (<column name>,...) REFERENCES <foreign table name>(<foreign column name>,...) [AS <foreign key name>]

For example, to add a foreign key on the employee table's department ID column, linking it to the department table's department ID column:

ALTER TABLE employee
ADD FOREIGN KEY (dept_id) REFERENCES department(id) AS fk_emp_dept

Drop Foreign Key¶

An existing foreign key can be removed from a table, either by the name (alias) given to it during creation or by its definition:

By name:

ALTER TABLE [<schema name>.]<table name>
DROP FOREIGN KEY <foreign key name>

By definition:

ALTER TABLE [<schema name>.]<table name>
DROP FOREIGN KEY (<column name>,...) REFERENCES <foreign table name>(<foreign column name>,...)

For example, to drop the foreign key on the employee table's department ID column:

By name:

ALTER TABLE employee
DROP FOREIGN KEY fk_emp_dept

By definition:

ALTER TABLE employee
DROP FOREIGN KEY (dept_id) REFERENCES department(id)

Add Partition¶

A partition can be added to a range-partitioned or list-partitioned table.

Warning

Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.

Range Partition¶

The new partition can be given a minimum bound (inclusive) and a maximum bound (exclusive). If the new partition would come before an existing partition, omitting the maximum bound would cause the new partition to take on the nearest following existing partition's minimum bound as its maximum bound. If the new partition would come after an existing partition, omitting the minimum bound would cause the new partition to take on the nearest preceding partition's maximum bound as its minimum bound. If no partitions are present in the table, the new partition will have to declare both a minimum and maximum bound.

ALTER TABLE [<schema name>.]<table name>
ADD PARTITION <partition name> [ MIN ( <least value> ) ] [ MAX ( <greatest value> ) ]

For example, to add a partition to the customer_order_range_by_year table, containing all records with a partition key less than 2020 and greater than or equal to the maximum bound of the nearest preceding partition:

ALTER TABLE customer_order_range_partition_by_year
ADD PARTITION order_2020 MAX(2021)

List Partition¶

The new partition can be given a list of values to match against the partition key values of incoming records.

ALTER TABLE [<schema name>.]<table name>
ADD PARTITION <partition name> VALUES ( <value lists> )

For example, to add a partition to the customer_order_manual_list_partition_by_year table, containing all records from 2020:

ALTER TABLE customer_order_manual_list_partition_by_year
ADD PARTITION order_2020 VALUES (2020)

For example, to add a partition to the customer_order_manual_list_partition_by_year_and_month table, containing all records from February 2020 & April 2020:

ALTER TABLE customer_order_manual_list_partition_by_year_and_month
ADD PARTITION order_2020_0204 VALUES ((2020, 2), (2020, 4))

Remove Partition¶

An existing partition can be removed from a range-partitioned or list-partitioned table, sending all data contained within that partition back to the default partition.

ALTER TABLE [<schema name>.]<table name>
REMOVE PARTITION <partition name>

For example, to remove a partition named order_2017 from the customer_order_range_by_year table:

ALTER TABLE customer_order_range_partition_by_year
REMOVE PARTITION order_2017

Delete Partition¶

An existing partition can be dropped from a range-partitioned or list-partitioned table, deleting all data contained within it.

ALTER TABLE [<schema name>.]<table name>
DELETE PARTITION <partition name>

For example, to drop a partition named order_2014_2016 from the customer_order_range_by_year table, deleting all data within that partition:

ALTER TABLE customer_order_range_partition_by_year
DELETE PARTITION order_2014_2016

Set Tier Strategy¶

A table's eviction priorities can be adjusted by setting its tier strategy.

ALTER TABLE [<schema name>.]<table name>
SET TIER STRATEGY (<tier strategy>)

For example, to set the customer_order table's tier strategy, to one with a below-average eviction priority in the RAM Tier:

ALTER TABLE customer_order
SET TIER STRATEGY
(
    ( ( VRAM 1, RAM 3, PERSIST 5 ) )
)

The tier strategy can also be reset to the system default strategy.

ALTER TABLE [<schema name>.]<table name>
RESET TIER STRATEGY

For example, to reset the customer_order table's tier strategy:

ALTER TABLE customer_order
RESET TIER STRATEGY

TRUNCATE TABLE¶

Deletes all the records from a table:

TRUNCATE TABLE [<schema name>.]<table name>

DROP TABLE¶

When removing a table from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the table to drop doesn't exist; if IF EXISTS is specified, no error will be reported.

DROP TABLE [IF EXISTS] [<schema name>.]<table name>

CREATE VIEW¶

Creates a new virtual table from the given query in the specified schema.

A view must adhere to the supported naming criteria for tables & columns.

Note

SQL views are only able to be queried via ODBC/JDBC connection or the /execute/sql endpoint. They are, however, still visible in the GAdmin table listing and manageable through other DDL endpoints.

The general format is:

CREATE [OR REPLACE] VIEW [<schema name>.]<view name> AS
<SELECT statement>

Keyword(s)	Description
`OR REPLACE`	Any existing table/view with the same name will be dropped before creating this view

When any of a view's source tables is dropped, the view will also be dropped.

Caution

A CREATE OR REPLACE issues an implicit drop, so replacing an input table will have the same effect of dropping the view.

For example, to create a view that is a copy of an existing table, failing if a table, view, or collection with the same name as the target view already exists:

CREATE VIEW view_of_table AS
(
    SELECT *
    FROM table_to_view
)

CREATE MATERIALIZED VIEW¶

Specifying MATERIALIZED in a CREATE VIEW statement will make the view a materialized view.

A view must adhere to the supported naming criteria for tables & columns.

The general format is:

CREATE [OR REPLACE] [TEMP] MATERIALIZED VIEW [<schema name>.]<view name>
[
    REFRESH
    <
        OFF |
        ON CHANGE |
        EVERY <number> <SECOND[S] | MINUTE[S] | HOUR[S] | DAY[S]> [STARTING AT <YYYY-MM-DD [HH:MM[:SS]]>]
    >
]
AS
<SELECT statement>

Keyword(s) Description

OR REPLACE Any existing table/view with the same name will be dropped before creating this one

TEMP If the database is restarted, the table will be removed

REFRESH

Specifies the data refresh scheme for the view. The following schemes are available:

Constant	Description
`OFF`	(the default) Will prevent the view from being automatically refreshed, but will still allow manual refreshes of the data to be requested
`ON CHANGE`	Will cause the view to be updated any time a record is added, modified, or deleted from the subtending tables in the view's query
`EVERY`	Allows specification of an interval in seconds, minutes, hours, or days, with the optional specification of a starting time at which the first refresh interval will run; if no start time is specified, the default will be an interval's worth of time from the point at which the view creation was requested

The intermediary results of materialized views are cached to improve the performance of queries against them. This means that, unlike typical views, materialized views are not lightweight database entities, but rather consume memory and processing time proportional to the size of the source data and complexity of the query.

When any of the source tables of a materialized view is dropped, the view will also be dropped.

Caution

A CREATE OR REPLACE issues an implicit drop, so replacing an input table will have the same effect of dropping the view.

While primary keys & foreign keys are not transferred to the new view, shard keys will be, if the column(s) composing them are part of the SELECT list. A new shard key can be specified for the created view by using the KI_SHARD_KEY(<column list>) pseudo-function in the SELECT list.

To create a materialized view with columns a, b, c, & d and a new shard key on columns a & b, that refreshes once per half hour, replacing a view with the same name as the target view, if it exists:

CREATE OR REPLACE MATERIALIZED VIEW materialized_view_of_table
REFRESH EVERY .5 HOURS AS
(
    SELECT a, b, c, d, KI_SHARD_KEY(a, b)
    FROM table_to_view
)

ALTER VIEW¶

Any of the following facets of a view can be altered:

Name
Schema
Global Access Mode
Protection Mode
Refresh Mode
TTL

Rename View¶

A view can be renamed, following the supported naming criteria.

ALTER VIEW [<schema name>.]<view name>
RENAME TO <new view name>

Move View¶

A view can be moved from one schema to another.

The general form of the command is:

ALTER VIEW [<schema name>.]<view name>
< MOVE TO | SET SCHEMA > <new schema name>

For example, to move the sales_2017 view from the olap schema to the archive schema:

ALTER VIEW olap.sales_2017
MOVE TO archive

To move the sales_2017 view from the archive schema to the root schema:

ALTER VIEW archive.sales_2017
MOVE TO ROOT

Set Access Mode¶

A view can have its global accessibility modified for all users in the system, independently from and further restricting any role-based access controls in place. Note that changing the access mode cannot widen access for users not already granted access; it can only narrow access for those who already have access. This setting will also trump administrative access to a view.

ALTER VIEW [<schema name>.]<view name>
SET ACCESS MODE < NO_ACCESS | READ_ONLY | WRITE_ONLY | READ_WRITE >

Set Protection Mode¶

A view can be altered to not expire, by altering its protection mode. Note that views don't normally expire, but can be set to expire.

Setting PROTECTED to TRUE enables protection for a view and prevents ALTER VIEW...SET TTL from being executed against it.

Setting PROTECTED to FALSE disables protection for a view and allows subsequent ALTER VIEW...SET TTL requests to succeed.

ALTER view <view name>
SET PROTECTED < TRUE | FALSE >

Set Refresh Mode¶

The refresh mode of a materialized view can be modified.

The general form of the command is:

ALTER MATERIALIZED VIEW [<schema name>.]<view name>
SET REFRESH
<
    OFF |
    ON CHANGE |
    EVERY <number> <SECOND[S] | MINUTE[S] | HOUR[S] | DAY[S]> [STARTING AT <YYYY-MM-DD [HH:MM[:SS]]>]
>

The available refresh modes are:

Constant	Description
`OFF`	Will prevent the view from being automatically refreshed, but will still allow manual refreshes of the data to be requested
`ON CHANGE`	Will cause the view to be updated any time a record is added, modified, or deleted from the subtending tables in the view's query
`EVERY`	Allows specification of an interval in seconds, minutes, hours, or days, with the optional specification of a starting time at which the first refresh interval will run; if no start time is specified, the default will be an interval's worth of time from the point at which the view alteration was requested

For example, to alter the current sales table to refresh every 6 hours:

ALTER MATERIALIZED VIEW sales_current
SET REFRESH EVERY 6 HOURS

This would alter the view in the same way:

ALTER MATERIALIZED VIEW sales_current
SET REFRESH EVERY .25 DAYS

Set TTL¶

A view's time-to-live (TTL) can be altered.

ALTER VIEW <view name>
SET TTL <new ttl>

REFRESH VIEW¶

Refreshes the data within a materialized view:

REFRESH MATERIALIZED VIEW [<schema name>.]<view name>

DROP VIEW¶

When removing a view from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the view to drop doesn't exist; if IF EXISTS is specified, no error will be reported.

DROP [MATERIALIZED] VIEW [IF EXISTS] [<schema name>.]<view name>

DESCRIBE¶

Lists the columns and column types & properties for a given table or view; or lists the contained tables and views of a given schema:

DESC[RIBE] <schema name>

DESC[RIBE] [<schema name>.]<table/view name>

For example, to describe the tables contained in the MASTER schema, into which demo data is usually downloaded:

DESC MASTER

+------------+
| Set_name   |
+------------+
| shipping   |
| nyctaxi    |
| stocks     |
| movies     |
| flights    |
+------------+

To describe the example table created in the CREATE TABLE section:

DESC various_types

+-----------+--------+------------+--------------------------------------+
| Col_num   | Name   | Null?      | Type                                 |
+-----------+--------+------------+--------------------------------------+
| 0         | i      | NOT NULL   | INTEGER (primary_key)                |
| 1         | bi     | NOT NULL   | BIGINT (primary_key, shard_key)      |
| 2         | ub     |            | UNSIGNED BIGINT                      |
| 3         | r      |            | REAL                                 |
| 4         | f      |            | REAL                                 |
| 5         | d      |            | DOUBLE (store_only)                  |
| 6         | s      |            | VARCHAR (store_only, text_search)    |
| 7         | c      |            | VARCHAR (32, dict)                   |
| 8         | p      |            | VARCHAR (text_search, 256, snappy)   |
| 9         | ip     |            | IPV4                                 |
| 10        | ts     |            | TIMESTAMP                            |
| 11        | td     |            | DATE                                 |
| 12        | tt     |            | TIME                                 |
| 13        | dt     |            | DATETIME (init_with_now)             |
| 14        | dc1    |            | BIGINT                               |
| 15        | dc2    |            | DECIMAL(18,4)                        |
| 16        | dc3    |            | REAL                                 |
| 17        | dc4    |            | DOUBLE (store_only)                  |
| 18        | n      |            | DECIMAL(18,4)                        |
| 19        | wkt    |            | GEOMETRY                             |
+-----------+--------+------------+--------------------------------------+

SHOW CREATE SCHEMA¶

Outputs the DDL statement required to reconstruct the given schema:

SHOW [CREATE < SCHEMA | COLLECTION >] <schema name>

Note

The response to SHOW CREATE SCHEMA is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.

For example, to output the DDL for the example schema created in the CREATE SCHEMA section:

SHOW CREATE SCHEMA container

| CREATE COLLECTION container   |

SHOW CREATE TABLE¶

Outputs the DDL statement required to reconstruct the given table:

SHOW [CREATE TABLE] [<schema name>.]<table name>

Note

The response to SHOW CREATE TABLE is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.

For example, to output the DDL for the example table created in the CREATE TABLE section:

SHOW CREATE TABLE various_types

| CREATE TABLE "various_types"
(
   "i" INTEGER (primary_key) NOT NULL,
   "bi" BIGINT (primary_key, shard_key) NOT NULL,
   "ub" UNSIGNED BIGINT,
   "r" REAL,
   "f" REAL,
   "d" DOUBLE (store_only),
   "s" VARCHAR (store_only, text_search),
   "c" VARCHAR (32, dict),
   "p" VARCHAR (text_search, 256, snappy),
   "ip" IPV4,
   "ts" TIMESTAMP,
   "td" DATE,
   "tt" TIME,
   "dt" DATETIME DEFAULT NOW(),
   "dc1" BIGINT,
   "dc2" DECIMAL(18,4),
   "dc3" REAL,
   "dc4" DOUBLE (store_only),
   "n" DECIMAL(18,4),
   "wkt" GEOMETRY,
    FOREIGN KEY (bi) references lookup(id) as fk
)
TIER STRATEGY (
( ( VRAM 1, RAM 5, PERSIST 5 ) )
)
ATTRIBUTE INDEX (ip)
ATTRIBUTE INDEX (ts) |

SHOW CREATE VIEW¶

Outputs the DDL statement required to reconstruct the given view:

SHOW [CREATE VIEW] [<schema name>.]<view name>

Note

The response to SHOW CREATE VIEW is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.

For example, to output the DDL for the example view created in the CREATE VIEW section:

SHOW CREATE VIEW view_of_table

| CREATE VIEW "view_of_table"
 AS (
    SELECT *
    FROM table_to_view
) |

To output the DDL for the example materialized view created in the CREATE MATERIALIZED VIEW section:

SHOW CREATE VIEW materialized_view_of_table

| CREATE MATERIALIZED VIEW "materialized_view_of_table"

REFRESH EVERY 30 MINUTES AS (
    SELECT a, b, c, d, KI_SHARD_KEY(a, b)
    FROM table_to_view
) |

Data Manipulation (DML)¶

INSERT¶

To insert records with literal values, use this format:

INSERT INTO [<schema name>.]<table name> [(<column list>)]
VALUES (<column value list>)[,...]

For example:

INSERT INTO employee (id, dept_id, manager_id, first_name, last_name, sal, hire_date)
VALUES
    (1, 1, null, 'Anne', 'Arbor', 200000, '2000-01-01'),
    (2, 2, 1, 'Brooklyn', 'Bridges', 100000, '2000-02-01'),
    (3, 3, 1, 'Cal', 'Cutta', 100000, '2000-03-01'),
    (4, 2, 2, 'Dover', 'Della', 150000, '2000-04-01'),
    (5, 2, 2, 'Elba', 'Eisle', 50000, '2000-05-01'),
    (6, 4, 1, 'Frank', 'Furt', 12345.6789, '2000-06-01')

To insert records, using another table as the source, use this format:

INSERT INTO [<schema name>.]<table name> [(<column list>)]
<SELECT statement>

For example:

INSERT INTO employee_backup (id, dept_id, manager_id, first_name, last_name, sal)
SELECT id, dept_id, manager_id, first_name, last_name, sal
FROM employee
WHERE hire_date >= '2000-01-01'

Note

When specifying a column list, any non-nullable fields not included in the list will be given default values--empty string for strings, and 0 for numerics. The fields in the column list and the fields selected must align.

To upsert records, inserting new records and updating existing ones (as denoted by primary key), use the KI_HINT_UPDATE_ON_EXISTING_PK hint. If the target table has no primary key, this hint will be ignored.

For example:

INSERT INTO employee_backup /* KI_HINT_UPDATE_ON_EXISTING_PK */
SELECT *
FROM employee
WHERE hire_date >= '2000-01-01'

Important

By default, any record being inserted that matches the primary key of an existing record in the target table will be discarded, and the existing record will remain unchanged. The KI_HINT_UPDATE_ON_EXISTING_PK hint overrides this behavior, favoring the source records over the target ones.

Tip

If Multi-Head Ingest has been enabled on the database server, INSERT operations will automatically leverage it, when applicable.

UPDATE¶

Updates can set columns to specified constant values or expressions. The general format is:

UPDATE [<schema name>.]<table name>
SET
    <column 1> = <expression 1>,
    ...
    <column n> = <expression n>
[WHERE <expression list>]

For example, to update employee with ID 5 to have a new manager, with ID 3, and to have a 10% salary increase:

UPDATE employee
SET
    sal = sal * 1.10,
    manager_id = 3
WHERE id = 5

Subqueries can also be used in the expression list. To update all the bottom earners in each department with a 5% salary increase:

UPDATE employee b
SET sal = sal * 1.05
WHERE sal =
    (
        SELECT MIN(sal)
        FROM employee l
        WHERE b.dept_id = l.dept_id
    )

DELETE¶

Deletes records from a table; the general format is:

DELETE
FROM [<schema name>.]<table name>
[WHERE <expression list>]

For example, to delete employee with ID 6:

DELETE
FROM employee
WHERE id = 6

Subqueries can also be used in the expression list. To delete all the most recent hires in each department:

DELETE
FROM employee b
WHERE hire_date =
    (
        SELECT MAX(hire_date)
        FROM employee l
        WHERE b.dept_id = l.dept_id
    )

Loading Data from Files¶

Kinetica supports loading of text-based delimited data files via SQL. There are two supported paths:

Loading client-side files via ODBC/JDBC
Loading server-side files via /execute/sql endpoint call

Loading Client-Side Files¶

Kinetica can load data from text-based data files into existing tables using a standard INSERT INTO...SELECT statement.

When reading data from a file, Kinetica assumes the following format, able to be overridden using command options:

If there is a header row, it is either:

a simple list of comma-delimited column names:
```
id,category,name,description,stock
```

the Kinetica standard format:

id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable

The data fields are comma-delimited
Strings are optionally enclosed in double-quotes; double-quotes must be used when the data contains commas; two consecutive double-quotes are used as an escape code for double-quoted string data containing double-quotes; e.g.:

"This string contains a "" quote mark and a "","" double-quoted comma."

This would be loaded into a single string column in Kinetica as:

This string contains a " quote mark and a "," double-quoted comma.

Data can be inserted into a table from a file with the following syntax:

INSERT INTO [<schema name>.]<table name> [( <column list> )]
SELECT <field list>
FROM FILE."<file name | file name expression>"
[ <option list> ]

Parameters	Description
`<schema name>`	Name of the schema containing the table into which data will be inserted
`<table name>`	Name of the table into which data will be inserted
`<column list>`	Optional list of target table columns into which data will be inserted; if not specified, all target table columns will be assumed. Whether specified or implied, this list needs to match the fields specified in `<field list>` in number, order, & type.
`<field list>`	List of the source data file field names from which data will be extracted; use `` to extract all columns (this wildcard must be used when a source file has no header which* defines field names). This list needs to match all the target table columns (or those specified in the `<column list>` option) in number, order, & type.
`<file name \|` `file name expression>`	Name of the source data file or set of source data files from which records will be extracted; wildcards (`*`) can be used to specify a group of files
`<option list>`	Optional list of whitespace-separated processing options

Options¶

The following options can be specified when loading data from files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.

Option Description

BATCH_SIZE=<n>

Use an ingest batch size of n records.

The default batch size is 10,000.

COMMENT='<string>'

Treat lines in the source file that begin with string as comments and skip.

The default comment marker is #.

DELIMITER='<char>'

Use char as the source file field delimiter.

The default delimiter is ,, unless the source file has one of these extensions:

.psv - will cause | to be the delimiter
.tsv - will cause the tab character to be the delimiter

See Option Characters for allowed characters.

ESCAPE='<char>'

Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data String	Representation when Loaded into the Database
`<char>a`	ASCII bell
`<char>b`	ASCII backspace
`<char>f`	ASCII form feed
`<char>n`	ASCII line feed
`<char>r`	ASCII carriage return
`<char>t`	ASCII horizontal tab
`<char>v`	ASCII vertical tab

For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database.

The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

There is no default escape character. See Option Characters for allowed characters.

HEADER=<Y|N> Declare that the source file has or does not have a header. If not given, that determination will be intuited.

INITIAL_CLEAR

Truncate the target table before loading data.

The default is to not truncate the target table.

LIMIT=<n>

Limit the total number of records inserted into the target table to n. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the three files being read have 10 records each, all of the records from the 1st file and the first 5 records from the 2nd file will be loaded. The remaining records in the 2nd file and all of the records from the 3rd file will be skipped.

The default is no limit.

NULL='<string>'

Treat string as the indicator of a null source field value.

The default is \N.

ON_ERROR=< Permissive | Skip | Abort >

When an error is encountered loading a record, handle it using one of the following modes:

Mode	Description
`Permissive`	If an error is encountered parsing a source record, attempt to insert as much of the valid fields from the record as possible.
`Skip`	If an error is encountered parsing a source record, skip the record.
`Abort`	If an error is encountered parsing a source record, stop the data load process

The default mode is Skip.

QUOTE='<char>'

Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given.

The default is the " (double-quote) character. See Option Characters for allowed characters.

SKIP=<n>

Skip the first n source file data lines read, not including header and comment lines. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the first two files read have 10 records each, the first record to be loaded (not skipped) will be the 6th record of the 2nd file.

The default is to skip no records.

Option Characters¶

For DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:

Char	Description
`''`	Single quote
`\\`	Backslash
`\a`	ASCII bell
`\b`	ASCII backspace
`\f`	ASCII form feed
`\t`	ASCII horizontal tab
`\v`	ASCII vertical tab

Examples¶

Single File¶

To load a CSV file of product data located at /tmp/data/products.csv into a table named product:

INSERT INTO product
SELECT *
FROM FILE."/tmp/data/products.csv"

Multiple Files¶

To load all files whose names begin with products and end in a csv extension, located under /tmp/data, into a table named product:

INSERT INTO product
SELECT *
FROM FILE."/tmp/data/products*csv"

Column/Field Names¶

To load specific fields from a product data file located at /tmp/data/products.csv into specific target table columns:

INSERT INTO product (id, stock, category, name)
SELECT ID, Stock, Category, Name
FROM FILE."/tmp/data/products.csv"

Note

When specifying source data file field names, it is not required that target table column names also be specified, or vice versa. The only requirement is that the set of source data file fields selected align with the target table columns into which data will be inserted.

Options¶

To load data from a product data file with the following options:

file located at /tmp/data/products.csv
only records 4 through 10 will be loaded
any error will cause the insert to fail
the target table will be truncated before loading
records will be inserted into the table in groups of 2

INSERT INTO product
SELECT *
FROM FILE."/tmp/data/products.csv"
BATCH_SIZE=2
INITIAL_CLEAR
LIMIT=7
ON_ERROR=Abort
SKIP=3

To load data from a product data file with the following options:

file located at /tmp/data/products.ssv
the file has a header
the field delimiter is a semicolon
data requiring quotes will have single quotes as the quoting character
the escape character is the backtick
the data file represents null values as <null>
data file comments are on lines starting with --

INSERT INTO product
SELECT *
FROM FILE."/tmp/data/products.ssv"
COMMENT='--'
DELIMITER=';'
ESCAPE='`'
HEADER=Y
NULL='<null>'
QUOTE='\''

Considerations¶

Multi-Head

If Multi-Head Ingest has been enabled on the database server, file loading through ODBC/JDBC will automatically leverage it, when applicable.

Blank Lines

A blank line in a source data file will be treated as a valid record and will be inserted into the target table as a record with all null fields. If blank lines should not be considered as data, they should be prefixed with the comment marker in order to be skipped by the loading process.

GAdmin

While GAdmin does support SQL file ingestion, it is recommended to use server-side loading instead. Upon receiving this client-side load command, GAdmin will spawn a JDBC client on its host that will need to reference files accessible to the gpudb user located on that host (usually, the head node of the Kinetica cluster).

/execute/sql endpoint

While SQL file ingestion is available to ODBC/JDBC clients and via GAdmin; it is not supported within the /execute/sql endpoint, either as a REST call directly or as an API call (like the Python API's execute_sql() function).

Loading Server-Side Files¶

Kinetica can load data from text-based data files, located on the Kinetica cluster itself, into existing tables using a LOAD INTO... statement.

Data can be loaded with the following syntax:

<COPY | LOAD> [DATA] INTO [<schema name>.]<table name>
FROM FILE PATHS <file paths>
[<format clause>]
[<load options clause>]
[<table properties clause>]

Parameters Description

DATA Optional keyword for compatibility

<schema name> Name of the schema containing the table into which data will be loaded

<table name> Name of the table into which data will be loaded

<file paths>

Comma-separated list of single-quoted server-side file paths from which data will be loaded; wildcards (*) can be used to specify a group of files.

The files must all be accessible to the gpudb user and reside on the path (or relative to the path) specified by the external files directory in the Kinetica configuration file.

For example, if external_files_directory is set to /opt/gpudb/data, the following <file paths> specifications will have the corresponding outcomes:

File Path	Outcome
`/opt/gpudb/data/products.csv`	File `/opt/gpudb/data/products.csv` loaded into database
`archive/products.csv`	File `/opt/gpudb/data/archive/products.csv` loaded into database
`archive/*.csv`	All files under `/opt/gpudb/data/archive` with a `csv` extension loaded into database
`/opt/gpudb/products.csv`	Error, as absolute path `/opt/gpudb` does not contain external files directory `/opt/gpudb/data` as part of the path

<format clause> Optional clause specifying the source file format and options

<load options clause> Optional clause specifying load options, which modify the way in which the data is processed. This can be used to initiate a dry run load.

<table properties clause> Optional comma-separated list of table options relating to loading data into the target

Format Clause¶

The type of the source data file(s) can be specified, as well as a set of format options, depending on the format selected. Supported file types include delimited text files (CSV, TSV, PSV, etc.)

The general format for the format clause is:

FORMAT <[DELIMITED] TEXT [(<delimited text options>)]>

Parameters	Description
`DELIMITED`	Optional keyword for clarity in specifying delimited text type
`USING...`	Optional clause specifying a comma-separated list of delimited text options to apply to the source file(s). For example: DELIMITER AS ';', QUOTE = '"'

Delimited Text Options¶

The following options can be specified when loading data from delimited text files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.

Option Description

COMMENT = '<string>'

Treat lines in the source file(s) that begin with string as comments and skip.

The default comment marker is #.

COLUMN FORMATS = '<string>'

Use the given type-specific formatting for the given column when parsing source data being loaded into that column. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string.

Supported column types include:

Column Type Source Data Format Codes

date

Apply the given date format to the given column.

Important

Valid formats must contain one of each code

Format codes include:

Code	Description
`%Y`	4-digit year
`%m`	2-digit month, where January is `01`
`%d`	2-digit day of the month, where the 1st of each month is `01`

time

Apply the given time format to the given column.

Important

Valid formats must contain hours, minutes, & one of the seconds codes.

Format codes include:

Code	Description
`%H`	24-based hour, where 12:00 AM is `00` and 7:00 PM is `19`
`%M`	2-digit minute of the hour
`%S`	2-digit second of the minute
`%s`	2-digit second of the minute plus an N-digit fractional component, separated by a dot Note fractional seconds will be truncated after the milliseconds place

datetime

Apply the given date/time format to the given column.

Important

Valid formats must contain a date & time format, as detailed above.

For example, to make the YYYY.MM.DD format for loading source data into date column d and HH:MM:SS format for loading source data into time column t:

{
    "d": {"date": "%Y.%m.%d"},
    "t": {"time": "%H:%M:%S"}
}

DEFAULT COLUMN FORMATS = '<string>'

Use the given formats for source data being loaded into target table columns with the corresponding column types. This should be a map of target column type to source format for data being loaded into columns of that type, formatted as a JSON string.

Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option.

For example, to make the default format for loading source data dates in the form YYYY.MM.DD and times in the form HH:MM:SS:

{
    "date": "%Y.%m.%d",
    "time": "%H:%M:%S",
    "datetime": "%Y.%m.%d %H:%M:%S"
}

DELIMITER = '<char>'

Use char as the source file field delimiter.

The default delimiter is a comma, unless a source file has one of these extensions:

.psv - will cause | to be the delimiter
.tsv - will cause the tab character to be the delimiter

See Delimited Text Option Characters for allowed characters.

ESCAPE = '<char>'

Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data String	Representation when Loaded into the Database
`<char>a`	ASCII bell
`<char>b`	ASCII backspace
`<char>f`	ASCII form feed
`<char>n`	ASCII line feed
`<char>r`	ASCII carriage return
`<char>t`	ASCII horizontal tab
`<char>v`	ASCII vertical tab

For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database.

The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

There is no default escape character. See Delimited Text Option Characters for allowed characters.

FIELDS MAPPED BY < POSITION(<col#s>) | NAME(<field_names>) >

Choose a comma-separated list of fields from the source file(s) to load, specifying fields by either position or name. If loading by name, the source file field names must match the target table column names exactly.

Note

When specifying source data file fields, the set of source data file fields must align, in type & number, with the target table columns into which data will be loaded.

HEADER DELIMITER = '<char>'

Use char as the source file header field name/property delimiter, when the source file header contains both names and properties. This is largely specific to the Kinetica export to delimited text feature, which will, within each field's header, contain the field name and any associated properties, delimited by the pipe | character.

An example Kinetica header in a CSV file:

id|int|data,category|string|data|char16,name|string|data|char32

The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters.

Note

The DELIMITER character will still be used to separate field name/property sets from each other in the header row

INCLUDES HEADER = <TRUE|FALSE> Declare that the source file(s) will or will not have a header. If not given, that determination will be intuited.

NULL = '<string>'

Treat string as the indicator of a null source field value.

The default is the empty string.

QUOTE = '<char>'

Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given.

The default is the " (double-quote) character. See Delimited Text Option Characters for allowed characters.

Delimited Text Option Characters¶

For DELIMITER, HEADER DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:

Char	Description
`''`	Single quote
`\a`	ASCII bell
`\b`	ASCII backspace
`\f`	ASCII form feed
`\t`	ASCII horizontal tab
`\v`	ASCII vertical tab

Load Options Clause¶

A comma-separated list of options for modifying the way data is loaded (or not loaded) into the target table can be specified.

The general format for the load options clause is:

WITH OPTIONS (<load options>)

Available load options include:

Option Description

BATCH SIZE = <n>

Use an ingest batch size of n records.

The default batch size is 10,000.

LOADING MODE = < HEAD | DISTRIBUTED LOCAL | DISTRIBUTED SHARED >

Use one of the following distribution schemes to load server files. The default mode is HEAD.

Mode	Description
`HEAD`	The head node loads all data. All files must be available to the head node.
`DISTRIBUTED LOCAL`	Each worker node loads all files that are available to it. This option works best when each worker loads files from its own file system.
`DISTRIBUTED SHARED`	The worker nodes coordinate loading a set of files available to all of them.

ON ERROR = < PERMISSIVE | SKIP | ABORT >

When an error is encountered loading a record, handle it using one of the following modes. The default mode is PERMISSIVE.

Mode	Description
`PERMISSIVE`	If an error is encountered parsing a source record, attempt to insert as many of the valid fields from the record as possible.
`SKIP`	If an error is encountered parsing a source record, skip the record.
`ABORT`	If an error is encountered parsing a source record, stop the data load process.

DRY RUN = <TRUE|FALSE> Run the load in dry run mode. If true, no data will be inserted, but the file will be read with the applied ON ERROR mode and the number of valid records that would normally be inserted is returned.

Table Properties Clause¶

A comma-separated list of options for creating the target table, if it doesn't exist and if the source data file has Kinetica headers, can be specified.

Kinetica headers contain column type information, which are used in creating the target table if it doesn't exist. An example Kinetica header:

id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable

The general format for the table properties clause is:

USING TABLE PROPERTIES (<table properties>)

Available table properties include:

Option	Description
`CHUNK SIZE`	Size of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold
`TTL`	The time-to-live (TTL) for the table; if not set, the table will not expire

Examples¶

Single File¶

To load a CSV file of product data located at data/products.csv, relative to the configured external files directory, into a table named product:

LOAD DATA INTO product
FROM FILE PATHS 'data/products.csv'

Single File - No Header¶

To load a headerless CSV file of product data located at data/products.nh.csv, relative to the configured external files directory, into a table named product:

LOAD DATA INTO product
FROM FILE PATHS 'data/products.nh.csv'
FORMAT TEXT (INCLUDES HEADER = FALSE)

Multiple Files¶

To load all files whose names begin with products and end in a csv extension, located at data/products*csv, relative to the configured external files directory, into a table named product:

LOAD DATA INTO product
FROM FILE PATHS 'data/products*csv'
FORMAT TEXT

When loading files from multiple paths, specify a comma-delimited list of those file paths. To load one file located at data/products.csv and one file located at data/products.kh.csv, relative to the configured external files directory, into a table named product:

LOAD DATA INTO product
FROM FILE PATHS 'data/products.csv', 'data/products.kh.csv'
FORMAT TEXT

Note

Multiple paths and wildcards specifying multiple files on those paths can be used together.

Column/Field Names¶

To load specific fields from a product data file, with the following conditions:

data file located at data/products.title-case.csv, relative to the configured external files directory
data file contains a header with fields named ID, Category, Name, Description, & Stock
target table named product_name_stock
target columns named id, name, & stock
only load fields ID, Name, & Stock into columns id, name, & stock
skip records that fail to load successfully

LOAD DATA INTO product_name_stock
FROM FILE PATHS 'data/products.title-case.csv'
FORMAT TEXT (FIELDS MAPPED BY NAME(ID, Name, Stock))
WITH OPTIONS (ON ERROR = SKIP)

Note

When specifying source data file field names, the set of source data file fields selected must align, in type & number, with the target table columns into which data will be loaded.

Delimited Text Options¶

To load data from a product data file with the following options:

file located at data/products.ssv, relative to the configured external files directory
the file has a header
the field delimiter is a semicolon
data requiring quotes will have single quotes as the quoting character
the escape character is the backtick
the data file represents null values as <null>
data file comments are on lines starting with --
when parse errors are encountered, as much of the record's data as possible will be loaded into the target table

LOAD DATA INTO product
FROM FILE PATHS 'data/products.ssv'
FORMAT TEXT
(
	COMMENT = '--',
	DELIMITER = ';',
	ESCAPE = '`',
	INCLUDES HEADER = TRUE,
	NULL = '<null>',
	QUOTE = ''''
)
WITH OPTIONS (ON ERROR = PERMISSIVE)

Create Table w/ Kinetica Headers¶

A CSV file with Kinetica headers can be used to create the target table if it doesn't exist.

To load such a file of product data with the following conditions:

data file located at data/products.kh.csv, relative to the configured external files directory
data file contains a Kinetica header
target table named product_create_on_load
target table will be created if it doesn't exist
target table will have a chunk size of 1,000,000 records, if it is created by this load command

LOAD DATA INTO examples.product_create_on_load
FROM FILE PATHS 'data/products.kh.csv'
USING TABLE PROPERTIES (CHUNK SIZE = 1000000)

Dry Run¶

To perform a dry run of a load of a CSV file of product data located at data/products.csv, relative to the configured external files directory, into a table named product:

LOAD DATA INTO product
FROM FILE PATHS 'data/products.csv'
WITH OPTIONS (DRY RUN = TRUE)

Note

The dry run will return the number of records that would have been loaded, but not actually insert any records into the target table.

SQL Procedures¶

Kinetica provides support for basic SQL procedures, as an executable batch of SQL statements. A procedure can be executed by two means:

on-demand - procedure is called directly by a user
scheduled execution - procedure is configured, upon creation, to execute at a user-specified interval

Note

Even if a procedure is configured for scheduled execution, it can still be executed directly by a user in on-demand fashion.

After the first run, the execution plan for all statements in the procedure will be created and cached to improve performance on future executions.

If there is an error executing any statement in the procedure, the procedure will stop immediately and report the error. If the procedure is invoked via scheduled execution, an alert will be sent to the alert monitor, as there is no interactive session through which the error could be reported to a user. Any database modifications prior to the error will not be rolled back.

Note

If any of the database objects referenced in the procedure are dropped or modified, the procedure will be dropped as well. This does not include any objects created by the procedure that are later referenced by it.

The ability to manage & execute procedures is available through SQL, using the following commands:

CREATE PROCEDURE
EXECUTE PROCEDURE
DROP PROCEDURE
SHOW PROCEDURE

For procedure execute permission management, see:

GRANT Procedure Permission
REVOKE Procedure Permission

Supported Statements¶

The following statement types are allowed within a SQL procedure:

SELECT
INSERT (inserting from data file not supported)
UPDATE
DELETE
<CREATE | DROP | SHOW CREATE> SCHEMA
<CREATE | TRUNCATE | DROP | SHOW CREATE> TABLE
CREATE TABLE...AS
<CREATE [MATERIALIZED] | REFRESH | DROP | SHOW CREATE> VIEW
DESCRIBE
SHOW PROCEDURE
SHOW SECURITY [FOR <USER | ROLE>]
SHOW RESOURCE GROUP

Security¶

Permissions for managing procedures follow those for creating tables; e.g., if a user has the ability to create a table in a given schema, that user will also be able to create & drop procedures there.

Executing a procedure requires either the implicit execute permission that is granted to the creator of a procedure, or explicit execute permission, which can be granted to or revoked from any user or role, irrespective of whether the target user or role has the appropriate access to the database objects referenced within the SQL procedure.

Execute permission on a procedure also allows the grantee to see the contents of the procedure.

CREATE PROCEDURE¶

SQL procedures can be created with the following syntax:

CREATE [OR REPLACE] PROCEDURE [<schema name>.]<procedure name> [()]
[LANGUAGE SQL]
BEGIN
    <SQL statements>
END
[
    EXECUTE FOR EVERY <number> <SECOND[S] | MINUTE[S] | HOUR[S] | DAY[S]>
    [STARTING AT '<YYYY-MM-DD [HH:MM[:SS]]>']
]

Parameters	Description
`<schema name>`	Name of the schema in which this procedure will be created
`<procedure name>`	Name to give to the created procedure; must adhere to the supported naming criteria for tables, and cannot be named the same as any existing table or view
`<SQL statements>`	Semicolon-separated list of supported SQL statements. If the final statement in the procedure is an output-generating statement (SQL query, SHOW command, etc.), the output of that statement alone will be returned to the user; all other statements that generate output will be ignored.
`OR REPLACE`	Drop any existing procedure with the same name before creating this one
`LANGUAGE`	Optional language specification for the procedure. Only `SQL` is supported at this time.
`<number>`	Length of time, in the given number of units, between scheduled executions of the procedure. Fractional values are accepted.

For example, to create a sqlp procedure:

CREATE PROCEDURE EXAMPLES.sqlp
BEGIN
	CREATE OR REPLACE TABLE EXAMPLES.sqlp_table_tmp AS
	SELECT *
	FROM sqlp_table_ref;

	SELECT *
	FROM sqlp_table_tmp;
END

To create a sqlp_weekly procedure that executes once per week:

CREATE PROCEDURE sqlp_weekly
BEGIN
	CREATE OR REPLACE TABLE EXAMPLES.sqlp_table_tmp AS
	SELECT *
	FROM sqlp_table_ref;

	SELECT *
	FROM sqlp_table_tmp;
END 
EXECUTE FOR EVERY 7 DAYS
STARTING AT '2019-01-01 00:00:00'

EXECUTE PROCEDURE¶

SQL procedures can be executed on-demand with the following syntax:

EXEC[UTE] [PROCEDURE] [<schema_name>.]<procedure name> [()]

If the final statement in the procedure is an output-generating statement (SQL query, SHOW command, etc.), the output of that statement alone will be returned to the user; all other statements that generate output will be ignored.

If there is an error executing any statement in the procedure, the procedure will stop immediately and report the error. Any database modifications prior to the error will not be rolled back.

For example, to execute the sqlp procedure:

EXECUTE sqlp

DROP PROCEDURE¶

When removing a SQL procedure from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the table to drop doesn't exist; if IF EXISTS is specified, no error will be reported.

DROP PROCEDURE [IF EXISTS] [<schema_name>.]<procedure name>

For example, to drop the sqlp procedure:

DROP PROCEDURE sqlp

SHOW PROCEDURE¶

The content of a SQL procedure can be displayed with the following syntax:

SHOW PROCEDURE [<schema_name>.]<procedure name>

For example, to show the contents of the sqlp_weekly procedure:

SHOW PROCEDURE sqlp_weekly

Security¶

Kinetica provides basic table-level role-based access control for users. It also allows global read/write and administrative access to be granted. For details about Kinetica security, see Security Concepts.

Tip

A limited set of user/role information can be extracted via function calls. See User/Security Functions for details.

Security features accessible via SQL include:

User Management
Role Management
Privilege Management
Impersonation (Execute As...) - enables execution of commands as another user

User Management¶

The ability to manage user accounts is available through SQL, using the following commands:

CREATE USER
ALTER USER
DROP USER
SHOW SECURITY FOR User

See Users for details about user accounts within Kinetica.

CREATE USER¶

Users can be added to the system and assigned permissions either directly or via roles.

Optionally, the user can be associated with a resource group.

To add a user to the system, use this format:

CREATE USER <user name>
[ < [WITH] PASSWORD [=] | IDENTIFIED BY [PASSWORD] > '<user password>' ]
[ [WITH] RESOURCE GROUP [=] <group name> ]

Note

The password needs to be single-quoted and must not contain single quotes.

For example, two of the ways to create a new internal user with the user ID of jsmith and a password of secret are:

CREATE USER jsmith IDENTIFIED BY 'secret'
CREATE USER jsmith WITH PASSWORD 'secret'

To create an internal user with the user ID of jsmith, a password of secret, and assigning the memory_over_execution resource group:

CREATE USER jsmith
IDENTIFIED BY 'secret'
WITH RESOURCE GROUP memory_over_execution

To create a user with an existing external LDAP user, the user name should match the LDAP user name and be prepended with the @ symbol; no password is supplied, as the user will be externally authenticated:

CREATE USER "@jsmith"

ALTER USER¶

Any of the following facets of a user can be altered, either individually or as a group:

password
resource group

To alter an existing user, use this format:

ALTER USER <user name>
SET
    [ PASSWORD [=] '<user password>' ]
    [ RESOURCE GROUP [=] <group name | DEFAULT> ]

Note

The password needs to be single-quoted and must not contain single quotes.

For example, to alter a user with the user ID of jsmith, assigning a password of new_secret, and the memory_over_execution resource group:

ALTER USER jsmith
SET
    PASSWORD = 'new_secret'
    RESOURCE GROUP memory_over_execution

To unassign the memory_over_execution resource_group from the jsmith user:

ALTER USER jsmith
SET
    RESOURCE GROUP DEFAULT

DROP USER¶

Any user, other than the default users, can be removed from the system. Note that any database objects created by a user will remain when the user is removed.

To remove a user from the system, use this format:

DROP USER <user name>

For example, to drop an internal user jsmith:

DROP USER jsmith

To drop an external LDAP user jsmith:

DROP USER "@jsmith"

SHOW SECURITY FOR User¶

To list the permissions, roles, and assigned resource groups for one or more users:

SHOW SECURITY FOR <user name>,...

For example, to show the permissions, roles, and resource group for user jsmith:

SHOW SECURITY FOR jsmith

Role Management¶

The ability to manage roles is available through SQL, using the following commands:

CREATE ROLE
ALTER ROLE
DROP ROLE
SHOW SECURITY FOR Role

See Roles for details about roles within Kinetica.

CREATE ROLE¶

A new role can be created as a container for permissions or other roles, though both of those must be granted to the role after its creation.

Optionally, the role can be associated with a resource group.

To create a new role, use this format:

CREATE ROLE <role name>
[ [WITH] RESOURCE GROUP [=] <group name> ]

For example, to create an analyst role:

CREATE ROLE analyst

To create an executive role, assigning it the execution_over_memory resource group:

CREATE ROLE executive
WITH RESOURCE GROUP execution_over_memory

ALTER ROLE¶

A role can have its associated resource group modified.

To modify a role's resource group assignment, use this format:

ALTER ROLE <role name>
SET RESOURCE GROUP [=] <group name | DEFAULT>

For example, to assign the memory_over_execution resource group to the analyst role:

ALTER ROLE analyst
SET RESOURCE GROUP memory_over_execution

To unassign the resource group from the analyst role:

ALTER ROLE analyst
SET RESOURCE GROUP DEFAULT

DROP ROLE¶

Dropping a role will remove the associated permissions & roles granted through the role to all users with the role. Users & roles granted the same permissions either directly or via other roles will retain those permissions.

Any role, other than the default roles, can be removed from the system.

To drop an existing role, use this format:

DROP ROLE <role name>

For example, to drop the analyst role:

DROP ROLE analyst

SHOW SECURITY FOR Role¶

To list the permissions, roles, and assigned resource groups for one or more roles:

SHOW SECURITY FOR <role name>,...

To show the permissions, roles, and resource group for the analyst role:

SHOW SECURITY FOR analyst

Privilege Management¶

The ability to manage user & role privileges is available through SQL, using the following commands:

GRANT Role
GRANT System Permission
GRANT Table Permission
GRANT Procedure Permission
REVOKE Role
REVOKE System Permission
REVOKE Table Permission
REVOKE Procedure Permission
SHOW SECURITY

See Users for details about user accounts within Kinetica.

GRANT Role¶

Roles can be granted directly to users or other roles.

To grant a role:

GRANT <role name> TO <user name | role name>

For example, to grant a role allowing access to analyst tables to the analyst role, and then grant that analyst role to user jsmith:

GRANT analyst_table_access TO analyst
GRANT analyst TO jsmith

GRANT SYSTEM Permission¶

System-level permissions can be granted directly to users or roles.

To grant a system-level permission or a user-administration permission:

GRANT SYSTEM < ADMIN | READ | WRITE > TO <user name | role name>

GRANT USER ADMIN TO <user name | role name>

For example, to grant system administrator permission to jsmith:

GRANT SYSTEM ADMIN TO jsmith

To grant read access to all tables to the auditor role:

GRANT SYSTEM READ TO auditor

For example, to grant user administrator permission to jsmith:

GRANT USER ADMIN TO jsmith

GRANT Table Permission¶

Table-level permissions, which can be applied to schemas, tables, and views, can be granted directly to users or roles.

Row-level security can be invoked for SELECT privilege by specifying a WHERE clause expression.

Column-level security can be invoked for SELECT privilege by specifying a list of accessible columns and/or column security functions.

To grant a table-level permission:

GRANT < SELECT | INSERT | UPDATE | DELETE | ALL > [PRIVILEGES]
    ON [TABLE] <schema name | [<schema name>.]<table/view name>> [<column list>]
    TO <user name | role name>
    [WHERE <expression>]

Parameters	Description
`PRIVILEGES`	Optional keyword for SQL-92 compatibility
`TABLE`	Optional keyword for SQL-92 compatibility
`<schema name>`	Either the name of the schema to which access is being granted or the name of the schema containing the table to which access is being granted
`<table/view name>`	The name of the table to which access is being granted
`<column list>`	Optional comma-separated list of specific table columns and/or column security functions applied to columns on which to grant access; see Column-Level Security for details
`WHERE <expression>`	Optional filter expression on the table to which access is being granted, determining to which rows of the target table the grantee will be given access; see Row-Level Security for details

Wildcards (*) can replace either the schema name, table/view name, or both, to specify all database objects within a given domain:

Wildcard Form	Selected Objects
`*`	All schemas
`SCHEMA_NAME.*`	All tables/views under the schema named `SCHEMA_NAME`
`*.TABLE_NAME`	All tables/views named `TABLE_NAME`, irrespective of schema
`.`	All tables/views under all schemas

Note

The ALL permission corresponds to the native table_admin permission, which gives full read/write access as well as the additional permission to ALTER and DROP the table.

Examples¶

For example, to grant full access on the network_config table to jsmith:

GRANT ALL PRIVILEGES ON TABLE network_config TO jsmith

To grant SELECT access on the network_config_history table to the analyst role:

GRANT SELECT ON archive.network_config_history TO analyst

To grant SELECT access on all schemas to the auditor role:

GRANT SELECT ON * TO auditor

To grant SELECT access on all tables within the archive schema to the analyst role:

GRANT SELECT ON archive.* TO analyst

To grant full access on all schemas and tables to the dbadmin role:

GRANT ALL ON *.* TO dbadmin

Row-Level Security Examples¶

To grant SELECT access on an rx_order table for all orders placed since the year 2002:

GRANT SELECT ON rx_order
	TO zanalyst
	WHERE order_ts >= '2002-01-01'

To grant SELECT access on an rx_order table for only orders belonging to the current user:

GRANT SELECT ON rx_order
	TO rx_user
	WHERE name = USER()

Column-Level Security Examples¶

To grant SELECT access on the following columns of an rx_order table:

obfuscated version of the orderer's social security number
name of the perscription ordered
order date/time

GRANT SELECT ON rx_order(HASH(ssn), rx_name, order_ts) TO zanalyst

To grant SELECT access on the following columns of an rx_order table:

masked version of the orderer's social security number (last 4 digits)
name of the orderer
name of the perscription ordered
order date/time

GRANT SELECT ON rx_order(MASK(ssn, 1, 6), name, rx_name, order_ts) TO rx_user

GRANT Procedure Permission¶

Procedure-level permissions, which allow execution of SQL procedures, can be granted directly to users or roles.

To grant the permission to execute a SQL procedure:

GRANT EXECUTE PROCEDURE ON <procedure name> TO <user name | role name>

For example, to grant execute on the sqlp_weekly procedure to jsmith:

GRANT EXECUTE PROCEDURE ON sqlp_weekly TO jsmith

REVOKE Role¶

Roles can be revoked from users or other roles.

To revoke a role:

REVOKE <role name> FROM <user name | role name>

For example, to revoke a role allowing access to analyst tables from the analyst role:

REVOKE analyst_table_access FROM analyst

To revoke the analyst role from user jsmith:

REVOKE analyst FROM jsmith

REVOKE SYSTEM Permission¶

System-level permissions can be revoked from users or roles.

To revoke a system-level permission or a user-administration permission:

REVOKE SYSTEM < ADMIN | READ | WRITE > FROM <user name | role name>

REVOKE USER ADMIN FROM <user name | role name>

For example, to revoke system administrator permission from jsmith:

REVOKE SYSTEM ADMIN FROM jsmith

To revoke write access to all tables from the auditor role:

REVOKE SYSTEM WRITE FROM auditor

For example, to revoke user administrator permission from jsmith:

REVOKE USER ADMIN FROM jsmith

REVOKE Table Permission¶

Table-level permissions, which can be applied to schemas, tables, and views, can be revoked from users or roles.

Row-level security can be revoked for SELECT privilege only by revoking SELECT privilege from the entire table.

Note

Access to individual rows cannot be revoked; instead, full SELECT access should be revoked, followed by a grant of access to the appropriate rows.

Column-level security can be revoked for SELECT privilege by specifying a list of columns for which access will be revoked, regardless of any security functions that may have been applied to the column when granting access (HASH, MASK).

To revoke a table-level permission:

REVOKE < SELECT | INSERT | UPDATE | DELETE | ALL > [PRIVILEGES]
    ON [TABLE] <schema name | [<schema name>.]<table/view name>> [<column list>]
    FROM <user name | role name>

Parameters	Description
`PRIVILEGES`	Optional keyword for SQL-92 compatibility
`TABLE`	Optional keyword for SQL-92 compatibility
`<schema name>`	Either the name of the schema to which access is being revoked or the name of the schema containing the table to which access is being revoked
`<table/view name>`	The name of the table to which access is being revoked
`<column list>`	Optional comma-separated list of specific table columns to which access is being revoked; see Column-Level Security for details

Wildcards (*) can replace either the schema name, table/view name, or both, to specify all database objects within a given domain:

Wildcard Form	Selected Objects
`*`	All schemas
`SCHEMA_NAME.*`	All tables/views under the schema named `SCHEMA_NAME`
`*.TABLE_NAME`	All tables/views named `TABLE_NAME`, irrespective of schema
`.`	All tables/views under all schemas

Note

The ALL permission corresponds to the native table_admin permission, which gives full read/write access as well as the additional permission to ALTER and DROP the table.

Examples¶

For example, to revoke full access on the network_config table from jsmith:

REVOKE ALL PRIVILEGES ON TABLE network_config FROM jsmith

To revoke DELETE access on the network_config_history table from the analyst role:

REVOKE DELETE ON archive.network_config_history FROM analyst

To revoke DELETE access on all schemas from the auditor role:

REVOKE DELETE ON * FROM auditor

To revoke UPDATE access on all tables within the archive schema from the analyst role:

REVOKE UPDATE ON archive.* FROM analyst

To revoke full access on all schemas and tables from the dbadmin role:

REVOKE ALL ON *.* FROM dbadmin

Row-Level Security Examples¶

To revoke SELECT access on an rx_order table for all orders placed since the year 2002 (as was granted in the row-level grant example), revoke all SELECT access:

REVOKE SELECT ON rx_order FROM zanalyst

Column-Level Security Examples¶

To revoke SELECT access on the following columns of an rx_order table:

orderer's social security number

REVOKE SELECT ON rx_order(ssn) FROM zanalyst

To revoke SELECT access on the following columns of an rx_order table:

masked version of the orderer's social security number (last 4 digits)
name of the perscription ordered

REVOKE SELECT ON rx_order(ssn, rx_name) FROM rx_user

REVOKE Procedure Permission¶

Permissions relating to SQL procedures can be revoked from users or roles.

To grant the permission to execute a SQL procedure:

REVOKE EXECUTE PROCEDURE ON <procedure name> FROM <user name | role name>

For example, to revoke execute on the sqlp_weekly procedure from jsmith:

REVOKE EXECUTE PROCEDURE ON sqlp_weekly FROM jsmith

SHOW SECURITY¶

To list the permissions, roles, and assigned resource groups for one or more users and/or roles (or all users and roles in the system):

SHOW SECURITY [FOR <user name | role name>,...]

For example, to show the permissions, roles, and resource group for user jsmith:

SHOW SECURITY FOR jsmith

To show the permissions, roles, and resource group for the analyst role:

SHOW SECURITY FOR analyst

To show the permissions, roles, and resource groups for the anonymous user and the public role:

SHOW SECURITY FOR anonymous, public

To show all users & roles:

SHOW SECURITY

Impersonation (Execute As...)¶

When logged into the database as one user, SQL commands can be run as a second user (that user can be impersonated). To do this, first log in to the database as an administrator, then switch to the second user, and finally, run the command. Afterwards, the original user can be reverted back to in order to execute commands as that administrator again; alternatively, an option can be specified to prevent switching back to the original user.

The ability to impersonate other users is available through SQL, using the following commands:

EXECUTE AS
REVERT
SET USER

Important

Only a user with administrator privilege can impersonate another user, and only an administrator with a Kinetica database account. Any users who are mapped to Kinetica as administrators via external authentication will need to ensure that a corresponding Kinetica administrator account has been created for impersonation to succeed.

Additionally, impersonation is only supported via ODBC/JDBC.

EXECUTE AS¶

Switching to another user can be done using the following syntax:

EXEC[UTE] AS USER = '<user name>' [WITH NO REVERT]

The WITH NO REVERT option prevents the impersonated user from switching back to the original user. This is useful in several cases, including:

An administrator needs to run a script as a user with specific permissions, but doesn't want the script to be able to revert back to administrator privileges during its execution.
An application server connects to the database with its own credentials, but issues commands to the database on behalf of an application user. Again, there is a need to prevent the application user from reverting back to the application server's database account and using those privileges to execute commands.

Example¶

To demonstrate the uses of EXECUTE AS USER, the following commands can be issued upon logging into the database as an administrator.

First, verify the current user is admin:

SELECT CURRENT_USER() AS whoami

+----------+
| whoami   |
+----------+
| admin    |
+----------+

Then, switch to a user auser and verify the session is now executing commands as that user:

EXECUTE AS USER = 'auser'
SELECT CURRENT_USER() AS whoami

+----------+
| whoami   |
+----------+
| auser    |
+----------+

Accessing a database object that auser has permission for results in a successful outcome:

SELECT *
FROM auser_can_access

+------+
|   id |
+------+
|    1 |
+------+

Accessing an object that auser doesn't have permission for results in an error:

SELECT *
FROM auser_cant_access

[GPUdb]executeSql: Error: 'SqlPlanExecutor: Access denied for query object(s):Access denied (S/SDc:246); code:1 'Error' in Job process'

To switch back to the original user and verify the current identity, use the REVERT command:

REVERT
SELECT CURRENT_USER() AS whoami

+----------+
| whoami   |
+----------+
| admin    |
+----------+

Now, switch to auser using the WITH NO REVERT option:

EXECUTE AS USER = 'auser' WITH NO REVERT
SELECT CURRENT_USER() AS whoami

+----------+
| whoami   |
+----------+
| auser    |
+----------+

Attempts to switch back to the administrator user will fail:

REVERT

GeneralException: May not revert user after WITH NO REVERT

Verify the session is still executing commands as the auser user:

SELECT CURRENT_USER() AS whoami

+----------+
| whoami   |
+----------+
| auser    |
+----------+

REVERT¶

After using EXECUTE AS or SET USER to switch to another user, the original user can be switched back to using the following command:

REVERT

See the EXECUTE AS example for an example usage of REVERT.

SET USER¶

The SET USER (or SETUSER) command is simply an alias for the EXECUTE AS and REVERT commands.

The following commands are equivalent for switching to another user:

SET USER <user name>
EXECUTE AS USER = '<user name>'

Important

There is no ability to prevent reversion to the original user when using the SET USER command. To prevent reversion, see EXECUTE AS.

The following commands are equivalent for switching back to the original user:

SET USER
REVERT

Resource Management¶

Kinetica provides a SQL interface for managing resource groups, which can be used to limit memory and thread usage, and execution priority for a user or group of users.

The ability to manage resource groups is available through SQL, using the following commands:

CREATE RESOURCE GROUP
ALTER RESOURCE GROUP
DROP RESOURCE GROUP
SHOW RESOURCE GROUP

See Security for details on assigning resource groups to users and roles.

CREATE RESOURCE GROUP¶

Resource groups require only a name to be created, all limits are optional. Resource group names are case-sensitive and must contain only letters, digits, and underscores, and cannot begin with a digit. A resource group must also not be named the same as an existing resource group, including the default resource groups: kinetica_system_resource_group & kinetica_default_resource_group.

To create a new resource group, use this format:

CREATE RESOURCE GROUP <resource group name>
[ RANK <FIRST | LAST | BEFORE <group name> | AFTER <group name>> ]
[
    TIER LIMITS
    (
        <tier name> USING (<tier property name>=<tier property value>[,...]),
        ...
        <tier name> USING (<tier property name>=<tier property value>[,...])
    )
]
[ WITH OPTIONS (<resource property name>=<resource property value>[,...]) ]

For example, to create a simple unlimited resource group with default settings:

CREATE RESOURCE GROUP unlimited

To create a memory_over_execution resource group, with high tier capacity limits, but low execution options:

CREATE RESOURCE GROUP memory_over_execution
TIER LIMITS
(
    VRAM USING (max_memory=10737418240),
    RAM USING (max_memory=107374182400)
)
WITH OPTIONS
(
    max_cpu_concurrency=5,
    max_scheduling_priority=30,
    max_tier_priority=4
)

To create a execution_over_memory resource group, with lower tier capacity limits, but higher execution options, as compared to the memory_over_execution group:

CREATE RESOURCE GROUP execution_over_memory
RANK BEFORE memory_over_execution
TIER LIMITS
(
    VRAM USING (max_memory=1073741824),
    RAM USING (max_memory=10737418240)
)
WITH OPTIONS
(
    max_cpu_concurrency=7,
    max_scheduling_priority=70,
    max_tier_priority=10
)

Note

The execution_over_memory group is given a higher rank than the memory_over_execution group, so a user with no direct resource group assignment who is assigned these two resource groups (via separate roles) will effectively be assigned to the execution_over_memory group.

ALTER RESOURCE GROUP¶

Any of the following facets of a resource group can be altered, either individually or as a group:

Rank
Tier limits
Execution options

To alter an existing resource group, use this format:

ALTER RESOURCE GROUP <resource group name>
[ RANK <FIRST | LAST | BEFORE <group name> | AFTER <group name>> ]
[
    TIER LIMITS
    (
        <tier name> USING (<tier property name>=<tier property value>[,...]),
        ...
        <tier name> USING (<tier property name>=<tier property value>[,...])
    )
]
[ WITH OPTIONS (<resource property name>=<resource property value>[,...]) ]

For example, to alter an existing memory_and_execution resource group, moving it to the highest ranking, while assigning new limits for the RAM tier and max_cpu_concurrency & max_scheduling_priority options, but leaving other existing settings untouched:

ALTER RESOURCE GROUP memory_and_execution
RANK FIRST
TIER LIMITS
(
    RAM USING (max_memory=214748364800)
)
WITH OPTIONS
(
    max_cpu_concurrency=9,
    max_scheduling_priority=90
)

DROP RESOURCE GROUP¶

An existing user-defined resource group can be removed from the system, which will also dissociate the group from any users or roles to which it was assigned.

To remove a resource group, use the following format:

DROP RESOURCE GROUP <resource group name>

For example, to remove the unlimited resource group:

DROP RESOURCE GROUP unlimited

SHOW RESOURCE GROUP¶

The configuration of any resource group can be shown, in tabular form. The configuration for all resource groups in the system can also be shown with a single command.

To show the configurations of resource groups, use the following format:

SHOW RESOURCE GROUP <resource group name | ALL>

For example, to show the configuration of the memory_and_execution resource group:

SHOW RESOURCE GROUP memory_and_execution

To show the configuration of all resource groups:

SHOW RESOURCE GROUP ALL

System Properties¶

Kinetica provides a SQL interface for managing system properties, which control everything from GPUs, hostnames, storage, graph server, and more.

The ability to manage system properties is available through SQL, using the following commands:

ALTER SYSTEM SET PROPERTIES
SHOW SYSTEM PROPERTIES

ALTER SYSTEM SET PROPERTIES¶

Only a small subset of the system properties can be altered at runtime. See the /alter/system/properties endpoint documentation for the full list. Only users with the system_admin permission can issue this command.

To alter a system property, use this format:

ALTER SYSTEM SET PROPERTIES ('<property key>' = '<property value>'[,...])

For example, to increase the request timeout and the maximum /get/records size:

ALTER SYSTEM SET PROPERTIES (
    'request_timeout' = '25', 
    'max_get_records_size' = '25000'
)

SHOW SYSTEM PROPERTIES¶

All system properties can be shown by any user.

SHOW SYSTEM PROPERTIES

Limitations¶

Kinetica has some limitations for any columns marked as store-only and string columns lacking a charN attribute.

Unlimited-Width Strings¶

String columns with no charN attribute are stored in whole on disk and in hashed form in memory. Because the strings are only available for processing as hashes, only equality-based operations can be applied to them.

Partially Applicable Operations¶

Scalar Column Functions
- Null Functions (except ZEROIFNULL )
- Conditional Functions (equality-based)
Aggregate Column Functions
- COUNT
- COUNT DISTINCT
WHERE (predicate condition must be equality-based)
JOIN (join condition must be equality-based)

Inapplicable Operations¶

Subqueries requiring a temporary view to be created
GROUP BY
UNION
INTERSECT
EXCEPT
CREATE TABLE...AS

Store-Only Columns¶

Columns marked store-only are only stored on disk, not in memory. Because they are not available for processing, only data extraction operations can be applied to them.

Inapplicable Operations¶

Scalar or Aggregate Column Functions
Subqueries requiring a temporary view to be created
WHERE
GROUP BY
JOIN
UNION
INTERSECT
EXCEPT
CREATE TABLE...AS

Correlated Subqueries¶

Correlated subqueries have the following limitations:

They cannot reference grouping columns in the parent query
They cannot reference tables beyond the immediate outer query; i.e., a table cannot be referenced in a correlated subquery that is two or more levels of nesting deeper than it is
They cannot contain disjunctive conditions
They cannot be part of an OUTER JOIN ON clause condition

Not Yet Supported SQL¶

Windowing Features
- Frame Exclusion (EXCLUDE)
Ranking Functions
- PERCENTILE_DISC
- PERCENTILE_CONT
Writing CSV files

Reserved Words¶

For a list of the reserved words within Kinetica see SQL Reserved Word List.

Table Of Contents

SQL Support¶

Query¶

Tableless Query¶

Join¶

ASOF¶

Examples¶

Aggregation¶

Grouping¶

ROLLUP¶

CUBE¶

GROUPING SETS¶

Window¶

PIVOT¶

UNPIVOT¶

Set Operations¶

UNION¶

INTERSECT¶

EXCEPT¶

WITH (Common Table Expressions)¶

Iteration¶

Constants¶

Numeric Constants¶

String-Based Constants¶

Binary Constants¶

Date/Time Constants¶

Expressions¶

Conditional Functions¶

CASE¶

Conversion Functions¶

Date/Time Functions¶

Date/Time Base Functions¶

Date/Time Complex Conversion Functions¶

Geospatial/Geometry Functions¶

Enhanced Performance Scalar Functions¶

Scalar Functions¶

Aggregation Functions¶

Math Functions¶

Null Functions¶

String Functions¶

User/Security Functions¶

Aggregation Functions¶

Grouping Functions¶

Distribution Functions¶

Sharding Example¶

Replication Example¶

Predicates¶

Predicate Operators¶

Predicate Clauses¶

Compound Predicate Operators¶

Subqueries¶

Non-Correlated Subqueries¶

Correlated Subqueries¶

Hints¶

EXPLAIN¶

Data Definition (DDL)¶

Column Types¶

Column Properties¶

CREATE SCHEMA¶

ALTER SCHEMA¶

Rename Schema¶

Set Protection Mode¶

Set TTL¶

DROP SCHEMA¶

CREATE TABLE¶

Partition Clause¶

Range Partitioning¶

Interval Partitioning¶

List Partitioning¶

Manual¶

Automatic¶

Hash Partitioning¶

Tier Strategy Clause¶

Index Clause¶

Option Clause¶

CREATE TABLE...AS¶

ALTER TABLE¶

Rename Table¶

Move Table¶

Set Access Mode¶