Kinetica has broad support for the SQL-92 standard through its ODBC connector interface. For details on installation, configuration, logging, & use, see the ODBC/JDBC Connector Guide section.
Name Resolution
Any table, view, or SQL procedure can be addressed using a qualified name, by prefixing the name of the object with the name of its containing schema, separated by a dot; e.g.:
|
|
If these objects are referenced without a schema, they will be looked for in the user's default schema, if one has been assigned.
The default schema can be overridden within an ODBC/JDBC session by issuing the SET CURRENT SCHEMA command. Once set, the new schema will be used to resolve unqualified names instead of the default schema--the user's default schema will be ignored for the purposes of name resolution.
Naming Criteria
Each table and view is identified by a name, which must meet the standard naming criteria.
However, in SQL, a pound sign used at the beginning of a table/view name will automatically be replaced by the current user's username, followed by an underscore. This provides a shorthand way for a user to create & reference namespaced tables within a common schema.
For instance, if a user, auser, issues the following query:
|
|
The query will be translated into this and executed:
|
|
Query
The basic form of the supported SELECT statement is:
|
|
Note
The * can be used to specify all columns in the column expression list, while <table name>.* can be used to specify all columns from the given table.
Columns can be double-quoted to use reserved words as column names, e.g., "PERCENT"; or to use special characters in column names, e.g., "key:value".
TOP <n> returns the first n records (up to 20000 records by default), but is configurable.
The grouping expression list may contain column names, aliases, expressions, or positions (e.g., GROUP BY 2 to aggregate on the 2nd column in the SELECT list).
The having expression list may contain grouping expressions or any grouping expression aliases defined in the SELECT list.
The ordering expression list may contain column names, expressions, or column positions (e.g., GROUP BY 2 to aggregate on the 2nd column in the SELECT list). The default ordering is ASC. The default null ordering is NULLS FIRST when using ascending order and NULLS LAST when using descending order. The general format for each comma-separated ordering expression in the list is:
1
<column name/alias/expression/position> [ASC | DESC] [NULLS FIRST | NULLS LAST]
LIMIT applies paging to the result set, starting at the 0-based offset (if specified) and returning num rows records.
For example:
|
|
Tableless Query
A query without a FROM clause can be used to return a single row of data containing a constant or expression.
For example, to select the current day of the week:
|
|
Note
A tableless query will create a result set backed by a replicated table, by default.
Join
The supported join types are:
- INNER - matching rows between two tables
- LEFT - matching rows between two tables, and rows in the left-hand table with no matching rows in the right-hand table
- RIGHT - matching rows between two tables, and rows in the right-hand table with no matching rows in the left-hand table
- FULL OUTER matching rows between two tables, and rows in both tables with no matching rows in the other
- CROSS - all rows in one table matched against all rows in the other
There are two execution schemes that are used to process joins, depending on the distribution of the joined tables:
- Local - highly performant, but native join criteria must be met
- Distributed - highly flexible, as native join restrictions are lifted, but less performant due to interprocessor communication overhead
Important
Though the data distribution restrictions on native joins do not exist for joins made via SQL, following the join guidelines on sharding will result in much more performant queries.
Kinetica supports both JOIN...ON and WHERE clause syntax for inner joins; all outer join types (LEFT, RIGHT, & FULL OUTER) require JOIN...ON syntax.
For example, to list the name of each employee and the name of the employee's manager, using the WHERE clause to specify the join condition:
|
|
To list the name of each employee and the associated manager, even for employees that don't have a manager, using the JOIN...ON syntax to specify the join condition:
|
|
ASOF
Kinetica supports the notion of an inexact match join via the ASOF join function. This feature allows each left-side table record to be matched to a single right-side table record whose join column value is the smallest or largest value within a range relative to the left-side join column value. In the case where multiple right-side table records have the same smallest or largest value for a given left-side table record, only one of the right-side table records will be chosen and returned as part of the join.
The format of the ASOF function is as follows:
|
|
The five parameters are:
- left_column - name of the column to join on from the left-side table
- right_column - name of the column to join on from the right-side table
- rel_range_begin - constant value defining the position, relative to each left-side column value, of the beginning of the range in which to match right-side column values; use a negative constant to begin the range before the left-side column value, or a positive one to begin after it
- rel_range_end - constant value defining the position, relative to each left-side column value, of the end of the range in which to match right-side column values; use a negative constant to end the range before the left-side column value, or a positive one to end after it
- MIN|MAX - use MIN to return the right-side matched record with the smallest join column value; use MAX to return the right-side matched record with the greatest join column value
Effectively, each matched right-side column value must be:
- >= <left-side column value> + rel_range_begin
- <= <left-side column value> + rel_range_end
Within the set of right-side matches for each left-side record, the one with the MIN or MAX column value will be returned in the join. In the case of a tie for the MIN or MAX column value, only one right-side record will be selected for return in the join for that left-side record.
Examples
The following ASOF call might be used to list, for each flight arrival time, the soonest flight departure time that occurs between half an hour and an hour and a half after the arrival; effectively, the time-matching portion of a connecting flight query:
|
|
This ASOF call returns right-side locations that are nearest eastward to each left-side location, for locations within 5 degrees of the left-side:
|
|
For example, to match a set of stock trades to the opening prices for those stocks (if an opening price record exists within 24 hours prior to the trade), and to include trades for which there is no opening stock price record:
|
|
While the ASOF join function can only be used as part of a join, it can effectively be made into a filter condition by sub-selecting the filter criteria in the FROM clause and joining on that criteria.
For instance, to look up the stock price for a given company as of a given date:
|
|
Important
The use of the KI_HINT_NO_LATE_MATERIALIZATION is key, here, as the join requires a materialized table, which this hint ensures, to succeed.
Aggregation
The GROUP BY clause can be used to segment data into groups and apply aggregate functions over the values within each group. Aggregation functions applied to data without a GROUP BY clause will be applied over the entire result set.
Note
GROUP BY can operate on columns, column expressions, column aliases, or the position of a member of the SELECT clause (where 1 is the first element).
For example, to find the average cab fare from the taxi data set:
|
|
To find the minimum, maximum, & average trip distances, as well as the average passenger count for each vendor per year from the taxi data set (weeding out data with errant trip distances):
|
|
Grouping
The GROUP BY clause can also be used to apply the following grouping functions over the values within each group:
With each of these, the GROUPING() aggregate function can be used to distinguish aggregated null values in the data from null values generated by the ROLLUP, CUBE, or GROUPING SETS grouping function.
For instance, the following CASE will turn the aggregated null values in the Sector column into an <UNKNOWN SECTOR> group and the null value generated by the grouping function into an <ALL SECTORS> group:
|
|
ROLLUP
The ROLLUP(expr list) function calculates n + 1 aggregates for n number of columns in expr list.
For example, the following query will aggregate the average opening stock price for these groups:
- Each market sector & stock symbol pair
- Each market sector
- All sectors and symbols
|
|
CUBE
The CUBE(expr list) function calculates 2n aggregates for n number of columns in expr list.
For example, the following query will aggregate the average opening stock price for these groups:
- Each market sector & stock symbol pair
- Each market sector
- Each stock symbol
- All sectors and symbols
|
|
GROUPING SETS
The GROUPING SETS(expr list) function calculates aggregates for each group of columns in expr list.
For example, the following query will aggregate the average opening stock price for these groups:
- Each market sector
- Each stock symbol
- All sectors and symbols
|
|
Window
Window functions are available through the use of the OVER clause, which can partition rows into frames. Different types of functions can be used to aggregate data over a sliding window.
The basic form for a window is:
|
|
Note
- If the PARTITION BY clause is not specified, the window function will be computed over the entire data set.
- The ORDER BY clause is not required when the window function is either FIRST_VALUE() or LAST_VALUE(). These two functions are also the only ranking functions that can contain a RANGE or ROWS frame clause.
The FIRST_VALUE(), LAST_VALUE(), LEAD(), and LAG() window functions support ignoring or respecting null values in the window calculation using the IGNORE NULLS or RESPECT NULLS syntax respectively. The general format for applying this additional syntax is:
|
|
The default ordering of records within each partition is ASC. The default null ordering is NULLS FIRST when using ascending order and NULLS LAST when using descending order. The general format for each comma-separated ordering expression in the list is:
|
|
Note
Only one column can be specified in the ordering expression list when using RANGE. When using ROWS, the frame is applied after any ordering; so, while several columns may appear in the order expression list, there will be only one ROWS clause following the list.
When a RANGE frame is specified, CURRENT ROW includes all peer rows (rows with the same ordering values). Thus, when the first of a set of peer rows is encountered, all associated peer rows are included in the frame (not just the first one).
In contrast, when a ROWS frame is specified, CURRENT ROW will direct that only the peer rows up to and including the current row are contained within the frame--the following peer rows will not be included.
The default frame type is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
If a frame clause is specified without a BETWEEN, the clause is applied to the frame start; the frame end will still be the default of CURRENT ROW.
For example, to calculate the rolling sum of total amounts collected by each taxi vendor over the course of a given day, as well as the number of other trips that occurred within 5 minutes of each trip:
|
|
To calculate a 5-before and 10-after moving average of 4-passenger trip distances per vendor over the course of a given day:
|
|
To rank, by vendor, the total amounts collected from 3-passenger trips on a given day:
|
|
To compare each trip's total amount to the lowest (ignoring nulls), highest (ignoring nulls), & average total amount for 5-passenger trips for each vendor over the course of a given day:
|
|
To compare each vendor's average total amount to their average total amount within the interquartile range:
|
|
PIVOT
The PIVOT clause can be used to pivot columns, "rotating" column values into multiple columns (one for each value), creating wider and shorter denormalized tables from longer, more normalized tables.
The basic form for a pivot is:
|
|
For example, given a source table phone_number, which lists each phone number for a customer as a separate record in the table, a pivot operation can be performed, creating a single record per customer with the home, work, & cell phone numbers as separate columns.
With this data:
|
|
The following pivot operation can be applied:
|
|
The data will be pivoted into a table like this:
|
|
UNPIVOT
The UNPIVOT clause can be used to unpivot columns, "rotating" row values into column values, creating longer, more normalized tables from shorter, more denormalized tables.
The basic form for an unpivot is:
|
|
For example, given a source table customer_contact, which lists the home, work, & cell phone numbers for each customer in the table, an unpivot operation can be performed, creating separate home, work, & cell phone records for each customer.
With this data:
|
|
The following unpivot operation can be applied:
|
|
The data will be unpivoted into a table like this:
|
|
Note
If the original column names can be used as the values of the unpivot key, as is, the pre-selection and renaming of those columns using a subquery in the FROM clause can be eliminated.
For example, unpivoting without aliasing the quarterly grade columns will result in those exact column names being used as the quarter values:
|
|
|
|
Set Operations
There are three types of supported set operations, each having the option of returning duplicate records in the result set by using the keyword ALL:
- UNION [ALL] - return all records from both source data sets
- INTERSECT [ALL] - return only records that exist in both source data sets
- EXCEPT [ALL] - return all records that exist in the first data set, but not in the second
UNION
The UNION set operator creates a single list of records from the results of two SELECT statements. Use the ALL keyword to keep all records from both sets; omit it to remove duplicate records and form a single list of records unique between the two sets. See Limitations and Cautions for limitations.
|
|
For example, given a table of lunch menu items and another table of dinner menu items, a UNION can be used to return all unique lunch & dinner menu items together, including items that are the same on both menus, but of a different price:
|
|
Note
Since the example includes price and all columns selected must match between the two sets for an item to be considered a duplicate, a lunch item that is priced differently as a dinner item would also appear in the result set.
A UNION ALL can be used to return all lunch & dinner menu items together, including duplicates:
|
|
INTERSECT
The INTERSECT set operator creates a single list of records that exist in both of the result sets from two SELECT statements. Use the ALL keyword to keep duplicate records that exist in both sets; omit it to remove duplicate records and form a single list of records that exist in both sets. See Limitations for limitations.
|
|
For example, given a table of lunch menu items and another table of dinner menu items, an INTERSECT can be used to return all lunch menu items (excluding duplicates) that are also dinner items for the same price:
|
|
Note
Since the example includes price and all columns selected must match between the two sets for an item to be included, a lunch item that is priced differently as a dinner item would not appear in the result set.
EXCEPT
The EXCEPT set operator performs set subtraction, creating a single list of records that exist in the first SELECT statement's result set, but not in the second SELECT statement's result set. Use the ALL keyword to keep duplicate records that exist in the first set but not in the second; omit it to remove duplicate records and form a single list of records that exist in the first set but not the second. See Limitations for limitations:
|
|
For example, given a table of lunch menu items and another table of dinner menu items, an EXCEPT can be used to return all lunch menu items (excluding duplicates) that are not also dinner items for the same price:
|
|
Note
Since the example includes price and all columns selected must match between the two sets for an item to be eliminated, a lunch item that is priced differently as a dinner item would still appear in the result set.
WITH (Common Table Expressions)
The WITH operation, also known as a Common Table Expression (CTE) creates a set of data that can be assigned table & column aliases and used one or more times in subsequent operations. The aliased set can be used within the SELECT, FROM, or WHERE clauses of a subsequent query or a subsequent CTE within the same WITH operation.
Recursive WITH operations are not supported--the aliased set cannot refer to itself. The column aliases must be unique within the WITH statement--no other column or column alias can be similarly named, for example. Also, when used in a FROM clause and given a table alias, the table alias must be preceded with AS.
A CTE can be made available to a DML or DDL statement by having the WITH statement follow the CREATE TABLE...AS, INSERT, UPDATE, or DELETE statement (not precede it).
Each CTE definition within a WITH statement is structured as follows:
|
|
Here, <cte name> is the table alias given to the data set returned by the CTE, and <column alias list> is the list of aliases assigned to the columns that the CTE returns. Each column alias is matched to each column returned in the order that each were defined; e.g., for column aliases (A, B, C) used with a CTE which performs a SELECT x, y, z ..., alias A will reference column x, B will reference y, and C will reference z. If no aliases are used, the names of the source columns themselves can be used to reference the data set returned by the CTE.
Each WITH statement can contain one or more CTE definitions, followed by a SELECT statement, as shown here:
|
|
For example:
|
|
To apply the CTE to an INSERT statement, follow the INSERT clause with the WITH clause:
|
|
Iteration
Kinetica supports iteration over each record within a data set for the purpose of creating a result set with 0 to N result records per record in the original set.
This iteration can be variable, based on some value within each record, or fixed, based on a given constant value.
The iteration is performed by joining against the virtual ITER table, as follows:
|
|
The <column expression> can be replaced by a constant for fixed iteration.
For example, to extract all of the individual letters from a column of words, with one record per letter extracted (using variable iteration):
|
|
To duplicate the set of words five times (using fixed iteration):
|
|
For more detail, examples, and limitations, see Iteration.
Constants
Each data type has an associated literal constant syntax, which can be used, for instance, to insert constant data values into those columns.
Numeric Constants
Integer and floating point data types can be either single-quoted or not.
For example:
|
|
String-Based Constants
String-based data types should be single-quoted.
For example:
|
|
Binary Constants
Binary types can be represented in either of the following forms:
- single-quoted or unquoted base-10
- single-quoted hexadecimal
For example:
|
|
Date/Time Constants
Kinetica accepts unqualified single-quoted date/time values, ANSI SQL, and ODBC escape sequences in the following formats:
Data Type | Native | ANSI | ODBC |
---|---|---|---|
Date | 'YYYY-[M]M-[D]D' | DATE 'YYYY-MM-DD' | {d 'YYYY-MM-DD'} |
Time | '[H]H24:MI:SS.mmm' | TIME 'HH24:MI:SS.mmm' | {t 'HH24:MI:SS.mmm'} |
DateTime | 'YYYY-[M]M-[D]D[T| ][H]H24:MI:SS.mmm[Z]' | TIMESTAMP 'YYYY-MM-DD HH24:MI:SS.mmm' | {ts 'YYYY-MM-DD HH24:MI:SS.mmm'} |
Timestamp | 'YYYY-[M]M-[D]D [H]H24:MI:SS.mmm' | TIMESTAMP 'YYYY-MM-DD HH24:MI:SS.mmm' | {ts 'YYYY-MM-DD HH24:MI:SS.mmm'} |
For example:
- Native:
|
|
- ANSI:
|
|
- ODBC:
|
|
Expressions
An expression can consist of a literal constant, a column name, or a function applied to a constant or column name. A compound expression is an operation or function applied to one or more expressions.
The following are the supported expression operators:
- + addition
- - subtraction
- * multiplication
- / division
- () grouping
- || string concatenation
Note
Use double quotes to specify column names in a case-sensitive manner.
Conditional Functions
Conditional functions are subject to short-circuiting to aid in error-checking.
Function | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
DECODE(expr, match_a, value_a, ..., match_N, value_N[, unmatched_value]) | Evaluates expr: returns the first value whose corresponding match is equal to expr; returns the optional unmatched_value (or null), if no match is found | ||||||||
IF(expr, value_if_true, value_if_false) | Evaluates expr: if true, returns value_if_true; otherwise, if false or null, returns value_if_false
|
CASE
The case statement acts as a scalar function, but has two more complex forms. Note that for each of these CASE statements, the value expressions must all be of the same or convertible data type.
In the first form, each WHEN is followed by a conditional expression whose corresponding THEN expression will have its value returned, if true. Control will continue through each WHEN until a match is found and the corresponding value returned; if no match is found, the value of the ELSE expression will be returned, or null, if no ELSE clause exists.
|
|
In the second form, the CASE expression is evaluated. A match of that result will be attempted against each WHEN expression until a match is found and the value of the corresponding THEN expression returned; if no match is found, the value of the ELSE expression will be returned, or null, if no ELSE clause exists.
|
|
Note
This second version below has greater optimization than the first.
Examples:
|
|
|
|
Conversion Functions
Function | Description | ||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CAST(expr AS [SQL_]<conv_type>) or CONVERT(expr, [SQL_]<conv_type>) | Converts expr into conv_type data type Conversion Types:
Note When using the SQL_ prefix, UNSIGNED BIGINT becomes SQL_UNSIGNED_BIGINT | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR(expr) | Returns the character associated with the ASCII code given in expr | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR1(expr) | Converts the given expr to VARCHAR(1) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR2(expr) | Converts the given expr to VARCHAR(2) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR4(expr) | Converts the given expr to VARCHAR(4) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR8(expr) | Converts the given expr to VARCHAR(8) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR16(expr) | Converts the given expr to VARCHAR(16) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR32(expr) | Converts the given expr to VARCHAR(32) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR64(expr) | Converts the given expr to VARCHAR(64) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR128(expr) | Converts the given expr to VARCHAR(128) type | ||||||||||||||||||||||||||||||||||||||||||||||
CHAR256(expr) | Converts the given expr to VARCHAR(256) type | ||||||||||||||||||||||||||||||||||||||||||||||
DATE(expr) | Converts expr to date (YYYY-MM-DD) format | ||||||||||||||||||||||||||||||||||||||||||||||
DATETIME(expr) | Converts expr to datetime (YYYY-MM-DD HH24:MI:SS.mmm) format | ||||||||||||||||||||||||||||||||||||||||||||||
DECIMAL(expr) | Converts the given expr to DECIMAL type | ||||||||||||||||||||||||||||||||||||||||||||||
DOUBLE(expr) | Converts the given expr to DOUBLE type | ||||||||||||||||||||||||||||||||||||||||||||||
FLOAT(expr) | Converts the given expr to REAL type | ||||||||||||||||||||||||||||||||||||||||||||||
INT(expr) | Converts the given expr to INTEGER type | ||||||||||||||||||||||||||||||||||||||||||||||
LONG(expr) | Converts the given expr to BIGINT type | ||||||||||||||||||||||||||||||||||||||||||||||
TIME(expr) | Converts expr to time (HH24:MI:SS) format | ||||||||||||||||||||||||||||||||||||||||||||||
TIMESTAMP(expr) | Converts expr to the number of milliseconds since the epoch | ||||||||||||||||||||||||||||||||||||||||||||||
TO_CHAR(expr, format) | Converts the given date/time expr to a string matching the given format. The format can be a string literal or expression. Arbitrary text can be injected into the format string using double-quotes. The returned string will be truncated at 32 characters. Valid format codes include:
Example:
| ||||||||||||||||||||||||||||||||||||||||||||||
ULONG(expr) | Converts the given expr to UNSIGNED BIGINT type |
Date/Time Functions
This section comprises the following functions:
- Date/Time Base Functions, which can extract parts of date/time expressions, convert back and forth between data types, and return the current date/time
- Date/Time Complex Conversion Functions, which can perform more complex date/type conversions
Date/Time Base Functions
Function | Description | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CURRENT_DATE() | Returns the date as YYYY-MM-DD | ||||||||||||||||||||||||||||||||||||||||
CURRENT_DATETIME() | Returns the date & time as YYYY-MM-DD HH24:MI:SS.mmm | ||||||||||||||||||||||||||||||||||||||||
CURRENT_TIME() | Returns the time as HH24:MI:SS.mmm | ||||||||||||||||||||||||||||||||||||||||
CURRENT_TIMESTAMP() | Returns the date & time as YYYY-MM-DD HH24:MI:SS.mmm; to return the date & time as the number of milliseconds since the epoch, pass the result of this function to LONG() | ||||||||||||||||||||||||||||||||||||||||
DATEADD(unit, amount, expr) | Adds the positive or negative integral amount of unit date/time intervals to the date/time in expr The following date/time intervals are supported for unit:
Note Any of these unit types can have a SQL_TSI_ prefix prepended to them; e.g., both DAY and SQL_TSI_DAY are valid unit types for specifying a day interval Examples:
| ||||||||||||||||||||||||||||||||||||||||
DATEDIFF(expr_begin, expr_end) | Determines the difference between two dates, irrespective of time component, as the number of days when expr_begin is subtracted from expr_end; returns a negative number of days if expr_begin occurs after expr_end | ||||||||||||||||||||||||||||||||||||||||
DAY(expr) | Alias for DAYOFMONTH(expr) | ||||||||||||||||||||||||||||||||||||||||
DAYNAME(expr) | Extracts the day of the week from expr and converts it to the corresponding day name [Sunday - Saturday ] | ||||||||||||||||||||||||||||||||||||||||
DAYOFMONTH(expr) | Extracts the day of the month from expr [1 - 31] | ||||||||||||||||||||||||||||||||||||||||
DAYOFWEEK(expr) | Extracts the day of the week from expr [1 - 7]
| ||||||||||||||||||||||||||||||||||||||||
DAY_OF_WEEK(expr) | Alias for DAYOFWEEK(expr) | ||||||||||||||||||||||||||||||||||||||||
DAYOFYEAR(expr) | Extracts the day of the year from expr [1 - 366] | ||||||||||||||||||||||||||||||||||||||||
DAY_OF_YEAR(expr) | Alias for DAYOFYEAR(expr) | ||||||||||||||||||||||||||||||||||||||||
HOUR(expr) | Extracts the hour of the day from expr [0 - 23] | ||||||||||||||||||||||||||||||||||||||||
<expr> + INTERVAL '<amount>' <part> <expr> - INTERVAL '<amount>' <part> | Adds to or subtracts from the date/time expr the integral amount units of type part. This mirrors the behavior of the TIMESTAMPADD function, only with a different format and different date/time part constants. The following date/time constants are supported for part:
| ||||||||||||||||||||||||||||||||||||||||
LAST_DAY(date) | Returns the date of the last day of the month in the given date | ||||||||||||||||||||||||||||||||||||||||
MINUTE(expr) | Extracts the minute of the day from expr [0 - 59] | ||||||||||||||||||||||||||||||||||||||||
MONTH(expr) | Extracts the month of the year from expr [1 - 12] | ||||||||||||||||||||||||||||||||||||||||
MONTHNAME(expr) | Extracts the month of the year from expr and converts it to the corresponding month name [January - December] | ||||||||||||||||||||||||||||||||||||||||
MSEC(expr) | Extracts the millisecond of the second from expr [0 - 999] | ||||||||||||||||||||||||||||||||||||||||
NEXT_DAY(date, day_of_week) | Returns the date of the next day of the week, provided as a day name in day_of_week, that occurs after the given date Some examples, given that 2000-10-10 is a Tuesday:
| ||||||||||||||||||||||||||||||||||||||||
NOW() | Alias for CURRENT_DATETIME() | ||||||||||||||||||||||||||||||||||||||||
QUARTER(expr) | Extracts the quarter of the year from expr [1 - 4]
| ||||||||||||||||||||||||||||||||||||||||
SECOND(expr) | Extracts the seconds of the minute from expr [0 - 59] | ||||||||||||||||||||||||||||||||||||||||
SEC(expr) | Alias for SECOND(expr) | ||||||||||||||||||||||||||||||||||||||||
TIMESTAMPADD(unit, amount, expr) | Adds the positive or negative integral amount of unit date/time intervals to the date/time in expr The following date/time intervals are supported for unit:
Note Any of these unit types can have a SQL_TSI_ prefix prepended to them; e.g., both DAY and SQL_TSI_DAY are valid unit types for specifying a day interval Examples:
| ||||||||||||||||||||||||||||||||||||||||
TIMESTAMPDIFF(unit, begin, end) | Calculates the difference between two date/time expressions, returning the result as an integral difference in the units specified; more precisely, how many whole date/time intervals of type unit need to be added to (or subtracted from) begin to equal end (or get as close as possible without going past it) using the unit types and and rules specified in TIMESTAMPADD. Note This is not symmetric with TIMESTAMPADD in all cases, as adding 1 MONTH to Mar 31st results in Apr 30th, but the TIMESTAMPDIFF in MONTH units between those two dates is 0. Examples:
| ||||||||||||||||||||||||||||||||||||||||
WEEK(expr) | Extracts the week of the year from expr [1 - 54]; each full week starts on Sunday (a 1 is returned for the week containing Jan 1st) | ||||||||||||||||||||||||||||||||||||||||
YEAR(expr) | Extracts the year from expr; 4-digit year, A.D. |
Date/Time Complex Conversion Functions
Function | Description | ||||||
---|---|---|---|---|---|---|---|
DATE_TO_EPOCH_MSECS(year, month, day, hour, min, sec, msec) | Converts the full date to milliseconds since the epoch; negative values are accepted Example:
| ||||||
DATE_TO_EPOCH_SECS(year, month, day, hour, min, sec) | Converts the full date to seconds since the epoch; negative values are accepted Example:
| ||||||
MSECS_SINCE_EPOCH(timestamp) | Converts the timestamp to milliseconds since the epoch Example:
| ||||||
TIMESTAMP_FROM_DATE_TIME(date, time) | Converts the given date and time to a composite date/time format Example:
| ||||||
WEEK_TO_EPOCH_MSECS(year, week_number) | Converts the year and week number to milliseconds since the epoch; negative values are accepted Example:
| ||||||
WEEK_TO_EPOCH_SECS(year, week_number) | Converts the year and week number to seconds since the epoch. Negative values are accepted. Each new week begins Sunday at midnight. Example:
|
Error-Checking Functions
Errors will only be successfully caught if the expressions are floating-point types (i.e. double and float).
Function | Description | ||||||
---|---|---|---|---|---|---|---|
IFERROR(expr, val) | Alias for IF_ERROR(expr, val) | ||||||
IFINF(expr, val) | Alias for IF_INF(expr, val) | ||||||
IFINFINITY(expr, val) | Alias for IF_INF(expr, val) | ||||||
IFNAN(expr, val) | Alias for IF_NAN(expr, val) | ||||||
IF_ERROR(expr, val) | Evaluates the given expr and if it resolves to infinity or NaN, return val Tip Conceptually, this function is the same as IF_INF(IF_NAN(expr, val), val) Example:
| ||||||
IF_INF(expr, val) | Evaluates the given expr and if it resolves to infinity, return val Example:
| ||||||
IF_INFINITY(expr, val) | Alias for IF_INF(expr, val) | ||||||
IF_NAN(expr, val) | Evaluates the given expr and if it resolves to NaN, return val Example:
|
Geospatial/Geometry Functions
Tip
- Use ST_ISVALID to determine if a geometry object is valid. The functions below work best with valid geometry objects.
- Use the REMOVE_NULLABLE function to remove any nullable column types that could result from calculating a derived column (e.g., as in Projections) using one of the functions below.
Enhanced Performance Scalar Functions
The functions below all compare x and y coordinates to geometry objects (or vice versa), thus increasing their performance in queries. Each of these functions have a geometry-to-geometry version listed in the next section.
Function | Description |
---|---|
STXY_CONTAINS(geom, x, y) | Returns 1 (true) if geom contains the x and y coordinate, e.g. lies in the interior of geom. The coordinate cannot be on the boundary and also be contained because geom does not contain its boundary |
STXY_CONTAINSPROPERLY(geom, x, y) | Returns 1 (true) if the x and y coordinate intersects the interior of geom but not the boundary (or exterior) because geom does not contain its boundary but does contain itself |
STXY_COVEREDBY(x, y, geom) | Returns 1 (true) if the x and y coordinate is covered by geom |
STXY_COVERS(geom, x, y) | Returns 1 (true) if geom covers the x and y coordinate |
STXY_DISJOINT(x, y, geom) | Returns 1 (true) if the given x and y coordinate and the geometry geom do not spatially intersect. |
STXY_DISTANCE(x, y, geom[, solution]) | Calculates the minimum distance between the given x and y coordinate and geom using the specified solution type. Solution types available:
Note: If the x and y coordinate and geom intersect (verify using ST_INTERSECTS), the distance will always be 0. |
STXY_DWITHIN(x, y, geom, distance[, solution]) | Returns 1 (true) if the x and y coordinate is within the specified distance from geom using the specified solution type. Solution types available:
|
STXY_ENVDWITHIN(x, y, geom, distance[, solution]) | Returns 1 (true) if the x and y coordinate is within the specified distance from the bounding box of geom using the specified solution type. Solution types available:
|
STXY_ENVINTERSECTS(x, y, geom) | Returns 1 (true) if the bounding box of the given geometry geom intersects the x and y coordinate. |
STXY_INTERSECTION(x, y, geom) | Returns the shared portion between the x and y coordinate and the given geometry geom, i.e. the point itself. |
STXY_INTERSECTS(x, y, geom) | Returns 1 (true) if the x and y coordinate and geom intersect in 2-D. |
STXY_TOUCHES(x, y, geom) | Returns 1 (true) if the x and y coordinate and geometry geom have at least one point in common but their interiors do not intersect. If geom is a GEOMETRYCOLLECTION, a 0 is returned regardless if the point and geometry touch |
STXY_WITHIN(x, y, geom) | Returns 1 (true) if the x and y coordinate is completely inside the geom geometry i.e., not on the boundary |
Scalar Functions
Function | Description |
---|---|
DIST(x1, y1, x2, y2) | Computes the Euclidean distance (in degrees), i.e. SQRT( (x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) ). |
GEODIST(lon1, lat1, lon2, lat2) | Computes the geographic great-circle distance (in meters) between two lat/lon points. |
GEOHASH_DECODE_LATITUDE(geohash) | Decodes a given geohash and returns the latitude value for the given hash string. Supports a maximum geohash character length of 16. |
GEOHASH_DECODE_LONGITUDE(geohash) | Decodes a given geohash and returns the longitude value for the given hash string. Supports a maximum geohash character length of 16. |
GEOHASH_ENCODE(lat, lon, precision) | Encodes a given coordinate pair and returns a hash string with a given precision. |
ST_ADDPOINT(linestring, point, position) | Adds a given point geometry to the given linestring geometry at the specified position, which is a 0-based index. |
ST_ALMOSTEQUALS(geom1, geom2, decimal) | Returns 1 (true) if given geometries, geom1 and geom2, are almost spatially equal within the given amount of decimal scale. Note that geometries will still be considered equal if the decimal scale for the geometries is within a half order of magnitude of each other, e.g, if decimal is set to 2, then POINT(63.4 123.45) and POINT(63.4 123.454) are equal, but POINT(63.4 123.45) and POINT(63.4 123.459) are not equal. The geometry types must match to be considered equal. |
ST_AREA(geom[, solution]) | Returns the area of the given geometry geom if it is a POLYGON or MULTIPOLYGON using the specified solution type. Returns 0 if the input geometry type is (MULTI)POINT or (MULTI)LINESTRING. Solution types available:
|
ST_AZIMUTH(geom1, geom2) | Returns the azimuth in radians defined by the segment between two POINTs, geom1 and geom2. Returns a null if the input geometry type is MULTIPOINT, (MULTI)LINESTRING, or (MULTI)POLYGON. |
ST_BOUNDARY(geom) | Returns the closure of the combinatorial boundary of a given geometry geom. Returns an empty geometry if geom is an empty geometry. Returns a null if geom is a GEOMETRYCOLLECTION |
ST_BOUNDINGDIAGONAL(geom) | Returns the diagonal of the given geometry's (geom) bounding box. |
ST_BUFFER(geom, radius[, style[, solution]]) | Returns a geometry that represents all points whose distance from the given geometry geom is less than or equal to the given distance radius. The radius units can be specified by the solution type (default is in degrees) and the radius is created in the provided style. The style options are specified as a list of blank-separated key-value pairs, e.g., 'quad_segs=8 endcap=round'. If an empty style list ('') is provided, the default settings will be used. The style parameter must be specified to provide a solution type. Available style options:
Available solution types:
Tip To create a 5-meter buffer around geom using the default styles: ST_BUFFER(geom, 5, '', 1). To create a 5-foot (converting feet to meters) buffer around geom using the following styles: ST_BUFFER(geom, 5*0.3048,'quad_segs=4 endcap=flat', 1) |
ST_CENTROID(geom) | Calculates the center of the given geometry geom as a POINT. For (MULTI)POINTs, the center is calculated as the average of the input coordinates. For (MULTI)LINESTRINGs, the center is calculated as the weighted length of each given LINESTRING. For (MULTI)POLYGONs, the center is calculated as the weighted area of each given POLYGON. If geom is an empty geometry, an empty GEOMETRYCOLLECTION is returned |
ST_CLIP(geom1, geom2) | Returns the geometry shared between given geometries geom1 and geom2 |
ST_CLOSESTPOINT(geom1, geom2[, solution]) | Calculates the 2-D POINT in geom1 that is closest to geom2 using the specified solution type. If geom1 or geom2 is empty, a null is returned. Solution types available:
|
ST_COLLECT(geom1, geom2) | Returns a MULTI* or GEOMETRYCOLLECTION comprising geom1 and geom2. If geom1 and geom2 are the same, singular geometry type, a MULTI* is returned, e.g., if geom1 and geom2 are both POINTs (empty or no), a MULTIPOINT is returned. If geom1 and geom2 are neither the same type nor singular geometries, a GEOMETRYCOLLECTION is returned. |
ST_COLLECTIONEXTRACT(collection, type) | Returns only the specified type from the given geometry collection. Type is a number that maps to the following:
|
ST_COLLECTIONHOMOGENIZE(collection) | Returns the simplest form of the given collection, e.g., a collection with a single POINT will be returned as POINT(x y), and a collection with multiple individual points will be returned as a MULTIPOINT. |
ST_CONCAVEHULL(geom, target_percent[, allow_holes]) | Returns a potentially concave geometry that encloses all geometries found in the given geom set. Use target_percent (values between 0 and 1) to determine the percent of area of a convex hull the concave hull will attempt to fill; 1 will return the same geometry as an ST_CONVEXHULL operation. Set allow_holes to 1 (true) to allow holes in the resulting geometry; default value is 0 (false). Note that allow_holes is independent of the area of target_percent. |
ST_CONTAINS(geom1, geom2) | Returns 1 (true) if no points of geom2 lie in the exterior of geom1 and at least one point of geom2 lies in the interior of geom1. Note that geom1 does not contain its boundary but does contain itself. |
ST_CONTAINSPROPERLY(geom1, geom2) | Returns 1 (true) if geom2 intersects the interior of geom1 but not the boundary (or exterior). Note that geom1 does not contain its boundary but does contain itself. |
ST_CONVEXHULL(geom) | Returns the minimum convex geometry that encloses all geometries in the given geom set. |
ST_COORDDIM(geom) | Returns the coordinate dimension of the given geom, e.g., a geometry with x, y, and z coordinates would return 3. |
ST_COVEREDBY(geom1, geom2) | Returns 1 (true) if no point in geom1 is outside geom2. |
ST_COVERS(geom1, geom2) | Returns 1 (true) if no point in geom2 is outside geom1. |
ST_CROSSES(geom1, geom2) | Returns 1 (true) if the given geometries, geom1 and geom2, spatially cross, meaning some but not all interior points in common. If geom1 and/or geom2 are a GEOMETRYCOLLECTION, a 0 is returned regardless if the two geometries cross |
ST_DIFFERENCE(geom1, geom2) | Returns a geometry that represents the part of geom1 that does not intersect with geom2. |
ST_DIMENSION(geom) | Returns the dimension of the given geometry geom, which is less than or equal to the coordinate dimension. If geom is a single geometry, a 0 is for POINT, a 1 is for LINESTRING, and a 2 is for POLYGON. If geom is a collection, it will return the largest dimension from the collection. If geom is empty, 0 is returned. |
ST_DISJOINT(geom1, geom2) | Returns 1 (true) if the given geometries, geom1 and geom2, do not spatially intersect. |
ST_DISTANCE(geom1, geom2[, solution]) | Calculates the minimum distance between the given geometries, geom1 and geom2, using the specified solution type. Solution types available:
Note: If geom1 and geom2 intersect (verify using ST_INTERSECTS), the distance will always be 0. |
ST_DISTANCEPOINTS(x1, y1, x2, y2[, solution]) | Calculates the minimum distance between the given points, x1, y1 and x2, y2, using the specified solution type. Solution types available:
|
ST_DFULLYWITHIN(geom1, geom2, distance[, solution]) | Returns 1 (true) if the maximum distance between geometries geom1 and geom2 is less than or equal to the specified distance of each other using the specified solution type. If geom1 or geom2 is null, 0 (false) is returned. Solution types available:
|
ST_DWITHIN(geom1, geom2, distance[, solution]) | Returns 1 (true) if the minimum distance between geometries geom1 and geom2 is within the specified distance of each other using the specified solution type. Solution types available:
|
ST_ELLIPSE(centerx, centery, height, width) | Returns an ellipse using the following values:
|
ST_ENDPOINT(geom) | Returns the last point of the given geom as a POINT if it's a LINESTRING. If geom is not a a LINESTRING, null is returned. |
ST_ENVDWITHIN(geom1, geom2, distance[, solution]) | Returns 1 (true) if geom1 is within the specified distance of the bounding box of geom2 using the specified solution type. Solution types available:
|
ST_ENVELOPE(geom) | Returns the bounding box of a given geometry geom. |
ST_ENVINTERSECTS(geom1, geom2) | Returns 1 (true) if the bounding box of the given geometries, geom1 and geom2, intersect. |
ST_EQUALS(geom1, geom2) | Returns 1 (true) if the given geometries, geom1 and geom2, are spatially equal. Note that order does not matter. |
ST_EQUALSEXACT(geom1, geom2, tolerance) | Returns 1 (true) if the given geometries, geom1 and geom2, are almost spatially equal within some given tolerance. If the values within the given geometries are within the tolerance value of each other, they're considered equal, e.g., if tolerance is 2, POINT(1 1) and POINT(1 3) are considered equal, but POINT(1 1) and POINT(1 3.1) are not. Note that the geometry types have to match for them to be considered equal. |
ST_ERASE(geom1, geom2) | Returns the result of erasing a portion of geom1 equal to the size of geom2. |
ST_EXPAND(geom, units) | Returns the bounding box expanded in all directions by the given units of the given geom. The expansion can also be defined for separate directions by providing separate parameters for each direction, e.g., ST_EXPAND(geom, unitsx, unitsy, unitsz, unitsm). |
ST_EXPANDBYRATE(geom, rate) | Returns the bounding box expanded by a given rate (a ratio of width and height) for the given geometry geom. The rate must be between 0 and 1. |
ST_EXTERIORRING(geom) | Returns a LINESTRING representing the exterior ring of the given POLYGON geom |
ST_FORCE2D(geom) | Returns the 2-dimensional version (e.g., X and Y coordinates) of geom, the provided geometry or set of geometries (e.g., via GEOMETRYCOLLECTION or WKT column name). |
ST_FORCE3D(geom[, z]) | Returns the 3-dimensional version (e.g., X, Y, and Z coordinates) of geom, a provided geometry or set of geometries (e.g., via GEOMETRYCOLLECTION or WKT column name), using z as the geometry's new z-value. The provided z-values can also be derived from a numeric column. If no z is provided, a 0 will be applied. Note If a WKT column is provided for geom and a numeric column is provided for z, the z values will be matched to the provided geometries by row in the source table. If a singular geometry is provided for geom and a column is provided for z, three-dimensional versions of the provided geometry will be returned for each z value found in the provided z column. If columns are provided for both geom and z and nulls are present in either column, the row containing null values will be skipped in the results. |
ST_GENERATEPOINTS(geom, num) | Creates a MULTIPOINT containing a number num of randomly generated points within the boundary of geom. |
ST_GEOHASH(geom, precision) | Returns a hash string representation of the given geometry geom with specified precision (the length of the geohash string). The longer the precision, the more precise the hash is. By default, precision is set to 20; the max for precision is 32. Returns null if geom is an empty geometry. Note The value returned will not be a geohash of the exact geometry but a geohash of the centroid of the given geometry |
ST_GEOMETRYN(geom, index) | Returns the index geometry back from the given geom geometry. The index starts from 1 to the number of geometry in geom. |
ST_GEOMETRYTYPE(geom) | Returns the type of geometry from the given geom. |
ST_GEOMETRYTYPEID(geom) | Returns the type ID of from geom. Type and ID mappings:
|
ST_GEOMFROMGEOHASH(geohash, precision) | Returns a POLYGON boundary box using the given geohash with a precision set by the integer precision. If precision is specified, the function will use as many characters in the hash equal to precision to create the geometry. If no precision is specified, the full length of the geohash is used. |
ST_GEOMFROMTEXT(wkt) | Returns a geometry from the given Well-Known text representation wkt. Note that this function is only compatible with constants |
ST_HEXGRID(xmin, ymin, xmax, ymax, cell_side[, limit]) | Creates a MULTIPOLYGON containing a grid of hexagons between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual hexagons' sides is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:
If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned. |
ST_INTERIORRINGN(geom, n) | Returns the n-th interior LINESTRING ring of the POLYGON geom. If geom is not a POLYGON or the given n is out of range, a null is returned. The index begins at 1 |
ST_INTERSECTION(geom1, geom2) | Returns the shared portion between given geometries geom1 and geom2 |
ST_INTERSECTS(geom1, geom2) | Returns 1 (true) if the given geometries, geom1 and geom2, intersect in 2-D |
ST_ISCLOSED(geom) | Returns 1 (true) if the given geometry's (geom) start and end points coincide |
ST_ISCOLLECTION(geom) | Returns 1 (true) if geom is a collection, e.g., GEOMETRYCOLLECTION, MULTIPOINT, MULTILINESTRING, etc. |
ST_ISEMPTY(geom) | Returns 1 (true) if geom is empty |
ST_ISRING(geom) | Returns 1 (true) if LINESTRING geom is both closed (per ST_ISCLOSED) and "simple" (per ST_ISSIMPLE). Returns 0 if geom is not a LINESTRING |
ST_ISSIMPLE(geom) | Returns 1 (true) if geom has no anomalous geometric points, e.g., self-intersection or self-tangency |
ST_ISVALID(geom) | Returns 1 (true) if geom (typically a [MULTI]POLYGON) is well formed. A POLYGON is valid if its rings do not cross and its boundary intersects only at POINTs (not along a line). The POLYGON must also not have dangling LINESTRINGs. A MULTIPOLYGON is valid if all of its elements are also valid and the interior rings of those elements do not intersect. Each element's boundaries may touch but only at POINTs (not along a line) |
ST_LENGTH(geom[, solution]) | Returns the length of the geometry if it is a LINESTRING or MULTILINESTRING. Returns 0 if another type of geometry, e.g., POINT, MULTIPOINT, etc. GEOMETRYCOLLECTIONs are also supported but the aforementioned type limitation still applies; the collection will be recursively searched for LINESTRINGs and MULTILINESTRINGs and the summation of all supported geometry types is returned (unsupported types are ignored). Solution types available:
|
ST_LINEFROMMULTIPOINT(geom) | Creates a LINESTRING from geom if it is a MULTIPOINT. Returns null if geom is not a MULTIPOINT |
ST_LINEINTERPOLATEPOINT(geom, fraction) | Returns a POINT that represents the specified fraction of the LINESTRING geom. If geom is either empty or not a LINESTRING, null is returned |
ST_LINELOCATEPOINT(linestring, point) | Returns the location of the closest point in the given linestring to the given point as a value between 0 and 1. The return value is a fraction of the total linestring length. |
ST_LINEMERGE(geom) | Returns a LINESTRING or MULTILINESTRING from a given geom. If geom is a MULTILINESTRING comprising LINESTRINGs with shared endpoints, a contiguous LINESTRING is returned. If geom is a LINESTRING or a MULTILINESTRING comprising LINESTRINGS without shared endpoints, geom is returned If geom is an empty (MULTI)LINESTRING or a (MULTI)POINT or (MULTI)POLYGON, an empty GEOMETRYCOLLECTION is returned. |
ST_LINESUBSTRING(geom, start_fraction, end_fraction) | Returns the fraction of a given geom LINESTRING where start_fraction and end_fraction are between 0 and 1. For example, given LINESTRING(1 1, 2 2, 3 3) a start_fraction of 0 and an end_fraction of 0.25 would yield the first quarter of the given LINESTRING, or LINESTRING(1 1, 1.5 1.5). Returns null if start_fraction is greater than end_fraction. Returns null if input geometry is (MULTI)POINT, MULTILINESTRING, or (MULTI)POLYGON. Returns null if start_fraction and/or end_fraction are less than 0 or more than 1. |
ST_LONGESTLINE(geom1, geom2[, solution]) | Returns the LINESTRING that represents the longest line of points between the two geometries. If multiple longest lines are found, only the first line found is returned. If geom1 or geom2 is empty, null is returned. Solution types available:
|
ST_MAKEENVELOPE(xmin, ymin, xmax, ymax) | Creates a rectangular POLYGON from the given min and max parameters |
ST_MAKELINE(geom[, geom2]) | Creates a LINESTRING from geom if it is a MULTIPOINT. If geom is a POINT, there must be at least one other POINT to construct a LINESTRING. If geom is a LINESTRING, it must have at least two points. Returns null if geom is not a POINT, MULTIPOINT, or LINESTRING Note This function can be rather costly in terms of performance |
ST_MAKEPOINT(x, y) | Creates a POINT at the given coordinate Note This function can be rather costly in terms of performance |
ST_MAKEPOLYGON(geom) | Creates a POLYGON from geom. Inputs must be closed LINESTRINGs Note This function can be rather costly in terms of performance |
ST_MAKETRIANGLE2D(x1, y1, x2, y2, x3, y3) | Creates a closed 2-D POLYGON with three vertices |
ST_MAKETRIANGLE3D(x1, y1, z1, x2, y2, z2, x3, y3, z3) | Creates a closed 3-D POLYGON with three vertices |
ST_MAXDISTANCE(geom1, geom2[, solution]) | Returns the maximum distance between the given geom1 and geom2 geometries using the specifed solution type. If geom1 or geom2 is empty, null is returned. Solution types available:
|
ST_MAXX(geom) | Returns the maximum x coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries. |
ST_MAXY(geom) | Returns the maximum y coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries. |
ST_MAXZ(geom) | Returns the maximum z coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries. |
ST_MINX(geom) | Returns the minimum x coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries. |
ST_MINY(geom) | Returns the minimum y coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries. |
ST_MINZ(geom) | Returns the minimum z coordinate of a bounding box for the given geom geometry. This function works for 2-D and 3-D geometries. |
ST_MULTI(geom) | Returns geom as a MULTI- geometry, e.g., a POINT would return a MULTIPOINT. |
ST_MULTIPLERINGBUFFERS(geom, distance, outside) | Creates multiple buffers at specified distance around the given geom geometry. Multiple distances are specified as comma-separated values in an array, e.g., [10,20,30]. Valid values for outside are:
|
ST_NEAR(geom1, geom2) | Returns the portion of geom2 that is closest to geom1. If geom2 is a singular geometry object (e.g., POINT, LINESTRING, POLYGON), geom2 will be returned. If geom2 a multi-geometry, e.g., MULTIPOINT, MULTILINESTRING, etc., the nearest singular geometry in geom2 will be returned. |
ST_NORMALIZE(geom) | Returns geom in its normalized (canonical) form, which may rearrange the points in lexicographical order. |
ST_NPOINTS(geom) | Returns the number of points (vertices) in geom. |
ST_NUMGEOMETRIES(geom) | If geom is a collection or MULTI- geometry, returns the number of geometries. If geom is a single geometry, returns 1. |
ST_NUMINTERIORRINGS(geom) | Returns the number of interior rings if geom is a POLYGON. Returns null if geom is anything else. |
ST_NUMPOINTS(geom) | Returns the number of points in the geom LINESTRING. Returns null if geom is not a LINESTRING. |
ST_OVERLAPS(geom1, geom2) | Returns 1 (true) if given geometries geom1 and geom2 share space. If geom1 and/or geom2 are a GEOMETRYCOLLECTION, a 0 is returned regardless if the two geometries overlap |
ST_PARTITION(geom, threshold) | Returns a MULTIPOLYGON representing the given geom partitioned into a number of POLYGONs with a maximum number of vertices equal to the given threshold. Minimum value for threshold is 10; default value is 10000. If geom is not a POLYGON or MULTIPOLYGON, geom is returned. If the number of vertices in geom is less than the threshold, geom is returned. |
ST_PERIMETER(geom[, solution]) | Returns the perimeter of the geometry if it is a POLYGON or MULTIPOLYGON. Returns 0 if another type of geometry, e.g., POINT, MULTIPOINT, LINESTRING, or MULTILINESTRING. GEOMETRYCOLLECTIONs are also supported but the aforementioned type limitation still applies; the collection will be recursively searched for POLYGONs and MULTIPOLYGONs and the summation of all supported geometry types is returned (unsupported types are ignored). Solution types available:
|
ST_POINT(x, y) | Returns a POINT with the given x and y coordinates. |
ST_POINTFROMGEOHASH(geohash, precision) | Returns a POINT using the given geohash with a precision set by the integer precision. If precision is specified, the function will use as many characters in the hash equal to precision to create the geometry. If no precision is specified, the full length of the geohash is used. Note The POINT returned represents the center of the bounding box of the geohash |
ST_POINTGRID(xmin, ymin, xmax, ymax, cell_side[, limit]) | Creates a MULTIPOLYGON containing a square-shaped grid of points between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The distance between the points (in meters) is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:
If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned. |
ST_POINTN(geom, n) | Returns the n-th point in LINESTRING geom. Negative values are valid, but note that they are counted backwards from the end of geom. A null is returned if geom is not a LINESTRING. |
ST_POINTS(geom) | Returns a MULTIPOINT containing all of the coordinates of geom. |
ST_PROJECT(geom, distance, azimuth) | Returns a POINT projected from a start point geom along a geodesic calculated using distance and azimuth. If geom is not a POINT, null is returned. |
ST_REMOVEPOINT(geom, offset) | Remove a point from LINESTRING geom using offset to skip over POINTs in the LINESTRING. The offset is 0-based. |
ST_REMOVEREPEATEDPOINTS(geom, tolerance) | Removes points from geom if the point's vertices are greater than or equal to the tolerance of the previous point in the geometry's list. If geom is not a MULTIPOINT, MULTILINESTRING, or a MULTIPOLYGON, no points will be removed. |
ST_REVERSE(geom) | Return the geometry with its coordinate order reversed. |
ST_SCALE(geom, x, y) | Scales geom by multiplying its respective vertices by the given x and y values. This function also supports scaling geom using another geometry object, e.g., ST_SCALE('POINT(3 4)', 'POINT(5 6)') would return POINT(15 24). If specifying x and y for scale, note that the default value is 0, e.g., ST_SCALE('POINT(1 3)', 4) would return POINT(4 0). |
ST_SEGMENTIZE(geom, max_segment_length[, solution]) | Returns the given geom but segmentized n number of times depending on how the max_segment_length distance (in units based on the solution type) divides up the original geometry. The new geom is guaranteed to have segments that are smaller than the given max_segment_length. Note that POINTs are not able to be segmentized. Collection geometries (GEOMETRYCOLLECTION, MULTILINESTRING, MULTIPOINT, etc.) can be segmentized, but only the individual parts will be segmentized, not the collection as a whole. Solution types available:
|
ST_SETPOINT(geom1, position, geom2) | Replace a point of LINESTRING geom1 with POINT geom2 at position (base 0). Negative values are valid, but note that they are counted backwards from the end of geom. |
ST_SHAREDPATH(geom1, geom2) | Returns a collection containing paths shared by geom1 and geom2. |
ST_SHORTESTLINE(geom1, geom2) | Returns the 2-D LINESTRING that represents the shortest line of points between the two geometries. If multiple shortest lines are found, only the first line found is returned. If geom1 or geom2 is empty, null is returned |
ST_SIMPLIFY(geom, tolerance) | Returns a simplified version of the given geom using an algorithm to reduce the number of points comprising a given geometry while attempting to best retain the original shape. The given tolerance determines how much to simplify the geometry. The higher the tolerance, the more simplified the returned geometry. Some holes might be removed and some invalid polygons (e.g., self-intersecting, etc.) might be present in the returned geometry. Only (MULTI)LINESTRINGs and (MULTI)POLYGONs can be simplified, including those found within GEOMETRYCOLLECTIONs; any other geometry objects will be returned unsimplified. Note The tolerance should be provided in the same units as the data. As a rule of thumb, a tolerance of 0.00001 would correspond to about one meter. |
ST_SIMPLIFYPRESERVETOPOLOGY(geom, tolerance) | Returns a simplified version of the given geom using an algorithm to reduce the number of points comprising a given geometry while attempting to best retain the original shape. The given tolerance determines how much to simplify the geometry. The higher the tolerance, the more simplified the returned geometry. No holes will be removed and no invalid polygons (e.g., self-intersecting, etc.) will be present in the returned geometry. Only (MULTI)LINESTRINGs and (MULTI)POLYGONs can be simplified, including those found within GEOMETRYCOLLECTIONs; any other geometry objects will be returned unsimplified. Note The tolerance should be provided in the same units as the data. As a rule of thumb, a tolerance of 0.00001 would correspond to about one meter. |
ST_SNAP(geom1, geom2, tolerance) | Snaps geom1 to geom2 within the given tolerance. If the tolerance causes geom1 to not snap, the geometries will be returned unchanged. |
ST_SPLIT(geom1, geom2) | Returns a collection of geometries resulting from the split between geom1 and geom2 geometries. |
ST_SQUAREGRID(xmin, ymin, xmax, ymax, cell_side[, limit]) | Creates a MULTIPOLYGON containing a grid of squares between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual squares' sides is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:
If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned. |
ST_STARTPOINT(geom) | Returns the first point of LINESTRING geom as a POINT. Returns null if geom is not a LINESTRING. |
ST_SYMDIFFERENCE(geom1, geom2) | Returns a geometry that represents the portions of geom1 and geom2 geometries that do not intersect. |
ST_TOUCHES(geom1, geom2) | Returns 1 (true) if the given geometries, geom1 and geom2, have at least one point in common but their interiors do not intersect. If geom1 and/or geom2 are a GEOMETRYCOLLECTION, a 0 is returned regardless if the two geometries touch |
ST_TRANSLATE(geom, deltax, deltay[, deltaz]) | Translate geom by given offsets deltax and deltay. A z-coordinate offset can be applied using deltaz. |
ST_TRIANGLEGRID(xmin, ymin, xmax, ymax, cell_side[, limit]) | Creates a MULTIPOLYGON containing a grid of triangles between given minimum and maximum points of a bounding box. The minimum point cannot be greater than or equal to the maximum point. The size (in meters) of the individual triangles' sides is determined by cell_side. The cell_side cannot be greater than the width or height of the bounding box. The maximum number of cells that can be produced is determined by limit, a positive integer. Supported values for limit:
If the custom limit request specifies more cells (based on the bounding box and the cell_side) than the system limit, a null is returned. |
ST_UNION(geom1, geom2) | Returns a geometry that represents the point set union of the two given geometries, geom1 and geom2. |
ST_UNIONCOLLECTION(geom) | Returns a geometry that represents the point set union of a single given geometry geom. |
ST_UPDATE(geom1, geom2) | Returns a geometry that is geom1 geometry updated by geom2 geometry |
ST_VORONOIPOLYGONS(geom, tolerance) | Returns a GEOMETRYCOLLECTION containing Voronoi polygons (regions consisting of points closer to a vertex in geom than any other vertices in geom) calculated from the vertices in geom and the given tolerance. The tolerance determines the distance at which points will be considered the same. An empty GEOMETRYCOLLECTION is returned if geom is an empty geometry, a single POINT, or a LINESTRING or POLYGON composed of equivalent vertices (e.g., POLYGON((0 0, 0 0, 0 0, 0 0)), LINESTRING(0 0, 0 0)). |
ST_WITHIN(geom1, geom2) | Returns 1 (true) if the geom1 geometry is inside the geom2 geometry. Note that as long as at least one point is inside of geom2, geom1 is considered within geom2 even if the rest of the geom1 lies along the boundary of geom2 |
ST_WKTTOWKB(geom) | Returns the binary form (WKB) of a geom (WKT) Note This function can only be used in queries against a single table. |
ST_X(geom) | Returns the X coordinate of the POINT geom; if the coordinate is not available, null is returned. geom must be a POINT. |
ST_XMAX(geom) | Alias for ST_MAXX() |
ST_XMIN(geom) | Alias for ST_MINX() |
ST_Y(geom) | Returns the Y coordinate of the POINT geom; if the coordinate is not available, null is returned. geom must be a POINT. |
ST_YMAX(geom) | Alias for ST_MAXY() |
ST_YMIN(geom) | Alias for ST_MINY() |
ST_ZMAX(geom) | Alias for ST_MAXZ() |
ST_ZMIN(geom) | Alias for ST_MINZ() |
Aggregation Functions
Function | Description |
---|---|
ST_AGGREGATE_COLLECT(geom) | Alias for ST_COLLECT_AGGREGATE() |
ST_AGGREGATE_INTERSECTION(geom) | Alias for ST_INTERSECTION_AGGREGATE() |
ST_COLLECT_AGGREGATE(geom) | Returns a GEOMETRYCOLLECTION comprising all geometries found in the geom set. Any MULTI* geometries will be divided into separate singular geometries, e.g., MULTIPOINT((0 0), (1 1)) would be divided into POINT(0 0) and POINT(1 1) in the results; the same is true for elements of a GEOMETRYCOLLECTION found in geom, where a GEOMETRYCOLLECTION within the provided geom set will also be parsed, effectively flattening it and adding the individual geometries to the resulting GEOMETRYCOLLECTION. Any empty geometries in geom are ignored even if they are part of a GEOMETRYCOLLECTION. Any duplicate WKTs will be retained. |
ST_DISSOLVE(geom) | Dissolves all geometries within a given set into a single geometry. Note that the resulting single geometry can still be a group of noncontiguous geometries but represented as a single group, e.g., a GEOMETRYCOLLECTION. Best performance when used in conjunction with adjacent geometries |
ST_DISSOLVEOVERLAPPING(geom) | Dissolves all geometries within a given set into a single geometry. Note that the resulting single geometry can still be a group of noncontiguous geometries but represented as a single group, e.g., a GEOMETRYCOLLECTION. Best performance when used in conjunction with overlapping geometries |
ST_INTERSECTION_AGGREGATE(geom) | Returns a POLYGON or MULTIPOLYGON comprising the shared portion between all geometries found in the geom set. Returns an empty GEOMETRYCOLLECTION if there is no shared portion between all geometries. Functionally equivalent to ST_INTERSECTION(ST_INTERSECTION(ST_INTERSECTION(geom1, geom2), geom3), ... geomN). |
ST_LINESTRINGFROMORDEREDPOINTS(x, y, t) | Returns a LINESTRING that represents a "track" of the given points (x, y) ordered by the given sort column t (e.g., a timestamp or sequence number). If any of the values in the specified columns are null, the null "point" will be left out of the resulting LINESTRING. If there's only one non-null "point" in the source table, a POINT is returned. If there are no non-null "points" in the source table, a null is returned |
ST_LINESTRINGFROMORDEREDPOINTS3D(x, y, z, t) | Returns a LINESTRING that represents a "track" of the given 3D points (x, y, z) ordered by the given sort column t (e.g., a timestamp or sequence number). If any of the values in the specified columns are null, the null "point" will be left out of the resulting LINESTRING. If there's only one non-null "point" in the source table, a POINT is returned. If there are no non-null "points" in the source table, a null is returned |
ST_POLYGONIZE(geom) | Returns a GEOMETRYCOLLECTION containing POLYGONs comprising the provided (MULTI)LINESTRING(s). (MULTI)POINT and (MULTI)POLYGON geometries are ignored when calculating the resulting GEOMETRYCOLLECTION. If a valid POLYGON cannot be constructed from the provided (MULTI)LINESTRING(s), an empty GEOMETRYCOLLECTION will be returned. |
Math Functions
Function | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
ABS(expr) | Calculates the absolute value of expr | ||||||||
ACOS(expr) | Returns the inverse cosine (arccosine) of expr as a double | ||||||||
ACOSF(expr) | Returns the inverse cosine (arccosine) of expr as a float | ||||||||
ACOSH(expr) | Returns the inverse hyperbolic cosine of expr as a double | ||||||||
ACOSHF(expr) | Returns the inverse hyperbolic cosine of expr as a float | ||||||||
ASIN(expr) | Returns the inverse sine (arcsine) of expr as a double | ||||||||
ASINF(expr) | Returns the inverse sine (arcsine) of expr as a float | ||||||||
ASINH(expr) | Returns the inverse hyperbolic sine of expr as a double | ||||||||
ASINHF(expr) | Returns the inverse hyperbolic sine of expr as a float | ||||||||
ATAN(expr) | Returns the inverse tangent (arctangent) of expr as a double | ||||||||
ATANF(expr) | Returns the inverse tangent (arctangent) of expr as a float | ||||||||
ATANH(expr) | Returns the inverse hyperbolic tangent of expr as a double | ||||||||
ATANHF(expr) | Returns the inverse hyperbolic tangent of expr as a float | ||||||||
ATAN2(x, y) | Returns the inverse tangent (arctangent) using two arguments as a double | ||||||||
ATAN2F(x, y) | Returns the inverse tangent (arctangent) using two arguments as a float | ||||||||
ATN2(x, y) | Alias for ATAN2 | ||||||||
ATN2F(x, y) | Alias for ATAN2F | ||||||||
CBRT(expr) | Returns the cube root of expr as a double | ||||||||
CBRTF(expr) | Returns the cube root of expr as a float | ||||||||
CEIL(expr) | Alias for CEILING | ||||||||
CEILING(expr) | Rounds expr up to the next highest integer | ||||||||
COS(expr) | Returns the cosine of expr as a double | ||||||||
COSF(expr) | Returns the cosine of expr as a float | ||||||||
COSH(expr) | Returns the hyperbolic cosine of expr as a double | ||||||||
COSHF(expr) | Returns the hyperbolic cosine of expr as a float | ||||||||
COT(expr) | Returns the cotangent of expr as a double | ||||||||
COTF(expr) | Returns the cotangent of expr as a float | ||||||||
DEGREES(expr) | Returns the conversion of expr (in radians) to degrees as a double | ||||||||
DEGREESF(expr) | Returns the conversion of expr (in radians) to degrees as a float | ||||||||
DIVZ(a, b, c) | Returns the quotient a / b unless b == 0, in which case it returns c | ||||||||
EXP(expr) | Returns e to the power of expr as a double | ||||||||
EXPF(expr) | Returns e to the power of expr as a float | ||||||||
FLOOR(expr) | Rounds expr down to the next lowest integer | ||||||||
GREATER(expr_a, expr_b) | Returns whichever of expr_a and expr_b has the larger value, based on typed comparison | ||||||||
HYPOT(x, y) | Returns the hypotenuse of x and y as a double | ||||||||
HYPOTF(x, y) | Returns the hypotenuse of x and y as a float | ||||||||
ISNAN(expr) | Returns 1 (true) if expr is not a number by IEEE standard; otherwise, returns 0 (false) | ||||||||
IS_NAN(expr) | Alias for ISNAN | ||||||||
ISINFINITY(expr) | Returns 1 (true) if expr is infinity by IEEE standard; otherwise, returns 0 (false) | ||||||||
IS_INFINITY(expr) | Alias for ISINFINITY | ||||||||
LDEXP(x, exp) | Returns the value of x * 2exp as a double | ||||||||
LDEXPF(x, exp) | Returns the value of x * 2exp as a float | ||||||||
LESSER(expr_a, expr_b) | Returns whichever of expr_a and expr_b has the smaller value, based on typed comparison | ||||||||
LN(expr) | Returns the natural logarithm of expr as a double | ||||||||
LNF(expr) | Returns the natural logarithm of expr as a float | ||||||||
LOG(expr) | Alias for LN | ||||||||
LOG10(expr) | Returns the base-10 logarithm of expr as a double | ||||||||
LOG10F(expr) | Returns the base-10 logarithm of expr as a float | ||||||||
LOG1P(expr) | Returns the natural logarithm of one plus expr as a double | ||||||||
LOG1PF(expr) | Returns the natural logarithm of one plus expr as a float | ||||||||
LOG2(expr) | Returns the binary (base-2) logarithm of expr as a double | ||||||||
LOG2F(expr) | Returns the binary (base-2) logarithm of expr as a float | ||||||||
LOGF(expr) | Alias for LNF | ||||||||
MAX_CONSECUTIVE_BITS(expr) | Calculates the length of the longest series of consecutive 1 bits in the integer expr | ||||||||
MOD(dividend, divisor) | Calculates the remainder after integer division of dividend by divisor | ||||||||
PI() | Returns the value of pi | ||||||||
POW(base, exponent) | Alias for POWER | ||||||||
POWF(base, exponent) | Alias for POWERF | ||||||||
POWER(base, exponent) | Returns base raised to the power of exponent as a double | ||||||||
POWERF(base, exponent) | Returns base raised to the power of exponent as a float | ||||||||
RADIANS(expr) | Returns the conversion of expr (in degrees) to radians as a double | ||||||||
RADIANSF(expr) | Returns the conversion of expr (in degrees) to radians as a float | ||||||||
RAND() | Returns a random floating-point value. | ||||||||
ROUND(expr, scale) | Rounds expr to the nearest decimal number with scale decimal places when scale is a positive number; rounds to the nearest number such that the result has -(scale) zeros to the left of the decimal point when scale is negative; use scale of 0 to round to the nearest integer. Examples:
| ||||||||
SIGN(expr) | Determines whether a number is positive, negative, or zero; returns one of the following three values:
| ||||||||
SIN(expr) | Returns the sine of expr as a double | ||||||||
SINF(expr) | Returns the sine of expr as a float | ||||||||
SINH(expr) | Returns the hyperbolic sine of expr as a double | ||||||||
SINHF(expr) | Returns the hyperbolic sine of expr as a float | ||||||||
SQRT(expr) | Returns the square root of expr as a double | ||||||||
SQRTF(expr) | Returns the square root of expr as a float | ||||||||
TAN(expr) | Returns the tangent of expr as a double | ||||||||
TANF(expr) | Returns the tangent of expr as a float | ||||||||
TANH(expr) | Returns the hyperbolic tangent of expr as a double | ||||||||
TANHF(expr) | Returns the hyperbolic tangent of expr as a float | ||||||||
TRUNCATE(expr, scale) | Rounds expr down to the nearest decimal number with scale decimal places, following the same rules as ROUND. Examples:
|
Null Functions
Some of the following null functions require parameters to be of convertible data types. Note that limited-width (charN) & unlimited-width (non-charN) string types are not convertible.
Function | Description |
---|---|
COALESCE(expr_a, ..., expr_N) | Returns the value of the first expression that is not null starting with expr_a and ending with expr_N. If all are null, then null is returned. All expressions should be of the same or convertible data type. |
IFNULL(expr_a, expr_b) | Returns expr_a if it is not null; otherwise, returns expr_b. Both should be of the same or convertible data type. See Short-Circuiting for error-checking details. |
ISNULL(expr) | Returns 1 if expr is null; otherwise, returns 0 |
IS_NULL(expr) | Synonymous with ISNULL(expr) |
NULLIF(expr_a, expr_b) | Returns null if expr_a equals expr_b; otherwise, returns the value of expr_a; both expressions should be of the same or convertible data type. |
NVL(expr_a, expr_b) | Alias for IFNULL |
NVL2(expr, value_if_not_null, value_if_null) | Evaluates expr: if not null, returns value_if_not_null; if null, returns value_if_null. Both value_if_not_null & value_if_null should be of the same data type as expr or implicitly convertible; see Short-Circuiting for error-checking details. |
REMOVE_NULLABLE(expr) | Alias for ZEROIFNULL |
ZEROIFNULL(expr) | Replaces null values with appropriate values based on the column type (e.g., 0 if numeric column, an empty string if charN column, etc.). Also removes the nullable column property if used to calculate a derived column. |
String Functions
There are three different types of string functions:
CharN Functions
The following functions will only work with fixed-width string columns (VARCHAR(1) - VARCHAR(256)).
Function | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
ASCII(expr) | Returns the ASCII code for the first character in expr | ||||||||
CHAR(expr) | The character represented by the standard ASCII code expr in the range [ 0 - 127 ] | ||||||||
CONCAT(expr_a, expr_b) | Performs a string concatenation of expr_a & expr_b; use nested CONCAT calls to concatenate more than two strings Note The resulting field size of any CONCAT will be a charN field big enough to hold the concatenated fields, e.g., concatenating a char32 column and a char64 column will result in a char128 column. Columns of type char256 cannot be used with CONCAT. | ||||||||
CONCAT_TRUNCATE(expr_a, expr_b) | Returns the concatenation of expr_a and expr_b, truncated at the maximum size of expr_a. For columns, this size is explicit; for string constants, this will be the smallest charN type that can hold the expr_a string. Examples:
| ||||||||
CONTAINS(match_expr, ref_expr) | Returns 1 if ref_expr contains match_expr by string-literal comparison; otherwise, returns 0 | ||||||||
DIFFERENCE(expr_a, expr_b) | Returns a value between 0 and 4 that represents the difference between the sounds of expr_a and expr_b based on the SOUNDEX() value of the strings--a value of 4 is the best possible sound match | ||||||||
EDIT_DISTANCE(expr_a, expr_b) | Returns the Levenshtein edit distance between expr_a and expr_b; the lower the the value, the more similar the two strings are | ||||||||
ENDS_WITH(match_expr, ref_expr) | Returns 1 if ref_expr ends with match_expr by string-literal comparison; otherwise, returns 0 | ||||||||
INITCAP(expr) | Returns expr with the first letter of each word in uppercase | ||||||||
IPV4_PART(expr, part_num) | Returns the octet of the IP address given in expr at the position specified by part_num. Valid part_num values are constants from 1 to 4. Examples:
| ||||||||
IS_IPV4(expr) | Returns 1 if expr is an IPV4 address; returns 0 otherwise | ||||||||
LCASE(expr) | Converts expr to lowercase | ||||||||
LEFT(expr, num_chars) | Returns the leftmost num_chars characters from expr | ||||||||
LENGTH(expr) | Returns the number of characters in expr | ||||||||
LOCATE(match_expr, ref_expr, [start_pos]) | Returns the starting position of the first match of match_expr in ref_expr, starting from position 1 or start_pos (if specified) | ||||||||
LOWER(expr) | Alias for LCASE | ||||||||
LPAD(base_expr, length, pad_expr) | Left pads the given base_expr string with the pad_expr string to the given length of characters. If base_expr is longer than length, the return value is shortened to length characters. If length is larger than 256, it will be truncated to 256. Examples:
| ||||||||
LTRIM(expr) | Removes whitespace from the left side of expr | ||||||||
POSITION(match_expr, ref_expr, [start_pos]) | Alias for LOCATE | ||||||||
REPLACE(ref_expr, match_expr, repl_expr) | Replaces every occurrence of match_expr in ref_expr with repl_expr | ||||||||
REVERSE(expr) | Returns expr with the order of characters reversed. Examples:
| ||||||||
RIGHT(expr, num_chars) | Returns the rightmost num_chars characters from expr | ||||||||
RPAD(base_expr, length, pad_expr) | Right pads the given base_expr string with the pad_expr string to the given length of characters. If base_expr is longer than length, the return value is shortened to length characters. If length is larger than 256, it will be truncated to 256. Examples:
| ||||||||
RTRIM(expr) | Removes whitespace from the right side of expr | ||||||||
SOUNDEX(expr) | Returns a soundex value from expr. Only the first word in the string will be considered in the calculation. Note This is the algorithm used by most programming languages. | ||||||||
SPACE(n) | Returns a string consisting of n space characters. The value of n can only be within the range of 0-256. | ||||||||
SPLIT(expr, delim, group_num) | Splits expr into groups delimited by the delim character and returns the group_num split group. If group_num is positive, groups will be counted from the beginning of expr; if negative, groups will be counted from the end of expr going backwards. Two consecutive delimiters will result in an empty string being added to the list of selectable groups. If no instances of delim exist in expr, the entire string is available at group 1 (and -1). Group 0 returns nothing. Examples:
| ||||||||
STARTS_WITH(match_expr, ref_expr) | Returns 1 if ref_expr starts with match_expr by string-literal comparison; otherwise, returns 0 | ||||||||
STRCMP(expr_a, expr_b) | Compares expr_a to expr_b in a lexicographical sort
| ||||||||
SUBSTR(expr, start_pos, num_chars) | Alias for SUBSTRING | ||||||||
SUBSTRING(expr, start_pos, num_chars) | Returns num_chars characters from the expr, starting at the 1-based start_pos | ||||||||
TRIM(expr) | Removes whitespace from both sides of expr | ||||||||
UCASE(expr) | Converts expr to uppercase | ||||||||
UPPER(expr) | Alias for UCASE |
LIKE
Simple pattern-matching capability is provided through the use of the LIKE clause, for fixed-width string columns (VARCHAR(1) - VARCHAR(256)).
The format of the LIKE clause is as follows:
|
|
This clause matches records where reference expression ref_expr does (or does NOT) match the string value of match_expr. The match is a string literal one, with the following exceptions:
- % matches any string of 0 or more characters
- _ matches any single character
For example, to search for employees whose last name starts with C, ends with a, and has exactly five letters:
|
|
To search for employees whose first name does not start with Brook:
|
|
FILTER_BY_STRING
The FILTER_BY_STRING table function allows for several types of pattern matching for fixed-width string columns (VARCHAR(1) - VARCHAR(256)), as well as full text search capability for unrestricted-width string columns (VARCHAR). Its features are based on the /filter/bystring endpoint.
The basic form of the FILTER_BY_STRING function when called in a SELECT statement follows.
|
|
The basic form of the FILTER_BY_STRING function when called in an EXECUTE FUNCTION statement follows. When called this way, the results are saved in the specified view.
|
|
Parameters | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TABLE_NAME | The data set to filter, which should be passed via query to the INPUT_TABLE function; e.g., to perform a string filter on a column in the customer table, pass a query on the table to the INPUT_TABLE: INPUT_TABLE(SELECT * FROM customer) | ||||||||||||
VIEW_NAME | Name of the view in which the results are to be stored, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria Important This parameter is only applicable when using EXECUTE FUNCTION syntax. | ||||||||||||
COLUMN_NAMES | Comma-separated list of columns to which the filter will be applied. Important Omit this parameter when using the search filter MODE. | ||||||||||||
MODE | Type of string filter to apply:
| ||||||||||||
EXPRESSION | String literal or pattern, depending on MODE selected, to use in filtering string columns | ||||||||||||
OPTIONS | Optional list of string filtering options, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.: KV_PAIRS(case_sensitive = 'false')
|
To see the message text & time for events in the event_log table containing the word ERROR:
|
|
User/Security Functions
Function | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
CURRENT_USER() | Alias for USER | ||||||||
HASH(column[, seed]) | Returns a non-negative integer representing an obfuscated version of column, using the given seed; default seed is 0 | ||||||||
IS_MEMBER(role[, user]) or IS_ROLEMEMBER(role[, user]) | Returns whether the current user (or the given user, if specified) has been assigned the given role, either directly or indirectly:
| ||||||||
MASK(expr, start, length[, char]) | Masks length characters of expr, beginning at the position identified by start, with * characters (or the character specified in char):
| ||||||||
OBFUSCATE(column[, seed]) | Alias for HASH | ||||||||
SHA256(expr) | Returns the hex digits of the SHA-256 hash of the given value expr as a char64 string. | ||||||||
SYSTEM_USER() | Alias for USER | ||||||||
USER() | Returns the username of the current user |
Aggregation Functions
Function | Description |
---|---|
ATTR(expr) | If MIN(expr) = MAX(expr), returns expr; otherwise * |
ARG_MIN(agg_expr, ret_expr) | The value of ret_expr where agg_expr is the minimum value (e.g. ARG_MIN(cost, product_id) returns the product ID of the lowest cost product) |
ARG_MAX(agg_expr, ret_expr) | The value of ret_expr where agg_expr is the maximum value (e.g. ARG_MAX(cost, product_id) returns the product ID of the highest cost product) |
AVG(expr) | Calculates the average value of expr |
CORR(expr1, expr2) | Calculates the correlation coefficient of expr1 and expr2 |
CORRELATION(expr1, expr2) | Alias for CORR |
CORRCOEF(expr1, expr2) | Alias for CORR |
COUNT(*) | Returns the number of records in a table |
COUNT(expr) | Returns the number of non-null data values in expr |
COUNT(DISTINCT expr) | Returns the number of distinct non-null data values in expr |
COV(expr1, expr2) | Alias for COVAR_POP |
COVAR(expr1, expr2) | Alias for COVAR_POP |
COVARIANCE(expr1, expr2) | Alias for COVAR_POP |
COVAR_POP(expr1, expr2) | Calculates the population covariance of expr1 and expr2 |
COVAR_SAMP(expr1, expr2) | Calculates the sample covariance of expr1 and expr2 |
GROUPING(expr) | Used primarily with ROLLUP, CUBE, and GROUPING SETS, to distinguish the source of null values in an aggregated result set, returns whether expr is part of the aggregation set used to calculate the values in a given result set row. Returns 0 if expr is part of the row's aggregation set, 1 if expr is not (meaning that aggregation took place across all expr values). For example, in a ROLLUP(A) operation, there will be two potential rows with null in the result set for column A. One row will contain null values of A aggregated together, and the other will contain null, but be an aggregation over the entire table, irrespective of A values. In this case, GROUPING(A) will return 0 for the null values of A aggregated together (as well as all other grouped A values) and 1 for the row resulting from aggregating across all A values. |
KURT(expr) | Alias for KURTOSIS_POP |
KURTOSIS(expr) | Alias for KURTOSIS_POP |
KURTOSIS_POP(expr) | Calculate the population kurtosis of expr |
KURTOSIS_SAMP(expr) | Calculate the sample kurtosis of expr |
KURT_POP(expr) | Alias for KURTOSIS_POP |
KURT_SAMP(expr) | Alias for KURTOSIS_SAMP |
MAX(expr) | Finds the maximum value of expr |
MEAN(expr) | Alias for AVG |
MIN(expr) | Finds the minimum value of expr |
PRODUCT(expr) | Multiplies all the values of expr |
SKEW(expr) | Alias for SKEWNESS_POP |
SKEWNESS(expr) | Alias for SKEWNESS_POP |
SKEWNESS_POP(expr) | Calculate the population skew of expr |
SKEWNESS_SAMP(expr) | Calculate the sample skew of expr |
SKEW_POP(expr) | Alias for SKEWNESS_POP |
SKEW_SAMP(expr) | Alias for SKEWNESS_SAMP |
STDDEV(expr) | Alias for STDDEV_POP |
STDDEV_POP(expr) | Calculates the population standard deviation of the values of expr |
STDDEV_SAMP(expr) | Calculates the sample standard deviation of the values of expr |
SUM(expr) | Sums all the values of expr |
VAR(expr) | Alias for VAR_POP |
VAR_POP(expr) | Calculates the population variance of the values of expr |
VAR_SAMP(expr) | Calculates the sample variance of the values of expr |
Grouping Functions
Function | Description |
---|---|
ROLLUP(expr) | Calculates n + 1 aggregates for n number of columns in expr |
CUBE(expr) | Calculates 2n aggregates for n number of columns in expr |
GROUPING SETS(expr) | Calculates aggregates for any given aggregates in expr, including ROLLUP() and CUBE() |
Distribution Functions
Distribution functions are column expressions that affect the sharded/replicated nature of the result set of a given query. It may be necessary to force a result set to be distributed in a certain way for a subsequent operation on that result set to be performant.
Important
Employing these functions will prevent any automatic resharding of data to allow the query to succeed. Use only when a better query plan (with respect to data distribution) is known than any the system can devise.
Function | Description |
---|---|
KI_REPLICATE() | Force a scalar result set to be replicated (query with no GROUP BY) |
KI_REPLICATE_GROUP_BY(0) | Force an aggregated result set to be replicated (query with GROUP BY) |
KI_MATCH_COLUMN(0) | Aligns the column count of queries that are part of a UNION, INTERSECT or EXCEPT with a query whose column list has been amended with either KI_REPLICATE_GROUP_BY or KI_SHARD_KEY |
KI_SHARD_KEY(<column list>) | Force the result set to be sharded on the given columns. This will override any implicitly-derived or explicitly-defined replication status the table would have had. Note The column(s) listed in column list must also appear in the SELECT list; KI_SHARD_KEY merely identifies which of the selected columns should be used as the shard key. |
Sharding Example
For example, a query for all employees and their total employees managed, including employees who don't manage anyone, could employ a UNION like this:
|
|
In this example, the employee table is sharded on id. Since the first part of the UNION aggregates on manager_id, the result will be replicated. The second part of the UNION does no aggregation and includes the shard key in the SELECT list; the result of this will be sharded.
Prior to Kinetica v7.0, the limitation of UNION operations requiring that both parts of a UNION have to be distributed the same way, would make the query fail, with the following message:
GPUdb Error: either all input tables must be replicated or all input tables must be non-replicated
In order to work around this limitation, a distribution function could be used. This is what the SQL engine of Kinetica, versions 7.0 or later, do automatically.
One option is to shard the first part of the UNION to match the second part:
|
|
Here, the distribution function KI_SHARD_KEY is used to make the selected manager_id column the new shard key for the first part of the UNION. Now, the shard key for the first part of the UNION (manager_id) aligns with the shard key for the second part (id), and the query succeeds. Note the use of KI_MATCH_COLUMN, which aligns the selected column lists on each side of the UNION. Without this matching distribution function, the UNION would appear to be merging three columns from the first part of the query into two columns in the second part and would fail.
Note
The manager_id column must exist in the SELECT list in order for the KI_SHARD_KEY function to designate it as the shard key.
Predicates
Predicate are generally used within a SQL WHERE clause to query records. They compare the values of two or more expressions; whenever a record meets the criteria defined in a predicate clause it will be marked as eligible to be part of the query result set. If it meets all predicate clauses defined within a query, it will be returned in the result set.
A single predicate clause may use a simple predicate operator to compare the values of two expressions or a more complex predicate clause form. A compound predicate clause uses a compound predicate operator to link together multiple predicate clauses to further refine a result set.
Unlimited-width (non-charN) strings can only be used within equality-based predicates, e.g. =, IN, etc.
Predicate Operators
- = equality
- != or <> inequality
- < less than
- <= less than or equal to
- > greater than
- >= greater than or equal to
Predicate Clauses
In the following list of predicate clauses, ref_expr is the reference expression to apply the predicate to; note that EXISTS has no reference expression.
Predicate Clause | Description |
---|---|
<expr_a> <pred_op> <expr_b> | Matches records where expr_a relates to expr_b according to predicate operator pred_op. |
<ref_expr> <pred_op> ALL (<select statement>) | Matches records where the reference expression ref_expr relates to all of the results of select statement according to the predicate operator pred_op |
<ref_expr> <pred_op> ANY (<select statement>) | Matches records where the reference expression ref_expr relates to any of the results of select statement according to the predicate operator pred_op |
<ref_expr> [NOT] BETWEEN <begin_expr> AND <end_expr> | Matches records where the reference expression ref_expr is (or is NOT) between the values of begin_expr and end_expr |
<ref_expr> [NOT] IN (<match_list>) | Matches records where the reference expression ref_expr is (or is NOT) in the match_list list of match values. The list can either be a comma-separated list of terms/expressions or the result of a SELECT statement. |
<ref_expr> IS [NOT] NULL | Matches records where the reference expression ref_expr is (or is NOT) null. |
[NOT] EXISTS (<select statement>) | Matches records where select statement returns 1 or more records. |
Compound Predicate Operators
Predicate Operator | Description |
---|---|
<pred_a> AND <pred_b> | Matches records where both pred_a & pred_b are true |
<pred_a> OR <pred_b> | Matches records where either pred_a or pred_b is true |
NOT <pred_b> | Matches records where pred is false |
Subqueries
Hints
Hint strings (KI_HINT) can be added as comments within queries, and affect just the query in which they appear. They will override the corresponding client & server settings (when such settings exist). For example:
|
|
Hint | Description |
---|---|
KI_HINT_BATCH_SIZE(n) | Use an ingest batch size of n records. Default: 10,000. Only applicable when issuing INSERT statements. |
KI_HINT_CHUNK_SIZE(n) | Use chunk sizes of n records per chunk within result sets. Suffixes of K & M can be used to represent thousands or millions of records; e.g., 20K, 50M. |
KI_HINT_COMPARABLE_EXPLAIN | Simplify EXPLAIN output removing unique identifiers from table names. Note Potentially unsafe for use in a multiuser environment, as the table names produced will not be unique and may collide with the table names of other users who execute EXPLAIN with this hint. |
KI_HINT_DICT_PROJECTION | Retain the dictionary encoding attributes of source columns in a projection result set. |
KI_HINT_DISTRIBUTED_OPERATIONS | Reshard data when doing so would be the only way to process one or more operations in this query. |
KI_HINT_DONT_COMBINE | Don’t combine joins and unions for this query. |
KI_HINT_GROUP_BY_FORCE_REPLICATED | Make all result tables within a single query replicated; useful when meeting the input table requirements of JOIN, UNION, etc. in a query containing aggregated subqueries which generate differently-sharded result tables. |
KI_HINT_GROUP_BY_PK | Create primary keys for all GROUP BY result sets on the grouped-upon columns/expressions within a given query; often used to create a primary key on the result of a CREATE TABLE...AS that ends in a GROUP BY, and can also make materialized views containing grouping operations more performant. Note If any of the grouped-on expressions are nullable, no primary key will be applied. |
KI_HINT_HAS_HEADER | Assume the first line of the source CSV file has a Kinetica header row. Only used with CSV ingestion. |
KI_HINT_INDEX(<column list>) | Create an index on each of the comma-separated columns in the given list; often used with CREATE TABLE...AS to create an index on a persisted result set. |
KI_HINT_JOBID_PREFIX(x) | Tag corresponding database job names(s) with x; e.g., KI_HINT_JOBID_PREFIX(tag) will result in job names like ODBC_tag_01234567-89ab-cdef-0123-456789abcdef. |
KI_HINT_KEEP_TEMP_TABLES | Don’t erase temp tables created by this query. |
KI_HINT_NO_COST_BASED_OPTIMIZATION | Don't use the cost-based optimizer when calculating the query plan. |
KI_HINT_NO_DICT_PROJECTION | Don't retain the dictionary encoding attributes of source columns in a projection result set. |
KI_HINT_NO_DISTRIBUTED_OPERATIONS | Don't reshard data when doing so would be the only way to process one or more operations in this query. |
KI_HINT_NO_HEADER | Assume the source CSV file has no Kinetica header row. Only used with CSV ingestion. |
KI_HINT_NO_JOIN_COUNT | Optimize joins by not calculating intermediate set counts. |
KI_HINT_NO_LATE_MATERIALIZATION | Force the materialization of intermediary result sets. |
KI_HINT_NO_PARALLEL_EXECUTION | Execute all components of this query in series. |
KI_HINT_NO_PLAN_CACHE | Don't cache the query plan calculated for this query. |
KI_HINT_NO_RULE_BASED_OPTIMIZATION | Don't use the rule-based optimizer when calculating the query plan. |
KI_HINT_NO_VALIDATE_CHANGE | Don't fail an ALTER TABLE command when changing column types/sizes and the column data is too long/large. Truncate the data instead and allow the modification to succeed. |
KI_HINT_PROJECT_MATERIALIZED_VIEW | Force the materialization of a materialized view. Some materialized views containing JOIN clauses will be backed by a native join view. This is done to improve the performance of materialized view refreshes and reduce memory usage at the cost of reduced query performance. This hint will induce the reverse of this trade-off -- increased query performance at the cost of reduced refresh performance and increased memory usage. |
KI_HINT_REPL_SYNC | Instruct the target database to treat this statement as one that should be run synchronously across all HA clusters in its ring. See High Availability Operation Handling for details. Note The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details. |
KI_HINT_REPL_ASYNC | Instruct the target database to treat this statement as one that should be run asynchronously across all HA clusters in its ring. See High Availability Operation Handling for details. Note The target database must be configured for high-availability for this to have an effect. See High Availability Configuration & Management for details. |
KI_HINT_TRUNCATE_STRINGS | Truncate all strings being inserted into restricted-width (charN) columns to their max width. Used with any INSERT INTO statement, including CSV ingestion. |
KI_HINT_UPDATE_ON_EXISTING_PK | Change the record collision policy for inserting into a table with a primary key to an upsert scheme; any existing table record with a primary key that matches a record being inserted will be replaced by that new record. Without this hint, the record being inserted will be discarded. If the specified table does not have a primary key, then this hint is ignored. |
EXPLAIN
Outputs the execution plan of a given SQL statement.
Tip
For the visual explain plan utility in GAdmin, see the Explain feature under Query.
The general form of the command is:
|
|
If LOGICAL is specified, the algebraic execution tree of the statement is output. Otherwise, the physical execution plan will be output.
Each supporting API endpoint call that is made in servicing the request is listed as an execution plan step in the output, along with any input or output tables associated with the call and the prior plan execution steps on which a given execution step depends.
The following options can be specified:
- LOGICAL - outputs the algebraic execution tree of the statement
- PHYSICAL - (default) outputs the physical execution plan with the
following endpoint-level details per step:
- ID - execution step number
- ENDPOINT - name of native API endpoint called
- INPUT_TABLES - input tables used by the endpoint (if any)
- OUTPUT_TABLE - output table created by the endpoint (if any)
- DEPENDENCIES - list of prior execution steps upon which this step depends
- ANALYZE - same as PHYSICAL, including additional run-time details:
- RUN_TIME - execution time of each endpoint call
- RESULT_ROWS - number of records produced in the endpoint call
- VERBOSE - same as PHYSICAL, including endpoint parameter details:
- COLUMNS - columns passed to the endpoint call
- EXPRESSIONS - expressions passed to the endpoint call
- OPTIONS - option keys & values passed to the endpoint call
- LAST_USE_TABLES - list of tables that will not be used by any following execution step
- ADDITIONAL_INFO - other parameters passed to the endpoint call
- VERBOSE ANALYZE - same as VERBOSE & ANALYZE together, including the execution plan for any joins contained within the query
- FORMAT JSON - outputs the result in JSON format
- FORMAT TABLE - (default) outputs the result in tabular format
Important
Specifying ANALYZE will cause the statement to be executed in order to collect run-time statistics on the endpoint calls made.
For example, to output the execution plan for a query that aggregates the number of taxi rides between boroughs (using KI_HINT_COMPARABLE_EXPLAIN hint to simplify output):
|
|
The execution plan is listed in table format, as follows:
|
|
If there is an error processing a query, the error can be returned in the JSON-formatted execution plan:
|
|
The execution plan is listed in JSON format with the query error, as follows:
|
|
Data Definition (DDL)
Kinetica supports the basic notion of SQL tables as containers of one or more columns of data. Tables can be created, altered, and dropped.
Table & column names must adhere to the supported naming criteria.
A column definition consists of a column type and optional column size, column properties, and nullability. Column properties are used to optimize data storage & speed.
The format of a defined column is column name, followed by column definition. A column definition is column type optionally followed by any column size limit or column properties all enclosed in parentheses, followed by an optional nullability statement:
|
|
This format applies to any DDL statement requiring the definition of columns, like CREATE TABLE and ALTER TABLE (when adding/modifying a column).
For example, the following are valid defined columns:
|
|
Column Types
Category | Data Type | Description | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number | TINYINT | Effective type: int8 | |||||||||||||||||||||||
SMALLINT | Effective type: int16 | ||||||||||||||||||||||||
INTEGER | Effective type: integer | ||||||||||||||||||||||||
INT | Alias for INTEGER | ||||||||||||||||||||||||
BIGINT | Effective type: long | ||||||||||||||||||||||||
LONG | Alias for BIGINT | ||||||||||||||||||||||||
UNSIGNED BIGINT | Effective type: ulong | ||||||||||||||||||||||||
UNSIGNED LONG | Alias for UNSIGNED BIGINT | ||||||||||||||||||||||||
REAL | Effective type: float | ||||||||||||||||||||||||
DOUBLE | Effective type: double | ||||||||||||||||||||||||
FLOAT | Alias for REAL | ||||||||||||||||||||||||
DECIMAL | Alias for BIGINT | ||||||||||||||||||||||||
DECIMAL(P,S) | Effective type: varies by P & S
| ||||||||||||||||||||||||
NUMERIC | Alias for DECIMAL | ||||||||||||||||||||||||
String | VARCHAR | Effective type: string; character limit based on configured system property | |||||||||||||||||||||||
VARCHAR(N) | Effective type: char1 - char256 or string, whichever is large enough to hold N characters | ||||||||||||||||||||||||
STRING | Alias for VARCHAR | ||||||||||||||||||||||||
TEXT | Alias for VARCHAR | ||||||||||||||||||||||||
IPV4 | Shorthand for VARCHAR(IPV4), which applies the IPV4 column property | ||||||||||||||||||||||||
UUID | Effective type uuid | ||||||||||||||||||||||||
Date/Time | DATE | Effective type: date | |||||||||||||||||||||||
DATETIME | Effective type: datetime | ||||||||||||||||||||||||
TIME | Effective type: time | ||||||||||||||||||||||||
TIMESTAMP | Effective type: timestamp | ||||||||||||||||||||||||
TYPE_DATE | Alias for DATE | ||||||||||||||||||||||||
TYPE_TIME | Alias for TIME | ||||||||||||||||||||||||
TYPE_TIMESTAMP | Alias for TIMESTAMP | ||||||||||||||||||||||||
Binary | BLOB | Effective type: bytes | |||||||||||||||||||||||
BYTES | Alias for BLOB | ||||||||||||||||||||||||
BINARY | Alias for BLOB | ||||||||||||||||||||||||
Geospatial | GEOMETRY | Effective type: wkt | |||||||||||||||||||||||
ST_GEOMETRY | Alias for GEOMETRY | ||||||||||||||||||||||||
WKT | Alias for GEOMETRY |
Column Properties
Property | Description |
---|---|
DICT | Applies dict data handling to a column, enabling dictionary-encoding of its values; see Dictionary Encoding for details |
DISK_OPTIMIZED | Applies disk-optimized data handling to a column |
IPV4 | Treats the associated string-based column as an IPv4 address |
LZ4 | Applies LZ4 compression to a column |
LZ4HC | Applies LZ4HC compression to a column |
PRIMARY_KEY | Treats the associated column as a primary key, or part of a composite primary key if other columns also have this property |
SHARD_KEY | Treats the associated column as a shard key, or part of a composite shard key if other columns also have this property |
SNAPPY | Applies Snappy compression to a column |
STORE_ONLY | Applies store-only data handling to a column |
TEXT_SEARCH | Applies text-searchability to a column |
INIT_WITH_NOW | For DATE, TIME, DATETIME, and TIMESTAMP column types, enables the database to use NOW() as the value when the column is not specified in an INSERT statement; also directs the database to replace empty strings (for DATE, TIME, & DATETIME columns) and invalid timestamps (for TIMESTAMP columns) with NOW() |
INIT_WITH_UUID | For UUID column type, enables the database to use a universally unique identifier (UUID) as the value when the column is not specified in an INSERT statement; also directs the database to replace empty strings (for UUID columns) with UUIDs |
CREATE SCHEMA
Schemas are logical containers for tables and views. In order to place a table or view in a schema, the schema must be created first--schemas will not be automatically created when specified in CREATE TABLE or similar calls.
To create one:
|
|
For example, to create a schema:
|
|
ALTER SCHEMA
The name of a schema is the only property that can be altered.
Rename Schema
A schema can be renamed.
|
|
For example, to rename a schema:
|
|
DROP SCHEMA
When removing a schema from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the schema to drop doesn't exist; if IF EXISTS is specified, no error will be reported. Also, an error will be reported if the schema to drop contains any tables or views; if CASCADE is specified, the schema and all tables and views within it will be removed.
|
|
For example, to drop a schema, including its contained tables and views:
|
|
SHOW CREATE SCHEMA
Outputs the DDL statement required to reconstruct the given schema:
|
|
Note
The response to SHOW CREATE SCHEMA is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.
For example, to output the DDL for the example schema created in the CREATE SCHEMA section:
|
|
|
|
SET CURRENT SCHEMA
To override the default schema with a different schema, the following syntax can be used:
|
|
Omitting schema name will switch back to the original default schema.
CREATE TABLE
Creates a new table in the specified schema. If no schema is specified, the table will be created in the calling user's default schema.
The table & column names used must adhere to the supported naming criteria.
The basic form of the supported CREATE TABLE statement follows.
|
|
Parameters | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
OR REPLACE | Any existing table or view with the same name will be dropped before creating this one | ||||||||||
REPLICATED | The table will be distributed within the database as a replicated table | ||||||||||
TEMP | If the database is restarted, the table will be removed | ||||||||||
<schema name> | Name of the schema containing the table to create | ||||||||||
<table name> | Name of the table to create | ||||||||||
<column name> | Name of a column to create within the table | ||||||||||
<column definition> | Definition of the column associated with <column name>; see Data Definition (DDL) for column format | ||||||||||
PRIMARY KEY (<column list>) | Optional primary key specification clause, where <column list> is a comma-separated list of columns to use as the primary key for the table | ||||||||||
SHARD KEY (<column list>) | Optional shard key specification clause, where <column list> is a comma-separated list of columns to use as the shard key for the table | ||||||||||
FOREIGN KEY... | Optional comma-separated set of foreign key specification clauses, with the following parameters:
| ||||||||||
<partition clause> | Defines a partitioning scheme for the table to create | ||||||||||
<tier strategy clause> | Defines the tier strategy for the table to create | ||||||||||
<index clause> | Applies any number of column indexes or chunk skip indexes to the table to create | ||||||||||
<table property clause> | Assigns properties, from a subset of those available, to the table to create |
For example, to create a table with various column types and properties:
|
|
Partition Clause
A table can be further segmented into partitions. The supported partition types are:
See Partitioning for details.
Range Partitioning
The general format for the range partition clause is:
|
|
The partition definition clause, PARTITIONS, is optional, though it is recommended to define partitions at table creation time, when feasible.
Warning
Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.
For example, to create a range-partitioned table with the following criteria:
- partitioned on the date/time of the order
- partitions for years:
- 2014 - 2016
- 2017
- 2018
- 2019
- records not in that range go to the default partition
|
|
Interval Partitioning
The general format for the interval partition clause is:
|
|
For example, to create an interval-partitioned table with the following criteria:
- partitioned on the date/time of the order
- one partition for each year from 2014 on
- later year partitions are added as necessary
- records prior to 2014 go to the default partition
|
|
To create an interval-partitioned table with the following criteria:
- partitioned on the date/time of the order
- one partition for each day from January 1st, 2014 on
- later day partitions are added as necessary
- records prior to 2014 go to the default partition
|
|
The same interval-partitioned scheme above can be created using the timestamp column directly, with the help of the INTERVAL function (described in the Date/Time Functions section):
|
|
This scheme can be easily modified to create an hourly partition instead:
|
|
List Partitioning
The list partition clause has two forms:
Manual
The general format for the manual list partition clause is:
|
|
The partition definition clause, PARTITIONS, is optional, though it is recommended to define partitions at table creation time, when feasible.
Warning
Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.
For example, to create a manual list-partitioned table with the following criteria:
- partitioned on the date/time of the order
- partitions for years:
- 2014 - 2016
- 2017
- 2018
- 2019
- records not in that list go to the default partition
|
|
To create a manual list-partitioned table with a multi-column key and the following criteria:
- partitioned on the date/time of the order
- each partition corresponds to a unique year & month pair
- partitions for years/months:
- February 2016 & March 2016
- March 2020
- records not in that list go to the default partition
|
|
Automatic
The general format for the automatic list partition clause is:
|
|
To create an automatic list-partitioned table with the following criteria:
- partitioned on the date/time of the order
- one partition for each unique year & month across all orders
- partitions are added as necessary
|
|
Hash Partitioning
The general format for the hash partition clause is:
|
|
To create a hash-partitioned table with the following criteria:
- partitioned on the date/time of the order
- distributed among the fixed set of partitions, based on the hash of the year & month of the order
- 10 partitions
|
|
Series Partitioning
The general format for the series partition clause is:
|
|
The PERCENT_FULL should be an integer between 1 and 100; the default is 50%.
To create a series-partitioned table with the following criteria:
- partitioned on the customer of each order
- partitions with closed key sets will contain all orders from a set of unique customers
- 50% fill threshold
|
|
To create a series-partitioned track table with the following criteria:
- partitioned on the track ID
- partitions with closed key sets will contain all points from a unique set of tracks
- 25% fill threshold
|
|
Tier Strategy Clause
A table can have a tier strategy specified at creation time. If not assigned a tier strategy upon creation, a default tier strategy will be assigned.
The general format for the tier strategy clause is:
|
|
For example, to create a customer_order table with an above-average eviction priority in the RAM Tier:
|
|
If not specified, the default tier strategy will be assigned:
|
|
|
|
|
|
Note
The response to SHOW CREATE TABLE is a single-record result set with the DDL statement as the value in the DDL column, shown here with the column separators returned by kisql.
Index Clause
A table can have any number of indexes applied to any of its columns at creation time.
The two types of indexes supported are:
The general format for the index clause is:
|
|
For example, to create a table with a column index on dept_id and a chunk skip index on id:
|
|
Table Property Clause
A subset of table properties can be applied to the table at creation time.
The general format for the table property clause is:
|
|
Available table properties include:
Property | Description |
---|---|
CHUNK SIZE | Size of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold |
TTL | The time-to-live (TTL) for the table; if not set, the table will not expire |
For example, to create a table with up to 1,000,000 records per chunk and that will expire in 15 minutes:
|
|
CREATE EXTERNAL TABLE
Creates a new external table, which is a database object whose source data is located in one or more files, external to the database. The source data can be located either on the cluster, accessible to the database; or remotely, accessible via a data source. Its use with ring resiliency has additional considerations.
The external table & column names used must adhere to the supported naming criteria.
The basic form of the supported CREATE EXTERNAL TABLE statement follows.
|
|
Parameters | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
OR REPLACE | Any existing table or view with the same name will be dropped before creating this one | ||||||||||
REPLICATED | The external table will be distributed within the database as a replicated table | ||||||||||
TEMP | If the database is restarted, the external table will be removed | ||||||||||
LOGICAL | External data will not be loaded into the database; the data will be retrieved from the source upon servicing each query against the external table. This mode ensures queries on the external table will always return the most current source data, though there will be a performance penalty for reparsing & reloading the data from source files upon each query. | ||||||||||
MATERIALIZED | Loads a copy of the external data into the database, refreshed on demand; this is the default external table type | ||||||||||
<schema name> | Name of the schema to contain the external table associated with the source data | ||||||||||
<table name> | Name of the external table to associate with the source data; must meet the required naming criteria | ||||||||||
<table definition clause> | Optional clause, defining the structure for the external table associated with the source data | ||||||||||
FILE PATHS <file paths> | External data file specification clause, where <file paths> is a comma-separated list of single-quoted file paths from which data will be loaded; wildcards (*) can be used to specify a group of files. If a data source is specified in the external table load options these file paths must resolve to accessible files at that data source location. Also, wildcards will only work when used within the file name, not the path. If no data source is specified, the files are assumed to be local to the database and must
all be accessible to the For example, if external_files_directory is set to
| ||||||||||
FORMAT | Optional indicator of source file type; will be inferred from file extension if not given. Supported formats include:
| ||||||||||
WITH OPTIONS | Optional indicator that a comma-delimited list of connection & global option/value assignments will follow. See Load Options for the complete list of options | ||||||||||
<partition clause> | Optional clause, defining a partitioning scheme for the external table associated with the source data | ||||||||||
<tier strategy clause> | Optional clause, defining the tier strategy for the external table associated with the source data | ||||||||||
<index clause> | Optional clause, applying any number of column indexes or chunk skip indexes to the external table associated with the source data | ||||||||||
<table property clause> | Optional clause, assigning properties, from a subset of those available, to the external table associated with the source data |
To create an external table with the following features, using a local file as the source of data:
- External table named ext_employee in the example schema
- External source is a Parquet file located on the head node at data/employee.parquet, relative to the configured external files directory
- External table has a primary key on the id column
- Data is not refreshed on database startup
|
|
To create an external table with the following features, using a data source as the source of data:
- External table named ext_product in the example schema
- External source is a data source named product_ds
- Source is a file named products.csv
- Data is refreshed on database startup
|
|
Delimited Text Options
The following options can be specified when loading data from delimited text files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.
Option | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
COMMENT = '<string>' | Treat lines in the source file(s) that begin with string as comments and skip. The default comment marker is #. | ||||||||||||||||
DELIMITER = '<char>' | Use char as the source file field delimiter. The default delimiter is a comma, unless a source file has one of these extensions:
See Delimited Text Option Characters for allowed characters. | ||||||||||||||||
ESCAPE = '<char>' | Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:
For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database. The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not. There is no default escape character. | ||||||||||||||||
HEADER DELIMITER = '<char>' | Use char as the source file header field name/property delimiter, when the source file header contains both names and properties. This is largely specific to the Kinetica export to delimited text feature, which will, within each field's header, contain the field name and any associated properties, delimited by the pipe | character. An example Kinetica header in a CSV file: The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters. Note The DELIMITER character will still be used to separate field name/property sets from each other in the header row | ||||||||||||||||
INCLUDES HEADER = <TRUE|FALSE> | Declare that the source file(s) will or will not have a header. If not given, that determination will be intuited. | ||||||||||||||||
NULL = '<string>' | Treat string as the indicator of a null source field value. The default is the empty string. | ||||||||||||||||
QUOTE = '<char>' | Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given. The default is the " (double-quote) character. See Delimited Text Option Characters for allowed characters. |
Delimited Text Option Characters
For DELIMITER, HEADER DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:
Escaped Char | Corresponding Source File Character |
---|---|
'' | Single quote |
\a | ASCII bell |
\b | ASCII backspace |
\f | ASCII form feed |
\t | ASCII horizontal tab |
\v | ASCII vertical tab |
For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.
Load Options
The following options can be specified to modify the way data is loaded (or not loaded) into the target table can be specified.
Option | Description | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BATCH SIZE | Use an ingest batch size of the given number of records. The default batch size is 10,000. | ||||||||||||||||||||||||||
COLUMN FORMATS = '<string>' | Use the given type-specific formatting for the given column when parsing source data being loaded into that column. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string. Supported column types include:
For example, to make the YYYY.MM.DD format for loading source data into date column d and HH:MM:SS format for loading source data into time column t: { "d": {"date": "%Y.%m.%d"}, "t": {"time": "%H:%M:%S"} } | ||||||||||||||||||||||||||
DATA SOURCE | Load data from the given data source. | ||||||||||||||||||||||||||
DEFAULT COLUMN FORMATS = '<string>' | Use the given formats for source data being loaded into target table columns with the corresponding column types. This should be a map of target column type to source format for data being loaded into columns of that type, formatted as a JSON string. Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option. For example, to make the default format for loading source data dates in the form YYYY.MM.DD and times in the form HH:MM:SS: { "date": "%Y.%m.%d", "time": "%H:%M:%S", "datetime": "%Y.%m.%d %H:%M:%S" } | ||||||||||||||||||||||||||
FIELDS MAPPED BY < POSITION(<col#s>) | NAME(<field names>) > | Choose a comma-separated list of fields from the source file(s) to load, specifying fields by either position or name. If loading by name, the source file field names must match the target table column names exactly. Note When specifying source data file fields, the set of source data file fields must align, in type & number, with the target table columns into which data will be loaded. Important Field mapping by position is not supported for Parquet files. | ||||||||||||||||||||||||||
INGESTION MODE | Whether to do a full ingest of the data or perform a dry run or type inference instead. The default mode is FULL.
| ||||||||||||||||||||||||||
LOADING MODE | Use one of the following distribution schemes to load data files. The default mode is HEAD.
| ||||||||||||||||||||||||||
ON ERROR | When an error is encountered loading a record, handle it using one of the following modes. The default mode is PERMISSIVE.
| ||||||||||||||||||||||||||
REFRESH ON START | Whether to refresh the external table data upon restart of the database. Only relevant for materialized external tables. The default value is FALSE.
|
Table Definition Clause
The table definition clause allows for an explicit local table structure to be defined, irrespective of the source data type. This specification mirrors that of CREATE TABLE.
The general format for the table definition clause is:
|
|
See Data Definition (DDL) for column format.
Partition Clause
An external table can be further segmented into partitions.
The supported Partition Clause syntax & features are the same as those in the CREATE TABLE Partition Clause.
Tier Strategy Clause
An external table can have a tier strategy specified at creation time. If not assigned a tier strategy upon creation, a default tier strategy will be assigned.
The supported Tier Strategy Clause syntax & features are the same as those in the CREATE TABLE Tier Strategy Clause.
Index Clause
An external table can have any number of indexes applied to any of its columns at creation time.
The supported Index Clause syntax & features are the same as those in the CREATE TABLE Index Clause.
Table Property Clause
A subset of table properties can be applied to the external table associated with the external data at creation time.
The supported Table Property Clause syntax & features are the same as those in the CREATE TABLE Table Property Clause.
CREATE TABLE...AS
Creates a new table from the given query in the specified schema. If no schema is specified, the table will be created in the user's default schema.
The table and any column aliases used must adhere to the supported naming criteria.
The general format is:
|
|
Parameters/Keys | Description |
---|---|
OR REPLACE | Any existing table or view with the same name will be dropped before creating this one |
REPLICATED | The table will be distributed within the database as a replicated table |
TEMP | If the database is restarted, the table will be removed |
<schema name> | Name of the schema containing the table to create |
<table name> | Name of the table to create |
<select statement> | The query that will define both the initial structure and content of the created table |
While primary keys & foreign keys are not transferred to the new table, shard keys will be, if the column(s) composing them are part of the SELECT list.
The following can be applied to <select statement> to affect the resulting table:
Keyword | Type | Description |
---|---|---|
KI_HINT_GROUP_BY_PK | hint | Creates a primary keys on the columns in the GROUP BY clause if the outermost SELECT statement contains a GROUP BY |
KI_HINT_INDEX(column list) | hint | Indexes each of the columns specified in the column list |
KI_SHARD_KEY(column list) | pseudo-function | Shards the result table with a shard key composed of all columns in the specified column list |
For example, to create a replicated temporary table that is a copy of an existing table, failing if a table with the same name as the target table already exists:
|
|
To create a permanent table with columns a, b, c, & d a new shard key on columns a & b, and an index on column d, replacing a table with the same name as the target table, if it exists:
|
|
To copy a table with columns a, b, c, & d, preserving the primary key on a, b, & c, and the foreign key from d; a new table must be created to match the schema of the old one and then records can be copied from the old one to the new one:
|
|
|
|
Note
This create/insert process is necessary, as neither primary keys nor foreign keys can be preserved through hints.
See Limitations for other restrictions.
ALTER TABLE
Any of the following facets of a table can be altered:
- Name
- Schema
- Global Access Mode
- TTL
- Columns
- Column Indexes
- Chunk Skip Indexes
- Foreign Keys
- Partitions
- Tier Strategy
Rename Table
A table can be renamed, following the supported naming criteria.
|
|
Move Table
A table can be moved from one schema to another.
The general form of the command is:
|
|
For example, to move the sales_2017 table from the example_olap schema to the example_archive schema:
|
|
Set Access Mode
A table can have its global accessibility modified for all users in the system, independently from and further restricting any role-based access controls in place. Note that changing the access mode cannot widen access for users not already granted access; it can only narrow access for those who already have access. This setting will also trump administrative access to a table.
|
|
Set TTL
A table's time-to-live (TTL) can be altered.
|
|
For example, to set a TTL of 7 minutes on a table:
|
|
To set a table to never expire by TTL timeout:
|
|
Add Column
A column can be added, specifying a column definition.
A new column can have its values initially populated through the use of the DEFAULT keyword. These values can either be a string/numeric constant or the name of an existing column in the table from which values can be copied into the new column. This default value is only in effect for the column creation; the new column will have no default value after that.
|
|
Note
Column compression must be applied after a new column is added; see Compress Column for syntax.
Examples
To add, to the employee table, a salary column that is a non-nullable, store-only, 10-digit number field containing 2 decimal places with a default value of 0:
|
|
To add, to the employee table, a category column that is a nullable, dictionary-encoded, 32-character text field:
|
|
To add, to the employee table, a bio column that is a nullable, text-searchable, disk-optimized, unrestricted-width text field:
|
|
Rename Column
An existing column can be renamed:
|
|
For example, to rename a column:
|
|
Modify Column
A column can have its column definition modified, affecting column type, column size, column properties, and nullability.
If a column is modified to be non-nullable, it will be populated with default values--empty string for string fields and 0 for numeric fields.
Either of the following can be used to modify a column:
|
|
|
|
Note
Column compression must be applied after an existing column is modified; see Compress Column for syntax.
Examples
To change, in the employee table, the first_name column to one that is a non-nullable, dictionary-encoded, 50-character text field:
|
|
Compress Column
A column can have its data compressed in memory.
The general form to alter a column's compression setting is:
|
|
For example, to use LZ4 compression on a column:
|
|
To use no compression on a column:
|
|
Drop Column
An existing column can be removed from a table:
|
|
Add Column Index
A column (attribute) index can be added to a table column in order to improve the performance of operations whose expressions contain relational operators against the column. See Limitations for restrictions.
|
|
For example, to index the employee table's department ID column:
|
|
Drop Column Index
An existing column (attribute) index can be removed from a table:
|
|
For example, to drop the index on the employee table's department ID column:
|
|
Add Chunk Skip Index
A chunk skip index can be added to a table column in order to improve the performance of operations containing equality-based filters against the column. See Limitations for restrictions.
|
|
For example, to index the employee table's department ID column:
|
|
Drop Chunk Skip Index
An existing chunk skip index can be removed from a table:
|
|
For example, to drop the chunk skip index on the employee table's department ID column:
|
|
Add Foreign Key
A foreign key can be added to any column or set of columns not marked as store-only in order to improve the performance of join operations between the table being altered and the table referenced in the foreign key.
|
|
For example, to add a foreign key on the employee table's department ID column, linking it to the department table's department ID column:
|
|
Drop Foreign Key
An existing foreign key can be removed from a table, either by the name (alias) given to it during creation or by its definition:
By name:
1 2
ALTER TABLE [<schema name>.]<table name> DROP FOREIGN KEY <foreign key name>
By definition:
1 2
ALTER TABLE [<schema name>.]<table name> DROP FOREIGN KEY (<column name>,...) REFERENCES <foreign table name>(<foreign column name>,...)
For example, to drop the foreign key on the employee table's department ID column:
- By name:
|
|
- By definition:
|
|
Add Partition
A partition can be added to a range-partitioned or list-partitioned table.
Warning
Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.
Range Partition
The new partition can be given a minimum bound (inclusive) and a maximum bound (exclusive). If the new partition would come before an existing partition, omitting the maximum bound would cause the new partition to take on the nearest following existing partition's minimum bound as its maximum bound. If the new partition would come after an existing partition, omitting the minimum bound would cause the new partition to take on the nearest preceding partition's maximum bound as its minimum bound. If no partitions are present in the table, the new partition will have to declare both a minimum and maximum bound.
|
|
For example, to add a partition to the customer_order_range_by_year table, containing all records with a partition key less than 2020 and greater than or equal to the maximum bound of the nearest preceding partition:
|
|
List Partition
The new partition can be given a list of values to match against the partition key values of incoming records.
|
|
For example, to add a partition to the customer_order_manual_list_partition_by_year table, containing all records from 2020:
|
|
For example, to add a partition to the customer_order_manual_list_partition_by_year_and_month table, containing all records from February 2020 & April 2020:
|
|
Remove Partition
An existing partition can be removed from a range-partitioned or list-partitioned table, sending all data contained within that partition back to the default partition.
|
|
For example, to remove a partition named order_2017 from the customer_order_range_by_year table:
|
|
Delete Partition
An existing partition can be dropped from a range-partitioned or list-partitioned table, deleting all data contained within it.
|
|
For example, to drop a partition named order_2014_2016 from the customer_order_range_by_year table, deleting all data within that partition:
|
|
Set Tier Strategy
A table's eviction priorities can be adjusted by setting its tier strategy.
|
|
For example, to set the customer_order table's tier strategy, to one with a below-average eviction priority in the RAM Tier:
|
|
The tier strategy can also be reset to the system default strategy.
|
|
For example, to reset the customer_order table's tier strategy:
|
|
REFRESH EXTERNAL TABLE
Refreshes the data within an external table:
|
|
Important
Data source connect privilege is required to refresh an external table that uses a data source.
TRUNCATE TABLE
Deletes all the records from a table:
|
|
DROP TABLE
When removing a table from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the table to drop doesn't exist; if IF EXISTS is specified, no error will be reported.
|
|
SHOW CREATE TABLE
Outputs the DDL statement required to reconstruct the given table:
|
|
Note
The response to SHOW CREATE TABLE is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.
For example, to output the DDL for the example table created in the CREATE TABLE section:
|
|
|
|
CREATE VIEW
Creates a new virtual table from the given query in the specified schema.
A view must adhere to the supported naming criteria for tables & columns.
Note
SQL views are only able to be queried via ODBC/JDBC connection or the /execute/sql endpoint. They are, however, still visible in the GAdmin table listing and manageable through other DDL endpoints.
The general format is:
|
|
Parameters | Description |
---|---|
OR REPLACE | Any existing table/view with the same name will be dropped before creating this view |
<schema name> | Name of the schema containing the view to create |
<table name> | Name of the view to create |
<select statement> | The query that will define both the structure and content of the created view |
When any of a view's source tables is dropped, the view will also be dropped.
Caution!
A CREATE OR REPLACE issues an implicit drop, so replacing an input table will have the same effect of dropping the view.
For example, to create a view that is a copy of an existing table, failing if a table or view with the same name as the target view already exists:
|
|
CREATE MATERIALIZED VIEW
Specifying MATERIALIZED in a CREATE VIEW statement will make the view a materialized view.
A view must adhere to the supported naming criteria for tables & columns.
The general format is:
|
|
Parameters | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
OR REPLACE | Any existing table/view with the same name will be dropped before creating this materialized view | ||||||||||
TEMP | If the database is restarted, the materialized view will be removed | ||||||||||
<schema name> | Name of the schema containing the materialized view to create | ||||||||||
<table name> | Name of the materialized view to create | ||||||||||
REFRESH | Specifies the data refresh scheme for the materialized view. The following schemes are available:
| ||||||||||
<select statement> | The query that will define both the structure and content of the created materialized view |
The intermediary results of materialized views are cached to improve the performance of queries against them. This means that, unlike typical views, materialized views are not lightweight database entities, but rather consume memory and processing time proportional to the size of the source data and complexity of the query.
When any of the source tables of a materialized view is dropped, the view will also be dropped.
Caution!
A CREATE OR REPLACE issues an implicit drop, so replacing an input table will have the same effect of dropping the view.
While primary keys & foreign keys are not transferred to the new view, shard keys will be, if the column(s) composing them are part of the SELECT list. A new shard key can be specified for the created view by using the KI_SHARD_KEY(<column list>) pseudo-function in the SELECT list.
To create a materialized view with columns a, b, c, & d and a new shard key on columns a & b, that refreshes once per half hour, replacing a view with the same name as the target view, if it exists:
|
|
ALTER VIEW
The following facet of a view can be altered:
Move View
A view can be moved from one schema to another.
The general form of the command is:
|
|
For example, to move the sales_2017 view from the example_olap schema to the example_archive schema:
|
|
ALTER MATERIALIZED VIEW
Any of the following facets of a materialized view can be altered:
Set Access Mode
A materialized view can have its global accessibility modified for all users in the system, independently from and further restricting any role-based access controls in place. Note that changing the access mode cannot widen access for users not already granted access; it can only narrow access for those who already have access. This setting will also trump administrative access to a view.
|
|
Set Refresh Mode
The refresh mode of a materialized view can be modified.
The general form of the command is:
|
|
The available refresh modes are:
Constant | Description |
---|---|
OFF | Will prevent the view from being automatically refreshed, but will still allow manual refreshes of the data to be requested |
ON CHANGE | Will cause the view to be updated any time a record is added, modified, or deleted from the subtending tables in the view's query Important On change mode cannot be applied if the enable_worker_http_servers configuration setting (i.e. multi-head) is set to true. This is because multi-head inserts inserts bypass the on change mode's notification system and are written directly to the tables on the nodes they're located on instead of communicating to the head node that a change has occurred. |
ON QUERY | Will cause the view to be updated any time the view is queried |
EVERY | Allows specification of an interval in seconds, minutes, hours, or days, with the optional specification of a starting time at which the first refresh interval will run; if no start time is specified, the default will be an interval's worth of time from the point at which the view alteration was requested |
For example, to alter the current sales table to refresh every 6 hours:
|
|
This would alter the view in the same way:
|
|
Set TTL
A materialized view's time-to-live (TTL) can be altered.
|
|
REFRESH VIEW
Refreshes the data within a materialized view:
|
|
DROP VIEW
When removing a view from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the view to drop doesn't exist; if IF EXISTS is specified, no error will be reported.
|
|
SHOW CREATE VIEW
Outputs the DDL statement required to reconstruct the given view:
|
|
Note
The response to SHOW CREATE VIEW is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.
For example, to output the DDL for the example view created in the CREATE VIEW section:
|
|
|
|
To output the DDL for the example materialized view created in the CREATE MATERIALIZED VIEW section:
|
|
|
|
CREATE DATA SOURCE
Creates a new data source, which contains the location and connection information for a data store that is external to Kinetica. The data source does not reference specific data files within the source; file references can be made by using the data source in a CREATE EXTERNAL TABLE call (for creating a persistent view of the file data) or a LOAD INTO call (for a one-time load of data from the external source to a locally persisted table.
Note
The data source will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.
|
|
Parameters | Description | |||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EXTERNAL | Optional keyword for clarity | |||||||||||||||||||||||||||||||||||||||
<data source name> | Name of the data source, which can be referenced in subsequent commands | |||||||||||||||||||||||||||||||||||||||
<provider> | Provider of the data source Supported providers include:
| |||||||||||||||||||||||||||||||||||||||
<host> | Host, only for HDFS, to use to connect to the data source The host used for Azure is <storage_account_name>.blob.core.windows.net. The host used for S3 is <region>.amazonaws.com. | |||||||||||||||||||||||||||||||||||||||
<port> | Port, only for HDFS, to use to connect to the data source | |||||||||||||||||||||||||||||||||||||||
USER | Optional user name, given in <username>, to use for authenticating to the data source | |||||||||||||||||||||||||||||||||||||||
PASSWORD | Optional password, given in <password>, to use for authenticating to the data source | |||||||||||||||||||||||||||||||||||||||
WITH OPTIONS | Optional indicator that a comma-delimited list of connection option/value assignments will follow.
Supported provider-specific options include:
|
To create a data source, kin_ds, that connects to an Amazon S3 bucket, kinetica_ds, in the US East (N. Virginia) region:
|
|
Provider-Specific Syntax
Several authentication schemes across multiple providers are supported.
- Azure Using Password
- Azure Using SAS Token
- Azure Using OAuth Token
- Azure Using Active Directory
- HDFS Using Password
- HDFS Using Kerberos Token
- HDFS Using Kerberos Keytab
- SQL Support
Azure Using Password
|
|
Azure Using SAS Token
|
|
Azure Using OAuth Token
|
|
Azure Using Active Directory
|
|
HDFS Using Password
|
|
HDFS Using Kerberos Token
|
|
HDFS Using Kerberos Keytab
|
|
S3
|
|
ALTER DATA SOURCE
Alters the connection parameters of an existing data source.
Note
The data source will be validated upon alteration, by default, and will fail to be altered if an authorized connection cannot be established.
|
|
Parameters | Description |
---|---|
EXTERNAL | Optional keyword for clarity |
<data source name> | Name of the data source to alter |
SET PROPERTY | Indicator that a comma-delimited list of alterations to make will follow. See Set Properties for the complete list of properties. |
To alter a data source, kin_ds, updating the username & password:
|
|
Set Properties
All data source properties can be altered via ALTER DATA SOURCE. The following are the property names and descriptions to use when performing an alteration.
Provider-Agnostic Properties
Parameters | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
LOCATION | Location of the data source, including the optional (HDFS only) <host> & <port> to use to connect to it The data source <provider> can be one of the following:
| ||||||||
USER | User name, given in <username>, to use for authenticating to the data source | ||||||||
PASSWORD | Password, given in <password>, to use for authenticating to the data source | ||||||||
VALIDATE | Whether to test the connection to the data source upon creation; if true, the creation of a data source that cannot be connected to will fail; if false, the data source will be created regardless of connectivity. Default is true. | ||||||||
WAIT TIMEOUT | Timeout in seconds for reading from a given storage provider |
Provider-Specific Properties
Option | Provider | Description |
---|---|---|
OAUTH TOKEN | AZURE | Azure storage account OAuth token |
SAS TOKEN | AZURE | Azure storage account shared access signature token |
STORAGE ACCOUNT NAME | AZURE | Azure storage account name (only used if TENANT ID is specified) |
TENANT ID | AZURE | Azure Active Directory tenant identifier |
DELEGATION TOKEN | HDFS | Optional Kerberos delegation token for worker nodes; if not specified, the token will be acquired from HDFS |
KERBEROS KEYTAB | HDFS | Location of the Kerberos keytab file on the head node |
USE KERBEROS | HDFS | Whether to attempt Kerberos authentication to HDFS |
BUCKET NAME | S3 | Amazon S3 bucket name |
REGION | S3 | Amazon S3 region identifier |
CONNECTION TIMEOUT | HDFS, S3 | Timeout in seconds for connecting to a given storage provider |
DROP DATA SOURCE
Removes an existing data source. Any external tables that depend on the data source must be dropped before it can be dropped.
|
|
Parameters | Description |
---|---|
EXTERNAL | Optional keyword for clarity |
<data source name> | Name of the data source to remove |
To drop a data source, kin_ds:
|
|
SHOW DATA SOURCE
Outputs the configuration of an existing data source. Users with system_admin permission will see all data source configuration (except any password defined for the connection), while users with connect permission will only see the name & provider.
|
|
Parameters | Description |
---|---|
EXTERNAL | Optional keyword for clarity |
<data source name> | Name of the data source whose configuration will be output. Use * instead to output the configuration of all data sources. |
To show information about a data source, kin_ds:
|
|
CREATE CREDENTIAL
Creates a new credential, which is a record that contains authentication information required to connect to a resource outside the database. Any user may create a credential for their own use.
|
|
Parameters | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<credential name> | Name of the credential to create. The credential name cannot start with a number and can only contain letters, numbers, and underscores. | ||||||||||||||||||||
TYPE | The type of credential to create. Supported types include:
| ||||||||||||||||||||
IDENTITY | Username to use for authenticating with the credential. | ||||||||||||||||||||
SECRET | Password to use for authenticating with the credential. |
For example, to create a Microsoft Azure Active Directory credential, auser_azure_active_dir_creds:
|
|
ALTER CREDENTIAL
Alters the properties of an existing credential. Only users with system_admin or system_user_admin, or users with credential_admin on the credential to alter, may alter a credential.
|
|
Parameters | Description |
---|---|
<credential name> | Name of the existing credential to alter. |
SET PROPERTY | Indicator that a comma-delimited list of alterations to make will follow. See Set Properties for the complete list of properties. |
To alter a credential, auser_azure_active_dir_creds, updating the secret:
|
|
Set Properties
All credential properties can be altered via ALTER CREDENTIAL. The following are the property names and descriptions to use when performing an alteration.
Parameters | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TYPE | The type of credential. Supported types include:
| ||||||||||||||||||||
IDENTITY | Username to use for authenticating with the credential. | ||||||||||||||||||||
SECRET | Password to use for authenticating with the credential. |
DROP CREDENTIAL
Removes an existing credential. Only users with system_admin or system_user_admin, or users with credential_admin on the credential to drop, may drop a credential.
|
|
Parameters | Description |
---|---|
<credential name> | Name of the existing credential to remove. Use * instead to drop all credentials. |
To drop a credential, auser_azure_active_dir_creds:
|
|
SHOW CREDENTIAL
Outputs the DDL statement required to reconstruct the given credential. Note that the secret value will be masked and would need to be replaced with the actual secret value if attempting to reconstruct the credential. Only users with system_admin or system_user_admin, or users with credential_admin or credential_read on the credential to show, may show a credential.
|
|
Parameters | Description |
---|---|
<credential name> | Name of the existing credential for which the DDL will be output. Use * instead to output the DDL of all credentials. |
Note
The response to SHOW CREDENTIAL is a single-column result set with the DDL statement as the value in the DDL column.
For example, to output the DDL for a credential, auser_azure_active_dir_creds:
|
|
To output the DDL for all credentials:
|
|
DESCRIBE CREDENTIAL
Outputs the configuration of an existing credential. Only users with system_admin or system_user_admin, or users with credential_admin or credential_read on the credential to describe, may describe a credential.
|
|
Parameters | Description |
---|---|
<credential name> | Name of the existing credential for which the configuration will be output. Use * instead to output the configuration of all credentials. |
For example, to show the configuration for a credential, auser_azure_active_dir_creds:
|
|
To show the configuration for all credentials:
|
|
DESCRIBE
Lists the columns and column types & properties for a given table or view; or lists the contained tables and views of a given schema:
|
|
|
|
For example, to describe the tables contained in the demo schema, into which demo data is usually downloaded:
|
|
|
|
To describe the example table created in the CREATE TABLE section:
|
|
|
|
Data Manipulation (DML)
INSERT
There are two methods for inserting data into a table from within the database:
When specifying a column list using either method, any non-nullable fields not included in the list will be given default values--empty string for strings, and 0 for numerics. The fields in the column list and the values or fields selected must align.
Tip
If Multi-Head Ingest has been enabled on the database server, INSERT operations will automatically leverage it, when applicable.
See Loading Data from Files for inserting data into a table from sources external to the database.
INSERT INTO...VALUES
To insert records with literal values, use this format:
|
|
For example:
|
|
INSERT INTO...SELECT
To insert records, using another table as the source, use this format:
|
|
For example:
|
|
Upserting
To upsert records, inserting new records and updating existing ones (as denoted by primary key), use the KI_HINT_UPDATE_ON_EXISTING_PK hint. If the target table has no primary key, this hint will be ignored.
For example:
|
|
Important
By default, any record being inserted that matches the primary key of an existing record in the target table will be discarded, and the existing record will remain unchanged. The KI_HINT_UPDATE_ON_EXISTING_PK hint overrides this behavior, favoring the source records over the target ones.
UPDATE
Updates can set columns to specified constant values or expressions. The general format is:
|
|
For example, to update employee with ID 5 to have a new manager, with ID 3, and to have a 10% salary increase:
|
|
Subqueries can be used in the expression list and within the SET clause. To update all the bottom earners in each department with a 5% salary increase using a subquery in the expression list:
|
|
To update a backup table to match the source table that was just updated for new salary values using a subquery within the SET clause:
|
|
Joins can be used within an UPDATE statement to update records based on the results of a JOIN.
Important
The JOIN statement cannot be part of the SET clause. Also, primary key column updates are not supported using JOIN operations. The table to be updated must be in the FROM clause.
For example, to update a backup table to match the source table that was just updated for new manager and salary values using an INNER JOIN:
|
|
Tip
Simplified JOIN syntax is also supported:
|
|
DELETE
Deletes records from a table; the general format is:
|
|
For example, to delete employee with ID 6:
|
|
Subqueries can also be used in the expression list. To delete all the most recent hires in each department:
|
|
Loading Data from Files
Kinetica supports loading of text-based delimited data files via SQL. There are two supported paths:
- Loading client-side files via ODBC/JDBC
- Loading server-side files via /execute/sql endpoint call
INSERT INTO...SELECT...FROM FILE
Kinetica can load data from text-based data files that are local to the client into existing tables using a modified INSERT INTO...SELECT statement.
When reading data from a file, Kinetica assumes the following format, able to be overridden using command options:
If there is a header row, it is either:
a simple list of comma-delimited column names:
1
id,category,name,description,stock
the Kinetica standard format:
1
id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable
The data fields are comma-delimited
Strings are optionally enclosed in double-quotes; double-quotes must be used when the data contains commas; two consecutive double-quotes are used as an escape code for double-quoted string data containing double-quotes; e.g.:
"This string contains a "" quote mark and a "","" double-quoted comma."
This would be loaded into a single string column in Kinetica as:
This string contains a " quote mark and a "," double-quoted comma.
Data can be inserted into a table from a file with the following syntax:
|
|
Parameters | Description |
---|---|
<schema name> | Name of the schema containing the table into which data will be inserted |
<table name> | Name of the table into which data will be inserted |
<column list> | Optional list of target table columns into which data will be inserted; if not specified, all target table columns will be assumed. Whether specified or implied, this list needs to match the fields specified in <field list> in number, order, & type. |
<field list> | List of the source data file field names from which data will be extracted; use * to extract all columns (this wildcard must be used when a source file has no header which defines field names). This list needs to match all the target table columns (or those specified in the <column list> option) in number, order, & type. |
<file name | file name expression> | Name of the source data file or set of source data files from which records will be extracted; wildcards (*) can be used to specify a group of files |
<option list> | Optional list of whitespace-separated processing options |
Options
The following options can be specified when loading data from files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.
Option | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BATCH_SIZE=<n> | Use an ingest batch size of n records. The default batch size is 10,000. | ||||||||||||||||
COMMENT='<string>' | Treat lines in the source file that begin with string as comments and skip. The default comment marker is #. | ||||||||||||||||
DELIMITER='<char>' | Use char as the source file field delimiter. The default delimiter is ,, unless the source file has one of these extensions:
See Option Characters for allowed characters. | ||||||||||||||||
ESCAPE='<char>' | Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:
For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database. The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not. There is no default escape character. See Option Characters for allowed characters. | ||||||||||||||||
HEADER=<Y|N> | Declare that the source file has or does not have a header. If not given, that determination will be intuited. | ||||||||||||||||
INITIAL_CLEAR | Truncate the target table before loading data. The default is to not truncate the target table. | ||||||||||||||||
LIMIT=<n> | Limit the total number of records inserted into the target table to n. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the three files being read have 10 records each, all of the records from the 1st file and the first 5 records from the 2nd file will be loaded. The remaining records in the 2nd file and all of the records from the 3rd file will be skipped. The default is no limit. | ||||||||||||||||
NULL='<string>' | Treat string as the indicator of a null source field value. The default is \N. | ||||||||||||||||
ON_ERROR=<mode> | When an error is encountered loading a record, handle it using using the given <mode>; one of the following:
The default mode is SKIP. | ||||||||||||||||
QUOTE='<char>' | Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given. The default is the " (double-quote) character. See Option Characters for allowed characters. | ||||||||||||||||
SKIP=<n> | Skip the first n source file data lines read, not including header and comment lines. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the first two files read have 10 records each, the first record to be loaded (not skipped) will be the 6th record of the 2nd file. The default is to skip no records. |
Option Characters
For DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:
Escaped Char | Corresponding Source File Character |
---|---|
'' | Single quote |
\\ | Backslash |
\a | ASCII bell |
\b | ASCII backspace |
\f | ASCII form feed |
\t | ASCII horizontal tab |
\v | ASCII vertical tab |
For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.
Examples
Single File
To load a CSV file of product data located at /tmp/data/products.csv
into a table named product:
|
|
Multiple Files
To load all files whose names begin with products and end in a csv
extension, located under /tmp/data
, into a table named product:
|
|
Column/Field Names
To load specific fields from a product data file located at
/tmp/data/products.csv
into specific target table columns:
|
|
Note
When specifying source data file field names, it is not required that target table column names also be specified, or vice versa. The only requirement is that the set of source data file fields selected align with the target table columns into which data will be inserted.
Options
To load data from a product data file with the following options:
- file located at
/tmp/data/products.csv
- only records 4 through 10 will be loaded
- any error will cause the insert to fail
- the target table will be truncated before loading
- records will be inserted into the table in groups of 2
|
|
To load data from a product data file with the following options:
- file located at
/tmp/data/products.ssv
- the file has a header
- the field delimiter is a semicolon
- data requiring quotes will have single quotes as the quoting character
- the escape character is the backtick
- the data file represents null values as <null>
- data file comments are on lines starting with --
|
|
Considerations
Multi-Head
- If Multi-Head Ingest has been enabled on the database server, file loading through ODBC/JDBC will automatically leverage it, when applicable.
Blank Lines
- A blank line in a source data file will be treated as a valid record and will be inserted into the target table as a record with all null fields. If blank lines should not be considered as data, they should be prefixed with the comment marker in order to be skipped by the loading process.
GAdmin
- While GAdmin does support SQL file ingestion, it is
recommended to use server-side loading instead.
Upon receiving this client-side load command, GAdmin will spawn a JDBC
client on its host that will need to reference files accessible to the
gpudb
user located on that host (usually, the head node of the Kinetica cluster).
/execute/sql endpoint
- While SQL file ingestion is available to ODBC/JDBC clients and via GAdmin; it is not supported within the /execute/sql endpoint, either as a REST call directly or as an API call (like the Python API's execute_sql() function).
LOAD INTO
Kinetica can load data from text-based data files, located on the Kinetica cluster itself, or from a data source, into existing tables using a LOAD INTO... statement.
Its use with ring resiliency has additional considerations.
Data can be loaded with the following syntax:
|
|
Parameters | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DATA | Optional keyword for compatibility | ||||||||||
<schema name> | Name of the schema containing the table into which data will be loaded | ||||||||||
<table name> | Name of the table into which data will be loaded | ||||||||||
FROM FILE PATHS <file paths> | Source file specification clause, where <file paths> is a comma-separated list of single-quoted server-side file paths from which data will be loaded; wildcards (*) can be used to specify a group of files. If a data source is specified in the external table load options these file paths must resolve to accessible files at that data source location. Also, wildcards will only work when used within the file name, not the path. If no data source is specified, the files are assumed to be local to the database and
must all be accessible to the The files must all be accessible to the For example, if external_files_directory is set to
| ||||||||||
FORMAT | Optional indicator of source file type; will be inferred from file extension if not given. Supported formats include:
| ||||||||||
WITH OPTIONS | Optional indicator that a comma-delimited list of connection & global option/value assignments will follow. See Load Options for the complete list of options | ||||||||||
<table property clause> | Optional comma-separated list of table properties, from a subset of those available, to the table to load into, if it is created by this call. |
Delimited Text Options
The following options can be specified when loading data from delimited text files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.
Option | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
COMMENT = '<string>' | Treat lines in the source file(s) that begin with string as comments and skip. The default comment marker is #. | ||||||||||||||||
DELIMITER = '<char>' | Use char as the source file field delimiter. The default delimiter is a comma, unless a source file has one of these extensions:
See Delimited Text Option Characters for allowed characters. | ||||||||||||||||
ESCAPE = '<char>' | Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:
For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database. The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not. There is no default escape character. | ||||||||||||||||
HEADER DELIMITER = '<char>' | Use char as the source file header field name/property delimiter, when the source file header contains both names and properties. This is largely specific to the Kinetica export to delimited text feature, which will, within each field's header, contain the field name and any associated properties, delimited by the pipe | character. An example Kinetica header in a CSV file: The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters. Note The DELIMITER character will still be used to separate field name/property sets from each other in the header row | ||||||||||||||||
INCLUDES HEADER = <TRUE|FALSE> | Declare that the source file(s) will or will not have a header. If not given, that determination will be intuited. | ||||||||||||||||
NULL = '<string>' | Treat string as the indicator of a null source field value. The default is the empty string. | ||||||||||||||||
QUOTE = '<char>' | Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given. The default is the " (double-quote) character. See Delimited Text Option Characters for allowed characters. |
Delimited Text Option Characters
For DELIMITER, HEADER DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:
Escaped Char | Corresponding Source File Character |
---|---|
'' | Single quote |
\a | ASCII bell |
\b | ASCII backspace |
\f | ASCII form feed |
\t | ASCII horizontal tab |
\v | ASCII vertical tab |
For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.
Load Options
The following options can be specified to modify the way data is loaded (or not loaded) into the target table can be specified.
Option | Description | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BATCH SIZE | Use an ingest batch size of the given number of records. The default batch size is 10,000. | ||||||||||||||||||||||||||
COLUMN FORMATS = '<string>' | Use the given type-specific formatting for the given column when parsing source data being loaded into that column. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string. Supported column types include:
For example, to make the YYYY.MM.DD format for loading source data into date column d and HH:MM:SS format for loading source data into time column t: { "d": {"date": "%Y.%m.%d"}, "t": {"time": "%H:%M:%S"} } | ||||||||||||||||||||||||||
DATA SOURCE | Load data from the given data source. Data source connect privilege is required when loading from a data source. | ||||||||||||||||||||||||||
DEFAULT COLUMN FORMATS = '<string>' | Use the given formats for source data being loaded into target table columns with the corresponding column types. This should be a map of target column type to source format for data being loaded into columns of that type, formatted as a JSON string. Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option. For example, to make the default format for loading source data dates in the form YYYY.MM.DD and times in the form HH:MM:SS: { "date": "%Y.%m.%d", "time": "%H:%M:%S", "datetime": "%Y.%m.%d %H:%M:%S" } | ||||||||||||||||||||||||||
FIELDS MAPPED BY < POSITION(<col#s>) | NAME(<field names>) > | Choose a comma-separated list of fields from the source file(s) to load, specifying fields by either position or name. If loading by name, the source file field names must match the target table column names exactly. Note When specifying source data file fields, the set of source data file fields must align, in type & number, with the target table columns into which data will be loaded. Important Field mapping by position is not supported for Parquet files. | ||||||||||||||||||||||||||
INGESTION MODE | Whether to do a full ingest of the data or perform a dry run or type inference instead. The default mode is FULL.
| ||||||||||||||||||||||||||
LOADING MODE | Use one of the following distribution schemes to load data files. The default mode is HEAD.
| ||||||||||||||||||||||||||
ON ERROR | When an error is encountered loading a record, handle it using one of the following modes. The default mode is PERMISSIVE.
|
Table Property Clause
A comma-separated list of options for creating the target table, if it doesn't exist and if the source data file has Kinetica headers, can be specified.
Kinetica headers contain column type information, which are used in creating the target table if it doesn't exist. An example Kinetica header:
|
|
The general format for the table properties clause is:
|
|
Available table properties include:
Property | Description |
---|---|
CHUNK SIZE | Size of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold |
TTL | The time-to-live (TTL) for the table; if not set, the table will not expire |
Examples
Single File
To load a CSV file of product data located at data/products.csv
,
relative to the configured
external files directory, into a table named
product:
|
|
Single File - No Header
To load a headerless CSV file of product data located at
data/products.nh.csv
, relative to the configured
external files directory, into a table named
product:
|
|
Multiple Files
To load all files whose names begin with products and end in a csv
extension, located at data/products*csv
, relative to the configured
external files directory, into a table named
product:
|
|
When loading files from multiple paths, specify a comma-delimited list of those
file paths. To load one file located at data/products.csv
and one file
located at data/products.kh.csv
, relative to the configured
external files directory, into a table named
product:
|
|
Note
Multiple paths and wildcards specifying multiple files on those paths can be used together.
Column/Field Names
To load specific fields from a product data file, with the following conditions:
- data file located at
data/products.title-case.csv
, relative to the configured external files directory - data file contains a header with fields named ID, Category, Name, Description, & Stock
- target table named product_name_stock
- target columns named id, name, & stock
- only load fields ID, Name, & Stock into columns id, name, & stock
- skip records that fail to load successfully
|
|
Note
When specifying source data file field names, the set of source data file fields selected must align, in type & number, with the target table columns into which data will be loaded.
Delimited Text Options
To load data from a product data file with the following options:
- file located at
data/products.ssv
, relative to the configured external files directory - the file has a header
- the field delimiter is a semicolon
- data requiring quotes will have single quotes as the quoting character
- the escape character is the backtick
- the data file represents null values as <null>
- data file comments are on lines starting with --
- when parse errors are encountered, as much of the record's data as possible will be loaded into the target table
|
|
Create Table with Kinetica Headers
A CSV file with Kinetica headers can be used to create the target table if it doesn't exist.
To load such a file of product data with the following conditions:
- data file located at
data/products.kh.csv
, relative to the configured external files directory - data file contains a Kinetica header
- target table named product_create_on_load
- target table will be created if it doesn't exist
- target table will have a chunk size of 1,000,000 records, if it is created by this load command
|
|
Parquet File
To load a Parquet file of employee data located at
data/employee.parquet
, relative to the configured
external files directory, into a table named
employee_2000:
|
|
Dry Run
To perform a dry run of a load of a CSV file of product data located at
data/products.csv
, relative to the configured
external files directory, into a table named
product:
|
|
Note
The dry run will return the number of records that would have been loaded, but not actually insert any records into the target table.
SQL Procedures
Kinetica provides support for basic SQL procedures, as an executable batch of SQL statements. A procedure can be executed by two means:
- on-demand - procedure is called directly by a user
- scheduled execution - procedure is configured, upon creation, to execute at a user-specified interval
Note
Even if a procedure is configured for scheduled execution, it can still be executed directly by a user in on-demand fashion.
After the first run, the execution plan for all statements in the procedure will be created and cached to improve performance on future executions.
If there is an error executing any statement in the procedure, the procedure will stop immediately and report the error. If the procedure is invoked via scheduled execution, an alert will be sent to the alert monitor, as there is no interactive session through which the error could be reported to a user. Any database modifications prior to the error will not be rolled back.
Note
If any of the database objects referenced in the procedure are dropped or modified, the procedure will be dropped as well. This does not include any objects created by the procedure that are later referenced by it.
The ability to manage & execute procedures is available through SQL, using the following commands:
For procedure execute permission management, see:
Supported Statements
The following statement types are allowed within a SQL procedure:
- SELECT
- INSERT (inserting from data file not supported)
- UPDATE
- DELETE
- <CREATE | DROP | SHOW CREATE> SCHEMA
- <CREATE | TRUNCATE | DROP | SHOW CREATE> TABLE
- CREATE TABLE...AS
- <CREATE [MATERIALIZED] | REFRESH | DROP | SHOW CREATE> VIEW
- DESCRIBE
- SHOW PROCEDURE
- SHOW SECURITY [FOR <USER | ROLE>]
- SHOW RESOURCE GROUP
Security
Permissions for managing procedures follow those for creating tables; e.g., if a user has the ability to create a table in a given schema, that user will also be able to create & drop procedures there.
Executing a procedure requires either the implicit execute permission that is granted to the creator of a procedure, or explicit execute permission, which can be granted to or revoked from any user or role, irrespective of whether the target user or role has the appropriate access to the database objects referenced within the SQL procedure.
Execute permission on a procedure also allows the grantee to see the contents of the procedure.
CREATE PROCEDURE
SQL procedures can be created with the following syntax:
|
|
Parameters | Description |
---|---|
<schema name> | Name of the schema in which this procedure will be created |
<procedure name> | Name to give to the created procedure; must adhere to the supported naming criteria for tables, and cannot be named the same as any existing table or view |
<sql statements> | Semicolon-separated list of supported SQL statements. If the final statement in the procedure is an output-generating statement (SQL query, SHOW command, etc.), the output of that statement alone will be returned to the user; all other statements that generate output will be ignored. |
OR REPLACE | Drop any existing procedure with the same name before creating this one |
LANGUAGE | Optional language specification for the procedure. Only SQL is supported at this time. |
<number> | Length of time, in the given number of units, between scheduled executions of the procedure. Fractional values are accepted. |
For example, to create a sqlp procedure:
|
|
To create a sqlp_weekly procedure that executes once per week:
|
|
EXECUTE PROCEDURE
SQL procedures can be executed on-demand with the following syntax:
|
|
If the final statement in the procedure is an output-generating statement (SQL query, SHOW command, etc.), the output of that statement alone will be returned to the user; all other statements that generate output will be ignored.
If there is an error executing any statement in the procedure, the procedure will stop immediately and report the error. Any database modifications prior to the error will not be rolled back.
For example, to execute the sqlp procedure:
|
|
DROP PROCEDURE
When removing a SQL procedure from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the table to drop doesn't exist; if IF EXISTS is specified, no error will be reported.
|
|
For example, to drop the sqlp procedure:
|
|
SHOW PROCEDURE
The content of a SQL procedure can be displayed with the following syntax:
|
|
For example, to show the contents of the sqlp_weekly procedure:
|
|
User Defined Functions (UDFs)
Kinetica provides support for User-Defined Function (UDF) creation and management in SQL. Unlike conventional UDFs, Kinetica UDFs are external programs that can be managed via SQL and may run in distributed fashion across the cluster.
UDF features accessible via SQL include:
For UDF execute permission management, see:
CREATE FUNCTION
Creates a new User-Defined Function (UDF) with the given options. Only users with the SYSTEM ADMIN permission can create UDFs.
The basic form of the supported CREATE FUNCTION statement follows:
|
|
Parameters | Description | ||||||
---|---|---|---|---|---|---|---|
OR REPLACE | Any existing UDF with the same name will be dropped before creating this one | ||||||
<function name> | Name of the UDF, which can be referenced in subsequent commands | ||||||
RETURNS TABLE | Specifies that the UDF returns a table after execution and defines the column names and types to which the function's output data is expected to conform Important This defines the UDF as a User-Defined Table Function (UDTF), which allows the function to be called within a SQL query; see SELECT...FROM TABLE for details. | ||||||
MODE | Name of the execution mode for the UDF; the following modes are available:
| ||||||
RUN_COMMAND | The command used to execute the files associated with the UDF; e.g., python, java | ||||||
RUN_COMMAND_ARGS | The arguments supplied alongside the given RUN_COMMAND; e.g., -jar udf.jar, --username jdoe --password 'Password123!' | ||||||
FILE PATHS | A comma-separated list of single-quoted file paths from which UDFs and any supplementary files will be loaded; the files specified must be present in a sub-directory under the external files directory prior to UDF creation | ||||||
WITH OPTIONS | Optional indicator that a comma-delimited list of UDF option/value assignments will follow
|
For example, to create a distributed Python UDF_SOS_PY_PROC UDF that returns a table featuring two columns (noting that the udf_sos_py_proc.py file was uploaded prior to running the example):
|
|
EXECUTE FUNCTION
Executes an existing User-Defined Function (UDF). Any user with the SYSTEM ADMIN permission or the EXECUTE FUNCTION permission on the specific UDF (or across all UDFs) is allowed to execute UDFs.
The basic form of the supported EXECUTE FUNCTION statement follows:
|
|
The <function name> should be the name of an existing UDF.
Function Parameters
The following are the available function parameters that may be passed to a UDF in an EXECUTE FUNCTION call, along with their corresponding function values.
Parameters | Description | ||||||
---|---|---|---|---|---|---|---|
INPUT_TABLE_NAMES | Optional set of "tables" that will be used as input to the UDF, specified as a set of queries defining the source data for each "table". The value of this parameter is either the INPUT_TABLE or INPUT_TABLES function, depending on the number of input "tables".
| ||||||
OUTPUT_TABLE_NAMES | Optional set of tables that will be used as output for the UDF, specified as a set of names passed as a comma-delimited list of strings to the OUTPUT_TABLES function; e.g.: OUTPUT_TABLES('sales_summary', 'customer_profile') | ||||||
PARAMS | Optional list of parameters to pass to the UDF, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.: KV_PAIRS(customer_id = '1', report_type = 'yearly') | ||||||
OPTIONS | Optional list of function execution options, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.: KV_PAIRS(max_concurrency_per_node = '2') |
For example, to execute the UDF_SOS_PY_PROC UDF, specifying an input and output table:
|
|
SELECT...FROM TABLE
Any User-Defined Table Function (UDTF), a UDF that has been created with a RETURN TABLE specified, can be used as a source table in a SELECT statement. Any user with execute permission on the UDTF can call it from within a query.
The supported SELECT...FROM TABLE statement follows.
|
|
The <function name> must refer to an existing UDTF, and the <function parameter list> follows the same form as in the EXECUTE FUNCTION syntax.
DROP FUNCTION
Removes an existing User-Defined Function (UDF). Only users with the SYSTEM ADMIN permission can remove UDFs.
The supported DROP FUNCTION statement follows.
|
|
Parameters | Description |
---|---|
<function name> | Name of the UDF to remove |
To drop the UDF_SOS_PY_PROC UDF:
|
|
SHOW FUNCTION
Outputs the configuration of an existing User-Defined Function (UDF). Users with SYSTEM ADMIN permission will see the entire UDF configuration, while users with proc level permissions will only see the name of the UDF & its execution mode.
The supported SHOW FUNCTION statement follows.
|
|
Parameters | Description |
---|---|
<function name> | Name of the UDF whose configuration will be output Tip Leave the function name blank to output the configuration of all UDFs. |
To show the UDF_SOS_PY_PROC UDF:
|
|
DESCRIBE FUNCTION
Displays the attributes of the given User-Defined Function (UDF) in tabular format. Users with SYSTEM ADMIN permission or proc-level permissions on the function (or across all functions) will be allowed to see the function attributes.
The supported DESCRIBE FUNCTION statement follows.
|
|
Parameters | Description |
---|---|
<function name> | Name of the UDF whose attributes will be output |
To describe the UDF_SOS_PY_PROC UDF:
|
|
Graph
Kinetica provides support for graph creation and management in SQL. Graphs represent topological relationships (both geospatial and non-geospatial) via nodes that are connected by edges.
Graph features accessible via SQL include:
CREATE GRAPH
Creates a new graph. The nodes and edges of a graph are optionally weighted and/or restricted to aid in calculating various ways to traverse the graph. Graphs comprise a maximum of four components that each have unique specifications and ways they can be combined to define the graph. Review Components and Identifiers and Identifier Combinations for more information.
The basic form of the supported CREATE GRAPH statement follows:
|
|
Parameters | Description | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DIRECTED | Optional keyword used to create the graph as directed | ||||||||||||||||||||||||||||||||||||
<graph name> | Name of the graph, which can be referenced in subsequent commands | ||||||||||||||||||||||||||||||||||||
EDGES | Graph component, provided as table column names, expressions, or constants using the SQL INPUT TABLE function, to use for identifying the edges of the graph; review Components and Identifiers and Identifier Combinations for more information | ||||||||||||||||||||||||||||||||||||
NODES | Optional graph component, provided as column names, expressions, or constants using the SQL INPUT TABLE function, to use for identifying the nodes of the graph; review Components and Identifiers and Identifier Combinations for more information | ||||||||||||||||||||||||||||||||||||
WEIGHTS | Optional graph component, provided as column names, expressions, or constants using the SQL INPUT TABLE function, to use for identifying the weights of the graph; review Components and Identifiers and Identifier Combinations for more information | ||||||||||||||||||||||||||||||||||||
RESTRICTIONS | Optional graph component, provided as column names, expressions, or constants using the SQL INPUT TABLE function, to use for identifying the restrictions of the graph; review Components and Identifiers and Identifier Combinations for more information | ||||||||||||||||||||||||||||||||||||
OPTIONS | Optional indicator that a list of connection option/value assignments will follow, passed as a comma-delimited list of key/value pairs to the KV_PAIRS function
|
To create a graph featuring edges and weights:
|
|
To create a graph featuring nodes, edges, and weights:
|
|
SOLVE GRAPH
Solves an existing graph using one of the supported solver types.
The basic form of the supported SOLVE_GRAPH function when called in a SELECT follows:
|
|
The basic form of the supported SOLVE_GRAPH function when called in an EXECUTE FUNCTION statement follows:
|
|
Parameters | Description | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GRAPH | Name of the graph to solve | ||||||||||||||||||||||||||||||||||
SOLVER_TYPE | The type of solver to use for the operation; review Network Graphs & Solvers Concepts for more information
| ||||||||||||||||||||||||||||||||||
SOURCE_NODES | The node(s) used as the origin point(s) for the solution specified using the SQL INPUT TABLE function | ||||||||||||||||||||||||||||||||||
SOLUTION_TABLE | Only applicable when using EXECUTE FUNCTION syntax. Name of the table to store the solution in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria | ||||||||||||||||||||||||||||||||||
DESTINATION_NODES | The node(s) used as the destination point(s) for the solution specified using the SQL INPUT TABLE function | ||||||||||||||||||||||||||||||||||
WEIGHTS_ON_EDGES | Additional weights to apply to the edges of an existing graph specified using the SQL INPUT TABLE function; review Components and Identifiers and Identifier Combinations for more information | ||||||||||||||||||||||||||||||||||
RESTRICTIONS | Additional restrictions to apply to the nodes/edges of an existing graph specified using the SQL INPUT TABLE function; review Components and Identifiers and Identifier Combinations for more information | ||||||||||||||||||||||||||||||||||
OPTIONS | Optional indicator that a comma-delimited list of connection option/value assignments will follow.
|
To solve a graph using the SHORTEST_PATH solver and the table function syntax:
|
|
To solve a graph using the SHORTEST_PATH solver and the EXECUTE FUNCTION syntax (note the additional SOLUTION_TABLE parameter):
|
|
QUERY GRAPH
Queries an existing graph using unique query identifiers and combinations.
The basic form of the supported QUERY_GRAPH function when called in a SELECT follows:
|
|
The basic form of the supported QUERY_GRAPH function when called in an EXECUTE FUNCTION statement follows:
|
|
Parameters | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GRAPH | Name of the graph to query | ||||||||||||||||
QUERIES | Nodes or edges to be queried specified using the SQL INPUT TABLE function; review Query Identifiers and Query Identifier Combinations for more information | ||||||||||||||||
ADJACENCY_TABLE | Only applicable when using EXECUTE FUNCTION syntax. Name of the table to store the resulting adjacencies in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria | ||||||||||||||||
RINGS | Sets the number of rings around the node to query for adjacency | ||||||||||||||||
RESTRICTIONS | Additional restrictions to apply to the nodes/edges of an existing graph specified using the SQL INPUT TABLE function; review Components and Identifiers and Identifier Combinations for more information | ||||||||||||||||
OPTIONS | Optional indicator that a comma-delimited list of connection option/value assignments will follow.
|
To query a graph node within one ring using the table function syntax:
|
|
To query a different graph node within two rings using the execute function syntax (note the additional required ADJACENCY_TABLE parameter):
|
|
MATCH GRAPH
Matches an existing graph to a dataset using one of the supported solver types and sample points which are defined using unique match identifiers and combinations.
The basic form of the supported MATCH_GRAPH function when called in a SELECT follows:
|
|
The basic form of the supported MATCH_GRAPH function when called in an EXECUTE FUNCTION statement follows:
|
|
Parameters | Description | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GRAPH | Name of the graph to match | ||||||||||||||||||||||||||||||||||||||||
SAMPLE_POINTS | Sample points used to match an underlying geospatial graph specified using the SQL INPUT TABLE function. Sample points are specified as identifiers, which are grouped into combinations | ||||||||||||||||||||||||||||||||||||||||
SOLVE_METHOD | The type of solve method to use for the operation; review Network Graphs & Solvers Concepts for more information
| ||||||||||||||||||||||||||||||||||||||||
SOLUTION_TABLE | Only applicable when using EXECUTE FUNCTION syntax. Name of the table to store the solution in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria | ||||||||||||||||||||||||||||||||||||||||
OPTIONS | Optional indicator that a comma-delimited list of connection option/value assignments will follow.
|
To match a graph to its parent dataset using the table function syntax:
|
|
To match the same graph to its parent set instead using the EXECUTE FUNCTION syntax (note the additional required SOLUTION_TABLE parameter):
|
|
Security
Kinetica provides basic table-level role-based access control for users. It also allows global read/write and administrative access to be granted. For details about Kinetica security, see Security Concepts.
Tip
A limited set of user/role information can be extracted via function calls. See SQL Support for details.
Security features accessible via SQL include:
- User Management
- Role Management
- Privilege Management
- Impersonation (Execute As...) - enables execution of commands as another user
User Management
The ability to manage user accounts is available through SQL, using the following commands:
See Users for details about user accounts within Kinetica.
CREATE USER
Users can be added to the system and assigned permissions either directly or via roles.
Optionally, the user can be associated with a resource group and/or be assigned a default schema.
To add a user to the system, use this format:
|
|
Note
The password needs to be single-quoted and must not contain single quotes.
For example, two of the ways to create a new internal user with the user ID of jsmith and a password of secret are:
|
|
To create an internal user with the user ID of jsmith, a password of secret, and assigning the memory_over_execution resource group:
|
|
To create a user with an existing external LDAP user, the user name should match the LDAP user name and be prepended with the @ symbol; no password is supplied, as the user will be externally authenticated:
|
|
ALTER USER
Any of the following facets of a user can be altered, either individually or as a group:
- password
- resource group
- default schema
To alter an existing user, use this format:
|
|
Note
The password needs to be single-quoted and must not contain single quotes.
For example, to alter a user with the user ID of jsmith, assigning a password of new_secret, and the memory_over_execution resource group:
|
|
To unassign the memory_over_execution resource_group from the jsmith user:
|
|
DROP USER
Any user, other than the default users, can be removed from the system. Note that any database objects created by a user will remain when the user is removed.
To remove a user from the system, use this format:
|
|
For example, to drop an internal user jsmith:
|
|
To drop an external LDAP user jsmith:
|
|
SHOW SECURITY FOR User
For any one or more (or all) users in the system, the following can be listed:
To list security assignments:
|
|
For example, to show the permissions, roles, resource group, and default schema for user jsmith:
|
|
Role Management
The ability to manage roles is available through SQL, using the following commands:
See Roles for details about roles within Kinetica.
CREATE ROLE
A new role can be created as a container for permissions or other roles, though both of those must be granted to the role after its creation.
Optionally, the role can be associated with a resource group.
To create a new role, use this format:
|
|
For example, to create an analyst role:
|
|
To create an executive role, assigning it the execution_over_memory resource group:
|
|
ALTER ROLE
A role can have its associated resource group modified.
To modify a role's resource group assignment, use this format:
|
|
For example, to assign the memory_over_execution resource group to the analyst role:
|
|
To unassign the resource group from the analyst role:
|
|
DROP ROLE
Dropping a role will remove the associated permissions & roles granted through the role to all users with the role. Users & roles granted the same permissions either directly or via other roles will retain those permissions.
Any role, other than the default roles, can be removed from the system.
To drop an existing role, use this format:
|
|
For example, to drop the analyst role:
|
|
SHOW SECURITY FOR Role
For any one or more (or all) roles in the system, the following can be listed:
To list security assignments:
|
|
To show the permissions, roles, and resource group for the analyst role:
|
|
Privilege Management
The ability to manage user & role privileges is available through SQL, using the following commands:
- GRANT Role
- GRANT System Permission
- GRANT Table Permission
- GRANT Data Source Permission
- GRANT Credential Permission
- GRANT Procedure Permission
- GRANT Function Permission
- REVOKE Role
- REVOKE System Permission
- REVOKE Table Permission
- REVOKE Data Source Permission
- REVOKE Credential Permission
- REVOKE Procedure Permission
- REVOKE Function Permission
- SHOW SECURITY
See Users for details about user accounts within Kinetica.
GRANT Role
Roles can be granted directly to users or other roles.
To grant a role:
|
|
For example, to grant a role allowing access to analyst tables to the analyst role, and then grant that analyst role to user jsmith:
|
|
GRANT SYSTEM Permission
System-level permissions can be granted directly to users or roles.
To grant a system-level permission or a user-administration permission:
|
|
For example, to grant system administrator permission to jsmith:
|
|
To grant read access to all tables to the auditor role:
|
|
To grant user administrator permission to jsmith:
|
|
GRANT Table Permission
Table-level permissions, which can be applied to schemas, tables, and views, can be granted directly to users or roles.
Row-level security can be invoked for SELECT privilege by specifying a WHERE clause expression.
Column-level security can be invoked for SELECT privilege by specifying a list of accessible columns and/or column security functions.
To grant a table-level permission:
|
|
Parameters | Description |
---|---|
PRIVILEGES | Optional keyword for SQL-92 compatibility |
TABLE | Optional keyword for SQL-92 compatibility |
<schema name> | Either the name of the schema to which access is being granted or the name of the schema containing the table to which access is being granted |
<table/view name> | The name of the table to which access is being granted |
<column list> | Optional comma-separated list of specific table columns and/or column security functions applied to columns on which to grant access; see Column-Level Security for details |
WHERE <expression> | Optional filter expression on the table to which access is being granted, determining to which rows of the target table the grantee will be given access; see Row-Level Security for details |
Wildcards (*) can replace either the schema name, table/view name, or both, to specify all database objects within a given domain:
Wildcard Form | Selected Objects |
---|---|
* | All schemas |
SCHEMA_NAME.* | All tables/views under the schema named SCHEMA_NAME |
*.* | All tables/views under all schemas |
Note
The ALL permission corresponds to the native table_admin permission, which gives full read/write access as well as the additional permission either to ALTER and DROP the specified table or to CREATE, ALTER, & DROP all tables in the specified schema.
Examples
For example, to grant full access on the network_config table to jsmith:
|
|
To grant SELECT access on the network_config_history table to the analyst role:
|
|
To grant SELECT access on all schemas to the auditor role:
|
|
To grant SELECT access on all tables within the archive schema to the analyst role:
|
|
To grant full access on all schemas and tables to the dbadmin role:
|
|
Row-Level Security Examples
To grant SELECT access on an rx_order table for all orders placed since the year 2002:
|
|
To grant SELECT access on an rx_order table for only orders belonging to the current user:
|
|
Column-Level Security Examples
To grant SELECT access on the following columns of an rx_order table:
- obfuscated version of the orderer's social security number
- name of the perscription ordered
- order date/time
|
|
To grant SELECT access on the following columns of an rx_order table:
- masked version of the orderer's social security number (last 4 digits)
- name of the orderer
- name of the perscription ordered
- order date/time
|
|
GRANT Data Source Permission
The permission to connect to and load data from a data source can be granted directly to users or roles.
To grant permission to connect to and load data from a data source:
|
|
For example, to grant access on the data source named kin_ds to the auser user:
|
|
GRANT Credential Permission
The permission to use and manage a credentials, can be granted directly to users or roles.
To grant read access, including use & display, on a credential to a user:
|
|
To grant administrative access, including use, modification, removal, & display, on a credential to a user:
|
|
Tip
Provide * instead of a specific credential name to grant the permission on all credentials.
For example, to grant read access on the credential named auser_azure_active_dir_creds to the auser user:
|
|
To grant administrative access on all credentials to the auser user:
|
|
GRANT Procedure Permission
Procedure-level permissions, which allow execution of SQL procedures, can be granted directly to users or roles.
To grant the permission to execute a SQL procedure:
|
|
For example, to grant execute on the sqlp_weekly procedure to jsmith:
|
|
GRANT Function Permission
Function-level permissions, which allow execution of UDFs/UDTFs, can be granted directly to users or roles.
To grant the permission to execute a UDF or UDTF:
|
|
For example, to grant execute on the udf_sos_py_proc UDF to jsmith:
|
|
REVOKE Role
Roles can be revoked from users or other roles.
To revoke a role:
|
|
For example, to revoke a role allowing access to analyst tables from the analyst role:
|
|
To revoke the analyst role from user jsmith:
|
|
REVOKE SYSTEM Permission
System-level permissions can be revoked from users or roles.
To revoke a system-level permission or a user-administration permission:
|
|
For example, to revoke system administrator permission from jsmith:
|
|
To revoke write access to all tables from the auditor role:
|
|
To revoke user administrator permission from jsmith:
|
|
REVOKE Table Permission
Table-level permissions, which can be applied to schemas, tables, and views, can be revoked from users or roles.
Row-level security can be revoked for SELECT privilege only by revoking SELECT privilege from the entire table.
Note
Access to individual rows cannot be revoked; instead, full SELECT access should be revoked, followed by a grant of access to the appropriate rows.
Column-level security can be revoked for SELECT privilege by specifying a list of columns for which access will be revoked, regardless of any security functions that may have been applied to the column when granting access (HASH, MASK).
To revoke a table-level permission:
|
|
Parameters | Description |
---|---|
PRIVILEGES | Optional keyword for SQL-92 compatibility |
TABLE | Optional keyword for SQL-92 compatibility |
<schema name> | Either the name of the schema to which access is being revoked or the name of the schema containing the table to which access is being revoked |
<table/view name> | The name of the table to which access is being revoked |
<column list> | Optional comma-separated list of specific table columns to which access is being revoked; see Column-Level Security for details |
Wildcards (*) can replace either the schema name, table/view name, or both, to specify all database objects within a given domain:
Wildcard Form | Selected Objects |
---|---|
* | All schemas |
SCHEMA_NAME.* | All tables/views under the schema named SCHEMA_NAME |
*.* | All tables/views under all schemas |
Note
The ALL permission corresponds to the native table_admin permission, which gives full read/write access as well as the additional permission either to ALTER and DROP the specified table or to CREATE, ALTER, & DROP all tables in the specified schema.
Examples
For example, to revoke full access on the network_config table from jsmith:
|
|
To revoke DELETE access on the network_config_history table from the analyst role:
|
|
To revoke DELETE access on all schemas from the auditor role:
|
|
To revoke UPDATE access on all tables within the archive schema from the analyst role:
|
|
To revoke full access on all schemas and tables from the dbadmin role:
|
|
Row-Level Security Examples
To revoke SELECT access on an rx_order table for all orders placed since the year 2002 (as was granted in the row-level grant example), revoke all SELECT access:
|
|
Column-Level Security Examples
To revoke SELECT access on the following columns of an rx_order table:
- orderer's social security number
|
|
To revoke SELECT access on the following columns of an rx_order table:
- masked version of the orderer's social security number (last 4 digits)
- name of the perscription ordered
|
|
REVOKE Data Source Permission
The permission to connect to and load data from a data source can be revoked directly from users or roles.
To revoke permission to connect to and load data from a data source:
|
|
For example, to revoke access on the data source named kin_ds from the auser user:
|
|
REVOKE Credential Permission
The permission to use and manage a credentials, can be revoked directly from users or roles.
To revoke read or administrative access on a credential from a user:
|
|
Tip
Provide * instead of a specific credential name to revoke the access on all credentials.
For example, to revoke read access on the credential named auser_azure_active_dir_creds from the auser user:
|
|
To revoke administrative access on all credentials from the auser user:
|
|
REVOKE Procedure Permission
Permissions relating to SQL procedures can be revoked from users or roles.
To revoke the permission to execute a SQL procedure:
|
|
For example, to revoke execute on the sqlp_weekly procedure from jsmith:
|
|
REVOKE Function Permission
Permissions relating to UDFs/UDTFs, can be revoked from users or roles.
To revoke the permission to execute a UDF or UDTF:
|
|
For example, to revoke execute on the udf_sos_py_proc UDF from jsmith:
|
|
SHOW SECURITY
For any one or more (or all) users and roles in the system, the following can be listed:
- permissions
- roles
- resource groups
- default schema (users only)
To list security assignments:
|
|
For example, to show the permissions, roles, resource groups, and default schema for user jsmith:
|
|
To show the permissions, roles, and resource groups for the analyst role:
|
|
To show the permissions, roles, resource groups, and default schema for the anonymous user and the public role:
|
|
To show all users & roles:
|
|
Impersonation (Execute As...)
When logged into the database as one user, SQL commands can be run as a second user (that user can be impersonated). To do this, first log in to the database as an administrator, then switch to the second user, and finally, run the command. Afterwards, the original user can be reverted back to in order to execute commands as that administrator again; alternatively, an option can be specified to prevent switching back to the original user.
The ability to impersonate other users is available through SQL, using the following commands:
Important
Only a user with administrator privilege can impersonate another user, and only an administrator with a Kinetica database account. Any users who are mapped to Kinetica as administrators via external authentication will need to ensure that a corresponding Kinetica administrator account has been created for impersonation to succeed.
Additionally, impersonation is only supported via ODBC/JDBC.
EXECUTE AS
Switching to another user can be done using the following syntax:
|
|
The WITH NO REVERT option prevents the impersonated user from switching back to the original user. This is useful in several cases, including:
- An administrator needs to run a script as a user with specific permissions, but doesn't want the script to be able to revert back to administrator privileges during its execution.
- An application server connects to the database with its own credentials, but issues commands to the database on behalf of an application user. Again, there is a need to prevent the application user from reverting back to the application server's database account and using those privileges to execute commands.
Example
To demonstrate the uses of EXECUTE AS USER, the following commands can be issued upon logging into the database as an administrator.
First, verify the current user is admin:
|
|