Data Definition (DDL)

Kinetica supports the basic notion of SQL tables as containers of one or more columns of data.

Table & column names must adhere to the supported naming criteria.

A column definition consists of a column type and optional column size, column properties, and nullability. Column properties are used to optimize data storage & speed.

The format of a defined column is column name, followed by column definition. A column definition is column type optionally followed by any column size limit or column properties all enclosed in parentheses, followed by an optional nullability statement:

Column Definition Syntax
1
<column name> <column type> [(<column size / property list>)] [[NOT] NULL]

This format applies to any DDL statement requiring the definition of columns, like CREATE TABLE and ALTER TABLE (when adding/modifying a column).

For example, the following are valid defined columns:

Column Definition Examples
1
2
3
4
id INT(SHARD_KEY)                -- id is an integer and the table's shard key
name VARCHAR(64, TEXT_SEARCH)    -- name is a 64-char limited string and text-searchable
ip IPV4                          -- ip is a string in IPv4 format
cost DECIMAL(10, 2, STORE_ONLY)  -- cost is able to hold an 8.2 decimal and not held in memory

Column Types

CategoryData TypeDescription
NumberTINYINTEffective type: int8
BYTEAlias for TINYINT
SMALLINTEffective type: int16
INTEGEREffective type: integer
INTAlias for INTEGER
BIGINTEffective type: long
LONGAlias for BIGINT
UNSIGNED BIGINTEffective type: ulong
UNSIGNED LONGAlias for UNSIGNED BIGINT
REALEffective type: float
DOUBLEEffective type: double
FLOATAlias for REAL
DECIMALAlias for BIGINT
DECIMAL(P,S)

Effective type: varies by P & S

PSEffective Type
1, 20int8
3, 40int16
5, 60integer
7- n0long
1- n1-4decimal
5, 65, 6float
7- n5- ndouble
NUMERICAlias for DECIMAL
StringVARCHAREffective type: string; character limit based on configured system property
CHARAlias for VARCHAR
STRINGAlias for VARCHAR
TEXTAlias for VARCHAR
VARCHAR(N)Effective type: char1 - char256 or string, whichever is large enough to hold N characters
BITAlias for VARCHAR(1)
CHAR(N)Alias for VARCHAR(N)
IPV4Shorthand for VARCHAR(IPV4), which applies the IPV4 column property
UUIDEffective type uuid
Date/TimeDATEEffective type: date
DATETIMEEffective type: datetime
TIMEEffective type: time
TIMESTAMPEffective type: timestamp
TYPE_DATEAlias for DATE
TYPE_TIMEAlias for TIME
TYPE_TIMESTAMPAlias for TIMESTAMP
BinaryBLOBEffective type: bytes
BINARYAlias for BLOB
BYTESAlias for BLOB
VARBINARYAlias for BLOB
GeospatialGEOMETRYEffective type: wkt
ST_GEOMETRYAlias for GEOMETRY
WKTAlias for GEOMETRY

Column Properties

PropertyDescription
DICTApplies dict data handling to a column, enabling dictionary-encoding of its values; see Dictionary Encoding for details
DISK_OPTIMIZEDApplies disk-optimized data handling to a column
IPV4Treats the associated string-based column as an IPv4 address
LZ4Applies LZ4 compression to a column
LZ4HCApplies LZ4HC compression to a column
PRIMARY_KEYTreats the associated column as a primary key, or part of a composite primary key if other columns also have this property
SHARD_KEYTreats the associated column as a shard key, or part of a composite shard key if other columns also have this property
SNAPPYApplies Snappy compression to a column
STORE_ONLYApplies store-only data handling to a column
TEXT_SEARCHApplies text-searchability to a column
INIT_WITH_NOWFor DATE, TIME, DATETIME, and TIMESTAMP column types, enables the database to use NOW() as the value when the column is not specified in an INSERT statement; also directs the database to replace empty strings (for DATE, TIME, & DATETIME columns) and invalid timestamps (for TIMESTAMP columns) with NOW()
INIT_WITH_UUIDFor UUID column type, enables the database to use a universally unique identifier (UUID) as the value when the column is not specified in an INSERT statement; also directs the database to replace empty strings (for UUID columns) with UUIDs

CREATE SCHEMA

Schemas are logical containers for tables and views. In order to place a table or view in a schema, the schema must be created first--schemas will not be automatically created when specified in CREATE TABLE or similar calls.

CREATE SCHEMA Syntax
1
CREATE < SCHEMA | COLLECTION > <schema name>
CREATE SCHEMA Example
1
CREATE SCHEMA example_container

ALTER SCHEMA

The name of a schema is the only property that can be altered.

Rename Schema

A schema can be renamed.

Rename Schema Syntax
1
2
ALTER SCHEMA <schema name>
RENAME TO <new schema name>
Rename Schema Example
1
2
ALTER SCHEMA example_container
RENAME TO example_container_renamed

DROP SCHEMA

When removing a schema from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the schema to drop doesn't exist; if IF EXISTS is specified, no error will be reported. Also, an error will be reported if the schema to drop contains any tables or views; if CASCADE is specified, the schema and all tables and views within it will be removed.

DROP SCHEMA Syntax
1
DROP < SCHEMA | COLLECTION > [IF EXISTS] <schema name> [CASCADE]

For example, to drop a schema, including its contained tables and views:

DROP SCHEMA Example
1
DROP SCHEMA example_container CASCADE

SHOW CREATE SCHEMA

Outputs the DDL statement required to reconstruct the given schema. If * is specified instead of a schema name, the DDL statements for all schemas, tables, & views in the database will be output.

SHOW CREATE SCHEMA Syntax
1
SHOW CREATE < SCHEMA | COLLECTION > < <schema name> | * >

Note

The response to SHOW CREATE SCHEMA is a single-column result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.

For example, to output the DDL for the example schema created in the CREATE SCHEMA section:

SHOW CREATE SCHEMA Example
1
SHOW CREATE SCHEMA example_container
SHOW CREATE SCHEMA Output
1
| CREATE SCHEMA example_container   |

SET CURRENT SCHEMA

To override the default schema with a different schema, the following syntax can be used:

SET CURRENT SCHEMA Syntax
1
SET CURRENT SCHEMA [<schema name>]

Omitting schema name will switch back to the original default schema.


CREATE TABLE

Creates a new table in the specified schema. If no schema is specified, the table will be created in the calling user's default schema.

The table & column names used must adhere to the supported naming criteria.

CREATE TABLE Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
CREATE [OR REPLACE] [REPLICATED] [TEMP] TABLE [<schema name>.]<table name>
(
    <column name> <column definition>,
    ...
    <column name> <column definition>,
    [PRIMARY KEY (<column list>)],
    [SHARD KEY (<column list>)],
    [FOREIGN KEY (<column list>) REFERENCES <foreign table name>(<foreign column list>) [AS <foreign key name>],...]
)
[<partition clause>]
[<tier strategy clause>]
[<index clause>]
[<table property clause>]
ParametersDescription
OR REPLACEAny existing table or view with the same name will be dropped before creating this one
REPLICATEDThe table will be distributed within the database as a replicated table
TEMPIf the database is restarted, the table will be removed
<schema name>Name of the schema containing the table to create
<table name>Name of the table to create
<column name>Name of a column to create within the table
<column definition>Definition of the column associated with <column name>; see Data Definition (DDL) for column format
PRIMARY KEY (<column list>)Optional primary key specification clause, where <column list> is a comma-separated list of columns to use as the primary key for the table
SHARD KEY (<column list>)Optional shard key specification clause, where <column list> is a comma-separated list of columns to use as the shard key for the table
FOREIGN KEY...

Optional comma-separated set of foreign key specification clauses, with the following parameters:

ParametersDescription
<column list>Comma-separated list of columns in the table to create that will reference a matching set of primary key columns in another table
<foreign table name>Name of target table referred to in this foreign key
<foreign column list>The primary key columns in the target table referred to in this foreign key, matching the list of columns specified in <column list> in the table to create
AS <foreign key name>Optional alias for the foreign key
<partition clause>Defines a partitioning scheme for the table to create
<tier strategy clause>Defines the tier strategy for the table to create
<index clause>Applies any number of column indexes or chunk skip indexes to the table to create
<table property clause>Assigns properties, from a subset of those available, to the table to create

For example, to create a table with various column types and properties:

CREATE TABLE Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
CREATE TABLE example.various_types
(
    i   INTEGER NOT NULL,                                 /* non-nullable integer, part of primary key (defined at end)                         */
    bi  BIGINT NOT NULL,                                  /* long, part of primary key, shard key, foreign key source (defined at end)          */
    ub  UNSIGNED BIGINT,                                  /* native unsigned long                                                               */
    r   REAL,                                             /* native float                                                                       */
    f   FLOAT,                                            /* native float                                                                       */
    d   DOUBLE(STORE_ONLY),                               /* native double, not in-memory                                                       */
    s   VARCHAR(STORE_ONLY, TEXT_SEARCH),                 /* string, searchable, not in-memory, only limited in size by system-configured value */
    c   VARCHAR(30, DICT),                                /* char32 using dictionary-encoding of values                                         */
    p   VARCHAR(256, SNAPPY, TEXT_SEARCH),                /* char256, searchable, using Snappy compression of values                            */
    ip  IPV4,                                             /* IP address                                                                         */
    ui  UUID(INIT_WITH_UUID),                             /* UUID                                                                               */
    ts  TIMESTAMP,                                        /* timestamp                                                                          */
    td  DATE,                                             /* simple date                                                                        */
    tt  TIME,                                             /* simple time                                                                        */
    dt  DATETIME(INIT_WITH_NOW),                          /* date/time                                                                          */
    dc1 DECIMAL,                                          /* native long                                                                        */
    dc2 DECIMAL(18,4),                                    /* native decimal                                                                     */
    dc3 DECIMAL(6,5),                                     /* native float                                                                       */
    dc4 DECIMAL(7, 5, STORE_ONLY),                        /* double, not in-memory                                                              */
    n   NUMERIC(5, 3),                                    /* native decimal, the next largest native numeric type to hold the number type       */
    wkt WKT,                                              /* geospatial column for WKT string data                                              */
    PRIMARY KEY (i, bi),                                  /* primary key columns must be NOT NULL                                               */
    SHARD KEY (bi),                                       /* shard key columns must be part of the primary key                                  */
    FOREIGN KEY (bi) REFERENCES example.lookup(id) AS fk  /* foreign key is often on the shard key                                              */
)
INDEX (ip)                                                /* index on IP column                                                                 */
INDEX (ts)                                                /* index on timestamp column                                                          */

Partition Clause

A table can be further segmented into partitions. The supported partition types are:

See Partitioning for details.

Range Partitioning

The general format for the range partition clause is:

PARTITION BY RANGE Syntax
1
2
3
4
5
6
7
8
9
PARTITION BY RANGE ( <column expression> )
[
    PARTITIONS
    (
        <partition name> [ MIN ( <least value> ) ] [ MAX ( <greatest value> ) ],
        ...
        <partition name> [ MIN ( <least value> ) ] [ MAX ( <greatest value> ) ]
    )
]

The partition definition clause, PARTITIONS, is optional, though it is recommended to define partitions at table creation time, when feasible.

Warning

Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.

For example, to create a range-partitioned table with the following criteria:

  • partitioned on the date/time of the order
  • partitions for years:
    • 2014 - 2016
    • 2017
    • 2018
    • 2019
  • records not in that range go to the default partition
PARTITION BY RANGE Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
CREATE TABLE example.customer_order_range_partition_by_year
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY RANGE (YEAR(purchase_ts))
PARTITIONS
(
    order_2014_2016 MIN(2014) MAX(2017),
    order_2017                MAX(2018),
    order_2018                MAX(2019),
    order_2019                MAX(2020)
)

Interval Partitioning

The general format for the interval partition clause is:

PARTITION BY INTERVAL Syntax
1
2
3
4
5
PARTITION BY INTERVAL ( <column expression> )
PARTITIONS
(
    STARTING [AT] (<least value>) INTERVAL (<interval size>)
)

For example, to create an interval-partitioned table with the following criteria:

  • partitioned on the date/time of the order
  • one partition for each year from 2014 on
  • later year partitions are added as necessary
  • records prior to 2014 go to the default partition
PARTITION BY INTERVAL (Year) Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE example.customer_order_interval_partition_by_year
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (YEAR(purchase_ts))
PARTITIONS
(
    STARTING AT (2014) INTERVAL (1)
)

To create an interval-partitioned table with the following criteria:

  • partitioned on the date/time of the order
  • one partition for each day from January 1st, 2014 on
  • later day partitions are added as necessary
  • records prior to 2014 go to the default partition
PARTITION BY INTERVAL (Day) Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE example.customer_order_interval_partition_by_day_timestampdiff
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (TIMESTAMPDIFF(DAY, '2014-01-01', purchase_ts))
PARTITIONS
(
    STARTING AT (0) INTERVAL (1)
)

The same interval-partitioned scheme above can be created using the timestamp column directly, with the help of the INTERVAL function (described in the Date/Time Functions section):

PARTITION BY INTERVAL (Day) Alternate Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE example.customer_order_interval_partition_by_day_interval
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (purchase_ts)
PARTITIONS
(
    STARTING AT ('2014-01-01') INTERVAL (INTERVAL '1' DAY)
)

This scheme can be easily modified to create an hourly partition instead:

PARTITION BY INTERVAL (Hour) Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE example.customer_order_interval_partition_by_hour_interval
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP
)
PARTITION BY INTERVAL (purchase_ts)
PARTITIONS
(
    STARTING AT ('2014-01-01') INTERVAL (INTERVAL '1' HOUR)
)

List Partitioning

The list partition clause has two forms:

Manual

The general format for the manual list partition clause is:

PARTITION BY LIST Syntax
1
2
3
4
5
6
7
8
9
PARTITION BY LIST ( <column expression list> )
[
    PARTITIONS
    (
        <partition name> VALUES ( <value lists> ),
        ...
        <partition name> VALUES ( <value lists> )
    )
]

The partition definition clause, PARTITIONS, is optional, though it is recommended to define partitions at table creation time, when feasible.

Warning

Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.

For example, to create a manual list-partitioned table with the following criteria:

  • partitioned on the date/time of the order
  • partitions for years:
    • 2014 - 2016
    • 2017
    • 2018
    • 2019
  • records not in that list go to the default partition
PARTITION BY LIST (Year) Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
CREATE TABLE example.customer_order_manual_list_partition_by_year
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY LIST (YEAR(purchase_ts))
PARTITIONS
(
    order_2014_2016 VALUES (2014, 2015, 2016),
    order_2017      VALUES (2017),
    order_2018      VALUES (2018),
    order_2019      VALUES (2019)
)

To create a manual list-partitioned table with a multi-column key and the following criteria:

  • partitioned on the date/time of the order
  • each partition corresponds to a unique year & month pair
  • partitions for years/months:
    • February 2016 & March 2016
    • March 2020
  • records not in that list go to the default partition
PARTITION BY LIST (Month) Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
CREATE TABLE example.customer_order_manual_list_partition_by_year_and_month
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY LIST (YEAR(purchase_ts), MONTH(purchase_ts))
PARTITIONS
(
    order_2016_0203 VALUES ((2016, 2), (2016, 3)),
    order_2020_03   VALUES ((2020, 3))
)
Automatic

The general format for the automatic list partition clause is:

Automatic PARTITION BY LIST Syntax
1
2
PARTITION BY LIST ( <column expression list> )
AUTOMATIC

To create an automatic list-partitioned table with the following criteria:

  • partitioned on the date/time of the order
  • one partition for each unique year & month across all orders
  • partitions are added as necessary
Automatic PARTITION BY LIST (Month) Example
1
2
3
4
5
6
7
8
9
CREATE TABLE example.customer_order_automatic_list_partition_by_year_and_month
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY LIST (YEAR(purchase_ts), MONTH(purchase_ts))
AUTOMATIC

Hash Partitioning

The general format for the hash partition clause is:

PARTITION BY HASH Syntax
1
2
PARTITION BY HASH ( <column expressions> )
PARTITIONS <total partitions>

To create a hash-partitioned table with the following criteria:

  • partitioned on the date/time of the order
  • distributed among the fixed set of partitions, based on the hash of the year & month of the order
  • 10 partitions
PARTITION BY HASH Example
1
2
3
4
5
6
7
8
9
CREATE TABLE example.customer_order_hash_partition_by_year_and_month
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY HASH (YEAR(purchase_ts), MONTH(purchase_ts))
PARTITIONS 10

Series Partitioning

The general format for the series partition clause is:

PARTITION BY SERIES
1
2
PARTITION BY SERIES ( <column list> )
[PERCENT_FULL <percentage>]

The PERCENT_FULL should be an integer between 1 and 100; the default is 50%.

To create a series-partitioned table with the following criteria:

  • partitioned on the customer of each order
  • partitions with closed key sets will contain all orders from a set of unique customers
  • 50% fill threshold
PARTITION BY SERIES Example
1
2
3
4
5
6
7
8
9
CREATE TABLE example.customer_order_series_partition_by_customer
(
    id           INT NOT NULL,
    customer_id  INT NOT NULL,
    total_price  DECIMAL(10,2),
    purchase_ts  TIMESTAMP NOT NULL
)
PARTITION BY SERIES (customer_id)
PERCENT_FULL 50

To create a series-partitioned track table with the following criteria:

  • partitioned on the track ID
  • partitions with closed key sets will contain all points from a unique set of tracks
  • 25% fill threshold
PARTITION BY SERIES (Tracks) Example
1
2
3
4
5
6
7
8
9
CREATE TABLE example.route_series_partition_by_track
(
    TRACKID    VARCHAR NOT NULL,
    x          DOUBLE NOT NULL,
    y          DOUBLE NOT NULL,
    TIMESTAMP  TIMESTAMP NOT NULL
)
PARTITION BY SERIES (TRACKID)
PERCENT_FULL 25

Tier Strategy Clause

A table can have a tier strategy specified at creation time. If not assigned a tier strategy upon creation, a default tier strategy will be assigned.

TIER STRATEGY Clause Syntax
1
2
3
4
5
6
TIER STRATEGY
(
    <tier strategy>,
    ...
    <tier strategy>
)

For example, to create a customer_order table with an above-average eviction priority in the RAM Tier:

TIER STRATEGY Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE OR REPLACE TABLE example.customer_order
(
    id          INT NOT NULL,
    customer_id INT NOT NULL,
    total_price DECIMAL(10,2),
    purchase_ts TIMESTAMP,
    SHARD KEY (customer_id)
)
TIER STRATEGY
(
    ( ( VRAM 1, RAM 7, PERSIST 5 ) )
)

If not specified, the default tier strategy will be assigned:

Default TIER STRATEGY Example
1
2
3
4
5
6
7
8
CREATE OR REPLACE TABLE example.customer_order
(
    id          INT NOT NULL,
    customer_id INT NOT NULL,
    total_price DECIMAL(10,2),
    purchase_ts TIMESTAMP,
    SHARD KEY (customer_id)
)
1
SHOW CREATE TABLE example.customer_order
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
| CREATE TABLE "example"."customer_order"
(
   "id" INTEGER NOT NULL,
   "customer_id" INTEGER (shard_key) NOT NULL,
   "total_price" DECIMAL(18,4),
   "purchase_ts" TIMESTAMP
)
TIER STRATEGY (
( ( VRAM 1, RAM 7, PERSIST 5 ) )
) |

Note

The response to SHOW CREATE TABLE is a single-record result set with the DDL statement as the value in the DDL column, shown here with the column separators returned by kisql.

Index Clause

A table can have any number of indexes applied to any of its columns at creation time.

The two types of indexes supported are:

Index Clause Syntax
1
2
3
<CHUNK [SKIP] | [ATTRIBUTE]> INDEX (<column>)
...
<CHUNK [SKIP] | [ATTRIBUTE]> INDEX (<column>)

For example, to create a table with a column index on dept_id and a chunk skip index on id:

Index Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CREATE TABLE example.employee
(
    id INT NOT NULL,
    dept_id INT NOT NULL,
    manager_id INT,
    first_name VARCHAR(30),
    last_name VARCHAR(30),
    sal DECIMAL,
    hire_date DATE,
    PRIMARY KEY (id, dept_id),
    SHARD KEY (dept_id)
)
INDEX (dept_id)
CHUNK SKIP INDEX (id)

Table Property Clause

A subset of table properties can be applied to the table at creation time.

Table Property Clause
1
2
3
4
5
6
USING TABLE PROPERTIES
(
    <table property> = <value>,
    ...
    <table property> = <value>
)

Available table properties include:

PropertyDescription
CHUNK SIZESize of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold
TTLThe time-to-live (TTL) for the table; if not set, the table will not expire

For example, to create a table with up to 1,000,000 records per chunk and that will expire in 15 minutes:

Table Property Example
1
2
3
4
5
6
7
8
9
CREATE OR REPLACE TABLE example.customer_order
(
    id          INT NOT NULL,
    customer_id INT NOT NULL,
    total_price DECIMAL(10,2),
    purchase_ts TIMESTAMP,
    SHARD KEY (customer_id)
)
USING TABLE PROPERTIES (CHUNK SIZE = 1000000, TTL = 15)

CREATE EXTERNAL TABLE

Creates a new external table, which is a database object whose source data is located in one or more files, external to the database. The source data can be located in any of the following locations:

  • on shared storage, mounted on each database node in the cluster
  • in KiFS
  • on a remote system, accessible via a data source

Its use with ring resiliency has additional considerations.

The external table & column names used must adhere to the supported naming criteria.

An external table can subscribe to a data source that references an Azure or S3 source of data configured for streaming.

Note

An external table cannot use a data source configured for Kafka; however, a standard table can have Kafka data streamed into it via a LOAD INTO command that references such a data source.

CREATE EXTERNAL TABLE Syntax
1
2
3
4
5
6
7
8
9
CREATE [OR REPLACE] [REPLICATED] [TEMP] [LOGICAL | MATERIALIZED] EXTERNAL TABLE [<schema name>.]<table name>
[<table definition clause>]
FILE PATHS <file paths>
[FORMAT <[DELIMITED] TEXT [(<delimited text options>)] | PARQUET | JSON | SHAPEFILE>]
[WITH OPTIONS (<load option name> = '<load option value>'[,...])]
[<partition clause>]
[<tier strategy clause>]
[<index clause>]
[<table property clause>]
ParametersDescription
OR REPLACEAny existing table or view with the same name will be dropped before creating this one
REPLICATEDThe external table will be distributed within the database as a replicated table
TEMPIf the database is restarted, the external table will be removed
LOGICALExternal data will not be loaded into the database; the data will be retrieved from the source upon servicing each query against the external table. This mode ensures queries on the external table will always return the most current source data, though there will be a performance penalty for reparsing & reloading the data from source files upon each query.
MATERIALIZEDLoads a copy of the external data into the database, refreshed on demand; this is the default external table type
<schema name>Name of the schema to contain the external table associated with the source data
<table name>Name of the external table to associate with the source data; must meet the required naming criteria
<table definition clause>Optional clause, defining the structure for the external table associated with the source data
FILE PATHS <file paths>

External data file specification clause, where <file paths> is a comma-separated list of single-quoted file paths from which data will be loaded; all files specified are presumed to have the same format and data types.

The form of a file path is dependent on the source referenced:

  • Data Source: If a data source is specified in the external table load options these file paths must resolve to accessible files at that data source location. A "path prefix" can be specified, which will cause all files whose path begins with the given prefix to be included.

    For example, a "path prefix" of /data/ge for <file paths> would match all of the following:

    • /data/geo.csv
    • /data/geo/flights.csv
    • /data/geo/2021/airline.csv

    If using an HDFS data source, the "path prefix" must be the name of an HDFS directory.

  • KiFS: The path must resolve to an accessible file path within KiFS. A "path prefix" can be specified, which will cause all files whose path begins with the given prefix to be included.

    For example, a "path prefix" of kifs://data/ge would match all of the following files under the KiFS data directory:

    • kifs://data/geo.csv
    • kifs://data/geo/flights.csv
    • kifs://data/geo/2021/airline.csv
  • Kinetica Cluster: The paths are OS-level paths under the Kinetica configuration file's external files directory that are accessible to the gpudb user and either:

    • accessible to all nodes in the Kinetica cluster
    • mirrored across each node in the Kinetica cluster

    A "path prefix" that references a directory on the cluster file system can be specified, which will cause all files contained within that directory and its subdirectories to be included. Wildcards can also be used to specify a group of files.

    For example, if external_files_directory is set to /opt/gpudb/data, the following <file paths> specifications will refer to these corresponding files:

    File PathReference
    /opt/gpudb/data/products.csvFile /opt/gpudb/data/products.csv
    archive/products.csvFile /opt/gpudb/data/archive/products.csv
    archive/*.csvAll files under /opt/gpudb/data/archive with a csv extension
    archiveAll files under /opt/gpudb/data/archive and its subdirectories
    /opt/gpudb/products.csvError, as absolute path /opt/gpudb does not contain external files directory /opt/gpudb/data as part of the path
FORMAT

Optional indicator of source file type; will be inferred from file extension if not given.

Supported formats include:

KeywordDescription
[DELIMITED] TEXT

Any text-based, delimited field data file (CSV, PSV, TSV, etc.); a comma-delimited list of options can be given to specify the way in which the data file(s) should be parsed, including the delimiter used, whether headers are present, etc.

See Delimited Text Options for the complete list of <delimited text options>.

PARQUETApache Parquet data file
JSONEither a JSON or GeoJSON data file
SHAPEFILEArcGIS shapefile
WITH OPTIONS

Optional indicator that a comma-delimited list of connection & global option/value assignments will follow.

See Load Options for the complete list of options

<partition clause>Optional clause, defining a partitioning scheme for the external table associated with the source data
<tier strategy clause>Optional clause, defining the tier strategy for the external table associated with the source data
<index clause>Optional clause, applying any number of column indexes or chunk skip indexes to the external table associated with the source data
<table property clause>Optional clause, assigning properties, from a subset of those available, to the external table associated with the source data

To create an external table with the following features, using a local file as the source of data:

  • External table named ext_employee in the example schema
  • External source is a Parquet file located on the head node at data/employee.parquet, relative to the configured external files directory
  • External table has a primary key on the id column
  • Data is not refreshed on database startup
CREATE EXTERNAL TABLE with Local File Example
1
2
3
4
CREATE EXTERNAL TABLE example.ext_employee
FILE PATHS 'data/employee.parquet'
FORMAT PARQUET
WITH OPTIONS (PRIMARY KEY = 'id')

To create an external table with the following features, using KiFS as the source of data:

  • External table named ext_product in the example schema
  • External source is a KiFS file named product.csv located in the data directory
  • Data is not refreshed on database startup
1
2
CREATE EXTERNAL TABLE example.ext_product
FILE PATHS 'kifs://data/products.csv'

To create an external table with the following features, using a data source as the source of data:

  • External table named ext_product in the example schema
  • External source is a data source named product_ds
  • Source is a file named products.csv
  • Data is refreshed on database startup
CREATE EXTERNAL TABLE with Data Source Example
1
2
3
4
5
6
7
CREATE EXTERNAL TABLE example.ext_product
FILE PATHS 'products.csv'
WITH OPTIONS
(
    DATA SOURCE = 'product_ds',
    REFRESH ON START = true
)

To create an external table with the following features, subscribing to a data source:

  • External table named ext_product in the example schema
  • External source is a data source named product_ds
  • Source is a file named products.csv
  • Data updates are streamed continuously
CREATE EXTERNAL TABLE with Data Source Subscription Example
1
2
3
4
5
6
7
CREATE EXTERNAL TABLE example.ext_product
FILE PATHS 'products.csv'
WITH OPTIONS
(
   DATA SOURCE = 'product_ds',
   SUBSCRIBE = true
)

Delimited Text Options

The following options can be specified when loading data from delimited text files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.

OptionDescription
COMMENT = '<string>'

Treat lines in the source file(s) that begin with string as comments and skip.

The default comment marker is #.

DELIMITER = '<char>'

Use char as the source file field delimiter.

The default delimiter is a comma, unless a source file has one of these extensions:

  • .psv - will cause | to be the delimiter
  • .tsv - will cause the tab character to be the delimiter

See Delimited Text Option Characters for allowed characters.

ESCAPE = '<char>'

Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data StringRepresentation when Loaded into the Database
<char>aASCII bell
<char>bASCII backspace
<char>fASCII form feed
<char>nASCII line feed
<char>rASCII carriage return
<char>tASCII horizontal tab
<char>vASCII vertical tab

For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database.

The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

There is no default escape character.

HEADER DELIMITER = '<char>'

Use char as the source file header field name/property delimiter, when the source file header contains both names and properties. This is largely specific to the Kinetica export to delimited text feature, which will, within each field's header, contain the field name and any associated properties, delimited by the pipe | character.

An example Kinetica header in a CSV file:

The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters.

Note

The DELIMITER character will still be used to separate field name/property sets from each other in the header row

INCLUDES HEADER = <TRUE|FALSE>Declare that the source file(s) will or will not have a header. If not given, that determination will be intuited.
NULL = '<string>'

Treat string as the indicator of a null source field value.

The default is the empty string.

QUOTE = '<char>'

Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given.

The default is the " (double-quote) character. See Delimited Text Option Characters for allowed characters.

Delimited Text Option Characters

For DELIMITER, HEADER DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:

Escaped CharCorresponding Source File Character
''Single quote
\aASCII bell
\bASCII backspace
\fASCII form feed
\tASCII horizontal tab
\vASCII vertical tab

For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.

Load Options

The following options can be specified to modify the way data is loaded (or not loaded) into the target table.

OptionDescription
BATCH SIZE

Use an ingest batch size of the given number of records.

The default batch size is 10,000.

COLUMN FORMATS

Use the given type-specific formatting for the given column when parsing source data being loaded into that column. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string.

Supported column types include:

Column TypeSource Data Format Codes
date

Apply the given date format to the given column.

Important

Valid formats must contain one year, month, & day code

Format codes include:

CodeDescription
%Y4-digit year
%m2-digit month, where January is 01
%d2-digit day of the month, where the 1st of each month is 01
time

Apply the given time format to the given column.

Important

Valid formats must contain hours, minutes, & one of the seconds codes.

Format codes include:

CodeDescription
%H24-based hour, where 12:00 AM is 00 and 7:00 PM is 19
%M2-digit minute of the hour
%S2-digit second of the minute
%s

2-digit second of the minute plus an N-digit fractional component, separated by a dot

Note

fractional seconds will be truncated after the milliseconds place

datetime

Apply the given date/time format to the given column.

Important

Valid formats must contain a date & time format, as detailed above.

For example, to make the YYYY.MM.DD format for loading source data into date column d and HH:MM:SS format for loading source data into time column t:

{
    "d": {"date": "%Y.%m.%d"},
    "t": {"time": "%H:%M:%S"}
}
DATA SOURCELoad data from the given data source.
DEFAULT COLUMN FORMATS

Use the given formats for source data being loaded into target table columns with the corresponding column types. This should be a map of target column type to source format for data being loaded into columns of that type, formatted as a JSON string.

Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option.

For example, to make the default format for loading source data dates in the form YYYY.MM.DD and times in the form HH:MM:SS:

{
    "date": "%Y.%m.%d",
    "time": "%H:%M:%S",
    "datetime": "%Y.%m.%d %H:%M:%S"
}
FIELDS MAPPED BY < POSITION(<col#s>) | NAME(<field names>) >

Choose a comma-separated list of fields from the source file(s) to load, specifying fields by either position or name. If loading by name, the source file field names must match the target table column names exactly.

Note

When specifying source data file fields, the set of source data file fields must align, in type & number, with the target table columns into which data will be loaded.

Important

Field mapping by position is not supported for Parquet files.

INGESTION MODE

Whether to do a full ingest of the data or perform a dry run or type inference instead. The default mode is FULL.

ValueDescription
DRY RUNNo data will be inserted, but the file will be read with the applied ON ERROR mode and the number of valid records that would normally be inserted is returned.
FULLData is fully ingested according to the active ON ERROR mode.
TYPE INFERENCEInfer the type of the source data and return, without ingesting any data. The inferred type is returned in the response, as the output of a SHOW CREATE TABLE command.
LOADING MODE

Use one of the following distribution schemes to load data files. The default mode is HEAD.

ModeDescription
HEADThe head node loads all data. All files must be available to the head node.
DISTRIBUTED LOCAL

A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data).

Note

  • If the table's columns aren't defined, table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.
  • This mode should not be used in conjunction with a data source, as data sources are seen by all worker processes as shared resources with no "local" component.
  • If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.
DISTRIBUTED SHARED

The head node coordinates loading data by worker processes across all nodes from shared files available to all workers.

Note

Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.

ON ERROR

When an error is encountered loading a record, handle it using one of the following modes. The default mode is PERMISSIVE.

ModeDescription
PERMISSIVEIf an error is encountered parsing a source record, attempt to insert as many of the valid fields from the record as possible; insert a null for each errant value found.
SKIPIf an error is encountered parsing a source record, skip the record.
ABORTIf an error is encountered parsing a source record, stop the data load process. Primary key collisions are considered abortable errors in this mode.
POLL_INTERVAL

Interval, in seconds, at which a data source is polled for updates. Only valid when SUBSCRIBE is true. Default is 60 seconds.

Note

The number of seconds must be passed as a single-quoted string.

REFRESH ON START

Whether to refresh the external table data upon restart of the database. Only relevant for materialized external tables. Ignored, if SUBSCRIBE is TRUE. The default value is FALSE.

ValueDescription
TRUERefresh the external table's data when the database is restarted.
FALSEDo not refresh the external table's data when the database is restarted.
SUBSCRIBE

Whether to subscribe to the data source specified in the DATA SOURCE option. Only relevant for data sources configured to allow streaming. If TRUE, the REFRESH ON START option is ignored. The default value is FALSE.

Important

This option is not yet available for Kafka data sources.

ValueDescription
TRUESubscribe to the specified streaming data source.
FALSEDo not subscribe to the specified data source.

Table Definition Clause

The table definition clause allows for an explicit local table structure to be defined, irrespective of the source data type. This specification mirrors that of CREATE TABLE.

Table Definition Clause Syntax
1
2
3
4
5
6
7
8
(
    <column name> <column definition>,
    ...
    <column name> <column definition>,
    [PRIMARY KEY (<column list>)],
    [SHARD KEY (<column list>)],
    [FOREIGN KEY (<column list>) REFERENCES <foreign table name>(<foreign column list>) [AS <foreign key name>],...]
)

See Data Definition (DDL) for column format.

Partition Clause

An external table can be further segmented into partitions.

The supported Partition Clause syntax & features are the same as those in the CREATE TABLE Partition Clause.

Tier Strategy Clause

An external table can have a tier strategy specified at creation time. If not assigned a tier strategy upon creation, a default tier strategy will be assigned.

The supported Tier Strategy Clause syntax & features are the same as those in the CREATE TABLE Tier Strategy Clause.

Index Clause

An external table can have any number of indexes applied to any of its columns at creation time.

The supported Index Clause syntax & features are the same as those in the CREATE TABLE Index Clause.

Table Property Clause

A subset of table properties can be applied to the external table associated with the external data at creation time.

The supported Table Property Clause syntax & features are the same as those in the CREATE TABLE Table Property Clause.


CREATE TABLE...AS

Creates a new table from the given query in the specified schema. If no schema is specified, the table will be created in the user's default schema.

The table and any column aliases used must adhere to the supported naming criteria.

CREATE TABLE...AS Syntax
1
2
3
4
CREATE [OR REPLACE] [REPLICATED] [TEMP] TABLE [<schema name>.]<table name> AS
(
    <select statement>
)
Parameters/KeysDescription
OR REPLACEAny existing table or view with the same name will be dropped before creating this one
REPLICATEDThe table will be distributed within the database as a replicated table
TEMPIf the database is restarted, the table will be removed
<schema name>Name of the schema containing the table to create
<table name>Name of the table to create
<select statement>The query that will define both the initial structure and content of the created table

While primary keys & foreign keys are not transferred to the new table, shard keys will be, if the column(s) composing them are part of the SELECT list.

The following can be applied to <select statement> to affect the resulting table:

KeywordTypeDescription
KI_HINT_GROUP_BY_PKhintCreates a primary keys on the columns in the GROUP BY clause if the outermost SELECT statement contains a GROUP BY
KI_HINT_INDEX(column list)hintIndexes each of the columns specified in the column list
KI_SHARD_KEY(column list)pseudo-functionShards the result table with a shard key composed of all columns in the specified column list

For example, to create a replicated temporary table that is a copy of an existing table, failing if a table with the same name as the target table already exists:

CREATE TABLE...AS (Replicated/Temporary) Example
1
2
3
4
5
CREATE REPLICATED TEMP TABLE example.new_temporary_table AS
(
    SELECT *
    FROM example.old_table
)

To create a permanent table with columns a, b, c, & d a new shard key on columns a & b, and an index on column d, replacing a table with the same name as the target table, if it exists:

CREATE TABLE...AS (Reshard) Example
1
2
3
4
5
CREATE OR REPLACE TABLE example.new_sharded_table AS
(
    SELECT a, b, c, d, KI_SHARD_KEY(a, b) /* KI_HINT_INDEX(d) */
    FROM example.old_table
)

To copy a table with columns a, b, c, & d, preserving the primary key on a, b, & c, and the foreign key from d; a new table must be created to match the schema of the old one and then records can be copied from the old one to the new one:

CREATE TABLE (Preserve Primary Key) Example, DDL Step
1
2
3
4
5
6
7
8
9
CREATE TABLE example.new_pk_copy_table
(
    a INT NOT NULL,
    b INT NOT NULL,
    c VARCHAR(32) NOT NULL,
    d TIMESTAMP,
    PRIMARY KEY (a, b, c),
    FOREIGN KEY (d) REFERENCES example.old_table_lookup(d)
)
CREATE TABLE (Preserve Primary Key) Example, DML Step
1
2
3
INSERT INTO example.new_pk_copy_table
SELECT *
FROM example.old_table

Note

This create/insert process is necessary, as neither primary keys nor foreign keys can be preserved through hints.

See Limitations for other restrictions.

ALTER TABLE

Any of the following facets of a table can be altered:

Rename Table

A table can be renamed, following the supported naming criteria.

Rename Table Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
RENAME TO <new table name>

Move Table

A table can be moved from one schema to another.

Move Table Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
< MOVE TO | SET SCHEMA > <other schema name>

For example, to move the sales_2017 table from the example_olap schema to the example_archive schema:

Move Table Example
1
2
ALTER TABLE example_olap.sales_2017
MOVE TO example_archive

Set Access Mode

A table can have its global accessibility modified for all users in the system, independently from and further restricting any role-based access controls in place. Note that changing the access mode cannot widen access for users not already granted access; it can only narrow access for those who already have access. This setting will also trump administrative access to a table.

Set Table Access Mode Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
SET ACCESS MODE < NO_ACCESS | READ_ONLY | WRITE_ONLY | READ_WRITE >

Set TTL

A table's time-to-live (TTL) can be altered.

Set Table TTL Syntax
1
2
ALTER TABLE <table name>
SET TTL <new ttl>

For example, to set a TTL of 7 minutes on a table:

Set Table TTL Example
1
2
ALTER TABLE example.employee
SET TTL 7

To set a table to never expire by TTL timeout:

Set Table No Expiration Example
1
2
ALTER TABLE example.employee
SET TTL -1

Add Column

A column can be added, specifying a column definition.

A new column can have its values initially populated through the use of the DEFAULT keyword. These values can either be a string/numeric constant or the name of an existing column in the table from which values can be copied into the new column. This default value is only in effect for the column creation; the new column will have no default value after that.

Add Table Column Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
ADD <column name> <column definition> [DEFAULT <string/numeric constant | column name>]

Note

Column compression must be applied after a new column is added; see Compress Column for syntax.

Examples

To add, to the employee table, a salary column that is a non-nullable, store-only, 10-digit number field containing 2 decimal places with a default value of 0:

Add Table Column (Numeric/Store-Only) Example
1
2
ALTER TABLE example.employee
ADD salary NUMERIC(10, 2, STORE_ONLY) NOT NULL DEFAULT 0

To add, to the employee table, a category column that is a nullable, dictionary-encoded, 32-character text field:

Add Table Column (CharN/Dictionary-Encoded) Example
1
2
ALTER TABLE example.employee
ADD category VARCHAR(32, DICT)

To add, to the employee table, a bio column that is a nullable, text-searchable, disk-optimized, unrestricted-width text field:

Add Table Column (String/Text-Searchable/Disk-Optimized) Example
1
2
ALTER TABLE example.employee
ADD bio VARCHAR(TEXT_SEARCH, DISK_OPTIMIZED)

Rename Column

An existing column can be renamed.

Rename Table Column Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
RENAME COLUMN <column current name> TO <column new name>
Rename Table Column Example
1
2
ALTER TABLE example.employee
RENAME COLUMN bio TO biography

Modify Column

A column can have its column definition modified, affecting column type, column size, column properties, and nullability.

If a column is modified to be non-nullable, it will be populated with default values--empty string for string fields and 0 for numeric fields.

Modify Table Column Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
MODIFY [COLUMN] <column name> <column definition>
Modify Table Column Alternate Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
ALTER COLUMN <column name> <column definition>

Note

Column compression must be applied after an existing column is modified; see Compress Column for syntax.

Examples

To change, in the employee table, the first_name column to one that is a non-nullable, dictionary-encoded, 50-character text field:

Modify Table Column Example
1
2
ALTER TABLE example.employee
ALTER COLUMN first_name VARCHAR(50, DICT) NOT NULL

Compress Column

A column can have its data compressed in memory.

Compress Table Column Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
SET COLUMN <column name> COMPRESSION [TO] <compression type>

For example, to use LZ4 compression on a column:

Compress Table Column Example
1
2
ALTER TABLE example.employee
SET COLUMN last_name COMPRESSION LZ4

To use no compression on a column:

Uncompress Table Column Example
1
2
ALTER TABLE example.employee
SET COLUMN last_name COMPRESSION NONE

Drop Column

An existing column can be removed from a table.

Drop Table Column Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
DROP COLUMN <column name>

Add Column Index

A column (attribute) index can be added to a table column in order to improve the performance of operations whose expressions contain relational operators against the column. See Limitations for restrictions.

Add Table Column Index Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
ADD INDEX (<column name>)

For example, to index the employee table's department ID column:

Add Table Column Index Example
1
2
ALTER TABLE example.employee
ADD INDEX (dept_id)

Drop Column Index

An existing column (attribute) index can be removed from a table.

Drop Table Column Index Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
DROP INDEX (<column name>)

For example, to drop the index on the employee table's department ID column:

Drop Table Column Index Example
1
2
ALTER TABLE example.employee
DROP INDEX (dept_id)

Add Chunk Skip Index

A chunk skip index can be added to a table column in order to improve the performance of operations containing equality-based filters against the column. See Limitations for restrictions.

Add Table Chunk Skip Index Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
ADD CHUNK [SKIP] INDEX (<column name>)

For example, to index the employee table's department ID column:

Add Table Chunk Skip Index Example
1
2
ALTER TABLE example.employee
ADD CHUNK SKIP INDEX (dept_id)

Drop Chunk Skip Index

An existing chunk skip index can be removed from a table.

Drop Table Chunk Skip Index Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
DROP CHUNK [SKIP] INDEX (<column name>)

For example, to drop the chunk skip index on the employee table's department ID column:

Drop Table Chunk Skip Index Example
1
2
ALTER TABLE example.employee
DROP CHUNK SKIP INDEX (dept_id)

Add Foreign Key

A foreign key can be added to any column or set of columns not marked as store-only in order to improve the performance of join operations between the table being altered and the table referenced in the foreign key.

Add Table Foreign Key Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
ADD FOREIGN KEY (<column name>,...) REFERENCES <foreign table name>(<foreign column name>,...) [AS <foreign key name>]

For example, to add a foreign key on the employee table's department ID column, linking it to the department table's department ID column:

Add Table Foreign Key Example
1
2
ALTER TABLE example.employee
ADD FOREIGN KEY (dept_id) REFERENCES example.department(id) AS fk_emp_dept

Drop Foreign Key

An existing foreign key can be removed from a table, either by the name (alias) given to it during creation or by its definition:

Drop Table Foreign Key by Name Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
DROP FOREIGN KEY <foreign key name>
Drop Table Foreign Key by Definition Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
DROP FOREIGN KEY (<column name>,...) REFERENCES <foreign table name>(<foreign column name>,...)

For example, to drop the foreign key on the employee table's department ID column:

Drop Table Foreign Key by Name Example
1
2
ALTER TABLE example.employee
DROP FOREIGN KEY fk_emp_dept
Drop Table Foreign Key by Definition Example
1
2
ALTER TABLE example.employee
DROP FOREIGN KEY (dept_id) REFERENCES example.department(id)

Add Partition

A partition can be added to a range-partitioned or list-partitioned table.

Warning

Defining (adding) partitions after data has been loaded will result in a performance penalty as the database moves existing records targeted for the new partition from the default partition into it.

Range Partition

The new partition can be given a minimum bound (inclusive) and a maximum bound (exclusive). If the new partition would come before an existing partition, omitting the maximum bound would cause the new partition to take on the nearest following existing partition's minimum bound as its maximum bound. If the new partition would come after an existing partition, omitting the minimum bound would cause the new partition to take on the nearest preceding partition's maximum bound as its minimum bound. If no partitions are present in the table, the new partition will have to declare both a minimum and maximum bound.

Add Table Range Partition Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
ADD PARTITION <partition name> [ MIN ( <least value> ) ] [ MAX ( <greatest value> ) ]

For example, to add a partition to the customer_order_range_by_year table, containing all records with a partition key less than 2020 and greater than or equal to the maximum bound of the nearest preceding partition:

Add Table Range Partition Example
1
2
ALTER TABLE example.customer_order_range_partition_by_year
ADD PARTITION order_2020 MAX(2021)

List Partition

The new partition can be given a list of values to match against the partition key values of incoming records.

Add Table List Partition Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
ADD PARTITION <partition name> VALUES ( <value lists> )

For example, to add a partition to the customer_order_manual_list_partition_by_year table, containing all records from 2020:

Add Table List Partition (Year) Example
1
2
ALTER TABLE example.customer_order_manual_list_partition_by_year
ADD PARTITION order_2020 VALUES (2020)

For example, to add a partition to the customer_order_manual_list_partition_by_year_and_month table, containing all records from February 2020 & April 2020:

Add Table List Partition (Month) Example
1
2
ALTER TABLE example.customer_order_manual_list_partition_by_year_and_month
ADD PARTITION order_2020_0204 VALUES ((2020, 2), (2020, 4))

Remove Partition

An existing partition can be removed from a range-partitioned or list-partitioned table, sending all data contained within that partition back to the default partition.

Remove Table Partition Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
REMOVE PARTITION <partition name>

For example, to remove a partition named order_2017 from the customer_order_range_by_year table:

Remove Table Partition Example
1
2
ALTER TABLE example.customer_order_range_partition_by_year
REMOVE PARTITION order_2017

Delete Partition

An existing partition can be dropped from a range-partitioned or list-partitioned table, deleting all data contained within it.

Delete Table Partition Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
DELETE PARTITION <partition name>

For example, to drop a partition named order_2014_2016 from the customer_order_range_by_year table, deleting all data within that partition:

Delete Table Partition Example
1
2
ALTER TABLE example.customer_order_range_partition_by_year
DELETE PARTITION order_2014_2016

Set Tier Strategy

A table's eviction priorities can be adjusted by setting its tier strategy.

Set Table Tier Strategy Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
SET TIER STRATEGY (<tier strategy>)

For example, to set the customer_order table's tier strategy, to one with a below-average eviction priority in the RAM Tier:

Set Table Tier Strategy Example
1
2
3
4
5
ALTER TABLE example.customer_order
SET TIER STRATEGY
(
    ( ( VRAM 1, RAM 3, PERSIST 5 ) )
)

The tier strategy can also be reset to the system default strategy.

Reset Table Tier Strategy Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
RESET TIER STRATEGY

For example, to reset the customer_order table's tier strategy:

Reset Table Tier Strategy Example
1
2
ALTER TABLE example.customer_order
RESET TIER STRATEGY

Manage Subscription

Any table that is subscribed to a streaming external data source can have that subscription paused, resumed, or cancelled.

Manage External Table Subscription Syntax
1
2
ALTER TABLE [<schema name>.]<table name>
<PAUSE | RESUME | CANCEL> SUBSCRIPTION <data source name>

Note

Once unsubscribed, there is no way to re-subscribe the table to the data source via ALTER TABLE. An external table will need to be re-created, while a table whose subscription was initiated through the LOAD INTO command can have that command re-executed.

For example, to pause the order table's subscription on the order_source table:

Pause Subscription Example
1
2
ALTER TABLE order
PAUSE SUBSCRIPTION order_source

To resume the order table's subscription on the order_source table:

Resume Subscription Example
1
2
ALTER TABLE order
RESUME SUBSCRIPTION order_source

To remove the order table's subscription on the order_source table:

Unsubscribe Example
1
2
ALTER TABLE order
CANCEL SUBSCRIPTION order_source

REFRESH EXTERNAL TABLE

Refreshes the data within an external table.

REFRESH EXTERNAL TABLE Syntax
1
REFRESH [EXTERNAL] TABLE [<schema name>.]<table name>

Important

Data source connect privilege is required to refresh an external table that uses a data source.

TRUNCATE TABLE

Deletes all the records from a table.

TRUNCATE TABLE Syntax
1
TRUNCATE TABLE [<schema name>.]<table name>

DROP TABLE

When removing a table from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the table to drop doesn't exist; if IF EXISTS is specified, no error will be reported.

DROP TABLE Syntax
1
DROP TABLE [IF EXISTS] [<schema name>.]<table name>

SHOW CREATE TABLE

Outputs the DDL statement required to reconstruct the given table.

SHOW CREATE TABLE Syntax
1
SHOW [CREATE TABLE] [<schema name>.]<table name>

Note

The response to SHOW CREATE TABLE is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.

For example, to output the DDL for the example table created in the CREATE TABLE section:

SHOW CREATE TABLE Example
1
SHOW CREATE TABLE example.various_types
SHOW CREATE TABLE Output
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| CREATE TABLE "example"."various_types"
(
   "i" INTEGER (primary_key) NOT NULL,
   "bi" BIGINT (primary_key, shard_key) NOT NULL,
   "ub" UNSIGNED BIGINT,
   "r" REAL,
   "f" REAL,
   "d" DOUBLE (store_only),
   "s" VARCHAR (store_only, text_search),
   "c" VARCHAR (32, dict),
   "p" VARCHAR (text_search, 256, snappy),
   "ip" IPV4,
   "ui" UUID,
   "ts" TIMESTAMP,
   "td" DATE,
   "tt" TIME,
   "dt" DATETIME DEFAULT NOW(),
   "dc1" BIGINT,
   "dc2" DECIMAL(18,4),
   "dc3" REAL,
   "dc4" DOUBLE (store_only),
   "n" DECIMAL(18,4),
   "wkt" GEOMETRY,
    FOREIGN KEY (bi) references example.lookup(id) as fk
)
TIER STRATEGY (
( ( VRAM 1, RAM 5, PERSIST 5 ) )
)
ATTRIBUTE INDEX (ip)
ATTRIBUTE INDEX (ts); |

CREATE VIEW

Creates a new virtual table from the given query in the specified schema.

A view must adhere to the supported naming criteria for tables & columns.

Note

SQL views are only able to be queried via ODBC/JDBC connection or the /execute/sql endpoint. They are, however, still visible in the GAdmin table listing and manageable through other DDL endpoints.

CREATE VIEW Syntax
1
2
CREATE [OR REPLACE] VIEW [<schema name>.]<view name> AS
<select statement>
ParametersDescription
OR REPLACEAny existing table/view with the same name will be dropped before creating this view
<schema name>Name of the schema containing the view to create
<table name>Name of the view to create
<select statement>The query that will define both the structure and content of the created view

When any of a view's source tables is dropped, the view will also be dropped.

Caution!

A CREATE OR REPLACE issues an implicit drop, so replacing an input table will have the same effect of dropping the view.

For example, to create a view that is a copy of an existing table, failing if a table or view with the same name as the target view already exists:

CREATE VIEW Example
1
2
3
4
5
CREATE VIEW example.view_of_table AS
(
    SELECT *
    FROM example.table_to_view
)

CREATE MATERIALIZED VIEW

Specifying MATERIALIZED in a CREATE VIEW statement will make the view a materialized view.

A view must adhere to the supported naming criteria for tables & columns.

CREATE MATERIALIZED VIEW Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
CREATE [OR REPLACE] [TEMP] MATERIALIZED VIEW [<schema name>.]<view name>
[
    REFRESH
    <
        OFF |
        ON CHANGE |
        ON QUERY |
        EVERY <number> <SECOND[S] | MINUTE[S] | HOUR[S] | DAY[S]> [STARTING AT <YYYY-MM-DD [HH:MM[:SS]]>]
    >
]
AS
<select statement>
[WITH OPTIONS (<materialized view property name> = '<materialized view property value>'[,...])]
ParametersDescription
OR REPLACEAny existing table/view with the same name will be dropped before creating this materialized view
TEMPIf the database is restarted, the materialized view will be removed
<schema name>Name of the schema containing the materialized view to create
<table name>Name of the materialized view to create
REFRESH

Specifies the data refresh scheme for the materialized view. The following schemes are available:

ConstantDescription
OFF(the default) Will prevent the view from being automatically refreshed, but will still allow manual refreshes of the data to be requested
ON CHANGE

Will cause the view to be updated any time a record is added, modified, or deleted from the subtending tables in the view's query

Important

On change mode cannot be applied if the enable_worker_http_servers configuration setting (i.e. multi-head) is set to true. This is because multi-head inserts inserts bypass the on change mode's notification system and are written directly to the tables on the nodes they're located on instead of communicating to the head node that a change has occurred.

ON QUERYWill cause the view to be updated any time the view is queried
EVERYAllows specification of an interval in seconds, minutes, hours, or days, with the optional specification of a starting time at which the first refresh interval will run; if no start time is specified, the default will be an interval's worth of time from the point at which the view creation was requested
<select statement>The query that will define both the structure and content of the created materialized view
WITH OPTIONS

Optional indicator that a comma-delimited list of connection option/value assignments will follow. The follow options are available:

OptionDescription
EXECUTE AS

Executes materialized view refreshes as the given user with that user's privileges, when EVERY... is specified as the REFRESH method.

Note

If this user doesn't exist at the time of a refresh, the refresh will be executed as the creating user, and, failing that, the system administration user.

The intermediary results of materialized views are cached to improve the performance of queries against them. This means that, unlike typical views, materialized views are not lightweight database entities, but rather consume memory and processing time proportional to the size of the source data and complexity of the query.

When any of the source tables of a materialized view is dropped, the view will also be dropped.

Caution!

A CREATE OR REPLACE issues an implicit drop, so replacing an input table will have the same effect of dropping the view.

While primary keys & foreign keys are not transferred to the new view, shard keys will be, if the column(s) composing them are part of the SELECT list. A new shard key can be specified for the created view by using the KI_SHARD_KEY(<column list>) pseudo-function in the SELECT list.

To create a materialized view with columns a, b, c, & d and a new shard key on columns a & b, that refreshes once per half hour, replacing a view with the same name as the target view, if it exists:

CREATE MATERIALIZED VIEW Example
1
2
3
4
5
6
CREATE OR REPLACE MATERIALIZED VIEW example.materialized_view_of_table
REFRESH EVERY .5 HOURS AS
(
    SELECT a, b, c, d, KI_SHARD_KEY(a, b)
    FROM example.table_to_view
)

ALTER VIEW

The following facet of a view can be altered:

Move View

A view can be moved from one schema to another.

Move View Syntax
1
2
ALTER VIEW [<schema name>.]<view name>
< MOVE TO | SET SCHEMA > <new schema name>

For example, to move the sales_2017 view from the example_olap schema to the example_archive schema:

Move View Example
1
2
ALTER VIEW example_olap.sales_2017
MOVE TO example_archive

ALTER MATERIALIZED VIEW

Any of the following facets of a materialized view can be altered:

Set Access Mode

A materialized view can have its global accessibility modified for all users in the system, independently from and further restricting any role-based access controls in place. Note that changing the access mode cannot widen access for users not already granted access; it can only narrow access for those who already have access. This setting will also trump administrative access to a view.

Set Materialized View Access Mode Syntax
1
2
ALTER MATERIALIZED VIEW [<schema name>.]<view name>
SET ACCESS MODE < NO_ACCESS | READ_ONLY | WRITE_ONLY | READ_WRITE >

Set Execution User

A materialized view can have its execution user for periodic refreshes (REFRESH is set to EVERY...) changed to the given user.

Note

If this user doesn't exist at the time of a refresh, the refresh will be executed as the creating user, and, failing that, the system administration user.

Set Materialized View Execution User Syntax
1
2
ALTER MATERIALIZED VIEW [<schema name>.]<view name>
SET EXECUTE AS '<user name>'

Set Refresh Mode

The refresh mode of a materialized view can be modified.

Set Materialized View Refresh Mode Syntax
1
2
3
4
5
6
7
8
ALTER MATERIALIZED VIEW [<schema name>.]<view name>
SET REFRESH
<
    OFF |
    ON CHANGE |
    ON QUERY |
    EVERY <number> <SECOND[S] | MINUTE[S] | HOUR[S] | DAY[S]> [STARTING AT <YYYY-MM-DD [HH:MM[:SS]]>]
>

The available refresh modes are:

ConstantDescription
OFFWill prevent the view from being automatically refreshed, but will still allow manual refreshes of the data to be requested
ON CHANGE

Will cause the view to be updated any time a record is added, modified, or deleted from the subtending tables in the view's query

Important

On change mode cannot be applied if the enable_worker_http_servers configuration setting (i.e. multi-head) is set to true. This is because multi-head inserts inserts bypass the on change mode's notification system and are written directly to the tables on the nodes they're located on instead of communicating to the head node that a change has occurred.

ON QUERYWill cause the view to be updated any time the view is queried
EVERYAllows specification of an interval in seconds, minutes, hours, or days, with the optional specification of a starting time at which the first refresh interval will run; if no start time is specified, the default will be an interval's worth of time from the point at which the view alteration was requested

For example, to alter the current sales table to refresh every 6 hours:

Set Materialized View Refresh Mode Example
1
2
ALTER MATERIALIZED VIEW example_olap.sales_current
SET REFRESH EVERY 6 HOURS

This would alter the view in the same way:

Set Materialized View Refresh Mode Example
1
2
ALTER MATERIALIZED VIEW example_olap.sales_current
SET REFRESH EVERY .25 DAYS

Set TTL

A materialized view's time-to-live (TTL) can be altered.

Set Materialized View TTL Syntax
1
2
ALTER MATERIALIZED VIEW <view name>
SET TTL <new ttl>

REFRESH VIEW

Refreshes the data within a materialized view.

Refresh Materialized View Syntax
1
REFRESH MATERIALIZED VIEW [<schema name>.]<view name>

DROP VIEW

When removing a view from the database, there are two options available, which control how the removal takes place. Normally, an error will be reported if the view to drop doesn't exist; if IF EXISTS is specified, no error will be reported.

DROP VIEW Syntax
1
DROP [MATERIALIZED] VIEW [IF EXISTS] [<schema name>.]<view name>

SHOW CREATE VIEW

Outputs the DDL statement required to reconstruct the given view.

SHOW CREATE VIEW Syntax
1
SHOW [CREATE VIEW] [<schema name>.]<view name>

Note

The response to SHOW CREATE VIEW is a single-record result set with the DDL statement as the value in the DDL column, shown below with the column separators returned by kisql.

For example, to output the DDL for the example view created in the CREATE VIEW section:

SHOW CREATE VIEW Example
1
SHOW CREATE VIEW example.view_of_table
SHOW CREATE VIEW Output
1
2
3
4
5
6
| CREATE VIEW "example"."view_of_table"
 AS 
(
    SELECT *
    FROM example.table_to_view
) |

To output the DDL for the example materialized view created in the CREATE MATERIALIZED VIEW section:

SHOW CREATE VIEW (Materialized) Example
1
SHOW CREATE VIEW example.materialized_view_of_table
SHOW CREATE VIEW (Materialized) Output
1
2
3
4
5
6
7
| CREATE MATERIALIZED VIEW "example"."materialized_view_of_table"

REFRESH EVERY 30 MINUTES AS 
(
    SELECT a, b, c, d, KI_SHARD_KEY(a, b)
    FROM example.table_to_view
) |

CREATE CREDENTIAL

Creates a new credential, which is a record that contains authentication information required to connect to a resource outside the database. Any user may create a credential for their own use.

The following can make use of credentials:

CREATE CREDENTIAL Syntax
1
2
3
4
5
CREATE CREDENTIAL <credential name>
TYPE = '<type>',
IDENTITY = '<username>',
SECRET = '<password>'
[WITH OPTIONS ('<option name>' = '<option value>'[,...])]

The following services are supported:

  • Amazon S3
  • Azure
  • Docker Repository
  • HDFS
  • Kafka Cluster

For provider-specific syntax, see Provider-Specific Syntax. For provider-specific examples, see Create Credentials.

ParametersDescription
<credential name>Name of the credential to create. The credential name cannot start with a number and can only contain letters, numbers, and underscores.
TYPE

The type of credential to create. Supported types include:

TypeDescription
aws_access_keyAuthenticate to Amazon Web Services (AWS) via Access Key
aws_iam_roleAuthenticate to Amazon Web Services (AWS) via IAM Role
azure_adAuthenticate to Microsoft Azure via Active Directory
azure_oauthAuthenticate to Microsoft Azure via OAuth
azure_sasAuthenticate to Microsoft Azure via Shared Access Signature (SAS)
azure_storage_keyAuthenticate to Microsoft Azure via Storage Key
dockerAuthenticate to a Docker repository
hdfsAuthenticate to HDFS
kafkaAuthenticate to a Kafka cluster
IDENTITYUsername to use for authenticating with the credential.
SECRETPassword to use for authenticating with the credential.
WITH OPTIONSOptional indicator that a comma-delimited list of option/value assignments will follow. See Credential Options for the full list of options.

For example, to create a credential, auser_azure_active_dir_creds, for connecting to Microsoft Azure Active Directory:

CREATE CREDENTIAL (Azure AD) Example
1
2
3
4
CREATE CREDENTIAL auser_azure_active_dir_creds
TYPE = 'azure_ad',
IDENTITY = 'atypicaluser',
SECRET = 'Passw0rd!'

To create a credential, kafka_cred, for connecting to Kafka SASL:

CREATE CREDENTIAL (Kafka SASL) Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE CREDENTIAL kafka_cred
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SASL_SSL',
    'sasl.mechanism' = 'PLAIN',
    'sasl.username' = 'KAFKASASL1234',
    'sasl.password' = 'Passw0rd!'
)

Credential Options

The following is a list of possible credential options and their associated providers. For valid combinations of credential options per provider and authentication mechanism, see Provider-Specific Syntax.

OptionProviderDescription
azure_oath_tokenAZUREAzure storage account OAuth token
azure_sas_tokenAZUREAzure storage account shared access signature token
azure_storage_account_nameAZUREAzure storage account name (only used if azure_tenant_id is specified)
azure_tenant_idAZUREAzure Active Directory tenant identifier
hdfs_kerberos_keytabHDFSLocation of the Kerberos keytab file on the head node
hdfs_use_kerberosHDFSWhether to attempt Kerberos authentication to HDFS
s3_aws_role_arnS3AWS S3 IAM role
sasl.kerberos.keytabKAFKAKerberos keytab file
sasl.kerberos.principalKAFKAKerberos principal ID
sasl.kerberos.service.nameKAFKAKerberos service name
sasl.mechanismKAFKA

SASL scheme to use; one of:

  • PLAIN
  • GSSAPI
sasl.passwordKAFKASASL user password
sasl.usernameKAFKASASL user ID
security.protocolKAFKA

Security protocol to use for authentication; one of:

  • SSL
  • SASL_SSL
  • SASL_PLAINTEXT
ssl.ca.locationKAFKAPath to trust store
ssl.certificate.locationKAFKAPath to client certificate
ssl.key.locationKAFKAPath to client key
ssl.key.passwordKAFKAPassword to client key or trust store

Provider-Specific Syntax

Several authentication schemes across multiple providers are supported.

Amazon S3

Syntax below, examples here.

S3 Access Key
1
2
3
4
CREATE CREDENTIAL <credential name>
TYPE = 'aws_access_key',
IDENTITY = '<aws access key id>',
SECRET = '<aws secret access key>'
IAM Role
1
2
3
4
5
6
7
8
CREATE CREDENTIAL <credential name>
TYPE = 'aws_iam_role',
IDENTITY = '<aws access key id>',
SECRET = '<aws secret access key>'
WITH OPTIONS
(
    's3_aws_role_arn' = '<amazon resource name>'
)

Azure

Syntax below, examples here.

Password
1
2
3
4
CREATE CREDENTIAL <credential name>
TYPE = 'azure_storage_key',
IDENTITY = '<azure storage account name>',
SECRET = '<azure storage account key>'
SAS Token
1
2
3
4
5
6
7
8
CREATE CREDENTIAL <credential name>
TYPE = 'azure_sas',
IDENTITY = '<azure storage account name>',
SECRET = ''
WITH OPTIONS
(
    'azure_sas_token' = '<azure sas token>'
)
OAuth Token
1
2
3
4
5
6
7
8
CREATE CREDENTIAL <credential name>
TYPE = 'azure_oauth',
IDENTITY = '<azure storage account name>',
SECRET = ''
WITH OPTIONS
(
    'azure_oauth_token' = '<azure oauth token>'
)
Active Directory
1
2
3
4
5
6
7
8
9
CREATE CREDENTIAL <credential name>
TYPE = 'azure_ad',
IDENTITY = '<ad client id>',
SECRET = '<ad client secret key>'
WITH OPTIONS
(
    'azure_storage_account_name' = '<azure storage account name>',
    'azure_tenant_id' = '<azure tenant id>'
)

HDFS

Syntax below, examples here.

Password
1
2
3
4
CREATE CREDENTIAL <credential name>
TYPE = 'hdfs',
IDENTITY = '<hdfs username>',
SECRET = '<hdfs password>'
Kerberos Token
1
2
3
4
5
6
7
8
CREATE CREDENTIAL <credential name>
TYPE = 'hdfs',
IDENTITY = '<hdfs username>',
SECRET = ''
WITH OPTIONS
(
    'hdfs_use_kerberos' = 'true'
)
Kerberos Keytab
1
2
3
4
5
6
7
8
CREATE CREDENTIAL <credential name>
TYPE = 'hdfs',
IDENTITY = '<hdfs username>',
SECRET = ''
WITH OPTIONS
(
    'hdfs_kerberos_keytab' = '<kerberos keytab file>'
)

Apache Kafka

Syntax below, examples here.

Password
1
2
3
4
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '<username>',
SECRET = '<password>'
SSL
1
2
3
4
5
6
7
8
9
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SSL',
    'ssl.ca.location' = '<truststore path>'
)
SSL with Keystore
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SSL',
    'ssl.ca.location' = '<truststore path>',
    'ssl.key.password' = '<truststore password>'
)
SSL with Encryption
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SSL',
    'ssl.ca.location' = '<truststore path>',
    'ssl.certificate.location' = '<client certificate path>',
    'ssl.key.location' = '<client key path>'
)
SSL with Keystore and Encryption
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SSL',
    'ssl.ca.location' = '<truststore path>',
    'ssl.certificate.location' = '<client certificate path>',
    'ssl.key.location' = '<client key path>',
    'ssl.key.password' = '<client key password>'
)
SASL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SASL_SSL',
    'sasl.mechanism' = 'PLAIN',
    'sasl.username' = '<sasl username>',
    'sasl.password' = '<sasl password>'
)
Kerberos
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SASL_PLAINTEXT',
    'sasl.mechanism' = 'GSSAPI',
    'sasl.kerberos.service.name' = '<kerberos service name>',
    'sasl.kerberos.keytab' = '<kerberos keytab file>',
    'sasl.kerberos.principal' = '<kerberos principal>'
)
Kerberos SSL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
CREATE CREDENTIAL <credential name>
TYPE = 'kafka',
IDENTITY = '',
SECRET = ''
WITH OPTIONS
(
    'security.protocol' = 'SASL_SSL',
    'sasl.mechanism' = 'GSSAPI',
    'sasl.kerberos.service.name' = '<kerberos service name>',
    'sasl.kerberos.keytab' = '<kerberos keytab file>',
    'sasl.kerberos.principal' = '<kerberos principal>',
    'ssl.ca.location' = '<truststore path>',
    'ssl.certificate.location' = '<client certification path>',
    'ssl.key.location' = '<client key path>',
    'ssl.key.password' = '<client key password>'
)

ALTER CREDENTIAL

Alters the properties of an existing credential. Only users with system_admin or system_user_admin, or users with credential_admin on the credential to alter, may alter a credential.

ALTER CREDENTIAL Syntax
1
2
3
4
ALTER CREDENTIAL <credential name>
SET PROPERTY
    <property name> = '<property value>'[,...]
[WITH OPTIONS ('<option name>' = '<option value>'[,...])]
ParametersDescription
<credential name>Name of the existing credential to alter.
SET PROPERTYIndicator that a comma-delimited list of alterations to make will follow. See Set Properties for the complete list of properties.
WITH OPTIONSOptional indicator that a comma-delimited list of option/value assignments will follow. See Credential Options for the full list of options.

To alter a credential, auser_azure_active_dir_creds, updating the secret:

ALTER CREDENTIAL Example
1
2
3
ALTER CREDENTIAL auser_azure_active_dir_creds
SET PROPERTY
    SECRET = 'atypicaluserNewPassw0rd!'

To alter a credential, kafka_cred, updating the SASL password:

ALTER CREDENTIAL WITH OPTIONS Example
1
2
3
4
5
6
7
ALTER CREDENTIAL kafka_cred
SET PROPERTY
    IDENTITY = ''
WITH OPTIONS
(
    'sasl.password' = 'Passw0rd!?'
)

Set Properties

All credential properties can be altered via ALTER CREDENTIAL. The following are the property names and descriptions to use when performing an alteration.

ParametersDescription
TYPE

The type of credential. Supported types include:

TypeDescription
aws_access_keyAuthenticate to Amazon Web Services (AWS) via Access Key
aws_iam_roleAuthenticate to Amazon Web Services (AWS) via IAM Role
azure_adAuthenticate to Microsoft Azure via Active Directory
azure_oauthAuthenticate to Microsoft Azure via OAuth
azure_sasAuthenticate to Microsoft Azure via Shared Access Signature (SAS)
azure_storage_keyAuthenticate to Microsoft Azure via Storage Key
dockerAuthenticate to a Docker repository
hdfsAuthenticate to HDFS
kafkaAuthenticate to a Kafka cluster
IDENTITYUsername to use for authenticating with the credential.
SECRETPassword to use for authenticating with the credential.

DROP CREDENTIAL

Removes an existing credential. Only users with system_admin or system_user_admin, or users with credential_admin on the credential to drop, may drop a credential.

DROP CREDENTIAL Syntax
1
DROP CREDENTIAL < <credential name> | * >
ParametersDescription
<credential name>Name of the existing credential to remove. Use * instead to drop all credentials.

To drop a credential, auser_azure_active_dir_creds:

DROP CREDENTIAL Example
1
DROP CREDENTIAL auser_azure_active_dir_creds

SHOW CREDENTIAL

Outputs the DDL statement required to reconstruct the given credential. Note that the secret value will be masked and would need to be replaced with the actual secret value if attempting to reconstruct the credential. Only users with system_admin or system_user_admin, or users with credential_admin or credential_read on the credential to show, may show a credential.

SHOW CREDENTIAL Syntax
1
SHOW CREDENTIAL < <credential name> | * >
ParametersDescription
<credential name>Name of the existing credential for which the DDL will be output. Use * instead to output the DDL of all credentials.

Note

The response to SHOW CREDENTIAL is a single-column result set with the DDL statement as the value in the DDL column.

For example, to output the DDL for a credential, auser_azure_active_dir_creds:

SHOW CREDENTIAL Example
1
SHOW CREDENTIAL auser_azure_active_dir_creds

To output the DDL for all credentials:

SHOW CREDENTIAL (All Credentials) Example
1
SHOW CREDENTIAL *

DESCRIBE CREDENTIAL

Outputs the configuration of an existing credential. Only users with system_admin or system_user_admin, or users with credential_admin or credential_read on the credential to describe, may describe a credential.

DESCRIBE CREDENTIAL Syntax
1
DESC[RIBE] CREDENTIAL < <credential name> | * >
ParametersDescription
<credential name>Name of the existing credential for which the configuration will be output. Use * instead to output the configuration of all credentials.

Note

The response to DESCRIBE CREDENTIAL is a four-column result set:

  • CREDENTIAL_NAME - name of the credential
  • CREDENTIAL_TYPE - type of the credential; see CREATE CREDENTIAL for values
  • CREDENTIAL_IDENTITY - Username associated with the credential
  • CREDENTIAL_OPTIONS - Options associated with the credential; see Credential Options for values

For example, to show the configuration for a credential, auser_azure_active_dir_creds:

DESCRIBE CREDENTIAL Example
1
DESCRIBE CREDENTIAL auser_azure_active_dir_creds

To show the configuration for all credentials:

DESCRIBE CREDENTIAL (All Credentials) Example
1
DESCRIBE CREDENTIAL *

CREATE DATA SOURCE

Creates a new data source, which contains the location and connection information for a data store that is external to Kinetica. A data source serves as an authentication & access mechanism to a remote resource and can optionally be used as a streaming source as well.

A data source does not reference specific data files within the source; file references (if applicable) can be made by using the data source in a CREATE EXTERNAL TABLE call (for creating a persistent view of the file data) or a LOAD INTO call (for either a one-time load or a subscribed loading of data from the external source to a locally persisted table).

Note

The data source will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.

CREATE DATA SOURCE Syntax
1
2
3
4
5
CREATE [EXTERNAL] DATA SOURCE <data source name>
LOCATION = '<provider>[://[<host>[:<port>]]]'
[USER = '<username>']
[PASSWORD = '<password>']
[WITH OPTIONS (<option name> = '<option value>'[,...])]

The following data source providers are supported:

  • Azure (Microsoft blob storage)
  • HDFS (Apache Hadoop Distributed File System)
  • Kafka (Apache Kafka streaming feed)
  • S3 (Amazon S3 Bucket)

Note

  • Azure anonymous data sources are only supported when both the container and the contained objects allow anonymous access.
  • HDFS systems with wire encryption are not supported.
  • Kafka data sources require an associated credential object for authentication.

For provider-specific syntax, see Provider-Specific Syntax. For provider-specific examples, see Create Data Sources.

ParametersDescription
EXTERNALOptional keyword for clarity
<data source name>Name of the data source, which can be referenced in subsequent commands
<provider>

Provider of the data source

Supported providers include:

ProviderDescription
AZUREMicrosoft Azure blob storage
HDFSApache Hadoop Distributed File System
KAFKAApache Kafka streaming feed
S3Amazon S3 bucket
<host>

Host, for HDFS or Kafka, to use to connect to the data source

The host used for Azure is <storage_account_name>.blob.core.windows.net.

The host used for S3 is <region>.amazonaws.com.

<port>Port, for HDFS or Kafka, to use to connect to the data source
USEROptional user name, given in <username>, to use for authenticating to the data source
PASSWORDOptional password, given in <password>, to use for authenticating to the data source
WITH OPTIONSOptional indicator that a comma-delimited list of connection option/value assignments will follow. See Data Source Options for the full list of options.

To create a data source, kin_ds, that connects to an Amazon S3 bucket, kinetica_ds, in the US East (N. Virginia) region:

CREATE DATA SOURCE Example
1
2
3
4
5
6
7
8
9
CREATE DATA SOURCE kin_ds
LOCATION = 'S3'
USER = '<aws access id>'
PASSWORD = '<aws access key>'
WITH OPTIONS
(
    BUCKET NAME = 'kinetica-ds',
    REGION = 'us-east-1'
)

Data Source Options

OptionProviderDescription
CREDENTIALAnyCredential object to use to authenticate to the remote system
VALIDATEAnyWhether to test the connection to the data source upon creation; if true (default), the creation of a data source that cannot be connected to will fail; if false, the data source will be created regardless of connectivity
WAIT TIMEOUTAZURE, HDFS, S3Timeout in seconds for reading from the storage provider
ANONYMOUSAZURE, S3Whether to connect to the storage provider with anonymous access
CONNECTION TIMEOUTHDFS, S3Timeout in seconds for connecting to a given storage provider
CONTAINER NAMEAZUREAzure storage container name
OAUTH TOKENAZUREAzure storage account OAuth token
SAS TOKENAZUREAzure storage account shared access signature token
STORAGE ACCOUNT NAMEAZUREAzure storage account name (only used if TENANT ID is specified)
TENANT IDAZUREAzure Active Directory tenant identifier
DELEGATION TOKENHDFSOptional Kerberos delegation token for worker nodes; if not specified, the token will be acquired from HDFS
KERBEROS KEYTABHDFSLocation of the Kerberos keytab file on the head node
USE KERBEROSHDFSWhether to attempt Kerberos authentication to HDFS
KAFKA_TOPIC_NAMEKAFKAKafka topic to access
BUCKET NAMES3Amazon S3 bucket name
REGIONS3Amazon S3 region identifier
S3_AWS_ROLE_ARNS3Amazon Resource Name (ARN) specifying the role

Provider-Specific Syntax

Several authentication schemes across multiple providers are supported.

Amazon S3

Syntax below, examples here.

Public (No Auth)
1
2
3
4
5
6
7
8
CREATE DATA SOURCE <data source name>
LOCATION = 'S3'
WITH OPTIONS
(
    ANONYMOUS = 'true',
    BUCKET NAME = '<aws bucket name>',
    REGION = '<aws region>'
)
Access Key
1
2
3
4
5
6
7
8
9
CREATE DATA SOURCE <data source name>
LOCATION = 'S3'
USER = '<aws access key id>'
PASSWORD = '<aws secret access key>'
WITH OPTIONS
(
    BUCKET NAME = '<aws bucket name>',
    REGION = '<aws region>'
)
IAM Role
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE DATA SOURCE <data source name>
LOCATION = 'S3'
USER = '<aws access key id>'
PASSWORD = '<aws secret access key>'
WITH OPTIONS
(
    BUCKET NAME = '<aws bucket name>',
    REGION = '<aws region>',
    S3_AWS_ROLE_ARN = '<aws iam role arn>'
)

Azure BLOB

Syntax below, examples here.

Public (No Auth)
1
2
3
4
5
6
7
8
CREATE DATA SOURCE <data source name>
LOCATION = 'AZURE'
USER = '<azure storage account name>'
WITH OPTIONS
(
    ANONYMOUS = 'true',
    CONTAINER NAME = '<azure container name>'
)
Password
1
2
3
4
5
CREATE DATA SOURCE <data source name>
LOCATION = 'AZURE'
USER = '<azure storage account name>'
PASSWORD = '<azure storage account key>'
WITH OPTIONS (CONTAINER NAME = '<azure container name>')
SAS Token
1
2
3
4
5
6
7
8
CREATE DATA SOURCE <data source name>
LOCATION = 'AZURE'
USER = '<azure storage account name>'
WITH OPTIONS
(
    CONTAINER NAME = '<azure container name>',
    SAS TOKEN = '<sas token>'
)
OAuth Token
1
2
3
4
5
6
7
8
CREATE DATA SOURCE <data source name>
LOCATION = 'AZURE'
USER = '<azure storage account name>'
WITH OPTIONS
(
    CONTAINER NAME = '<azure container name>',
    OAUTH TOKEN = '<oauth token>'
)
Active Directory
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE DATA SOURCE <data source name>
LOCATION = 'AZURE'
USER = '<ad client id>'
PASSWORD = '<ad client secret key>'
WITH OPTIONS
(
    STORAGE ACCOUNT NAME = '<azure storage account name>',
    CONTAINER NAME = '<azure container name>',
    TENANT ID = '<ad tenant id>'
)

HDFS

Syntax below, examples here.

Password
1
2
3
4
CREATE DATA SOURCE <data source name>
LOCATION = 'HDFS://<host>:<port>'
USER = '<hdfs username>'
PASSWORD = '<hdfs password>'
Kerberos Token
1
2
3
4
CREATE DATA SOURCE <data source name>
LOCATION = 'HDFS://<host>:<port>'
USER = '<hdfs username>'
WITH OPTIONS (USE KERBEROS = true)
Kerberos Keytab
1
2
3
4
CREATE DATA SOURCE <data source name>
LOCATION = 'HDFS://<host>:<port>'
USER = '<hdfs username>'
WITH OPTIONS (KERBEROS KEYTAB = '<keytab file/path>')

Apache Kafka

Syntax below, examples here.

Anonymous
1
2
3
CREATE DATA SOURCE <data source name>
LOCATION = 'KAFKA://<host>:<port>'
WITH OPTIONS (KAFKA_TOPIC_NAME = '<kafka topic name>')
Authenticated
1
2
3
4
5
6
7
CREATE DATA SOURCE <data source name>
LOCATION = 'KAFKA://<host>:<port>'
WITH OPTIONS
(
    KAFKA_TOPIC_NAME = '<kafka topic name>',
    CREDENTIAL = '<credential name>'
)

ALTER DATA SOURCE

Alters the connection parameters of an existing data source.

Note

The data source will be validated upon alteration, by default, and will fail to be altered if an authorized connection cannot be established.

ALTER DATA SOURCE Syntax
1
2
3
ALTER [EXTERNAL] DATA SOURCE <data source name>
SET PROPERTY
    <property name> = '<property value>'[,...]
ParametersDescription
EXTERNALOptional keyword for clarity
<data source name>Name of the data source to alter
SET PROPERTY

Indicator that a comma-delimited list of alterations to make will follow.

See Set Properties for the complete list of properties.

To alter a data source, kin_ds, updating the username & password:

ALTER DATA SOURCE Example
1
2
3
4
ALTER DATA SOURCE kin_ds
SET PROPERTY
    USER = '<new aws access id>',
    PASSWORD = '<new aws access key>'

Set Properties

All data source properties can be altered via ALTER DATA SOURCE. The following are the property names and descriptions to use when performing an alteration.

Provider-Agnostic Properties

ParametersDescription
LOCATION

Location of the data source, including the optional (HDFS only) <host> & <port> to use to connect to it

The data source <provider> can be one of the following:

ProviderDescription
AZUREMicrosoft Azure blob storage
HDFSApache Hadoop Distributed File System
KAFKAApache Kafka streaming feed
S3Amazon S3 bucket
USERUser name, given in <username>, to use for authenticating to the data source
PASSWORDPassword, given in <password>, to use for authenticating to the data source
CREDENTIALCredential object to use to authenticate to the remote system
VALIDATEWhether to test the connection to the data source upon creation; if true, the creation of a data source that cannot be connected to will fail; if false, the data source will be created regardless of connectivity. Default is true.
WAIT TIMEOUTTimeout in seconds for reading from a given storage provider

Provider-Specific Properties

OptionProviderDescription
CONTAINER NAMEAZUREAzure storage container name
OAUTH TOKENAZUREAzure storage account OAuth token
SAS TOKENAZUREAzure storage account shared access signature token
STORAGE ACCOUNT NAMEAZUREAzure storage account name (only used if TENANT ID is specified)
TENANT IDAZUREAzure Active Directory tenant identifier
DELEGATION TOKENHDFSOptional Kerberos delegation token for worker nodes; if not specified, the token will be acquired from HDFS
KERBEROS KEYTABHDFSLocation of the Kerberos keytab file on the head node
USE KERBEROSHDFSWhether to attempt Kerberos authentication to HDFS
KAFKA_TOPIC_NAMEKAFKAKafka topic to access
BUCKET NAMES3Amazon S3 bucket name
REGIONS3Amazon S3 region identifier
CONNECTION TIMEOUTHDFS, S3Timeout in seconds for connecting to a given storage provider

DROP DATA SOURCE

Removes an existing data source. Any external tables that depend on the data source must be dropped before it can be dropped.

DROP DATA SOURCE Syntax
1
DROP [EXTERNAL] DATA SOURCE <data source name>
ParametersDescription
EXTERNALOptional keyword for clarity
<data source name>Name of the data source to remove

To drop a data source, kin_ds:

DROP DATA SOURCE Example
1
DROP DATA SOURCE kin_ds

SHOW DATA SOURCE

Outputs the DDL statement required to reconstruct the given data source. Note that the PASSWORD value will be masked and would need to be replaced with the actual password if attempting to reconstruct the data source.

SHOW DATA SOURCE Syntax
1
SHOW [EXTERNAL] DATA SOURCE < <data source name> | * >
ParametersDescription
EXTERNALOptional keyword for clarity
<data source name>Name of the data source for which the DDL will be output. Use * instead to output the DDL of all data sources.

Note

The response to SHOW DATA SOURCE is a single-column result set with the DDL statement as the value in the DDL column.

For example, to output the DDL for a data source, kin_ds:

SHOW DATA SOURCE Example
1
SHOW DATA SOURCE kin_ds

To output the DDL for all data sources:

SHOW DATA SOURCE (All Data Sources) Example
1
SHOW DATA SOURCE *

DESCRIBE DATA SOURCE

Outputs the configuration of an existing data source. Passwords will be redacted.

SHOW DATA SOURCE Syntax
1
DESC[RIBE] [EXTERNAL] DATA SOURCE < <data source name> | * >
ParametersDescription
EXTERNALOptional keyword for clarity
<data source name>Name of the data source whose configuration will be output. Use * instead to output the configuration of all data sources.

Note

The response to DESCRIBE DATA SOURCE is a three-column result set:

  • DATA_SOURCE - name of the data source
  • STORAGE_PROVIDER_TYPE - keyword associated with storage provider
  • ADDITIONAL_INFO - data source configuration

To show the configuration for a data source, kin_ds:

DESCRIBE DATA SOURCE Example
1
DESCRIBE DATA SOURCE kin_ds

To show the configuration for all data sources:

DESCRIBE DATA SOURCE (All Data Sources) Example
1
DESCRIBE DATA SOURCE *

CREATE DATA SINK

Creates a new data sink, which contains the location and connection information for a data consumer that is external to Kinetica. A data sink serves as an authentication & access mechanism to the remote resource.

A data sink can be referenced in a stream, as the target of the streamed data.

Note

Kafka data sinks will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.

CREATE DATA SINK Syntax
1
2
3
CREATE [EXTERNAL] DATA SINK <data sink name>
LOCATION = '<consumer>://<host>[:<port>]'
[WITH OPTIONS (<option name> = '<option value>'[,...])]

For provider-specific syntax, see Consumer-Specific Syntax.

ParametersDescription
EXTERNALOptional keyword for clarity
<data sink name>Name of the data sink, which can be referenced in subsequent commands
<consumer>

Consumer of the data sink

Supported consumers include:

ConsumerDescription
HTTPUnsecured webhook
HTTPSSecured webhook
KAFKAApache Kafka broker
<host>Host to use to connect to the data sink
<port>Port to use to connect to the data sink
WITH OPTIONSOptional indicator that a comma-delimited list of connection option/value assignments will follow. See Data Sink Options for the full list of options.

To create a data sink, kin_dsink, that connects to an Apache Kafka broker:

CREATE DATA SINK Example
1
2
3
4
5
6
7
CREATE DATA SINK kin_dsink
LOCATION = 'kafka://kafka.abc.com:9092'
WITH OPTIONS
(
    KAFKA_TOPIC_NAME = 'kafka_topic',
    CREDENTIAL = 'kafka_credential'
)

Data Sink Options

OptionProviderDescription
CREDENTIALAnyCredential object to use to authenticate to the remote consumer
VALIDATEAnyWhether to test the connection to the data sink upon creation; if true (default), the creation of a data sink that cannot be connected to will fail; if false, the data sink will be created regardless of connectivity
WAIT TIMEOUTAnyTimeout in seconds for reading from the consumer
CONNECTION TIMEOUTKAFKATimeout in seconds for connecting to a consumer
KAFKA_TOPIC_NAMEKAFKAKafka topic to write to

Consumer-Specific Syntax

Several authentication schemes across multiple consumers are supported.

Webhooks

HTTP
1
2
CREATE DATA SINK <data sink name>
LOCATION = 'http://<host>[:<port>]'
HTTPS with Credential
1
2
3
CREATE DATA SINK <data sink name>
LOCATION = 'https://<host>[:<port>]'
WITH OPTIONS (CREDENTIAL = '<credential name>')

Kafka

No Auth
1
2
3
4
5
6
CREATE DATA SINK <data sink name>
LOCATION = 'kafka://<host>[:<port>]'
WITH OPTIONS
(
      KAFKA_TOPIC_NAME = '<kafka topic name>'
)
Credential Auth
1
2
3
4
5
6
7
CREATE DATA SINK <data sink name>
LOCATION = 'kafka://<host>[:<port>]'
WITH OPTIONS
(
      KAFKA_TOPIC_NAME = '<kafka topic name>',
      CREDENTIAL = '<credential name>'
)

ALTER DATA SINK

Alters the connection parameters of an existing data sink.

Note

Kafka data sinks will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.

ALTER DATA SINK Syntax
1
2
3
ALTER [EXTERNAL] DATA SINK <data sink name>
SET PROPERTY
    <property name> = '<property value>'[,...]
ParametersDescription
EXTERNALOptional keyword for clarity
<data sink name>Name of the data sink to alter
SET PROPERTY

Indicator that a comma-delimited list of alterations to make will follow.

See Set Properties for the complete list of properties.

To alter a data sink, kin_dsink, updating the timeouts:

ALTER DATA SINK Example
1
2
3
4
ALTER DATA SINK kin_dsink
SET PROPERTY
    CONNECTION TIMEOUT = '10',
    WAIT TIMEOUT = '10'

Set Properties

All data sink properties can be altered via ALTER DATA SINK. The following are the property names and descriptions to use when performing an alteration.

OptionProviderDescription
CREDENTIALAnyCredential object to use to authenticate to the remote consumer
LOCATIONAny

Location of the data sink; the URL of the consumer to connect to in this format:

<consumer>://<host>[:<port>]
VALIDATEAnyWhether to test the connection to the data sink upon creation; if true (default), the creation of a data sink that cannot be connected to will fail; if false, the data sink will be created regardless of connectivity
WAIT TIMEOUTAnyTimeout in seconds for reading from the consumer
CONNECTION TIMEOUTKAFKATimeout in seconds for connecting to a consumer
KAFKA_TOPIC_NAMEKAFKAKafka topic to write to

DROP DATA SINK

Removes an existing data sink. Any table monitors that depend on the data sink must be dropped before it can be dropped.

DROP DATA SINK Syntax
1
DROP [EXTERNAL] DATA SINK <data sink name>
ParametersDescription
EXTERNALOptional keyword for clarity
<data sink name>Name of the data sink to remove

To drop a data sink, kin_dsink:

DROP DATA SINK Example
1
DROP DATA SINK kin_dsink

SHOW DATA SINK

Outputs the DDL statement required to reconstruct the given data sink.

SHOW DATA SINK Syntax
1
SHOW [EXTERNAL] DATA SINK < <data sink name> | * >
ParametersDescription
EXTERNALOptional keyword for clarity
<data sink name>Name of the data sink for which the DDL will be output. Use * instead to output the DDL of all data sink.

Note

The response to SHOW DATA SINK is a single-column result set with the DDL statement as the value in the DDL column.

For example, to output the DDL for a data sink, kin_dsink:

SHOW DATA SINK Example
1
SHOW DATA SINK kin_dsink

To output the DDL for all data sinks:

SHOW DATA SINK (All Data Sinks) Example
1
SHOW DATA SINK *

DESCRIBE DATA SINK

Outputs the configuration of an existing data sink.

SHOW DATA SINK Syntax
1
DESC[RIBE] [EXTERNAL] DATA SINK < <data sink name> | * >
ParametersDescription
EXTERNALOptional keyword for clarity
<data sink name>Name of the data sink whose configuration will be output. Use * instead to output the configuration of all data sinks.

Note

The response to DESCRIBE DATA SINK is a three-column result set:

  • DATA_SINK - name of the data sink
  • STORAGE_PROVIDER_TYPE - keyword identifying data sink consumer
  • ADDITIONAL_INFO - data sink configuration

To show the configuration for a data sink, kin_dsink:

DESCRIBE DATA SINK Example
1
DESC DATA SINK kin_dsink

To show the configuration for all data sinks:

DESCRIBE DATA SINK (All Data Sinks) Example
1
DESC DATA SINK *

CREATE STREAM

Creates a new data stream (natively, a table monitor), which publishes changes in a given table to a target.

The monitored table can be one of:

The target can be one of:

  • an external Apache Kafka broker
  • an external webhook

By default, a stream will publish inserted records to the target. A stream can alternatively be configured to monitor for updates or deletes and publish the corresponding record counts. Streams that monitor for inserts can have an expression applied to only have specific inserts of interest published.

An existing data sink can be referenced in creating a stream, as the external target of the streamed data. Only unauthenticated external targets may be used if not using a data sink.

CREATE STREAM Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE STREAM <stream name> ON [TABLE] [<schema name>.]<table name>
[
    REFRESH
    <
        ON CHANGE |
        EVERY <number> <SECOND[S] | MINUTE[S] | HOUR[S] | DAY[S]> [STARTING AT <YYYY-MM-DD [HH:MM[:SS]]>]
    >
]
[WHERE <filter expression>]
WITH OPTIONS (<option name> = '<option value>'[,...])
ParametersDescription
<stream name>Name of the stream, which can be referenced in subsequent commands.
TABLEOptional keyword for clarity
<schema name>Name of the schema containing the table or materialized view to monitor.
<table name>Name of the table or materialized view to monitor for changes.
REFRESH

Specifies the reporting scheme for monitored changes. The following schemes are available:

ConstantDescription
ON CHANGEWill cause notifications to be streamed any time a record is added, modified, or deleted from the monitored table
EVERYAllows specification of an interval in seconds, minutes, hours, or days, with the optional specification of a starting time at which the first monitor interval will run; if no start time is specified, the default will be an interval's worth of time from the point at which the stream was created
<filter expression>Boolean expression that can be used to monitor for only a specific set of inserts.
WITH OPTIONSIndicator that a comma-delimited list of configuration option/value assignments will follow. See Stream Options for the full list of options.

To create a stream, kin_stream, that publishes inserts into the order_stream table via the kin_dsink data sink:

CREATE STREAM Example
1
2
CREATE STREAM kin_stream ON example.order_stream
WITH OPTIONS (DATASINK_NAME = 'kin_dsink')

Stream Options

OptionConsumerDescription
DATASINK_NAMEAnyData sink object to use to identify the consumer of this stream; mutually exclusive with destination
DESTINATIONAny

Location of the external target when not using a data sink; the URL of the consumer to connect to in this format:

<consumer>://<host>[:<port>]

Supported consumers include:

ConsumerDescription
httpUnsecured webhook
httpsSecured webhook
kafkaApache Kafka broker
EVENTAny

Specifies the type of event to monitor

Event TypeDescription
insert(default) Monitor inserts and publish them to the target
updateMonitor updates and publish their counts to the target
deleteMonitor deletes and publish their counts to the target
INCREASING_COLUMNAnyColumn of the monitored table that contains ever-increasing values for new data; e.g., a timestamp or sequence ID column
KAFKA_TOPIC_NAMEkafkaKafka topic to write to, if the target is a Kafka broker

Consumer-Specific Syntax

Several configurations across multiple consumers are supported.

HTTP
1
2
CREATE STREAM <stream name> ON [TABLE] [<schema name>.]<table name>
WITH OPTIONS (DESTINATION = 'http://<host>[:<port>]')
HTTPS
1
2
CREATE STREAM <stream name> ON [TABLE] [<schema name>.]<table name>
WITH OPTIONS (DESTINATION = 'https://<host>[:<port>]')
Kafka (No auth)
1
2
3
4
5
6
CREATE STREAM <stream name> ON [TABLE] [<schema name>.]<table name>
WITH OPTIONS
(
      DESTINATION = 'kafka://<host>:<port>',
      KAFKA_TOPIC_NAME = '<kafka topic name>'
)
Data Sink
1
2
CREATE STREAM <stream name> ON [TABLE] [<schema name>.]<table name>
WITH OPTIONS (DATASINK_NAME = '<data sink name>')

DROP STREAM

Removes an existing stream.

DROP STREAM Syntax
1
DROP STREAM <stream name>

To drop a stream, kin_stream:

DROP STREAM Example
1
DROP STREAM kin_stream

SHOW STREAM

Outputs the DDL statement required to reconstruct the given stream.

SHOW STREAM Syntax
1
SHOW STREAM <stream name>

Note

The response to SHOW STREAM is a single-column result set with the DDL statement as the value in the DDL column.

For example, to output the DDL for a stream, kin_stream:

SHOW STREAM Example
1
SHOW STREAM kin_stream

DESCRIBE STREAM

Outputs the configuration of an existing stream.

SHOW STREAM Syntax
1
DESC[RIBE] STREAM <stream name>

Note

The response to DESCRIBE STREAM is a six-column result set:

  • MONITOR_ID - unique name or topid ID for the stream
  • TABLE_NAME - name of the table being monitored
  • EVENT - table data event that triggers streamed content
  • INCREASING_COLUMN - column in table being monitored that contains ever-increasing data values
  • FILTER_EXPRESSION - filter used to monitor only data of interest
  • DATASINK_NAME - name of the data sink associated with the consumer

To show the configuration for a stream, kin_stream:

DESCRIBE STREAM Example
1
DESC STREAM kin_stream

DESCRIBE

Lists the contained tables and views of a given schema, or lists the columns and column types & properties for a given table or view.

DESCRIBE Schema Syntax
1
DESC[RIBE] <schema name>
DESCRIBE Table Syntax
1
DESC[RIBE] [<schema name>.]<table/view name>

For example, to describe the tables contained in the demo schema, into which demo data is usually downloaded:

DESCRIBE Schema Example
1
DESC demo
DESCRIBE Schema Output
1
2
3
4
5
6
7
8
9
+------------+
| Set_name   |
+------------+
| flights    |
| movies     |
| nyctaxi    |
| shipping   |
| stocks     |
+------------+

To describe the example table created in the CREATE TABLE section:

DESCRIBE Table Example
1
DESC example.various_types
DESCRIBE Table Output
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
+-----------+--------+------------+--------------------------------------+
| Col_num   | Name   | Null?      | Type                                 |
+-----------+--------+------------+--------------------------------------+
| 0         | i      | NOT NULL   | INTEGER (primary_key)                |
| 1         | bi     | NOT NULL   | BIGINT (primary_key, shard_key)      |
| 2         | ub     |            | UNSIGNED BIGINT                      |
| 3         | r      |            | REAL                                 |
| 4         | f      |            | REAL                                 |
| 5         | d      |            | DOUBLE (store_only)                  |
| 6         | s      |            | VARCHAR (store_only, text_search)    |
| 7         | c      |            | VARCHAR (32, dict)                   |
| 8         | p      |            | VARCHAR (text_search, 256, snappy)   |
| 9         | ip     |            | IPV4                                 |
| 10        | ui     |            | UUID                                 |
| 11        | ts     |            | TIMESTAMP                            |
| 12        | td     |            | DATE                                 |
| 13        | tt     |            | TIME                                 |
| 14        | dt     |            | DATETIME                             |
| 15        | dc1    |            | BIGINT                               |
| 16        | dc2    |            | DECIMAL(18,4)                        |
| 17        | dc3    |            | REAL                                 |
| 18        | dc4    |            | DOUBLE (store_only)                  |
| 19        | n      |            | DECIMAL(18,4)                        |
| 20        | wkt    |            | GEOMETRY                             |
+-----------+--------+------------+--------------------------------------+