Exporting Data

EXPORT...INTO

Kinetica can export data from one or more tables, using an EXPORT...INTO statement. The data can be exported to either of the following:

  • KiFS
  • a data sink configured to allow write access to:
    • remote files on Azure, GCS, HDFS, or S3
    • a writable remote database table via JDBC (or CData JDBC)

Note

For contextualized examples, see Examples. For copy/paste examples, see Exporting Data.

EXPORT...INTO Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
EXPORT
<
   TABLE [<schema name>.]<table name>
   |
   QUERY (<data query>)
>
INTO
<
   FILE PATH '<file path>'
      [FORMAT <[DELIMITED] TEXT [(<delimited text options>)] | PARQUET>]
   |
   REMOTE TABLE '[<schema name>.]<table name>'
   |
   REMOTE QUERY '<insert statement>'
>
[WITH OPTIONS (<export option name> = '<export option value>'[,...])]

Note

No automatic data type transformations are done between the local data and remote column types. If any transformations are necessary, the QUERY clause should be used and the transformation done within it.

Parameters Description

TABLE

[<schema name>.]<table name>

Source data specification clause, where [<schema name>.]<table name> is the name of the table (and, optionally, its schema) to export into the remote database.

Note

This clause is mutually exclusive with the QUERY clause.

QUERY

'<data query>'

Source data specification clause, where <data query> is a SQL SELECT statement to run locally to generate a result set to export into the remote database.

Note

This clause is mutually exclusive with the TABLE clause.

FILE PATH

<file path>

Target file specification clause, where <file path> is a single-quoted file path to which data will be written.

Note

This clause is mutually exclusive with the REMOTE TABLE & REMOTE QUERY clauses, and is only applicable when exporting to an Azure, GCS, HDFS, or S3 data sink, or to KiFS.

The form of a file path is dependent on the target referenced:

  • Data Sink: If a data sink is specified in the export options this file path must resolve to a writable file/directory at that data sink location. A "path prefix" can be specified, which will cause all files written to begin with that prefix, and an appropriate extension will be appended, based on file format.

    For example, a "path prefix" of /data/emp for <file path> might result in the following files being written when exporting the employee table:

    • /data/emp/employee_12340000.csv
    • /data/emp/employee_12340001.csv
    • /data/emp/employee_12340002.csv

    If using an HDFS data sink, the "path prefix" must be the name of an HDFS directory.

  • KiFS: The path must resolve to a writable file path within KiFS. A "path prefix" can be specified, which will cause all files written to begin with that prefix, and an appropriate extension will be appended, based on file format.

    For example, a "path prefix" of /data/emp for <file path> might result in the following files being written:

    • kifs://data/emp/employee_12340000.csv
    • kifs://data/emp/employee_12340001.csv
    • kifs://data/emp/employee_12340002.csv
FORMAT

Optional indicator of target file type; default is DELIMITED TEXT.

Supported formats include:

Keyword Description
[DELIMITED] TEXT

A text-based, delimited field data file (CSV, PSV, TSV, etc.); a comma-delimited list of options can be given to specify the way in which the data file(s) should be written, including the delimiter used, whether headers are present, etc. Records spanning multiple lines are not supported.

Note

Non-numeric fields will be double-quoted, and double quotes in the data will be escaped with two double quotes.

See Delimited Text Options for the complete list of <delimited text options>.

PARQUET Apache Parquet data file

REMOTE TABLE

[<schema name>.]<table name>

Target table specification clause, where [<schema name>.]<table name> is the name of the table (and, optionally, its schema) in the remote database into which data will be exported.

Data will be exported using an INSERT statement of the form:

INSERT INTO [<schema name>.]<table name> VALUES ($1, ..., $N)

This default form can be modified to use the ? parameter form via the USE_INDEXED_PARAMETERS option.

Note

This clause is mutually exclusive with the FILE PATH & REMOTE QUERY clauses.

REMOTE QUERY

'<insert statement>'

Target insert specification clause, where <data insert> is a SQL INSERT statement defining the way in which data will be loaded into the remote database. The target table named in the INSERT statement must already exist in the remote database.

Note

This clause is mutually exclusive with the FILE PATH & REMOTE TABLE clauses.

WITH OPTIONS

Optional indicator that a comma-delimited list of connection & global option/value assignments will follow.

See Export Options for the complete list of options.

Delimited Text Options

The following options can be specified when exporting data into delimited text files.

Option Description
DELIMITER = '<char>'

Use char as the target file field delimiter.

The default delimiter is a comma, unless the target file has one of these extensions:

  • .psv - will cause | to be the delimiter
  • .tsv - will cause the tab character to be the delimiter

Note

Since non-numeric fields are automatically double-quoted when written out, it is not recommended to use a double quote character as the delimiter.

See Delimited Text Option Characters for allowed characters.

HEADER DELIMITER = '<char>'

Use char as the target file header field name/property delimiter, when writing the header in Kinetica custom format (when KINETICA_HEADER is TRUE).

The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters.

Note

The DELIMITER character will still be used to separate field name/property sets from each other in the header row

INCLUDES HEADER = <TRUE|FALSE>

Declare that the target file(s) will or will not have a header.

The default is TRUE.

KINETICA_HEADER = <TRUE|FALSE>

Declare that the target file(s) will or will not have a custom Kinetica header.

The default is FALSE.

In the custom Kinetica format, each field's header contains the field name and any associated column properties delimited by a different character than the one used to separate field values.

An example Kinetica header in a CSV file:

1
id|int|data,category|string|data|char16,name|string|data|char32

NULL = '<string>'

Treat string as the indicator of a null field value.

The default is the empty string.

Delimited Text Option Characters

For DELIMITER & HEADER DELIMITER, any single character can be used, or any one of the following escaped characters:

Escaped Char Corresponding Target File Character
'' Single quote
\a ASCII bell
\b ASCII backspace
\f ASCII form feed
\t ASCII horizontal tab
\v ASCII vertical tab

For instance, specifying \t for DELIMITER will cause the target files to be written with ASCII horizontal tab characters as delimiter characters.

Export Options

The following options can be specified to modify the way data is written to the target file(s).

Option Description
BATCH SIZE

Use an export batch size of the given number of records.

The default batch size is 1,000,000.

COLUMN FORMATS

Use the given type-specific formatting for the given column when writing target data. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string.

Supported column types include:

Column Type Target Data Format Codes
date

Apply the given date format to the given column.

Common date format codes follow. For the complete list, see Date/Time Conversion Codes.

Code Description
YYYY 4-digit year
MM 2-digit month, where January is 01
DD 2-digit day of the month, where the 1st of each month is 01
time

Apply the given time format to the given column.

Common time format codes follow. For the complete list, see Date/Time Conversion Codes.

Code Description
HH24 24-based hour, where 12:00 AM is 00 and 7:00 PM is 19
MI 2-digit minute of the hour
SS 2-digit second of the minute
MS milliseconds
datetime Apply the given date/time format to the given column.

For example, to write dates from column d of the format 2010.10.30 and times from column t of the 24-hour format 18:36:54.789:

{
    "d": {"date": "YYYY.MM.DD"},
    "t": {"time": "HH24:MI:SS.MS"}
}
DATA SINK Export data through the given data sink. Data sink connect privilege is required when exporting through a data sink.
DEFAULT COLUMN FORMATS

Use the given formats for data being written to target file fields with the corresponding column types. This should be a map of source column type to target format for that type, formatted as a JSON string.

Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option.

For example, to make the default format for writing target data dates like 2010.10.30 and 24-hour times like 18:36:54.789:

{
    "date": "YYYY.MM.DD",
    "time": "HH24:MI:SS.MS",
    "datetime": "YYYY.MM.DD HH24:MI:SS.MS"
}
EXPORT_DDL

Export the source table/query DDL as a separate file, in addition to the data file.

The DDL file will be co-located with the data file(s) and have a .ddl extension.

Default is FALSE.

FIELDS IGNORED BY < POSITION(<col#s>) | NAME(<field names>) > Choose a comma-separated list of columns from the source table/query to ignore, exporting only those columns that are not in the identified list in the order they appear in the table/query. Columns can be identified by either POSITION or NAME. If ignoring by NAME, the specified names must match the source column names exactly.
FIELDS MAPPED BY < POSITION(<col#s>) | NAME(<field names>) > Choose a comma-separated list of columns from the source table/query to export, in the specified order, identifying fields by either POSITION or NAME. If mapping by NAME, the specified names must match the source column names exactly.
FILE_EXTENSION Extension to give the exported file. Usually used when FILE PATH specifies a directory name or a "path prefix". Default is .csv.
SINGLE_FILE

Export records to a single file. This option may be ignored if the file size exceeds the configured file size limits.

Default is TRUE.

USE_INDEXED_PARAMETERS

Use $N parameter styling when generating insert statements instead of using ?. Most databases support the former; for those that only support the latter, set this to FALSE.

Note

This options is only used with the REMOTE TABLE clause.

Default is TRUE.

USE_ST_GEOMFROM_CASTS

Cast geometry data stored in string format (WKTs) or byte format (WKBs) to a geometry type during export. Some databases take neither string nor byte formatted geometry objects; the cast will align outgoing geometries to an acceptable type for those databases.

Note

This options is only used with the REMOTE TABLE clause.

Default is FALSE.

Examples

Export Table to File (KiFS)

To export a table of employees to a file in KiFS:

EXPORT...INTO (File from Table) Example
1
2
EXPORT TABLE employee
INTO FILE PATH 'kifs://data/employee_export.csv'

Export Query to File (KiFS)

To export a query of employees in department 2 to a Parquet file in KiFS:

EXPORT...INTO (File from Query) Example
1
2
3
4
5
6
7
8
EXPORT QUERY
(
    SELECT id, manager_id, first_name, last_name, salary, hire_date
    FROM employee
    WHERE dept_id = 2
)
INTO FILE PATH '/data/employee_export.parquet'
FORMAT PARQUET

Export Table via Data Sink

To export a table of employees through the JDBC data sink jdbc_dsink, into a remote database table named example.remote_employee:

EXPORT...INTO (Data Sink Table from Table) Example
1
2
3
EXPORT TABLE employee
INTO REMOTE TABLE 'example.remote_employee'
WITH OPTIONS (DATASINK_NAME = 'example.jdbc_dsink')

Export Table via Data Sink DML

To export a table of employees through the JDBC data sink jdbc_dsink, into a remote database table named example.remote_employee, using a custom INSERT statement:

EXPORT...INTO (Data Sink DML from Table) Example
1
2
3
EXPORT TABLE employee
INTO REMOTE QUERY 'INSERT INTO example.remote_employee VALUES (?, ?, ?, ?, ?, ?, ?)'
WITH OPTIONS (DATASINK_NAME = 'example.jdbc_dsink')

Export Query via Data Sink DML

To export a query of employees in department 2 through the JDBC data sink jdbc_dsink, into a remote database table named example.remote_employee_dept2, using a custom INSERT statement:

EXPORT...INTO (Data Sink DML from Query) Example
1
2
3
4
5
6
7
8
EXPORT QUERY
(
    SELECT id, manager_id, first_name, last_name, salary, hire_date
    FROM employee
    WHERE dept_id = 2
)
INTO REMOTE QUERY 'INSERT INTO example.remote_employee_dept2 VALUES (?, ?, ?, ?, ?, ?)'
WITH OPTIONS (DATASINK_NAME = 'example.jdbc_dsink')