Exporting Data

EXPORT ... INTO

Kinetica can export data from one or more tables, using an EXPORT ... INTO statement. The data can be exported to either of the following:

  • KiFS
  • a data sink configured to allow write access to:
    • remote files on Azure, GCS, HDFS, or S3
    • a writable remote database table via JDBC (or CData JDBC)

Note

For contextualized examples, see Examples. For copy/paste examples, see Exporting Data.

EXPORT ... INTO Syntax
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
EXPORT
<
   TABLE [<schema name>.]<table name>
   |
   QUERY (<data query>)
>
INTO
<
   FILE PATH '<file path>'
      [FORMAT <[DELIMITED] TEXT [(<delimited text options>)] | PARQUET>]
   |
   REMOTE TABLE '[<schema name>.]<table name>'
   |
   REMOTE QUERY '<insert statement>'
>
[WITH OPTIONS (<export option name> = '<export option value>'[,...])]

Note

No automatic data type transformations are done between the local data and remote column types. If any transformations are necessary, the QUERY clause should be used and the transformation done within it.

Parameters Description
TABLE

Source data specification clause, where [<schema name>.]<table name> is the name of the table (and, optionally, its schema) to export into the remote database.

Note

This clause is mutually exclusive with the QUERY clause.

QUERY

Source data specification clause, where <data query> is a SQL SELECT statement to run locally to generate a result set to export into the remote database.

Note

This clause is mutually exclusive with the TABLE clause.

FILE PATH

Target file specification clause, where <file path> is a single-quoted file path to which data will be written.

Note

This clause is mutually exclusive with the REMOTE TABLE & REMOTE QUERY clauses, and is only applicable when exporting to an Azure, GCS, HDFS, or S3 data sink, or to KiFS.

The form of a file path is dependent on the target referenced:

  • Data Sink: If a data sink is specified in the export options this file path must resolve to a writable file/directory at that data sink location. A "path prefix" can be specified, which will cause all files written to begin with that prefix, and an appropriate extension will be appended, based on file format.

    For example, a "path prefix" of /data/emp for <file path> might result in the following files being written when exporting the employee table multiple times:

    • /data/emp/employee_123450000.csv
    • /data/emp/employee_123460000.csv
    • /data/emp/employee_123480000.csv

    If using an HDFS data sink, the "path prefix" must be the name of an HDFS directory.

  • KiFS: The path must resolve to a writable file path within KiFS. A "path prefix" can be specified, which will cause all files written to begin with that prefix, and an appropriate extension will be appended, based on file format.

    For example, a "path prefix" of /data/emp for <file path> might result in the following files being written when exporting the employee table multiple times:

    • kifs://data/emp/employee_123450000.csv
    • kifs://data/emp/employee_123460000.csv
    • kifs://data/emp/employee_123480000.csv

The SINGLE_FILE option can be used to create one file with no suffix appended to the base file name, or to create multiple smaller files to increase the export performance.

FORMAT

Optional indicator of target file type; default is DELIMITED TEXT.

Supported formats include:

Keyword Description
[DELIMITED] TEXT

A text-based, delimited field data file (CSV, PSV, TSV, etc.); a comma-delimited list of options can be given to specify the way in which the data file(s) should be written, including the delimiter used, whether headers are present, etc. Records spanning multiple lines are not supported.

Note

Non-numeric fields will be double-quoted, and double quotes in the data will be escaped with two double quotes.

See Delimited Text Options for the complete list of <delimited text options>.

PARQUET Apache Parquet data file
REMOTE TABLE

Target table specification clause, where [<schema name>.]<table name> is the name of the table (and, optionally, its schema) in the remote database into which data will be exported.

Data will be exported using an INSERT statement of the form:

INSERT INTO [<schema name>.]<table name> VALUES ($1, ..., $N)

This default form can be modified to use the ? parameter form via the USE_INDEXED_PARAMETERS option.

Note

This clause is mutually exclusive with the FILE PATH & REMOTE QUERY clauses.

REMOTE QUERY

Target insert specification clause, where <insert statement> is a SQL INSERT statement defining the way in which data will be loaded into the remote database. The target table named in the INSERT statement must already exist in the remote database.

Note

This clause is mutually exclusive with the FILE PATH & REMOTE TABLE clauses.

WITH OPTIONS

Optional indicator that a comma-delimited list of connection & global option/value assignments will follow.

See Export Options for the complete list of options.

Delimited Text Options

The following options can be specified when exporting data into delimited text files.

Option Description
DELIMITER = '<char>'

Use char as the target file field delimiter.

The default delimiter is a comma, unless the target file has one of these extensions:

  • .psv - will cause | to be the delimiter
  • .tsv - will cause the tab character to be the delimiter

Note

Since non-numeric fields are automatically double-quoted when written out, it is not recommended to use a double quote character as the delimiter.

See Delimited Text Option Characters for allowed characters.

HEADER DELIMITER = '<char>'

Use char as the target file header field name/property delimiter, when writing the header in Kinetica custom format (when KINETICA_HEADER is TRUE).

The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters.

Note

The DELIMITER character will still be used to separate field name/property sets from each other in the header row

INCLUDES HEADER = <TRUE|FALSE>

Declare that the target file(s) will or will not have a header.

The default is TRUE.

KINETICA_HEADER = <TRUE|FALSE>

Declare that the target file(s) will or will not have a custom Kinetica header.

The default is FALSE.

In the custom Kinetica format, each field's header contains the field name and any associated column properties delimited by a different character than the one used to separate field values.

An example Kinetica header in a CSV file:

1
id|int|data,category|string|data|char16,name|string|data|char32

NULL = '<string>'

Treat string as the indicator of a null field value.

The default is the empty string.

Delimited Text Option Characters

For DELIMITER & HEADER DELIMITER, any single character can be used, or any one of the following escaped characters:

Escaped Char Corresponding Target File Character
'' Single quote
\a ASCII bell
\b ASCII backspace
\f ASCII form feed
\t ASCII horizontal tab
\v ASCII vertical tab

For instance, specifying \t for DELIMITER will cause the target files to be written with ASCII horizontal tab characters as delimiter characters.

Export Options

The following options can be specified to modify the way data is written to the target. Which options apply to which target types is noted below.

Option Target Description
BATCH SIZE Both

Use an export batch size of the given number of records.

The default batch size is export target dependent:

  • Local file (KiFS): 1,000,000
  • Remote table: 200,000
COLUMN FORMATS File

Use the given type-specific formatting for the given column when writing target data. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string.

Supported column types include:

Column Type Target Data Format Codes
date

Apply the given date format to the given column.

Common date format codes follow. For the complete list, see Date/Time Conversion Codes.

Code Description
YYYY 4-digit year
MM 2-digit month, where January is 01
DD 2-digit day of the month, where the 1st of each month is 01
time

Apply the given time format to the given column.

Common time format codes follow. For the complete list, see Date/Time Conversion Codes.

Code Description
HH24 24-based hour, where 12:00 AM is 00 and 7:00 PM is 19
MI 2-digit minute of the hour
SS 2-digit second of the minute
MS milliseconds
datetime Apply the given date/time format to the given column.

For example, to write dates from column d of the format 2010.10.30 and times from column t of the 24-hour format 18:36:54.789:

{
    "d": {"date": "YYYY.MM.DD"},
    "t": {"time": "HH24:MI:SS.MS"}
}
COMPRESSION_TYPE File

Compress the data with the given type.

Type Description
uncompressed Don't compress the data. Default for text files.
gzip Use GZip compression on the data.
snappy Use Snappy compression on the data. Default for Parquet files, and only applicable to Parquet files.
DATA SINK Table Export data through the given data sink. Data sink connect privilege is required when exporting through a data sink.
DEFAULT COLUMN FORMATS File

Use the given formats for data being written to target file fields with the corresponding column types. This should be a map of source column type to target format for that type, formatted as a JSON string.

Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option.

For example, to make the default format for writing target data dates like 2010.10.30 and 24-hour times like 18:36:54.789:

{
    "date": "YYYY.MM.DD",
    "time": "HH24:MI:SS.MS",
    "datetime": "YYYY.MM.DD HH24:MI:SS.MS"
}
EXPORT_DDL File

Export the source table/query DDL as a separate file, in addition to the data file.

The DDL file will be co-located with the data file(s) and have a .ddl extension.

Default is FALSE.

FIELDS IGNORED BY File

Choose a comma-separated list of columns from the source table/query to ignore, exporting only those columns that are not in the identified list in the order they appear in the table/query. Columns can be identified by either POSITION or NAME. If ignoring by NAME, the specified names must match the source column names exactly.

  • Identifying by Name:

    FIELDS IGNORED BY NAME(Category, Description)
    
  • Identifying by Position:

    FIELDS IGNORED BY POSITION(3, 4)
    
FIELDS MAPPED BY File

Choose a comma-separated list of columns from the source table/query to export, in the specified order, identifying fields by either POSITION or NAME. If mapping by NAME, the specified names must match the source column names exactly.

  • Identifying by Name:

    FIELDS MAPPED BY NAME(ID, Name, Stock)
    
  • Identifying by Position:

    FIELDS MAPPED BY POSITION(1, 2, 5)
    
FILE_EXTENSION File Extension to give the exported file. Usually used when FILE PATH specifies a directory name or a "path prefix". Default is .csv.
JDBC_CONNECTION_INIT_STATEMENT Table

Run the single given statement before the export command is performed. This runs after the statement in JDBC_SESSION_INIT_STATEMENT, if specified, and runs before any distributed data transfer sessions are initiated.

For example, to create a target table in the remote database before exporting data:

JDBC_CONNECTION_INIT_STATEMENT = 'CREATE TABLE g (wkt WKT)'

Note

This option is only available for data sources configured for JDBC.

JDBC_SESSION_INIT_STATEMENT Table

Run the single given statement before transferring data in each of the distributed JDBC sessions supporting the export. This statement will also be run before any JDBC_CONNECTION_INIT_STATEMENT, if specified.

For example, to set the time zone to UTC before transferring the data, use:

JDBC_SESSION_INIT_STATEMENT = 'SET TIME ZONE ''UTC'''

Note

This option is only available for data sources configured for JDBC.

SINGLE_FILE File

Directs the file name format and number of files into which data will be exported.

Default is true. Option is ignored if the file size exceeds the configured limits.

Mode Description
true Write records to a single file, appending a numeric code to the end of the file name base, in order to avoid collisions with subsequent exports to the same file name.
false Write records to multiple files, one for each worker node in the cluster, appending a numeric code to the end of the file names generated, in order to avoid collisions with subsequent exports to the same file name.
overwrite Write records to a single file, with the exact file name used in the command. Subsequent exports to the same file name will result in the file being overwritten each time.

For example, exporting to /export/data.csv might result in the following file names:

Mode File Name(s)
true /export/data_123450000.csv
false

/export/data_123451000.csv

/export/data_123452000.csv

overwrite /export/data.csv
USE_INDEXED_PARAMETERS Table

Use $N parameter styling when generating insert statements instead of using ?. Most databases support the former; for those that only support the latter, set this to FALSE.

Default is TRUE.

USE_ST_GEOMFROM_CASTS Table

Cast geometry data stored in string format (WKTs) or byte format (WKBs) to a geometry type during export. Some databases take neither string nor byte formatted geometry objects; the cast will align outgoing geometries to an acceptable type for those databases.

Default is FALSE.

Examples

Export Table to File (KiFS)

To export a table of employees to a file in KiFS:

EXPORT ... INTO (File from Table) Example
1
2
EXPORT TABLE employee
INTO FILE PATH '/export/employee.csv'

Export Query to File (KiFS)

To export a query of employees in department 2 to a Parquet file in KiFS:

EXPORT ... INTO (File from Query) Example
1
2
3
4
5
6
7
8
EXPORT QUERY
(
    SELECT id, manager_id, first_name, last_name, salary, hire_date
    FROM employee
    WHERE dept_id = 2
)
INTO FILE PATH '/export/employee.parquet'
FORMAT PARQUET

Export Table via Data Sink

To export a table of employees through the JDBC data sink jdbc_dsink, into a remote database table named example.remote_employee:

EXPORT ... INTO (Data Sink Table from Table) Example
1
2
3
EXPORT TABLE employee
INTO REMOTE TABLE 'example.remote_employee'
WITH OPTIONS (DATASINK_NAME = 'example.jdbc_dsink')

Export Table via Data Sink DML

To export a table of employees through the JDBC data sink jdbc_dsink, into a remote database table named example.remote_employee, using a custom INSERT statement:

EXPORT ... INTO (Data Sink DML from Table) Example
1
2
3
EXPORT TABLE employee
INTO REMOTE QUERY 'INSERT INTO example.remote_employee VALUES (?, ?, ?, ?, ?, ?, ?)'
WITH OPTIONS (DATASINK_NAME = 'example.jdbc_dsink')

Export Query via Data Sink DML

To export a query of employees in department 2 through the JDBC data sink jdbc_dsink, into a remote database table named example.remote_employee_dept2, using a custom INSERT statement:

EXPORT ... INTO (Data Sink DML from Query) Example
1
2
3
4
5
6
7
8
EXPORT QUERY
(
    SELECT id, manager_id, first_name, last_name, salary, hire_date
    FROM employee
    WHERE dept_id = 2
)
INTO REMOTE QUERY 'INSERT INTO example.remote_employee_dept2 VALUES (?, ?, ?, ?, ?, ?)'
WITH OPTIONS (DATASINK_NAME = 'example.jdbc_dsink')