EXPORT … INTO
Kinetica can export data from one or more tables, using anEXPORT ... INTO
statement. The data can be exported to either of the following:
- KiFS
-
a data sink configured to allow write
access to:
- remote files on Azure, GCS, HDFS, or S3
- a writable remote database table via JDBC
For contextualized examples, see Examples.
For copy/paste examples, see Exporting Data.
EXPORT ... INTO Syntax
No automatic data type transformations are done between the local data
and remote column types. If any transformations are necessary, the
QUERY clause should be used and the transformation done within it.Parameters
TABLE
TABLE
Source data specification clause, where
[<schema name>.]<table name> is the name of the table (and, optionally,
its schema) to export into the remote database.This clause is mutually exclusive with the
QUERY clause.QUERY
QUERY
Source data specification clause, where
<data query> is a SQL SELECT statement to
run locally to generate a result set to export into the remote database.This clause is mutually exclusive with the
TABLE clause.FILE PATH
FILE PATH
Target file specification clause, where The form of a file path is dependent on the target referenced:
<file path> is a single-quoted file path to
which data will be written.This clause is mutually exclusive with the
REMOTE TABLE & REMOTE QUERY
clauses, and is only applicable when exporting to an Azure, GCS, HDFS, or
S3 data sink, or to KiFS.-
Data Sink: If a data sink is specified in the
export options this file path must resolve to a writable
file/directory at that data sink location. A “path prefix” can be specified, which
will cause all files written to begin with that prefix, and an appropriate extension will
be appended, based on file format.
For example, a “path prefix” of
/data/empfor<file path>might result in the following files being written when exporting theemployeetable multiple times:/data/emp/employee_123450000.csv/data/emp/employee_123460000.csv/data/emp/employee_123480000.csv
-
KiFS: The path must resolve to a writable file path within
KiFS. A “path prefix” can be specified, which will cause all files written to begin
with that prefix, and an appropriate extension will be appended, based on file format.
For example, a “path prefix” of
/data/empfor<file path>might result in the following files being written when exporting theemployeetable multiple times:kifs://data/emp/employee_123450000.csvkifs://data/emp/employee_123460000.csvkifs://data/emp/employee_123480000.csv
SINGLE_FILE option can be used to create one file with no suffix appended to the
base file name, or to create multiple smaller files to increase the export performance.FORMAT
FORMAT
Optional indicator of target file type; default is
DELIMITED TEXT.Supported formats include:| Keyword | Description |
|---|---|
[DELIMITED] TEXT | A text-based, delimited field data file (CSV, PSV, TSV, etc.); a comma-delimited list of options can be given to specify the way in which the data file(s) should be written, including the delimiter used, whether headers are present, etc. Records spanning multiple lines are not supported. Non-numeric fields will be double-quoted, and double quotes in the data will be escaped with two double quotes. See Delimited Text Options for the complete list of <delimited text options>. |
PARQUET | Apache Parquet data file |
REMOTE TABLE
REMOTE TABLE
Target table specification clause, where This default form can be modified to use the
[<schema name>.]<table name> is the name of the table (and, optionally,
its schema) in the remote database into which data will be exported.Data will be exported using an INSERT statement of the form:? parameter form via the
USE_INDEXED_PARAMETERS option.This clause is mutually exclusive with the
FILE PATH & REMOTE QUERY
clauses.REMOTE QUERY
REMOTE QUERY
Target insert specification clause, where
<insert statement> is a SQL INSERT
statement defining the way in which data will be loaded into the remote database. The
target table named in the INSERT statement must already exist in the remote database.This clause is mutually exclusive with the
FILE PATH & REMOTE TABLE
clauses.WITH OPTIONS
WITH OPTIONS
Optional indicator that a comma-delimited list of connection & global option/value
assignments will follow.See Export Options for the complete list of options.
Delimited Text Options
The following options can be specified when exporting data into delimited text files.DELIMITER = '<char>'
DELIMITER = '<char>'
Use See Delimited Text Option Characters for allowed characters.
char as the target file field delimiter.The default delimiter is a comma, unless the target file has one of these extensions:.psv- will cause|to be the delimiter.tsv- will cause the tab character to be the delimiter
Since non-numeric fields are automatically double-quoted when written out, it is
not recommended to use a double quote character as the delimiter.
HEADER DELIMITER = '<char>'
HEADER DELIMITER = '<char>'
Use
char as the target file header field name/property delimiter, when writing the
header in Kinetica custom format (when KINETICA_HEADER is TRUE).The default is the | (pipe) character. See Delimited Text Option Characters for
allowed characters.The
DELIMITER character will still be used to separate field
name/property sets from each other in the header rowINCLUDES HEADER = <TRUE|FALSE>
INCLUDES HEADER = <TRUE|FALSE>
Declare that the target file(s) will or will not have a header.The default is
TRUE.KINETICA_HEADER = <TRUE|FALSE>
KINETICA_HEADER = <TRUE|FALSE>
Declare that the target file(s) will or will not have a custom Kinetica header.The default is
FALSE.In the custom Kinetica format, each field’s header contains the field name and any
associated column properties delimited by a different character than the one used to
separate field values.An example Kinetica header in a CSV file:NULL = '<string>'
NULL = '<string>'
Treat
string as the indicator of a null field value.The default is the empty string.Delimited Text Option Characters
ForDELIMITER & HEADER DELIMITER, any single character can be used, or
any one of the following escaped characters:
| Escaped Char | Corresponding Target File Character |
|---|---|
'' | Single quote |
\a | ASCII bell |
\b | ASCII backspace |
\f | ASCII form feed |
\t | ASCII horizontal tab |
\v | ASCII vertical tab |
\t for DELIMITER will cause the target files to
be written with ASCII horizontal tab characters as delimiter characters.
Export Options
The following options can be specified to modify the way data is written to the target. Which options apply to which target types is noted below.| Option | Target | Description |
|---|---|---|
BATCH SIZE | Both | Use an export batch size of the given number of records. The default batch size is export target dependent: * Local file (KiFS): 1,000,000 * Remote table: 200,000 |
COLUMN FORMATS | File | Use the given type-specific formatting for the given column when writing target data. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string. Supported column types include: date: Apply the given date format to the given column. Common date format codes follow. For the complete list, see Date/Time Conversion Codes . +----------+----------------------------------------------------------+ | Code | Description | +==========+==========================================================+ | YYYY | 4-digit year | +----------+----------------------------------------------------------+ | MM | 2-digit month, where January is 01 | +----------+----------------------------------------------------------+ | DD | 2-digit day of the month, where the 1st | | | of each month is 01 | +----------+----------------------------------------------------------+ time: Apply the given time format to the given column. Common time format codes follow. For the complete list, see Date/Time Conversion Codes . +----------+----------------------------------------------------------+ | Code | Description | +==========+==========================================================+ | HH24 | 24-based hour, where 12:00 AM is 00 and 7:00 PM | | | is 19 | +----------+----------------------------------------------------------+ | MI | 2-digit minute of the hour | +----------+----------------------------------------------------------+ | SS | 2-digit second of the minute | +----------+----------------------------------------------------------+ | MS | milliseconds | +----------+----------------------------------------------------------+ datetime: Apply the given date/time format to the given column. For example, to write dates from column d of the format 2010.10.30 and times from column t of the 24-hour format 18:36:54.789: { "d": {"date": "YYYY.MM.DD"}, "t": {"time": "HH24:MI:SS.MS"} } |
COMPRESSION_TYPE | File | Compress the data with the given type. uncompressed: Don’t compress the data. Default for text files. gzip: Use GZip compression on the data. snappy: Use Snappy compression on the data. Default for Parquet files, and only applicable to Parquet files. |
DATA SINK | Table | Export data through the given data sink. Data sink connect privilege is required when exporting through a data sink. Local table data sinks are not supported. |
DEFAULT COLUMN FORMATS | File | Use the given formats for data being written to target file fields with the corresponding column types. This should be a map of source column type to target format for that type, formatted as a JSON string. Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option. For example, to make the default format for writing target data dates like 2010.10.30 and 24-hour times like 18:36:54.789: { "date": "YYYY.MM.DD", "time": "HH24:MI:SS.MS", "datetime": "YYYY.MM.DD HH24:MI:SS.MS" } |
EXPORT_DDL | File | Export the source table/query DDL as a separate file, in addition to the data file. The DDL file will be co-located with the data file(s) and have a .ddl extension. Default is FALSE. |
FIELDS IGNORED BY | File | Choose a comma-separated list of columns from the source table/query to ignore, exporting only those columns that are not in the identified list in the order they appear in the table/query. Columns can be identified by either POSITION or NAME. If ignoring by NAME, the specified names must match the source column names exactly. * Identifying by Name: FIELDS IGNORED BY NAME(Category, Description) * Identifying by Position: FIELDS IGNORED BY POSITION(3, 4) |
FIELDS MAPPED BY | File | Choose a comma-separated list of columns from the source table/query to export, in the specified order, identifying fields by either POSITION or NAME. If mapping by NAME, the specified names must match the source column names exactly. * Identifying by Name: FIELDS MAPPED BY NAME(ID, Name, Stock) * Identifying by Position: FIELDS MAPPED BY POSITION(1, 2, 5) |
FILE_EXTENSION | File | Extension to give the exported file. Usually used when FILE PATH specifies a directory name or a “path prefix”. Default is .csv. |
JDBC_CONNECTION_INIT_STATEMENT | Table | Run the single given statement before the export command is performed. This runs after the statement in JDBC_SESSION_INIT_STATEMENT, if specified, and runs before any distributed data transfer sessions are initiated. For example, to create a target table in the remote database before exporting data: JDBC_CONNECTION_INIT_STATEMENT = 'CREATE TABLE g (wkt WKT)' This option is only available for data sources configured for JDBC. |
JDBC_SESSION_INIT_STATEMENT | Table | Run the single given statement before transferring data in each of the distributed JDBC sessions supporting the export. This statement will also be run before any JDBC_CONNECTION_INIT_STATEMENT, if specified. For example, to set the time zone to UTC before transferring the data, use: JDBC_SESSION_INIT_STATEMENT = 'SET TIME ZONE ''UTC''' This option is only available for data sources configured for JDBC. |
SINGLE_FILE | File | Directs the file name format and number of files into which data will be exported. Default is true. An error will be returned if the option is true and the file size exceeds the configured limit. true: Write records to a single file, appending a numeric code to the end of the file name base, in order to avoid collisions with subsequent exports to the same file name. false: Write records to multiple files, one for each worker node in the cluster, appending a numeric code to the end of the file names generated, in order to avoid collisions with subsequent exports to the same file name. overwrite: Write records to a single file, with the exact file name used in the command. Subsequent exports to the same file name will result in the file being overwritten each time. For example, exporting to /export/data.csv might result in the following file names: true: /export/data_123450000.csv false: /export/data_123451000.csv /export/data_123452000.csv overwrite: /export/data.csv |
SINGLE_FILE_MAX_SIZE | File | Max file size (in MB) to allow when saving a single file. May be overridden by target limitations. Default is configured by the system egress_single_file_max_size parameter. See External Files. |
USE_INDEXED_PARAMETERS | Table | Use $N parameter styling when generating insert statements instead of using ?. Most databases support the former; for those that only support the latter, set this to FALSE. Default is TRUE. |
USE_ST_GEOMFROM_CASTS | Table | Cast geometry data stored in string format (WKTs) or byte format (WKBs) to a geometry type during export. Some databases take neither string nor byte formatted geometry objects; the cast will align outgoing geometries to an acceptable type for those databases. Default is FALSE. |
Examples
Export Table to File (KiFS)
To export a table of employees to a file in KiFS:EXPORT ... INTO (File from Table) Example
Export Query to File (KiFS)
To export a query of employees in department 2 to a Parquet file in KiFS:EXPORT ... INTO (File from Query) Example
Export Table via Data Sink
To export a table of employees through the JDBC data sinkjdbc_dsink, into a remote database table named
example.remote_employee:
EXPORT ... INTO (Data Sink Table from Table) Example
Export Table via Data Sink DML
To export a table of employees through the JDBC data sinkjdbc_dsink, into a remote database table named
example.remote_employee, using a custom INSERT statement:
EXPORT ... INTO (Data Sink DML from Table) Example
Export Query via Data Sink DML
To export a query of employees in department 2 through the JDBC data sinkjdbc_dsink, into a remote database table named
example.remote_employee_dept2, using a custom INSERT statement:
EXPORT ... INTO (Data Sink DML from Query) Example