Kinetica Input/Output (KIO)¶

KIO provides the ability to import data to and export data from Kinetica; it comes pre-installed and ready to use. KIO can transfer data between external sources directly, bypassing Kinetica if necessary. A simpler, GUI version of KIO is available in the the KIO Tool section of GAdmin.

Command Line Usage¶

KIO can be accessed via command line from /opt/gpudb/bin/kio

Available Options:

Parameter	Description
`-h`, `--help`	Displays the help menu
`-v`, `--version`	Displays the version
`-s <uri>`, `--source <uri>`	The source URI. Use `--show-usage` to see more information about the URI types for supported data formats.
`-d <uri>`, `--destination <uri>`	The destination URI. Use `--show-usage` to see more information about the URI types for supported data formats.
`--show-usage <format>`	Displays detailed usage information for the given format. See the list of formats for more information about the format and format-specific options.
`--failure-mode <mode>`	Change the behavior of KIO upon encountering data parsing errors. There are two modes: `graceful` (default) -- Continue ingest when parsing errors occur (only available for `csv`, `kinetica`, `parquetdataset`, and `shapefile` formats) `fail-fast` -- Cease ingest when parsing errors occur
`--batch-size <num>`	Set the number of records to be fetched at a time from the given source. Maximum of `100000`. When using `odo` for `--engine` type (see below), the `--batch-size` option is only supported for the following formats: `kinetica` (as a `source` or `destination`) `csv` (`source` only) When using `spark` for `--engine` type, the `--batch-size` option is supported for all formats.
`--engine <type>`	Change the underlying engine used to ingest records from a given source into Kinetica. The following engines are available: `odo` -- Ingest/egress with Odo, a Python-based Extract, Transform, and Load (ETL) engine. `spark` (default) -- Ingest/egress with Apache Spark and the Kinetica Spark Connector. The Spark engine is generally faster for both read and write operations.
`--spark-network-timeout <seconds>`	Set a network timeout in seconds. This is useful for regulating Spark executor heartbeat failures. Default is `800` seconds.
`--spark-driver-memory <memory>`	Set the Spark driver memory usage limit. Default is `2` GB.
`--spark-executor-memory <memory>`	Set the Spark executor memory usage limit. Default is `2` GB.
`--spark-offheap-memory <memory>`	Set the Spark offheap memory usage limit. Default is `4` GB.
`--log-level <level>`	Set the log level for the KIO transaction. Valid log levels: `CRITICAL` `ERROR` `WARNING` `INFO` (default) `DEBUG` Tip The `KIO_LOGLEVEL` environment variable can be set to avoid specifying the log level for each KIO transaction.
`--log-file <file-path>`	Set the file to write logs to for the KIO transaction. If the file does not exist, it will be created. If the file does exist, logs will be appended to the file. By default, all logs are directed to standard output. Tip The `KIO_LOGFILE` environment variable can be set to avoid specifying the log file for each KIO transaction.
`--d-column-properties <list>`	Enables overriding destination column properties for Kinetica to PostgreSQL transfers. The column properties list is passed in as a stringified JSON list of dictionaries, e.g.: [{"column_name1": "<name>", "column_base_type": "<type>"}, {"column_name2" ...}, ...] Note Columns of `string\|wkt` and `bytes\|wkt` in Kinetica can convert to `geometry`, `geography`, and `bytea` in PostgreSQL to support PostGIS.

Format Types¶

The following list details the available formats for each type.

Database formats:

Kinetica
Microsoft SQL Server
Oracle
PostgreSQL / PostGIS
Teradata
Sybase IQ

File formats:

Apache ORC
Apache Parquet
Apache Parquet Dataset
CSV
Shapefile

Engine Compatibility¶

The following sections detail which engines can support transferring data to (import) or from (export) Kinetica.

Import (into Kinetica)¶

Important

Both the Spark and Odo engines can import from other Kinetica tables.

The Spark engine can transfer data from the following data format types to Kinetica:

Apache Parquet (AWS S3 or local)
CSV (AWS S3 or local)
Microsoft SQL Server
Oracle
PostgreSQL
Sybase IQ
Teradata

The Odo engine can transfer data from the following data format types to Kinetica:

Apache ORC
Apache Parquet Dataset (AWS S3 or local)
CSV (AWS S3 or local)
Shapefile

Export (out of Kinetica)¶

Important

Both the Spark and Odo engines can export to a Kinetica table.

The Spark engine can transfer data from Kinetica to the following data format types:

PostgreSQL

The Odo engine can transfer data from Kinetica to the following data format types:

Apache ORC
Apache Parquet (local)

Important

The source data cannot be transferred to a local parquet file if the data contains bytes or timestamp data
CSV (AWS S3, local)
Microsoft SQL Server
Oracle
PostgreSQL
Shapefile (AWS S3, local)

Inferred Types¶

Because the supported formats can differ drastically from one another, KIO has to normalize the types of data being input so it can be output to the destination format appropriately. The table below describes the type inferred by KIO from each source format.

Source	Source Type	KIO Inferred Type
CSV	Numerical value without decimal places, e.g., `123456`	`int64`
	Numerical value with decimal places, e.g., `123.456`	`float64`
	Value containing string characters or symbols (`A-Z`, `a-z`, `-`, etc.)	`string`
	Time value, e.g., `23:45:15.848`	`datetime`
	Date (`2008-04-15`) or datetime (`2009-04-16 21:00:15.325`) value	`string`
	Empty or null (or null string) value	`string`

Database Formats¶

Kinetica¶

Imports from or exports to a table in Kinetica. Copying from one table to another is possible. For help with command line usage, provide the --show-usage parameter with the kinetica option.

Important

If graceful failure mode is enabled, KIO will attempt to sanitize column names to fit within the Kinetica column name restrictions that could prevent proper ingest.

Usage¶

Base Kinetica URI:

kinetica://[[<username:password>]@][<host>:<port>]<::table name>

Note

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

Important

If attempting to connect to Kinetica securely, the trusted root certificate must be in the OpenSSL trust store. Also, the port portion of the URI will need to reference the HTTPD proxy port/rank for Kinetica, e.g., localhost:8082/gpudb-0 (the default rank is gpubd-0). See Security Configuration for more details on configuring Kinetica securely.

URI Parameters:

Parameter	Description
`username`	The username used for authentication. Default is blank
`password`	The password used for authentication. Default is blank
`host`	The host to connect to. Default is `localhost`
`port`	The port to connect to. Default is `9191` Note KIO supports connecting to Kinetica via HTTPD proxy (if enabled).
`table name`	The table in the database to import into or export from.

Command Line Options:

Note

For more information on the types of engines, see Command Line Usage.

Options	Engine Availability	Description
`--d-base-type-overrides '<type_definition>'`	Both	Override default base types found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes `""` used for keys/values and single quotes `''` used for the parameter value itself, e.g., --d-base-type-overrides '{"column_name": "int"}' Values should be a string for a given key, as there is a one-to-one mapping between a column and type. To override the base types for multiple columns, add another key/value pair --d-base-type-overrides '{"column_name": "string", "column_name2": "float"}' Note The Kinetica Spark Connector infers `charN` sub-types for `string` type columns by default. Apply the `string` base type override to avoid `charN` inferencing.
`--d-property-additions '<prop_definition>'`	Both	Properties added to the properties found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes `""` used for keys/values being overridden and single quotes `''` used for the parameter value itself, e.g., --d-property-additions '{"column_name": ["data", "int16"]}' Values should be a list of strings for a given key, as there can be multiple property additions for a column. To override the properties for multiple columns, add another key/value pair --d-property-additions '{"column_name": ["char4"], "column_name2": ["data", "int16"]}' All the properties listed under Column Properties are supported when using `odo` for `--engine` type. The properties listed below are supported when using `spark` for `--engine` type: Data Types `wkt` `ipv4` `varchar` Data Handling Data Keys Note The Kinetica Spark Connector infers `charN` sub-types for `string` type columns by default. Apply the `varchar` property addition (Spark engine only) to avoid `charN` inferencing.
`--d-property-overrides '<prop_definition>'`	Both	Override default properties found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes `""` used for keys/values being overridden and single quotes `''` used for the parameter value itself, e.g., --d-property-overrides '{"column_name": ["data", "int16"]}' Values should be a list of strings for a given key, as there can be multiple property additions for a column. To override the properties for multiple columns, add another key/value pair --d-property-overrides '{"column_name": ["char4"], "column_name2": ["data", "int16"]}' Note The Kinetica Spark Connector assumes all columns are nullable by default, so using the Spark engine will automatically apply the `nullable` property to all columns. The Odo engine assumes all columns are not nullable by default.
`--d-dry-run`	`odo`	Generates a stringified JSON schema of the specified Kinetica destination table based on the input source. The schema includes the column's name, column's type, any properties, and whether the column is nullable. Both a source and destination must be specified, but note that no data is transferred/appended and non-existing tables will not be created when this flag is present. Returned schema is a list of Python dictionaries, e.g., [{"is_nullable": false, "column_properties":["data", "int16"], "column_type":"int", "column_name":"a"}]
`--s-https`	`spark`	Generates a connection using HTTPS instead of the default HTTP to the specified Kinetica source table. Note An HTTPS connection requires that the appropriate truststore and password be passed as well. Review the `--s-ssl-httpd-truststore[-password]` parameters below for more information
`--d-https`	`spark`	Generates a connection using HTTPS instead of the default HTTP to the specified Kinetica destination table. Note An HTTPS connection requires that the appropriate truststore and password be passed as well. Review the `--d-ssl-httpd-truststore[-password]` parameters below for more information
`--d-collection-name`	Both	Specifies the collection in which the specified Kinetica destination table should be placed. If the collection does not exist, it will be created. If no collection is specified, the destination table will be placed in the root collection. If the destination table already exists, this option will NOT place the existing table into the specified collection or move the table from its existing collection to the specified collection.
`--d-is-replicated`	`spark`	Specifies whether the destination table should be replicated.
`--d-update-on-existing-pk`	`spark`	Specifies the record collision policy for inserting into a table with a primary key If this flag is present, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record. If this flag is not present, any existing table record with primary key values that match those of a record being inserted will remain unchanged and the new record discarded. If the specified table does not have a primary key, then this flag is ignored.
`--d-jdbc-url '<jdbc-url>'`	`spark`	Specifies a particular JDBC URL to connect to Kinetica. The default JDBC URL is `'jdbc:kinetica://<hostname>:9191'` where `<hostname>` is the hostname referenced in Kinetica URI.
`--d-spark-timeout-milliseconds`	`spark`	Set a connection timeout in milliseconds. The default value is `60000` (1 minute).
`--d-spark-enable-multihead`	`spark`	Enables multi-head ingest. Important Replicated tables are not supported with multi-head ingest.
`--s-ssl-httpd-truststore /file/path/truststore.jks`	`spark`	When Kinetica is the source, filepath to the Java truststore that is then passed to the HTTPD service to verify the SSL certificate that the Kinetica HTTPD server provides. Note This option must be passed in conjuction with the `--s-ssl-httpd-truststore-password` parameter
`--d-ssl-httpd-truststore /file/path/truststore.jks`	`spark`	When Kinetica is the destination, filepath to the Java truststore that is then passed to the HTTPD service to verify the SSL certificate that the Kinetica HTTPD server provides. Note This option must be passed in conjuction with the `--d-ssl-httpd-truststore-password` parameter
`--s-ssl-httpd-truststore-password '<password>'`	`spark`	When Kinetica is the source, the password for the Java truststore passed with the `--s-ssl-httpd-truststore` parameter.
`--d-ssl-httpd-truststore-password '<password>'`	`spark`	When Kinetica is the destination, the password for the Jave truststore passed with the `--d-ssl-httpd-truststore` parameter.

Limitations and Cautions¶

If a table does not exist when importing into Kinetica, the table will be created automatically and:
- it will be created at the root level
- foreign keys are not preserved
- distribution scheme is not preserved (unless using the --d-is-replicated option with spark as the --engine type)-- the table will be randomly sharded by default
To remove properties, specify the --d-property-overrides option and list all properties that should be kept while leaving out any property(ies) that should be removed.
Overriding a base type will remove its existing properties.
If a table is being imported into Kinetica using the Spark engine and its name contains non-alphanumeric special characters, the special characters will be replaced with underscores
If importing a GIS-type column(s) with the --gis-ext-enabled flag, the --d-property-additions parameter must be present containing a map of each GIS-type column to the wkt type, e.g.,
```
--d-property-additions '{"gis_col1": ["wkt"], "gis_col2": ["wkt"]}'
```

Microsoft SQL Server / Spatial Data (GIS) SQL Server¶

Imports from a table in Microsoft SQL Server. For help with command line usage, provide the --show-usage parameter with the sqlserver option.

Important

Microsoft SQL Server / GIS SQL Server is only supported when importing from Microsoft SQL Server to Kinetica with the Spark engine.

Usage¶

Base Microsoft SQL Server URI:

sqlserver://<username>:<password>@<address><:port>/<database_name><::table_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter	Description
`username`	The username used for authentication.
`password`	The password used for authentication.
`address`	The IPV4 address to connect to.
`port`	The port to connect to.
`database_name`	The database name.
`table_name`	The table in the database to import from.

Command Line Options:

Option	Description
`--gis-ext-enabled`	Allows KIO to recognize a GIS-specific extension is enabled so SQL-native GIS types can be read into memory as `wkt` strings for proper conversion into Kinetica. This option must be passed in conjunction with the `kinetica` destination format option `--d-property-additions` to map the incoming GIS-type column(s) to `wkt` type in Kinetica (see Examples for more detail).

Limitations and Cautions¶

KIO does not support Java virtual machines (JVMs) with Federal Information Processing Standards (FIPS) enabled.
KIO does not support extra properties for Microsoft SQL Server availability groups, including cluster or subnet failover.
KIO does not support connecting to Microsoft SQL Server with SSL.

Oracle¶

Imports from a table in an Oracle database. For help with command line usage, provide the --show-usage parameter with the oracle option.

Important

Oracle is only supported when importing from Oracle to Kinetica with the Spark engine.

Usage¶

Base Oracle URI:

oracle://<username>:<password>@<address><:port>/<system_id><::schema_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter	Description
`username`	The username used for authentication.
`password`	The password used for authentication.
`address`	The IPV4 address to connect to.
`port`	The port to connect to.
`system_id`	The system ID to identify a particular Oracle service.
`schema_name`	The schema in the system to import from.

Limitations and Cautions¶

KIO uses the Oracle XE Service to establish a database connection. The listener.ora and tsnames.ora files must be correctly configured to expose the XE Service.
KIO only supports Oracle 12c.
When exporting from Oracle to a given destination, column names will be capitalized. When applying properties to a Kinetica destination table's column names, ensure the column names referenced are also capitalized, e.g.,:
```
--d-property-additions '{"VENDOR_ID": ["text_search"]}'
```

PostgreSQL / PostGIS¶

Imports from or exports to a table in PostgreSQL. Copying from one table to another is possible. For help with command line usage, provide the --show-usage parameter with the postgresql option.

Important

PostGIS is only supported when importing from PostgreSQL to Kinetica with the Spark engine.

Usage¶

Base PostgreSQL URI:

postgresql://<username>:<password>@<host><:port>/<database_name>[::schema_name]<::table_name>

Important

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter	Description
`username`	The username used for authentication. Default is the user currently running KIO
`password`	The password used for authentication.
`host`	The host to connect to. Default is `localhost`
`port`	The port to connect to. Default is `5432`
`database_name`	The database name. Default is the value the `username` parameter is set to
`schema_name`	The schema name. Optional
`table_name`	The table in the database to import into or export from.

Command Line Options:

Option	Description
`--s-case-sensitive`	When the source is PostgreSQL and this flag is passed, case sensitivity will be enabled for table, column, and schema names.
`--d-case-sensitive`	When the destination is PostgreSQL and this flag is passed, case sensitivity will be enabled for table, column, and schema names.
`--s-postgis-schema <schema-name>`	The PostGIS schema name for when the source is PostgreSQL. Default value is `public`.
`--d-postgis-schema <schema-name>`	The PostGIS schema name for when the destination is PostgreSQL. Default value is `public`.
`--gis-ext-enabled`	Allows KIO to recognize a GIS-specific extension is enabled so SQL-native GIS types can be read into memory as `wkt` strings for proper conversion into Kinetica. This option must be passed in conjunction with the `kinetica` destination format option `--d-property-additions` to map the incoming GIS-type column(s) to `wkt` type in Kinetica (see Examples for more detail).
`--s-batch-iterator-column-name <column-name>`	When the source is PostgreSQL, column name for a column that will be used to iterate over data when ingesting into Kinetica. This parameter is required for tables that have more than 10 million records, but tables of any size will receive better ingest performance. The provided column must be an integer (`BIGINT`, `SMALLINT`, or `INTEGER`) column, must not be nullable, and is either `SERIAL` or `UNIQUE`. Providing this option enables KIO to use a more performant query to fetch records. Important This option is only available for PostgreSQL to Kinetica transfers using the Spark engine.

Limitations and Cautions¶

If a table does not exist when importing into PostgreSQL, the table will be created automatically. If attempting to copy from one table to another in PostgreSQL, the output table must already exist before KIO can copy the source table into it.
Case sensitivity is disabled by default. When case sensitivity is disabled, PostgreSQL will query and reference table, column, and schema names by lowercase values.

Sybase IQ¶

Imports from a table in a Sybase IQ database. For help with command line usage, provide the --show-usage parameter with the sybaseiq option.

Important

Sybase IQ is only supported when importing from Sybase IQ to Kinetica with the Spark engine.

Usage¶

Base Sybase IQ URI:

sybaseiq://<username>:<password>@<address><:port>/<database_name><::table_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter	Description
`username`	The username used for authentication.
`password`	The password used for authentication.
`address`	The host to connect to.
`port`	The port to connect to.
`database_name`	The database name.
`table_name`	The table in the database to import from.

Limitations and Cautions¶

KIO will use the default character set of the connecting database; no conversion between character sets is available.
KIO does not define Sybase IQ connection failover behavior.
KIO does not define a timeout for client sessions or SQL queries.
KIO does not support connecting with Sybase IQ using SSL.

Teradata¶

Imports from a table in a Teradata database. For help with command line usage, provide the --show-usage parameter with the teradata option.

Some considerations when connecting to Teradata:

KIO uses the JDBC 4.0 API specification.
KIO connects to the the standard DBC/SQL partition in Teradata.
KIO requires that JDK/JRE version 1.3.2, 5, 6, 7, 8, or 9 are installed and reachable on your machine instance in order to connect to Teradata.

Important

Teradata is only supported when importing from Teradata to Kinetica with the Spark engine.

Usage¶

Base Teradata URI:

teradata://<username>:<password>@<address><:port>/<database_name><::table_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter	Description
`username`	The username used for authentication.
`password`	The password used for authentication.
`host`	The host to connect to.
`port`	The port to connect to.
`database_name`	The database name.
`table_name`	The table in the database to import from. The name must exactly match; predicate matching is not supported.

Limitations and Cautions¶

KIO only supports the UTF-8 character set (CHARSET) when connecting with Teradata.
KIO does not support diagnostics or troubleshooting support via DDSTATS.
KIO does not support data encryption between the Teradata JDBC driver and the Teradata database using ENCRYPTDATA.
KIO does not support additional fractional digits.
KIO does not support modifying users and authentication credentials using NEW_PASSWORD.
KIO only supports Teradata versions 14.00, 14.10, 15.00, 15.10, 16.10, and 16.20.

File Formats¶

Apache ORC¶

Imports from an ORC file; exporting to ORC via KIO is currently not supported. For help with command line usage, provide the --show-usage parameter with the orc option.

Usage¶

URI Parameters:

Parameter	Description
`/file/path/filename.orc`	Absolute filepath to an ORC file

Important

Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.

Limitations and Cautions¶

The following Apache ORC types are not currently supported by KIO when Kinetica is the destination:
- struct
- list
- map
- union
KIO currently does not support referencing ORC files using relative file paths.

Apache Parquet¶

Imports from or exports to a Parquet file; however, exporting to Parquet files in AWS S3 via KIO is currently not supported. Compressed or partitioned Parquet files are also supported. For help with command line usage, provide the --show-usage parameter with the parquet option.

Important

If graceful failure mode is enabled, KIO will not validate the schema of the dataset until time of ingest for a particular partition at which point the partition will be skipped if its schema is invalid.

Tip

Apache Parquet files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage¶

URI Parameters:

Parameter	Description
`/file/path/filename.parquet`	Absolute filepath to a Parquet file

Important

Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.

Command Line Options:

Options Description

--s-recursive

Recursively search a directory for parquet files instead of only searching the top level of the directory specified by the file path parameter. For example, given the directory below:

/path/to/directory/
├── example1.parquet
├── example2.parquet
├── subdirectory/
│   ├── example4.parquet
│   └── example5.parquet
└── example3.parquet

Specifying --source of /path/to/directory/example*.parquet without specifying --s-recursive will register example1.parquet, example2.parquet, and example3.parquet. Specifying --s-recursive will register all files, including example4.parquet and example5.parquet

Limitations and Cautions¶

KIO currently does not support referencing Parquet files using relative file paths.
If the file does not exist when exporting to Parquet, it will be created. If the file does exist, it will be overwritten.
KIO only supports hidden Parquet file ingestion (/file/path/.<file-name>.parquet) using the Odo engine

Apache Parquet Dataset¶

Imports from a Parquet Dataset; exporting to Parquet Datasets via KIO is currently not supported. For help with command line usage, provide the --show-usage parameter with the parquetdataset option.

Tip

Apache Parquet Dataset files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage¶

URI Parameters:

Parameter	Description
`/file/path/directory.parquetdataset`	Absolute path to a Parquet Dataset directory

Limitations and Cautions¶

General:

KIO currently does not support referencing Parquet Datasets using relative directory paths.
KIO will always attempt to ingest the Parquet dataset as a nested directory of partitioned Parquet files. If ingesting a list of absolute file paths to Parquet files, move all the files into a single directory and use /file/path/*.parquet as the source URI instead to ingest the files as chunked Parquet files.
KIO currently does not support inferring the types of partition keys, and as such, the keys are cast as strings. If ingesting the dataset into Kinetica, utilize --d-property-additions, --d-property-overrides, and --d-base-type-overrides to establish the data's types as desired. Refer to Kinetica for more information.
KIO currently does not support reading in specific columns/partition keys from the Parquet Dataset.

AWS S3 Apache Parquet Dataset:

KIO assumes that Parquet Datasets are not S3 buckets but rather a subdirectory (or subdirectories) within an S3 bucket
Interacting with Apache Parquet Datasets in an S3 bucket is a Python 3-specific feature. Note that Python 3 is packaged with KIO by default

CSV¶

Imports from or exports to a CSV file. For help with command line usage, provide the --show-usage parameter with the csv option.

Tip

CSV files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage¶

URI Parameters:

Parameter	Description
`/file/path/filename.csv`	Absolute or relative filepath to a CSV file

Important

Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.

Command Line Options:

Options	Description
`--d-delimiter <delimiter>`	Set the delimiter used in the file when `csv` is the destination format. Any alphanumeric character or symbol is a valid delimiter; default is `,`.
`--s-delimiter <delimiter>`	Set the delimiter used in the file when `csv` is the source format. Important The option must match the delimiter in the file or the import will fail.
`--d-escape-character <character>`	Set the character used to escape error-prone characters, e.g., `,`, `""`, etc., when `csv` is the destination format.
`--s-escape-character <character>`	Set the character used to escape error-prone characters, e.g., `,`, `""`, etc., used in the source `csv` file. Important The option must match the escape character used in the file or the import will fail.
`--s-custom-null-value <value>`	Set the null value literal when `csv` is the source format. Any string value is supported. Important This option is only available when using the Spark engine.
`--s-custom-quote-value <value>`	Set the quote symbol that will replace default double quotes (`"`) when enclosing text fields for data transfers where `csv` is the source format. Any string value is supported. Important This option is only available when using the Spark engine.
`--s-custom-comment-value <value>`	Set the comment symbol when `csv` is the source format. Lines starting with the provided comment symbol are not parsed or ingested. Any string value is supported, but only one character is supported, e.g., `#`, `-`. Important This option is only available when using the Spark engine.
`--s-headerless`	Specifies if the source CSV file is headerless, e.g., the first row in the CSV file is data instead of column headers. By default, Kinetica assumes all source CSV files have headers as the first row.
`--dynamically-infer-wkt-subtypes`	Dynamically infer whether columns with base type `string` should have the `wkt` property applied during ingestion. Important This option is only available for CSV to Kinetica transfers using the Odo engine.
`--s-drop-malformed-rows`	Indicates rows of a source CSV file will not be inserted if either of the following is true: The number of fields in the row does not match the number of fields expected by the type schema One or more field(s') type does not match the existing table's type schema Important This option is only available when using the Spark engine and when `--failure-mode` is set to `graceful`.

Limitations and Cautions¶

If CSV is the destination and the target file(s) does not exist, column headers will be written to the file first and then data will be appended. If the target file(s) does exist, only data will be appended.
If importing from a CSV into a non-existing table in Kinetica, the table will be created. The database will infer column base types and null values depending on the data being imported.
KIO only supports wildcard ingestion of multiple CSVs (/file/path/*.csv) using the Spark engine
KIO only supports hidden CSV file ingestion (/file/path/.<file-name>.csv) using the Odo engine
If importing from a CSV into an existing table in Kinetica, the first row of the CSV file must contain a delimiter-separated (based on the user-specified delimiter) list of the column names in the order they appear in the existing table.

Warning

If the first row of the CSV file is a list of column names that don't match with the existing table, KIO will attempt to insert that row as data instead, which could result in import errors.

Shapefiles¶

Imports from a shapefile or exports to another shapefile. For help with command line usage, provide the --show-usage parameter with the shapefile option.

Tip

Shapefiles can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage¶

URI Parameters:

Parameter	Description
`/file/path/filename.shapefile`	Absolute filepath to a shapefile

Note

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

Limitations and Cautions¶

The URI must reference the .shapefile file, which maps to a main file (.shp) and sometimes an index file (.shx) and a dBASE file (.dbf). KIO requires that at least the .shapefile and .shp files be available. These files (and any other files) must be in the same directory and have the same prefix to be recognized by KIO, e.g., usa-coastline.shapefile, usa-coastline.shp, etc.
KIO currently does not support referencing shapefiles using relative file paths.
Kinetica decimal fields are fixed precision and scale, so importing a shapefile that references numeric fields into Kinetica will be lossy
KIO may truncate dBASE record values that are more than 256 characters long
KIO supports wildcard ingestion of shapefiles (/file/path/*.shapefile) into the same destination data format provided that the schema for all files is identical, i.e. column names don't have to match but the rows of data therein must be of the same type. If a file's schema does not match the other files being ingested, the file will be ignored and the ingestion will continue. If the column names do not match, the first file ingested will determine the column names.
Shapefiles can only be ingested using the Odo engine

Transferring files via S3¶

You can also import from or export to supported file formats in an S3 bucket. For help with command line usage, provide the --show-usage parameter with the s3 option. AWS S3 in KIO isn't necessarily a separate format but rather a way to access an S3 bucket for file transfers; all associated parameters, options, and limitations apply when using a supported file format within an S3 URI. The following file formats are supported with S3 KIO operations:

CSV
Apache Parquet
Apache Parquet Dataset
Shapefile

Usage¶

Base S3 URI:

s3://<bucket name>/<file path>

Note

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter	Description
`<bucket name>`	The unique name given to the S3 bucket in your AWS account.
`<file path>`	The file path to the desired file(s) contained in the specified S3 bucket.

Command Line Options:

Options	Description
`--aws-access-key-id <key>`	Sets the AWS access key ID for an AWS user with access permissions to both the source and destination (if both are AWS resources).
`--aws-secret-access-key <key>`	Sets the AWS secret access key for an AWS user with access permissions to both the source and destination (if both are AWS resources).

Important

If the S3 bucket has public access, the access key and secret access key are not required.

Configuration¶

If not using the access key ID and secret access key command line options, your machine can be configured for direct AWS S3 access. Once you've created your AWS access key ID and AWS secret access key:

Install the official AWS CLI on your machine using pip:
```
pip install awscli
```

Add your AWS account credentials to the machine. You can find the default region name in the URL for your AWS console; the default output format can be left as the default (None):

$ aws configure
AWS Access Key ID [None]: <access-key-id>
AWS Secret Access Key [None]: <secret-access-key>
Default region name [None]: <region-name>
Default output format [None]:

Verify the .aws directory is available in your root directory:

$ ls -al ~/
total 500
dr-xr-x---.  10 root root   4096 Jun 23 12:25 .
dr-xr-xr-x.  17 root root   4096 Jun 23 11:49 ..
drwxr-xr-x    2 root root     39 Jun 23 12:25 .aws
...

Verify you can read the contents of your AWS S3 bucket:
```
aws s3 ls <your-bucket-name>
```

Limitations and Cautions¶

If the file does not exist when exporting to S3, it will be created; if the file does exist, it will be overwritten.
KIO supports wildcard ingestion of files on AWS S3 (e.g., s3://my-s3-bucket/*.csv, s3://my-s3-bucket/*.shapefile) into the same destination data format provided that the schema for all files is identical, i.e. column names don't have to match but the rows of data therein must be of the same type. If a file's schema does not match the other files being ingested, the file will be ignored and the ingestion will continue. If the column names do not match, the first file ingested will determine the column names

Important

If a prefix directory is provided, e.g., s3://my-s3-bucket/test/dir/*.csv, it will be recursively searched, and any .csv files found will being copied to the destination.
The AWS credentials passed in via command line must have permissions to both the source and destination resources. KIO does not support separate AWS credentials for the source and destination resources.

Examples¶

Importing from a CSV file into the taxi_trip_data table in Kinetica, adding a property, and overriding one base type:

./kio -s /home/john_smith/files/taxi_trip_data.csv -d kinetica://::taxi_trip_data --d-property-additions '{"vendor_id": ["text_search"]}' --d-base-type-overrides '{"trip_distance": "double"}'

Exporting from the flights table in Kinetica to a CSV file on the local machine:

./kio -s kinetica://localhost:9191::flights -d /home/jane_smith/flights_data.csv

Exporting from PostgreSQL into Kinetica and overriding a property:

./kio -s postgresql://jsmith1:password123@172.123.45.67:5432/MASTER::nyctaxi -d 'kinetica://jsmith1:password234$@172.123.45.67:9191::taxi_data' --d-property-overrides '{"passenger_count": ["int16"]}'

Importing from CSV files in the specified S3 directory into Kinetica:

./kio -s 's3://my-s3-bucket/csv/*.csv' -d kinetica://::my_test_data

Exporting from the taxi_trip_data table in Kinetica to a CSV file in an S3 bucket:

./kio -s kinetica://::taxi_trip_data -d s3://my-s3-bucket/taxi_trip_data.csv

Importing from a shapefile into Kinetica:

./kio -s /tmp/data/usa-coastline.shapefile -d kinetica://::usa_coastline_data

Importing from taxi_trip_data, a Parquet Dataset directory, in an AWS S3 bucket to a table in a collection in Kinetica:

./kio -s s3://data-bucket/taxi_trip_data.parquetdataset -d kinetica://::taxi_trip_data --d-collection-name my_collection

Importing from a local CSV file into Kinetica securely:

./kio -s /home/my-data.csv -d kinetica://jsmith:password234@172.123.45.67:8082/gpudb-0::my_data_table --d-https

Importing from a PostGIS-enabled PostgreSQL database into Kinetica:

./kio -s 'postgresql://postgres:password123@172.123.45.67:5432/postgres::roads' -d 'kinetica://::roads' --d-property-additions '{"roads_geom": ["wkt"]}' --gis-ext-enabled --engine spark

Importing from Microsoft SQL Server into Kinetica:

./kio -s sqlserver://jsmith:temp1234@172.123.45.67:1433/gpudb::nyctaxi -d kinetica://::nyctaxi

Importing from Oracle into Kinetica:

./kio -s  oracle://jsmith:temp1234@172.123.45.67:1521/orcl::nyctaxi -d kinetica://::nyctaxi

Importing from Sybase IQ into Kinetica:

./kio -s sybaseiq://jsmith:sql123@172.123.45.67:2638/kindata::nyctaxi -d kinetica://::nyctaxi

Importing from Teradata into Kinetica:

./kio -s teradata://dbcadmin:password123@172.123.45.67:1025/gpudb::nyctaxi -d kinetica://::nyctaxi

Database Compatibility¶

KIO is distributed with each Kinetica database installation and will always be compatible with its local database instance.

Additionally, KIO v6.2 is forward-compatible with Kinetica v7.0 and can be used to transfer data between the two.

Note

KIO v7.0 is not backward-compatible with Kinetica versions prior to v7.0. If migrating data from a Kinetica v6.2 database to a Kinetica v7.0 database, the KIO on the Kinetica v6.2 host should be used to initiate the transfer.

Table Of Contents

Kinetica Input/Output (KIO)¶

Command Line Usage¶

Format Types¶

Engine Compatibility¶

Import (into Kinetica)¶

Export (out of Kinetica)¶

Inferred Types¶

Database Formats¶

Kinetica¶

Usage¶

Limitations and Cautions¶

Microsoft SQL Server / Spatial Data (GIS) SQL Server¶

Usage¶

Limitations and Cautions¶

Oracle¶

Usage¶

Limitations and Cautions¶

PostgreSQL / PostGIS¶

Usage¶

Limitations and Cautions¶

Sybase IQ¶

Usage¶

Limitations and Cautions¶

Teradata¶

Usage¶

Limitations and Cautions¶

File Formats¶

Apache ORC¶

Usage¶

Limitations and Cautions¶

Apache Parquet¶

Usage¶

Limitations and Cautions¶

Apache Parquet Dataset¶

Usage¶

Limitations and Cautions¶

CSV¶

Usage¶

Limitations and Cautions¶

Shapefiles¶

Usage¶

Limitations and Cautions¶

Transferring files via S3¶

Usage¶

Configuration¶

Limitations and Cautions¶

Examples¶

Database Compatibility¶