Version:

Kinetica Input/Output (KIO)

KIO provides the ability to import data to and export data from Kinetica; it comes pre-installed and ready to use. KIO can transfer data between external sources directly, bypassing Kinetica if necessary. A simpler, GUI version of KIO is available in the the KIO Tool section of GAdmin.

Command Line Usage

KIO can be accessed via command line from /opt/gpudb/bin/kio

Available Options:

Parameter Description
-h, --help Displays the help menu
-v, --version Displays the version
-s <uri>, --source <uri> The source URI. Use --show-usage to see more information about the URI types for supported data formats.
-d <uri>, --destination <uri> The destination URI. Use --show-usage to see more information about the URI types for supported data formats.
--show-usage <format> Displays detailed usage information for the given format. See the list of formats for more information about the format and format-specific options.
--failure-mode <mode>

Change the behavior of KIO upon encountering data parsing errors. There are two modes:

  • graceful (default) -- Continue ingest when parsing errors occur (only available for csv, kinetica, parquetdataset, and shapefile formats)
  • fail-fast -- Cease ingest when parsing errors occur
--batch-size <num>

Set the number of records to be fetched at a time from the given source. Maximum of 100000.

When using odo for --engine type (see below), the --batch-size option is only supported for the following formats:

  • kinetica (as a source or destination)
  • csv (source only)

When using spark for --engine type, the --batch-size option is supported for all formats.

--engine <type>

Change the underlying engine used to ingest records from a given source into Kinetica. The following engines are available:

  • odo -- Ingest/egress with Odo, a Python-based Extract, Transform, and Load (ETL) engine.
  • spark (default) -- Ingest/egress with Apache Spark and the Kinetica Spark Connector. The Spark engine is generally faster for both read and write operations.
--spark-network-timeout <seconds> Set a network timeout in seconds. This is useful for regulating Spark executor heartbeat failures. Default is 800 seconds.
--spark-driver-memory <memory> Set the Spark driver memory usage limit. Default is 2 GB.
--spark-executor-memory <memory> Set the Spark executor memory usage limit. Default is 2 GB.
--spark-offheap-memory <memory> Set the Spark offheap memory usage limit. Default is 4 GB.
--log-level <level>

Set the log level for the KIO transaction. Valid log levels:

  • CRITICAL
  • ERROR
  • WARNING
  • INFO (default)
  • DEBUG

Tip

The KIO_LOGLEVEL environment variable can be set to avoid specifying the log level for each KIO transaction.

--log-file <file-path>

Set the file to write logs to for the KIO transaction. If the file does not exist, it will be created. If the file does exist, logs will be appended to the file. By default, all logs are directed to standard output.

Tip

The KIO_LOGFILE environment variable can be set to avoid specifying the log file for each KIO transaction.

--d-column-properties <list>

Enables overriding destination column properties for Kinetica to PostgreSQL transfers. The column properties list is passed in as a stringified JSON list of dictionaries, e.g.:

[{"column_name1": "<name>", "column_base_type": "<type>"}, {"column_name2" ...}, ...]

Note

Columns of string|wkt and bytes|wkt in Kinetica can convert to geometry, geography, and bytea in PostgreSQL to support PostGIS.

Format Types

The following list details the available formats for each type.

Database formats:

  • Kinetica
  • Microsoft SQL Server
  • Oracle
  • PostgreSQL / PostGIS
  • Teradata
  • Sybase IQ

File formats:

  • Apache ORC
  • Apache Parquet
  • Apache Parquet Dataset
  • CSV
  • Shapefile

Engine Compatibility

The following sections detail which engines can support transferring data to (import) or from (export) Kinetica.

Import (into Kinetica)

Important

Both the Spark and Odo engines can import from other Kinetica tables.

The Spark engine can transfer data from the following data format types to Kinetica:

  • Apache Parquet (AWS S3 or local)
  • CSV (AWS S3 or local)
  • Microsoft SQL Server
  • Oracle
  • PostgreSQL
  • Sybase IQ
  • Teradata

The Odo engine can transfer data from the following data format types to Kinetica:

  • Apache ORC
  • Apache Parquet Dataset (AWS S3 or local)
  • CSV (AWS S3 or local)
  • Shapefile

Export (out of Kinetica)

Important

Both the Spark and Odo engines can export to a Kinetica table.

The Spark engine can transfer data from Kinetica to the following data format types:

  • PostgreSQL

The Odo engine can transfer data from Kinetica to the following data format types:

  • Apache ORC

  • Apache Parquet (local)

    Important

    The source data cannot be transferred to a local parquet file if the data contains bytes or timestamp data

  • CSV (AWS S3, local)

  • Microsoft SQL Server

  • Oracle

  • PostgreSQL

  • Shapefile (AWS S3, local)

Inferred Types

Because the supported formats can differ drastically from one another, KIO has to normalize the types of data being input so it can be output to the destination format appropriately. The table below describes the type inferred by KIO from each source format.

Source Source Type KIO Inferred Type
CSV Numerical value without decimal places, e.g., 123456 int64
Numerical value with decimal places, e.g., 123.456 float64
Value containing string characters or symbols (A-Z, a-z, -, etc.) string
Time value, e.g., 23:45:15.848 datetime
Date (2008-04-15) or datetime (2009-04-16 21:00:15.325) value string
Empty or null (or null string) value string

Database Formats

Kinetica

Imports from or exports to a table in Kinetica. Copying from one table to another is possible. For help with command line usage, provide the --show-usage parameter with the kinetica option.

Important

If graceful failure mode is enabled, KIO will attempt to sanitize column names to fit within the Kinetica column name restrictions that could prevent proper ingest.

Usage

Base Kinetica URI:

kinetica://[[<username:password>]@][<host>:<port>]<::table name>

Note

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

Important

If attempting to connect to Kinetica securely, the trusted root certificate must be in the OpenSSL trust store. Also, the port portion of the URI will need to reference the HTTPD proxy port/rank for Kinetica, e.g., localhost:8082/gpudb-0 (the default rank is gpubd-0). See Security Configuration for more details on configuring Kinetica securely.

URI Parameters:

Parameter Description
username The username used for authentication. Default is blank
password The password used for authentication. Default is blank
host The host to connect to. Default is localhost
port

The port to connect to. Default is 9191

Note

KIO supports connecting to Kinetica via HTTPD proxy (if enabled).

table name The table in the database to import into or export from.

Command Line Options:

Note

For more information on the types of engines, see Command Line Usage.

Options Engine Availability Description
--d-base-type-overrides '<type_definition>' Both

Override default base types found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes "" used for keys/values and single quotes '' used for the parameter value itself, e.g.,

--d-base-type-overrides '{"column_name": "int"}'

Values should be a string for a given key, as there is a one-to-one mapping between a column and type. To override the base types for multiple columns, add another key/value pair

--d-base-type-overrides '{"column_name": "string", "column_name2": "float"}'

Note

The Kinetica Spark Connector infers charN sub-types for string type columns by default. Apply the string base type override to avoid charN inferencing.

--d-property-additions '<prop_definition>' Both

Properties added to the properties found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes "" used for keys/values being overridden and single quotes '' used for the parameter value itself, e.g.,

--d-property-additions '{"column_name": ["data", "int16"]}'

Values should be a list of strings for a given key, as there can be multiple property additions for a column. To override the properties for multiple columns, add another key/value pair

--d-property-additions '{"column_name": ["char4"], "column_name2": ["data", "int16"]}'

All the properties listed under Column Properties are supported when using odo for --engine type.

The properties listed below are supported when using spark for --engine type:

Note

The Kinetica Spark Connector infers charN sub-types for string type columns by default. Apply the varchar property addition (Spark engine only) to avoid charN inferencing.

--d-property-overrides '<prop_definition>' Both

Override default properties found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes "" used for keys/values being overridden and single quotes '' used for the parameter value itself, e.g.,

--d-property-overrides '{"column_name": ["data", "int16"]}'

Values should be a list of strings for a given key, as there can be multiple property additions for a column. To override the properties for multiple columns, add another key/value pair

--d-property-overrides '{"column_name": ["char4"], "column_name2": ["data", "int16"]}'

Note

The Kinetica Spark Connector assumes all columns are nullable by default, so using the Spark engine will automatically apply the nullable property to all columns. The Odo engine assumes all columns are not nullable by default.

--d-dry-run odo

Generates a stringified JSON schema of the specified Kinetica destination table based on the input source. The schema includes the column's name, column's type, any properties, and whether the column is nullable. Both a source and destination must be specified, but note that no data is transferred/appended and non-existing tables will not be created when this flag is present. Returned schema is a list of Python dictionaries, e.g.,

[{"is_nullable": false, "column_properties":["data", "int16"], "column_type":"int", "column_name":"a"}]
--s-https spark

Generates a connection using HTTPS instead of the default HTTP to the specified Kinetica source table.

Note

An HTTPS connection requires that the appropriate truststore and password be passed as well. Review the --s-ssl-httpd-truststore[-password] parameters below for more information

--d-https spark

Generates a connection using HTTPS instead of the default HTTP to the specified Kinetica destination table.

Note

An HTTPS connection requires that the appropriate truststore and password be passed as well. Review the --d-ssl-httpd-truststore[-password] parameters below for more information

--d-collection-name Both Specifies the collection in which the specified Kinetica destination table should be placed. If the collection does not exist, it will be created. If no collection is specified, the destination table will be placed in the root collection. If the destination table already exists, this option will NOT place the existing table into the specified collection or move the table from its existing collection to the specified collection.
--d-is-replicated spark Specifies whether the destination table should be replicated.
--d-update-on-existing-pk spark Specifies the record collision policy for inserting into a table with a primary key If this flag is present, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record. If this flag is not present, any existing table record with primary key values that match those of a record being inserted will remain unchanged and the new record discarded. If the specified table does not have a primary key, then this flag is ignored.
--d-jdbc-url '<jdbc-url>' spark Specifies a particular JDBC URL to connect to Kinetica. The default JDBC URL is 'jdbc:kinetica://<hostname>:9191' where <hostname> is the hostname referenced in Kinetica URI.
--d-spark-timeout-milliseconds spark Set a connection timeout in milliseconds. The default value is 60000 (1 minute).
--d-spark-enable-multihead spark

Enables multi-head ingest.

Important

Replicated tables are not supported with multi-head ingest.

--s-ssl-httpd-truststore /file/path/truststore.jks spark

When Kinetica is the source, filepath to the Java truststore that is then passed to the HTTPD service to verify the SSL certificate that the Kinetica HTTPD server provides.

Note

This option must be passed in conjuction with the --s-ssl-httpd-truststore-password parameter

--d-ssl-httpd-truststore /file/path/truststore.jks spark

When Kinetica is the destination, filepath to the Java truststore that is then passed to the HTTPD service to verify the SSL certificate that the Kinetica HTTPD server provides.

Note

This option must be passed in conjuction with the --d-ssl-httpd-truststore-password parameter

--s-ssl-httpd-truststore-password '<password>' spark When Kinetica is the source, the password for the Java truststore passed with the --s-ssl-httpd-truststore parameter.
--d-ssl-httpd-truststore-password '<password>' spark When Kinetica is the destination, the password for the Jave truststore passed with the --d-ssl-httpd-truststore parameter.

Limitations and Cautions

  • If a table does not exist when importing into Kinetica, the table will be created automatically and:

    • it will be created at the root level
    • foreign keys are not preserved
    • distribution scheme is not preserved (unless using the --d-is-replicated option with spark as the --engine type)-- the table will be randomly sharded by default
  • To remove properties, specify the --d-property-overrides option and list all properties that should be kept while leaving out any property(ies) that should be removed.

  • Overriding a base type will remove its existing properties.

  • If a table is being imported into Kinetica using the Spark engine and its name contains non-alphanumeric special characters, the special characters will be replaced with underscores

  • If importing a GIS-type column(s) with the --gis-ext-enabled flag, the --d-property-additions parameter must be present containing a map of each GIS-type column to the wkt type, e.g.,

    --d-property-additions '{"gis_col1": ["wkt"], "gis_col2": ["wkt"]}'
    

Microsoft SQL Server / Spatial Data (GIS) SQL Server

Imports from a table in Microsoft SQL Server. For help with command line usage, provide the --show-usage parameter with the sqlserver option.

Important

Microsoft SQL Server / GIS SQL Server is only supported when importing from Microsoft SQL Server to Kinetica with the Spark engine.

Usage

Base Microsoft SQL Server URI:

sqlserver://<username>:<password>@<address><:port>/<database_name><::table_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter Description
username The username used for authentication.
password The password used for authentication.
address The IPV4 address to connect to.
port The port to connect to.
database_name The database name.
table_name The table in the database to import from.

Command Line Options:

Option Description
--gis-ext-enabled Allows KIO to recognize a GIS-specific extension is enabled so SQL-native GIS types can be read into memory as wkt strings for proper conversion into Kinetica. This option must be passed in conjunction with the kinetica destination format option --d-property-additions to map the incoming GIS-type column(s) to wkt type in Kinetica (see Examples for more detail).

Limitations and Cautions

  • KIO does not support Java virtual machines (JVMs) with Federal Information Processing Standards (FIPS) enabled.
  • KIO does not support extra properties for Microsoft SQL Server availability groups, including cluster or subnet failover.
  • KIO does not support connecting to Microsoft SQL Server with SSL.

Oracle

Imports from a table in an Oracle database. For help with command line usage, provide the --show-usage parameter with the oracle option.

Important

Oracle is only supported when importing from Oracle to Kinetica with the Spark engine.

Usage

Base Oracle URI:

oracle://<username>:<password>@<address><:port>/<system_id><::schema_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter Description
username The username used for authentication.
password The password used for authentication.
address The IPV4 address to connect to.
port The port to connect to.
system_id The system ID to identify a particular Oracle service.
schema_name The schema in the system to import from.

Limitations and Cautions

  • KIO uses the Oracle XE Service to establish a database connection. The listener.ora and tsnames.ora files must be correctly configured to expose the XE Service.

  • KIO only supports Oracle 12c.

  • When exporting from Oracle to a given destination, column names will be capitalized. When applying properties to a Kinetica destination table's column names, ensure the column names referenced are also capitalized, e.g.,:

    --d-property-additions '{"VENDOR_ID": ["text_search"]}'
    

PostgreSQL / PostGIS

Imports from or exports to a table in PostgreSQL. Copying from one table to another is possible. For help with command line usage, provide the --show-usage parameter with the postgresql option.

Important

PostGIS is only supported when importing from PostgreSQL to Kinetica with the Spark engine.

Usage

Base PostgreSQL URI:

postgresql://<username>:<password>@<host><:port>/<database_name>[::schema_name]<::table_name>

Important

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter Description
username The username used for authentication. Default is the user currently running KIO
password The password used for authentication.
host The host to connect to. Default is localhost
port The port to connect to. Default is 5432
database_name The database name. Default is the value the username parameter is set to
schema_name The schema name. Optional
table_name The table in the database to import into or export from.

Command Line Options:

Option Description
--s-case-sensitive When the source is PostgreSQL and this flag is passed, case sensitivity will be enabled for table, column, and schema names.
--d-case-sensitive When the destination is PostgreSQL and this flag is passed, case sensitivity will be enabled for table, column, and schema names.
--s-postgis-schema <schema-name> The PostGIS schema name for when the source is PostgreSQL. Default value is public.
--d-postgis-schema <schema-name> The PostGIS schema name for when the destination is PostgreSQL. Default value is public.
--gis-ext-enabled Allows KIO to recognize a GIS-specific extension is enabled so SQL-native GIS types can be read into memory as wkt strings for proper conversion into Kinetica. This option must be passed in conjunction with the kinetica destination format option --d-property-additions to map the incoming GIS-type column(s) to wkt type in Kinetica (see Examples for more detail).
--s-batch-iterator-column-name <column-name>

When the source is PostgreSQL, column name for a column that will be used to iterate over data when ingesting into Kinetica. This parameter is required for tables that have more than 10 million records, but tables of any size will receive better ingest performance. The provided column must be an integer (BIGINT, SMALLINT, or INTEGER) column, must not be nullable, and is either SERIAL or UNIQUE. Providing this option enables KIO to use a more performant query to fetch records.

Important

This option is only available for PostgreSQL to Kinetica transfers using the Spark engine.


Limitations and Cautions

  • If a table does not exist when importing into PostgreSQL, the table will be created automatically. If attempting to copy from one table to another in PostgreSQL, the output table must already exist before KIO can copy the source table into it.
  • Case sensitivity is disabled by default. When case sensitivity is disabled, PostgreSQL will query and reference table, column, and schema names by lowercase values.

Sybase IQ

Imports from a table in a Sybase IQ database. For help with command line usage, provide the --show-usage parameter with the sybaseiq option.

Important

Sybase IQ is only supported when importing from Sybase IQ to Kinetica with the Spark engine.

Usage

Base Sybase IQ URI:

sybaseiq://<username>:<password>@<address><:port>/<database_name><::table_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter Description
username The username used for authentication.
password The password used for authentication.
address The host to connect to.
port The port to connect to.
database_name The database name.
table_name The table in the database to import from.

Limitations and Cautions

  • KIO will use the default character set of the connecting database; no conversion between character sets is available.
  • KIO does not define Sybase IQ connection failover behavior.
  • KIO does not define a timeout for client sessions or SQL queries.
  • KIO does not support connecting with Sybase IQ using SSL.

Teradata

Imports from a table in a Teradata database. For help with command line usage, provide the --show-usage parameter with the teradata option.

Some considerations when connecting to Teradata:

  • KIO uses the JDBC 4.0 API specification.
  • KIO connects to the the standard DBC/SQL partition in Teradata.
  • KIO requires that JDK/JRE version 1.3.2, 5, 6, 7, 8, or 9 are installed and reachable on your machine instance in order to connect to Teradata.

Important

Teradata is only supported when importing from Teradata to Kinetica with the Spark engine.

Usage

Base Teradata URI:

teradata://<username>:<password>@<address><:port>/<database_name><::table_name>

Important

All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter Description
username The username used for authentication.
password The password used for authentication.
host The host to connect to.
port The port to connect to.
database_name The database name.
table_name The table in the database to import from. The name must exactly match; predicate matching is not supported.

Limitations and Cautions

  • KIO only supports the UTF-8 character set (CHARSET) when connecting with Teradata.
  • KIO does not support diagnostics or troubleshooting support via DDSTATS.
  • KIO does not support data encryption between the Teradata JDBC driver and the Teradata database using ENCRYPTDATA.
  • KIO does not support additional fractional digits.
  • KIO does not support modifying users and authentication credentials using NEW_PASSWORD.
  • KIO only supports Teradata versions 14.00, 14.10, 15.00, 15.10, 16.10, and 16.20.

File Formats

Apache ORC

Imports from an ORC file; exporting to ORC via KIO is currently not supported. For help with command line usage, provide the --show-usage parameter with the orc option.

Usage

URI Parameters:

Parameter Description
/file/path/filename.orc Absolute filepath to an ORC file

Important

Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.

Limitations and Cautions

  • The following Apache ORC types are not currently supported by KIO when Kinetica is the destination:
    • struct
    • list
    • map
    • union
  • KIO currently does not support referencing ORC files using relative file paths.

Apache Parquet

Imports from or exports to a Parquet file; however, exporting to Parquet files in AWS S3 via KIO is currently not supported. Compressed or partitioned Parquet files are also supported. For help with command line usage, provide the --show-usage parameter with the parquet option.

Important

If graceful failure mode is enabled, KIO will not validate the schema of the dataset until time of ingest for a particular partition at which point the partition will be skipped if its schema is invalid.

Tip

Apache Parquet files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage

URI Parameters:

Parameter Description
/file/path/filename.parquet Absolute filepath to a Parquet file

Important

Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.

Command Line Options:

Options Description
--s-recursive

Recursively search a directory for parquet files instead of only searching the top level of the directory specified by the file path parameter. For example, given the directory below:

/path/to/directory/
├── example1.parquet
├── example2.parquet
├── subdirectory/
│   ├── example4.parquet
│   └── example5.parquet
└── example3.parquet

Specifying --source of /path/to/directory/example*.parquet without specifying --s-recursive will register example1.parquet, example2.parquet, and example3.parquet. Specifying --s-recursive will register all files, including example4.parquet and example5.parquet

Limitations and Cautions

  • KIO currently does not support referencing Parquet files using relative file paths.
  • If the file does not exist when exporting to Parquet, it will be created. If the file does exist, it will be overwritten.
  • KIO only supports hidden Parquet file ingestion (/file/path/.<file-name>.parquet) using the Odo engine

Apache Parquet Dataset

Imports from a Parquet Dataset; exporting to Parquet Datasets via KIO is currently not supported. For help with command line usage, provide the --show-usage parameter with the parquetdataset option.

Tip

Apache Parquet Dataset files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage

URI Parameters:

Parameter Description
/file/path/directory.parquetdataset Absolute path to a Parquet Dataset directory

Limitations and Cautions

General:

  • KIO currently does not support referencing Parquet Datasets using relative directory paths.
  • KIO will always attempt to ingest the Parquet dataset as a nested directory of partitioned Parquet files. If ingesting a list of absolute file paths to Parquet files, move all the files into a single directory and use /file/path/*.parquet as the source URI instead to ingest the files as chunked Parquet files.
  • KIO currently does not support inferring the types of partition keys, and as such, the keys are cast as strings. If ingesting the dataset into Kinetica, utilize --d-property-additions, --d-property-overrides, and --d-base-type-overrides to establish the data's types as desired. Refer to Kinetica for more information.
  • KIO currently does not support reading in specific columns/partition keys from the Parquet Dataset.

AWS S3 Apache Parquet Dataset:

  • KIO assumes that Parquet Datasets are not S3 buckets but rather a subdirectory (or subdirectories) within an S3 bucket
  • Interacting with Apache Parquet Datasets in an S3 bucket is a Python 3-specific feature. Note that Python 3 is packaged with KIO by default

CSV

Imports from or exports to a CSV file. For help with command line usage, provide the --show-usage parameter with the csv option.

Tip

CSV files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage

URI Parameters:

Parameter Description
/file/path/filename.csv Absolute or relative filepath to a CSV file

Important

Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.

Command Line Options:

Options Description
--d-delimiter <delimiter> Set the delimiter used in the file when csv is the destination format. Any alphanumeric character or symbol is a valid delimiter; default is ,.
--s-delimiter <delimiter>

Set the delimiter used in the file when csv is the source format.

Important

The option must match the delimiter in the file or the import will fail.

--d-escape-character <character> Set the character used to escape error-prone characters, e.g., ,, "", etc., when csv is the destination format.
--s-escape-character <character>

Set the character used to escape error-prone characters, e.g., ,, "", etc., used in the source csv file.

Important

The option must match the escape character used in the file or the import will fail.

--s-custom-null-value <value>

Set the null value literal when csv is the source format. Any string value is supported.

Important

This option is only available when using the Spark engine.

--s-custom-quote-value <value>

Set the quote symbol that will replace default double quotes (") when enclosing text fields for data transfers where csv is the source format. Any string value is supported.

Important

This option is only available when using the Spark engine.

--s-custom-comment-value <value>

Set the comment symbol when csv is the source format. Lines starting with the provided comment symbol are not parsed or ingested. Any string value is supported, but only one character is supported, e.g., #, -.

Important

This option is only available when using the Spark engine.

--s-headerless Specifies if the source CSV file is headerless, e.g., the first row in the CSV file is data instead of column headers. By default, Kinetica assumes all source CSV files have headers as the first row.
--dynamically-infer-wkt-subtypes

Dynamically infer whether columns with base type string should have the wkt property applied during ingestion.

Important

This option is only available for CSV to Kinetica transfers using the Odo engine.

--s-drop-malformed-rows

Indicates rows of a source CSV file will not be inserted if either of the following is true:

  • The number of fields in the row does not match the number of fields expected by the type schema
  • One or more field(s') type does not match the existing table's type schema

Important

This option is only available when using the Spark engine and when --failure-mode is set to graceful.


Limitations and Cautions

  • If CSV is the destination and the target file(s) does not exist, column headers will be written to the file first and then data will be appended. If the target file(s) does exist, only data will be appended.

  • If importing from a CSV into a non-existing table in Kinetica, the table will be created. The database will infer column base types and null values depending on the data being imported.

  • KIO only supports wildcard ingestion of multiple CSVs (/file/path/*.csv) using the Spark engine

  • KIO only supports hidden CSV file ingestion (/file/path/.<file-name>.csv) using the Odo engine

  • If importing from a CSV into an existing table in Kinetica, the first row of the CSV file must contain a delimiter-separated (based on the user-specified delimiter) list of the column names in the order they appear in the existing table.

    Warning

    If the first row of the CSV file is a list of column names that don't match with the existing table, KIO will attempt to insert that row as data instead, which could result in import errors.


Shapefiles

Imports from a shapefile or exports to another shapefile. For help with command line usage, provide the --show-usage parameter with the shapefile option.

Tip

Shapefiles can also transferred to/from an S3 bucket. See Transferring files via S3 for details.

Usage

URI Parameters:

Parameter Description
/file/path/filename.shapefile Absolute filepath to a shapefile

Note

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

Limitations and Cautions

  • The URI must reference the .shapefile file, which maps to a main file (.shp) and sometimes an index file (.shx) and a dBASE file (.dbf). KIO requires that at least the .shapefile and .shp files be available. These files (and any other files) must be in the same directory and have the same prefix to be recognized by KIO, e.g., usa-coastline.shapefile, usa-coastline.shp, etc.
  • KIO currently does not support referencing shapefiles using relative file paths.
  • Kinetica decimal fields are fixed precision and scale, so importing a shapefile that references numeric fields into Kinetica will be lossy
  • KIO may truncate dBASE record values that are more than 256 characters long
  • KIO supports wildcard ingestion of shapefiles (/file/path/*.shapefile) into the same destination data format provided that the schema for all files is identical, i.e. column names don't have to match but the rows of data therein must be of the same type. If a file's schema does not match the other files being ingested, the file will be ignored and the ingestion will continue. If the column names do not match, the first file ingested will determine the column names.
  • Shapefiles can only be ingested using the Odo engine

Transferring files via S3

You can also import from or export to supported file formats in an S3 bucket. For help with command line usage, provide the --show-usage parameter with the s3 option. AWS S3 in KIO isn't necessarily a separate format but rather a way to access an S3 bucket for file transfers; all associated parameters, options, and limitations apply when using a supported file format within an S3 URI. The following file formats are supported with S3 KIO operations:

  • CSV
  • Apache Parquet
  • Apache Parquet Dataset
  • Shapefile

Usage

Base S3 URI:

s3://<bucket name>/<file path>

Note

The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt

URI Parameters:

Parameter Description
<bucket name> The unique name given to the S3 bucket in your AWS account.
<file path> The file path to the desired file(s) contained in the specified S3 bucket.

Command Line Options:

Options Description
--aws-access-key-id <key> Sets the AWS access key ID for an AWS user with access permissions to both the source and destination (if both are AWS resources).
--aws-secret-access-key <key> Sets the AWS secret access key for an AWS user with access permissions to both the source and destination (if both are AWS resources).

Important

If the S3 bucket has public access, the access key and secret access key are not required.

Configuration

If not using the access key ID and secret access key command line options, your machine can be configured for direct AWS S3 access. Once you've created your AWS access key ID and AWS secret access key:

  1. Install the official AWS CLI on your machine using pip:

    pip install awscli
    
  2. Add your AWS account credentials to the machine. You can find the default region name in the URL for your AWS console; the default output format can be left as the default (None):

    $ aws configure
    AWS Access Key ID [None]: <access-key-id>
    AWS Secret Access Key [None]: <secret-access-key>
    Default region name [None]: <region-name>
    Default output format [None]:
    
  3. Verify the .aws directory is available in your root directory:

    $ ls -al ~/
    total 500
    dr-xr-x---.  10 root root   4096 Jun 23 12:25 .
    dr-xr-xr-x.  17 root root   4096 Jun 23 11:49 ..
    drwxr-xr-x    2 root root     39 Jun 23 12:25 .aws
    ...
    
  4. Verify you can read the contents of your AWS S3 bucket:

    aws s3 ls <your-bucket-name>
    

Limitations and Cautions

  • If the file does not exist when exporting to S3, it will be created; if the file does exist, it will be overwritten.

  • KIO supports wildcard ingestion of files on AWS S3 (e.g., s3://my-s3-bucket/*.csv, s3://my-s3-bucket/*.shapefile) into the same destination data format provided that the schema for all files is identical, i.e. column names don't have to match but the rows of data therein must be of the same type. If a file's schema does not match the other files being ingested, the file will be ignored and the ingestion will continue. If the column names do not match, the first file ingested will determine the column names

    Important

    If a prefix directory is provided, e.g., s3://my-s3-bucket/test/dir/*.csv, it will be recursively searched, and any .csv files found will being copied to the destination.

  • The AWS credentials passed in via command line must have permissions to both the source and destination resources. KIO does not support separate AWS credentials for the source and destination resources.

Examples

Importing from a CSV file into the taxi_trip_data table in Kinetica, adding a property, and overriding one base type:

./kio -s /home/john_smith/files/taxi_trip_data.csv -d kinetica://::taxi_trip_data --d-property-additions '{"vendor_id": ["text_search"]}' --d-base-type-overrides '{"trip_distance": "double"}'

Exporting from the flights table in Kinetica to a CSV file on the local machine:

./kio -s kinetica://localhost:9191::flights -d /home/jane_smith/flights_data.csv

Exporting from PostgreSQL into Kinetica and overriding a property:

./kio -s postgresql://jsmith1:password123@172.123.45.67:5432/MASTER::nyctaxi -d 'kinetica://jsmith1:password234$@172.123.45.67:9191::taxi_data' --d-property-overrides '{"passenger_count": ["int16"]}'

Importing from CSV files in the specified S3 directory into Kinetica:

./kio -s 's3://my-s3-bucket/csv/*.csv' -d kinetica://::my_test_data

Exporting from the taxi_trip_data table in Kinetica to a CSV file in an S3 bucket:

./kio -s kinetica://::taxi_trip_data -d s3://my-s3-bucket/taxi_trip_data.csv

Importing from a shapefile into Kinetica:

./kio -s /tmp/data/usa-coastline.shapefile -d kinetica://::usa_coastline_data

Importing from taxi_trip_data, a Parquet Dataset directory, in an AWS S3 bucket to a table in a collection in Kinetica:

./kio -s s3://data-bucket/taxi_trip_data.parquetdataset -d kinetica://::taxi_trip_data --d-collection-name my_collection

Importing from a local CSV file into Kinetica securely:

./kio -s /home/my-data.csv -d kinetica://jsmith:password234@172.123.45.67:8082/gpudb-0::my_data_table --d-https

Importing from a PostGIS-enabled PostgreSQL database into Kinetica:

./kio -s 'postgresql://postgres:password123@172.123.45.67:5432/postgres::roads' -d 'kinetica://::roads' --d-property-additions '{"roads_geom": ["wkt"]}' --gis-ext-enabled --engine spark

Importing from Microsoft SQL Server into Kinetica:

./kio -s sqlserver://jsmith:temp1234@172.123.45.67:1433/gpudb::nyctaxi -d kinetica://::nyctaxi

Importing from Oracle into Kinetica:

./kio -s  oracle://jsmith:temp1234@172.123.45.67:1521/orcl::nyctaxi -d kinetica://::nyctaxi

Importing from Sybase IQ into Kinetica:

./kio -s sybaseiq://jsmith:sql123@172.123.45.67:2638/kindata::nyctaxi -d kinetica://::nyctaxi

Importing from Teradata into Kinetica:

./kio -s teradata://dbcadmin:password123@172.123.45.67:1025/gpudb::nyctaxi -d kinetica://::nyctaxi

Database Compatibility

KIO is distributed with each Kinetica database installation and will always be compatible with its local database instance.

Additionally, KIO v6.2 is forward-compatible with Kinetica v7.0 and can be used to transfer data between the two.

Note

KIO v7.0 is not backward-compatible with Kinetica versions prior to v7.0. If migrating data from a Kinetica v6.2 database to a Kinetica v7.0 database, the KIO on the Kinetica v6.2 host should be used to initiate the transfer.