KIO provides the ability to import data to and export data from Kinetica; it comes pre-installed and ready to use. KIO can transfer data between external sources directly, bypassing Kinetica if necessary. A simpler, GUI version of KIO is available in the the KIO Tool section of GAdmin.
Database Compatibility
KIO is distributed with each Kinetica database installation and will always be compatible with its local database instance.
Additionally, KIO v7.0 is forward-compatible with Kinetica v7.1 and can be used to transfer data between the two.
Note
KIO v7.1 is not backward-compatible with Kinetica versions prior to v7.1. If migrating data from a Kinetica v7.0 database to a Kinetica v7.1 database, the KIO on the Kinetica v7.0 host should be used to initiate the transfer.
Command Line Usage
KIO can be accessed via command line from /opt/gpudb/bin/kio
Available Options:
Parameter | Description |
---|---|
-h, --help | Displays the help menu |
-v, --version | Displays the version |
-s <uri>, --source <uri> | The source URI. Use --show-usage to see more information about the URI types for supported data formats. |
-d <uri>, --destination <uri> | The destination URI. Use --show-usage to see more information about the URI types for supported data formats. |
--show-usage <format> | Displays detailed usage information for the given format. See the list of formats for more information about the format and format-specific options. |
--failure-mode <mode> | Change the behavior of KIO upon encountering data parsing errors. There are two modes:
|
--batch-size <num> | Set the number of records to be fetched at a time from the given source. Maximum of 100000. When using odo for --engine type (see below), the --batch-size option is only supported for the following formats:
When using spark for --engine type, the --batch-size option is supported for all formats. |
--engine <type> | Change the underlying engine used to ingest records from a given source into Kinetica. The following engines are available:
|
--spark-network-timeout <seconds> | Set a network timeout in seconds. This is useful for regulating Spark executor heartbeat failures. Default is 800 seconds. |
--spark-driver-memory <memory> | Set the Spark driver memory usage limit. Default is 2 GB. |
--spark-executor-memory <memory> | Set the Spark executor memory usage limit. Default is 2 GB. |
--spark-offheap-memory <memory> | Set the Spark offheap memory usage limit. Default is 4 GB. |
--log-level <level> | Set the log level for the KIO transaction. Valid log levels:
Tip The KIO_LOGLEVEL environment variable can be set to avoid specifying the log level for each KIO transaction. |
--log-file <file-path> | Set the file to write logs to for the KIO transaction. If the file does not exist, it will be created. If the file does exist, logs will be appended to the file. By default, all logs are directed to standard output. Tip The KIO_LOGFILE environment variable can be set to avoid specifying the log file for each KIO transaction. |
--d-column-properties <list> | Enables overriding destination column properties for Kinetica to PostgreSQL transfers. The column properties list is passed in as a stringified JSON list of dictionaries, e.g.: [{"column_name1": "<name>", "column_base_type": "<type>"}, {"column_name2" ...}, ...] Note Columns of string|wkt and bytes|wkt in Kinetica can convert to geometry, geography, and bytea in PostgreSQL to support PostGIS. |
Format Types
The following list details the available formats for each type.
Database formats:
- Kinetica
- Microsoft SQL Server
- Oracle
- PostgreSQL / PostGIS
- Teradata
- Sybase IQ
File formats:
- Apache ORC
- Apache Parquet
- Apache Parquet Dataset
- CSV
- Shapefile
Engine Compatibility
The following sections detail which engines can support transferring data to (import) or from (export) Kinetica.
Import (into Kinetica)
Important
Both the Spark and Odo engines can import from other Kinetica tables.
The Spark engine can transfer data from the following data format types to Kinetica:
- Apache Parquet (AWS S3 or local)
- CSV (AWS S3 or local)
- Microsoft SQL Server
- Oracle
- PostgreSQL
- Sybase IQ
- Teradata
The Odo engine can transfer data from the following data format types to Kinetica:
- Apache ORC
- Apache Parquet Dataset (AWS S3 or local)
- CSV (AWS S3 or local)
- Shapefile
Export (out of Kinetica)
Important
Both the Spark and Odo engines can export to a Kinetica table.
The Spark engine can transfer data from Kinetica to the following data format types:
- PostgreSQL
The Odo engine can transfer data from Kinetica to the following data format types:
Apache ORC
Apache Parquet (local)
Important
The source data cannot be transferred to a local parquet file if the data contains bytes or timestamp data
CSV (AWS S3, local)
Microsoft SQL Server
Oracle
PostgreSQL
Shapefile (AWS S3, local)
Inferred Types
Because the supported formats can differ drastically from one another, KIO has to normalize the types of data being input so it can be output to the destination format appropriately. The table below describes the type inferred by KIO from each source format.
Source | Source Type | KIO Inferred Type |
---|---|---|
CSV | Numerical value without decimal places, e.g., 123456 | int64 |
Numerical value with decimal places, e.g., 123.456 | float64 | |
Value containing string characters or symbols (A-Z, a-z, -, etc.) | string | |
Time value, e.g., 23:45:15.848 | datetime | |
Date (2008-04-15) or datetime (2009-04-16 21:00:15.325) value | string | |
Empty or null (or null string) value | string |
Database Formats
Kinetica
Imports from or exports to a table in Kinetica. Copying from one table to another is possible. For help with command line usage, provide the --show-usage parameter with the kinetica option.
Important
If graceful failure mode is enabled, KIO will attempt to sanitize column names to fit within the Kinetica column name restrictions that could prevent proper ingest.
Usage
Base Kinetica URI:
kinetica://[[<username:password>]@][<host>:<port>]::[schema name.]<table name>
Note
The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
Important
If attempting to connect to Kinetica securely, the trusted root certificate must be in the OpenSSL trust store. Also, the port portion of the URI will need to reference the HTTPD proxy port/rank for Kinetica, e.g., localhost:8082/gpudb-0 (the default rank is gpubd-0). See Security Configuration for more details on configuring Kinetica securely.
URI Parameters
Parameter | Description |
---|---|
username | The username used for authentication. Default is blank |
password | The password used for authentication. Default is blank |
host | The host to connect to. Default is localhost |
port | The port to connect to. Default is 9191 Note KIO supports connecting to Kinetica via HTTPD proxy (if enabled). |
schema name | The schema containing the table to import into or export from. Default is the user's default schema Note KIO will create the specified schema if it doesn't exist |
table name | The table in the database to import into or export from. |
Command Line Options
Note
For more information on the types of engines, see Command Line Usage.
Options | Engine Availability | Description |
---|---|---|
--d-base-type-overrides '<type_definition>' | Both | Override default base types found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes "" used for keys/values and single quotes '' used for the parameter value itself, e.g., --d-base-type-overrides '{"column_name": "int"}' Values should be a string for a given key, as there is a one-to-one mapping between a column and type. To override the base types for multiple columns, add another key/value pair --d-base-type-overrides '{"column_name": "string", "column_name2": "float"}' Note The Kinetica Spark Connector infers charN sub-types for string type columns by default. Apply the string base type override to avoid charN inferencing. |
--d-property-additions '<prop_definition>' | Both | Properties added to the properties found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes "" used for keys/values being overridden and single quotes '' used for the parameter value itself, e.g., --d-property-additions '{"column_name": ["data", "int16"]}' Values should be a list of strings for a given key, as there can be multiple property additions for a column. To override the properties for multiple columns, add another key/value pair --d-property-additions '{"column_name": ["char4"], "column_name2": ["data", "int16"]}' All the properties listed under Column Properties are supported when using odo for --engine type. The properties listed below are supported when using spark for --engine type:
Note The Kinetica Spark Connector infers charN sub-types for string type columns by default. Apply the varchar property addition (Spark engine only) to avoid charN inferencing. |
--d-property-overrides '<prop_definition>' | Both | Override default properties found in the source when the destination is Kinetica. Only applicable when importing data into a non-existing table. Values should be stringified JSON objects, with double quotes "" used for keys/values being overridden and single quotes '' used for the parameter value itself, e.g., --d-property-overrides '{"column_name": ["data", "int16"]}' Values should be a list of strings for a given key, as there can be multiple property additions for a column. To override the properties for multiple columns, add another key/value pair --d-property-overrides '{"column_name": ["char4"], "column_name2": ["data", "int16"]}' Note The Kinetica Spark Connector assumes all columns are nullable by default, so using the Spark engine will automatically apply the nullable property to all columns. The Odo engine assumes all columns are not nullable by default. |
--d-dry-run | odo | Generates a stringified JSON schema of the specified Kinetica destination table based on the input source. The schema includes the column's name, column's type, any properties, and whether the column is nullable. Both a source and destination must be specified, but note that no data is transferred/appended and non-existing tables will not be created when this flag is present. Returned schema is a list of Python dictionaries, e.g., [{"is_nullable": false, "column_properties":["data", "int16"], "column_type":"int", "column_name":"a"}] |
--s-https | spark | Generates a connection using HTTPS instead of the default HTTP to the specified Kinetica source table. Note An HTTPS connection requires that the appropriate truststore and password be passed as well. Review the --s-ssl-httpd-truststore[-password] parameters below for more information |
--d-https | spark | Generates a connection using HTTPS instead of the default HTTP to the specified Kinetica destination table. Note An HTTPS connection requires that the appropriate truststore and password be passed as well. Review the --d-ssl-httpd-truststore[-password] parameters below for more information |
--d-is-replicated | spark | Specifies whether the destination table should be replicated. |
--d-update-on-existing-pk | spark | Specifies the record collision policy for inserting into a table with a primary key If this flag is present, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record. If this flag is not present, any existing table record with primary key values that match those of a record being inserted will remain unchanged and the new record discarded. If the specified table does not have a primary key, then this flag is ignored. |
--d-jdbc-url '<jdbc-url>' | spark | Specifies a particular JDBC URL to connect to Kinetica. The default JDBC URL is 'jdbc:kinetica://<hostname>:9191' where <hostname> is the hostname referenced in Kinetica URI. |
--d-spark-timeout-milliseconds | spark | Set a connection timeout in milliseconds. The default value is 60000 (1 minute). |
--d-spark-enable-multihead | spark | Enables multi-head ingest. Important Replicated tables are not supported with multi-head ingest. |
--s-ssl-httpd-truststore /file/path/truststore.jks | spark | When Kinetica is the source, filepath to the Java truststore that is then passed to the HTTPD service to verify the SSL certificate that the Kinetica HTTPD server provides. Note This option must be passed in conjuction with the --s-ssl-httpd-truststore-password parameter |
--d-ssl-httpd-truststore /file/path/truststore.jks | spark | When Kinetica is the destination, filepath to the Java truststore that is then passed to the HTTPD service to verify the SSL certificate that the Kinetica HTTPD server provides. Note This option must be passed in conjuction with the --d-ssl-httpd-truststore-password parameter |
--s-ssl-httpd-truststore-password '<password>' | spark | When Kinetica is the source, the password for the Java truststore passed with the --s-ssl-httpd-truststore parameter. |
--d-ssl-httpd-truststore-password '<password>' | spark | When Kinetica is the destination, the password for the Jave truststore passed with the --d-ssl-httpd-truststore parameter. |
Limitations and Cautions
If a table does not exist when importing into Kinetica, the table will be created automatically and:
- it will be created in the user's default schema
- foreign keys are not preserved
- distribution scheme is not preserved (unless using the --d-is-replicated option with spark as the --engine type)-- the table will be randomly sharded by default
To remove properties, specify the --d-property-overrides option and list all properties that should be kept while leaving out any property(ies) that should be removed.
Overriding a base type will remove its existing properties.
If a table is being imported into Kinetica using the Spark engine and its name contains non-alphanumeric special characters, the special characters will be replaced with underscores
If importing a GIS-type column(s) with the --gis-ext-enabled flag, the --d-property-additions parameter must be present containing a map of each GIS-type column to the wkt type, e.g.,
--d-property-additions '{"gis_col1": ["wkt"], "gis_col2": ["wkt"]}'
Microsoft SQL Server / Spatial Data (GIS) SQL Server
Imports from a table in Microsoft SQL Server. For help with command line usage, provide the --show-usage parameter with the sqlserver option.
Important
Microsoft SQL Server / GIS SQL Server is only supported when importing from Microsoft SQL Server to Kinetica with the Spark engine.
Usage
Base Microsoft SQL Server URI:
sqlserver://<username>:<password>@<address><:port>/<database_name><::table_name>
Important
All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
URI Parameters
Parameter | Description |
---|---|
username | The username used for authentication. |
password | The password used for authentication. |
address | The IPV4 address to connect to. |
port | The port to connect to. |
database_name | The database name. |
table_name | The table in the database to import from. |
Command Line Options
Option | Description |
---|---|
--gis-ext-enabled | Allows KIO to recognize a GIS-specific extension is enabled so SQL-native GIS types can be read into memory as wkt strings for proper conversion into Kinetica. This option must be passed in conjunction with the kinetica destination format option --d-property-additions to map the incoming GIS-type column(s) to wkt type in Kinetica (see Examples for more detail). |
Limitations and Cautions
- KIO does not support Java virtual machines (JVMs) with Federal Information Processing Standards (FIPS) enabled.
- KIO does not support extra properties for Microsoft SQL Server availability groups, including cluster or subnet failover.
- KIO does not support connecting to Microsoft SQL Server with SSL.
Oracle
Imports from a table in an Oracle database. For help with command line usage, provide the --show-usage parameter with the oracle option.
Important
Oracle is only supported when importing from Oracle to Kinetica with the Spark engine.
Usage
Base Oracle URI:
oracle://<username>:<password>@<address><:port>/<system_id><::schema_name>
Important
All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
URI Parameters
Parameter | Description |
---|---|
username | The username used for authentication. |
password | The password used for authentication. |
address | The IPV4 address to connect to. |
port | The port to connect to. |
system_id | The system ID to identify a particular Oracle service. |
schema_name | The schema in the system to import from. |
Limitations and Cautions
KIO uses the Oracle XE Service to establish a database connection. The listener.ora and tsnames.ora files must be correctly configured to expose the XE Service.
KIO only supports Oracle 12c.
When exporting from Oracle to a given destination, column names will be capitalized. When applying properties to a Kinetica destination table's column names, ensure the column names referenced are also capitalized, e.g.,:
--d-property-additions '{"VENDOR_ID": ["text_search"]}'
PostgreSQL / PostGIS
Imports from or exports to a table in PostgreSQL. Copying from one table to another is possible. For help with command line usage, provide the --show-usage parameter with the postgresql option.
Important
PostGIS is only supported when importing from PostgreSQL to Kinetica with the Spark engine.
Usage
Base PostgreSQL URI:
postgresql://<username>:<password>@<host><:port>/<database_name>[::schema_name]<::table_name>
Important
The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
URI Parameters
Parameter | Description |
---|---|
username | The username used for authentication. Default is the user currently running KIO |
password | The password used for authentication. |
host | The host to connect to. Default is localhost |
port | The port to connect to. Default is 5432 |
database_name | The database name. Default is the value the username parameter is set to |
schema_name | The schema name. Optional |
table_name | The table in the database to import into or export from. |
Command Line Options
Option | Description |
---|---|
--s-case-sensitive | When the source is PostgreSQL and this flag is passed, case sensitivity will be enabled for table, column, and schema names. |
--d-case-sensitive | When the destination is PostgreSQL and this flag is passed, case sensitivity will be enabled for table, column, and schema names. |
--s-postgis-schema <schema-name> | The PostGIS schema name for when the source is PostgreSQL. Default value is public. |
--d-postgis-schema <schema-name> | The PostGIS schema name for when the destination is PostgreSQL. Default value is public. |
--gis-ext-enabled | Allows KIO to recognize a GIS-specific extension is enabled so SQL-native GIS types can be read into memory as wkt strings for proper conversion into Kinetica. This option must be passed in conjunction with the kinetica destination format option --d-property-additions to map the incoming GIS-type column(s) to wkt type in Kinetica (see Examples for more detail). |
--s-batch-iterator-column-name <column-name> | When the source is PostgreSQL, column name for a column that will be used to iterate over data when ingesting into Kinetica. This parameter is required for tables that have more than 10 million records, but tables of any size will receive better ingest performance. The provided column must be an integer (BIGINT, SMALLINT, or INTEGER) column, must not be nullable, and is either SERIAL or UNIQUE. Providing this option enables KIO to use a more performant query to fetch records. Important This option is only available for PostgreSQL to Kinetica transfers using the Spark engine. |
Limitations and Cautions
- If a table does not exist when importing into PostgreSQL, the table will be created automatically. If attempting to copy from one table to another in PostgreSQL, the output table must already exist before KIO can copy the source table into it.
- Case sensitivity is disabled by default. When case sensitivity is disabled, PostgreSQL will query and reference table, column, and schema names by lowercase values.
Sybase IQ
Imports from a table in a Sybase IQ database. For help with command line usage, provide the --show-usage parameter with the sybaseiq option.
Important
Sybase IQ is only supported when importing from Sybase IQ to Kinetica with the Spark engine.
Usage
Base Sybase IQ URI:
sybaseiq://<username>:<password>@<address><:port>/<database_name><::table_name>
Important
All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
URI Parameters
Parameter | Description |
---|---|
username | The username used for authentication. |
password | The password used for authentication. |
address | The host to connect to. |
port | The port to connect to. |
database_name | The database name. |
table_name | The table in the database to import from. |
Limitations and Cautions
- KIO will use the default character set of the connecting database; no conversion between character sets is available.
- KIO does not define Sybase IQ connection failover behavior.
- KIO does not define a timeout for client sessions or SQL queries.
- KIO does not support connecting with Sybase IQ using SSL.
Teradata
Imports from a table in a Teradata database. For help with command line usage, provide the --show-usage parameter with the teradata option.
Some considerations when connecting to Teradata:
- KIO uses the JDBC 4.0 API specification.
- KIO connects to the the standard DBC/SQL partition in Teradata.
- KIO requires that JDK/JRE version 1.3.2, 5, 6, 7, 8, or 9 are installed and reachable on your machine instance in order to connect to Teradata.
Important
Teradata is only supported when importing from Teradata to Kinetica with the Spark engine.
Usage
Base Teradata URI:
teradata://<username>:<password>@<address><:port>/<database_name><::table_name>
Important
All fields are required. The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
URI Parameters
Parameter | Description |
---|---|
username | The username used for authentication. |
password | The password used for authentication. |
host | The host to connect to. |
port | The port to connect to. |
database_name | The database name. |
table_name | The table in the database to import from. The name must exactly match; predicate matching is not supported. |
Limitations and Cautions
- KIO only supports the UTF-8 character set (CHARSET) when connecting with Teradata.
- KIO does not support diagnostics or troubleshooting support via DDSTATS.
- KIO does not support data encryption between the Teradata JDBC driver and the Teradata database using ENCRYPTDATA.
- KIO does not support additional fractional digits.
- KIO does not support modifying users and authentication credentials using NEW_PASSWORD.
- KIO only supports Teradata versions 14.00, 14.10, 15.00, 15.10, 16.10, and 16.20.
File Formats
Apache ORC
Imports from an ORC file; exporting to ORC via KIO is currently not supported. For help with command line usage, provide the --show-usage parameter with the orc option.
Usage
URI Parameters
Parameter | Description |
---|---|
/file/path/filename.orc | Absolute filepath to an ORC file |
Important
Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.
Limitations and Cautions
- The following Apache ORC types are not currently supported by KIO when
Kinetica is the destination:
- struct
- list
- map
- union
- KIO currently does not support referencing ORC files using relative file paths.
Apache Parquet
Imports from or exports to a Parquet file; however, exporting to Parquet files in AWS S3 via KIO is currently not supported. Compressed or partitioned Parquet files are also supported. For help with command line usage, provide the --show-usage parameter with the parquet option.
Important
If graceful failure mode is enabled, KIO will not validate the schema of the dataset until time of ingest for a particular partition at which point the partition will be skipped if its schema is invalid.
Tip
Apache Parquet files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.
Usage
URI Parameters
Parameter | Description |
---|---|
/file/path/filename.parquet | Absolute filepath to a Parquet file |
Important
Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.
Command Line Options
Options | Description |
---|---|
--s-recursive | Recursively search a directory for parquet files instead of only searching the top level of the directory specified by the file path parameter. For example, given the directory below: /path/to/directory/ ├── example1.parquet ├── example2.parquet ├── subdirectory/ │ ├── example4.parquet │ └── example5.parquet └── example3.parquet Specifying --source of /path/to/directory/example*.parquet without specifying --s-recursive will register example1.parquet, example2.parquet, and example3.parquet. Specifying --s-recursive will register all files, including example4.parquet and example5.parquet |
Limitations and Cautions
- KIO currently does not support referencing Parquet files using relative file paths.
- If the file does not exist when exporting to Parquet, it will be created. If the file does exist, it will be overwritten.
- KIO only supports hidden Parquet file ingestion (/file/path/.<file-name>.parquet) using the Odo engine
Apache Parquet Dataset
Imports from a Parquet Dataset; exporting to Parquet Datasets via KIO is currently not supported. For help with command line usage, provide the --show-usage parameter with the parquetdataset option.
Tip
Apache Parquet Dataset files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.
Usage
URI Parameters
Parameter | Description |
---|---|
/file/path/directory.parquetdataset | Absolute path to a Parquet Dataset directory |
Limitations and Cautions
General:
- KIO currently does not support referencing Parquet Datasets using relative directory paths.
- KIO will always attempt to ingest the Parquet dataset as a nested directory of partitioned Parquet files. If ingesting a list of absolute file paths to Parquet files, move all the files into a single directory and use /file/path/*.parquet as the source URI instead to ingest the files as chunked Parquet files.
- KIO currently does not support inferring the types of partition keys, and as such, the keys are cast as strings. If ingesting the dataset into Kinetica, utilize --d-property-additions, --d-property-overrides, and --d-base-type-overrides to establish the data's types as desired. Refer to Kinetica for more information.
- KIO currently does not support reading in specific columns/partition keys from the Parquet Dataset.
AWS S3 Apache Parquet Dataset:
- KIO assumes that Parquet Datasets are not S3 buckets but rather a subdirectory (or subdirectories) within an S3 bucket
- Interacting with Apache Parquet Datasets in an S3 bucket is a Python 3-specific feature. Note that Python 3 is packaged with KIO by default
CSV
Imports from or exports to a CSV file. For help with command line usage, provide the --show-usage parameter with the csv option.
Tip
CSV files can also transferred to/from an S3 bucket. See Transferring files via S3 for details.
Usage
URI Parameters
Parameter | Description |
---|---|
/file/path/filename.csv | Absolute or relative filepath to a CSV file |
Important
Do not specify single or double quotes around the file path, as KIO will interpret that as the name of the file, rather than a file path.
Command Line Options
Options | Description |
---|---|
--d-delimiter <delimiter> | Set the delimiter used in the file when csv is the destination format. Any alphanumeric character or symbol is a valid delimiter; default is ,. |
--s-delimiter <delimiter> | Set the delimiter used in the file when csv is the source format. Important The option must match the delimiter in the file or the import will fail. |
--d-escape-character <character> | Set the character used to escape error-prone characters, e.g., ,, "", etc., when csv is the destination format. |
--s-escape-character <character> | Set the character used to escape error-prone characters, e.g., ,, "", etc., used in the source csv file. Important The option must match the escape character used in the file or the import will fail. |
--s-custom-null-value <value> | Set the null value literal when csv is the source format. Any string value is supported. Important This option is only available when using the Spark engine. |
--s-custom-quote-value <value> | Set the quote symbol that will replace default double quotes (") when enclosing text fields for data transfers where csv is the source format. Any string value is supported. Important This option is only available when using the Spark engine. |
--s-custom-comment-value <value> | Set the comment symbol when csv is the source format. Lines starting with the provided comment symbol are not parsed or ingested. Any string value is supported, but only one character is supported, e.g., #, -. Important This option is only available when using the Spark engine. |
--s-headerless | Specifies if the source CSV file is headerless, e.g., the first row in the CSV file is data instead of column headers. By default, Kinetica assumes all source CSV files have headers as the first row. |
--dynamically-infer-wkt-subtypes | Dynamically infer whether columns with base type string should have the wkt property applied during ingestion. Important This option is only available for CSV to Kinetica transfers using the Odo engine. |
--s-drop-malformed-rows | Indicates rows of a source CSV file will not be inserted if either of the following is true:
Important This option is only available when using the Spark engine and when --failure-mode is set to graceful. |
Limitations and Cautions
If CSV is the destination and the target file(s) does not exist, column headers will be written to the file first and then data will be appended. If the target file(s) does exist, only data will be appended.
If importing from a CSV into a non-existing table in Kinetica, the table will be created. The database will infer column base types and null values depending on the data being imported.
KIO only supports wildcard ingestion of multiple CSVs (/file/path/*.csv) using the Spark engine
KIO only supports hidden CSV file ingestion (/file/path/.<file-name>.csv) using the Odo engine
If importing from a CSV into an existing table in Kinetica, the first row of the CSV file must contain a delimiter-separated (based on the user-specified delimiter) list of the column names in the order they appear in the existing table.
Warning
If the first row of the CSV file is a list of column names that don't match with the existing table, KIO will attempt to insert that row as data instead, which could result in import errors.
Shapefiles
Imports from a shapefile or exports to another shapefile. For help with command line usage, provide the --show-usage parameter with the shapefile option.
Tip
Shapefiles can also transferred to/from an S3 bucket. See Transferring files via S3 for details.
Usage
URI Parameters
Parameter | Description |
---|---|
/file/path/filename.shapefile | Absolute filepath to a shapefile |
Note
The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
Limitations and Cautions
- The URI must reference the .shapefile file, which maps to a main file (.shp) and sometimes an index file (.shx) and a dBASE file (.dbf). KIO requires that at least the .shapefile and .shp files be available. These files (and any other files) must be in the same directory and have the same prefix to be recognized by KIO, e.g., usa-coastline.shapefile, usa-coastline.shp, etc.
- KIO currently does not support referencing shapefiles using relative file paths.
- Kinetica decimal fields are fixed precision and scale, so importing a shapefile that references numeric fields into Kinetica will be lossy
- KIO may truncate dBASE record values that are more than 256 characters long
- KIO supports wildcard ingestion of shapefiles (/file/path/*.shapefile) into the same destination data format provided that the schema for all files is identical, i.e. column names don't have to match but the rows of data therein must be of the same type. If a file's schema does not match the other files being ingested, the file will be ignored and the ingestion will continue. If the column names do not match, the first file ingested will determine the column names.
- Shapefiles can only be ingested using the Odo engine
Transferring files via S3
You can also import from or export to supported file formats in an S3 bucket. For help with command line usage, provide the --show-usage parameter with the s3 option. AWS S3 in KIO isn't necessarily a separate format but rather a way to access an S3 bucket for file transfers; all associated parameters, options, and limitations apply when using a supported file format within an S3 URI. The following file formats are supported with S3 KIO operations:
- CSV
- Apache Parquet
- Apache Parquet Dataset
- Shapefile
Usage
Base S3 URI:
s3://<bucket name>/<file path>
Note
The URI must be wrapped in quotes ('' or "") should a special character be present that could confuse the command prompt
URI Parameters
Parameter | Description |
---|---|
<bucket name> | The unique name given to the S3 bucket in your AWS account. |
<file path> | The file path to the desired file(s) contained in the specified S3 bucket. |
Command Line Options
Options | Description |
---|---|
--aws-access-key-id <key> | Sets the AWS access key ID for an AWS user with access permissions to both the source and destination (if both are AWS resources). |
--aws-secret-access-key <key> | Sets the AWS secret access key for an AWS user with access permissions to both the source and destination (if both are AWS resources). |
Important
If the S3 bucket has public access, the access key and secret access key are not required.
Configuration
If not using the access key ID and secret access key command line options, your machine can be configured for direct AWS S3 access. Once you've created your AWS access key ID and AWS secret access key:
Install the official AWS CLI on your machine using pip:
pip install awscli
Add your AWS account credentials to the machine. You can find the default region name in the URL for your AWS console; the default output format can be left as the default (None):
$ aws configure AWS Access Key ID [None]: <access-key-id> AWS Secret Access Key [None]: <secret-access-key> Default region name [None]: <region-name> Default output format [None]:
Verify the .aws directory is available in your root directory:
$ ls -al ~/ total 500 dr-xr-x---. 10 root root 4096 Jun 23 12:25 . dr-xr-xr-x. 17 root root 4096 Jun 23 11:49 .. drwxr-xr-x 2 root root 39 Jun 23 12:25 .aws ...
Verify you can read the contents of your AWS S3 bucket:
aws s3 ls <your-bucket-name>
Limitations and Cautions
If the file does not exist when exporting to S3, it will be created; if the file does exist, it will be overwritten.
KIO supports wildcard ingestion of files on AWS S3 (e.g., s3://my-s3-bucket/*.csv, s3://my-s3-bucket/*.shapefile) into the same destination data format provided that the schema for all files is identical, i.e. column names don't have to match but the rows of data therein must be of the same type. If a file's schema does not match the other files being ingested, the file will be ignored and the ingestion will continue. If the column names do not match, the first file ingested will determine the column names
Important
If a prefix directory is provided, e.g., s3://my-s3-bucket/test/dir/*.csv, it will be recursively searched, and any .csv files found will being copied to the destination.
The AWS credentials passed in via command line must have permissions to both the source and destination resources. KIO does not support separate AWS credentials for the source and destination resources.
Examples
Importing from a CSV file into the taxi_trip_data table in Kinetica, adding a property, and overriding one base type:
./kio -s /home/john_smith/files/taxi_trip_data.csv -d kinetica://::taxi_trip_data --d-property-additions '{"vendor_id": ["text_search"]}' --d-base-type-overrides '{"trip_distance": "double"}'
Exporting from the flights table in Kinetica to a CSV file on the local machine:
./kio -s kinetica://localhost:9191::flights -d /home/jane_smith/flights_data.csv
Exporting from PostgreSQL into Kinetica and overriding a property:
./kio -s postgresql://jsmith1:password123@172.123.45.67:5432/demo::nyctaxi -d 'kinetica://jsmith1:password234$@172.123.45.67:9191::taxi_data' --d-property-overrides '{"passenger_count": ["int16"]}'
Importing from CSV files in the specified S3 directory into Kinetica:
./kio -s 's3://my-s3-bucket/csv/*.csv' -d kinetica://::my_test_data
Exporting from the taxi_trip_data table in Kinetica to a CSV file in an S3 bucket:
./kio -s kinetica://::taxi_trip_data -d s3://my-s3-bucket/taxi_trip_data.csv
Importing from a shapefile into Kinetica:
./kio -s /tmp/data/usa-coastline.shapefile -d kinetica://::usa_coastline_data
Importing from taxi_trip_data, a Parquet Dataset directory, in an AWS S3 bucket to a table in a collection in Kinetica:
./kio -s s3://data-bucket/taxi_trip_data.parquetdataset -d kinetica://::taxi_trip_data --d-collection-name my_collection
Importing from a local CSV file into Kinetica securely:
./kio -s /home/my-data.csv -d kinetica://jsmith:password234@172.123.45.67:8082/gpudb-0::my_data_table --d-https
Importing from a PostGIS-enabled PostgreSQL database into Kinetica:
./kio -s 'postgresql://postgres:password123@172.123.45.67:5432/postgres::roads' -d 'kinetica://::roads' --d-property-additions '{"roads_geom": ["wkt"]}' --gis-ext-enabled --engine spark
Importing from Microsoft SQL Server into Kinetica:
./kio -s sqlserver://jsmith:temp1234@172.123.45.67:1433/gpudb::nyctaxi -d kinetica://::nyctaxi
Importing from Oracle into Kinetica:
./kio -s oracle://jsmith:temp1234@172.123.45.67:1521/orcl::nyctaxi -d kinetica://::nyctaxi
Importing from Sybase IQ into Kinetica:
./kio -s sybaseiq://jsmith:sql123@172.123.45.67:2638/kindata::nyctaxi -d kinetica://::nyctaxi
Importing from Teradata into Kinetica:
./kio -s teradata://dbcadmin:password123@172.123.45.67:1025/gpudb::nyctaxi -d kinetica://::nyctaxi