BAD RECORD TABLE | Name of the table containing records that failed to be loaded into the target table. This
bad record table will include the following columns: Column Name | Source Data Format Codes |
---|
line_number | Number of the line in the input file containing the failed record | char_number | Position of character within a failed record that is assessed as
the beginning of the portion of the record that failed to process | filename | Name of file that contained the failed record | line_rejected | Text of the record that failed to process | error_msg | Error message associated with the record processing failure |
Note This option is not applicable for an ON ERROR mode of ABORT. In that
mode, processing stops at the first error and that error is returned to the user. |
BATCH SIZE | Use an ingest batch size of the given number of records. The default batch size is 50,000. Note This option is not available for data sources configured for Kafka. |
COLUMN FORMATS | Use the given type-specific formatting for the given column when parsing source data being
loaded into that column. This should be a map of column names to format specifications,
where each format specification is map of column type to data format, all formatted as a
JSON string. Supported column types include: Column Type | Source Data Format Codes |
---|
date | Apply the given date format to the given column. Common date format codes follow. For the complete list, see
Date/Time Conversion Codes. Code | Description |
---|
YYYY | 4-digit year | MM | 2-digit month, where January is 01 | DD | 2-digit day of the month, where the 1st
of each month is 01 |
| time | Apply the given time format to the given column. Common time format codes follow. For the complete list, see
Date/Time Conversion Codes. Code | Description |
---|
HH24 | 24-based hour, where 12:00 AM is 00 and 7:00 PM
is 19 | MI | 2-digit minute of the hour | SS | 2-digit second of the minute | MS | milliseconds |
| datetime | Apply the given date/time format to the given column. |
For example, to load dates of the format 2010.10.30 into date column d and times of
the 24-hour format 18:36:54.789 into time column t: {
"d": {"date": "YYYY.MM.DD"},
"t": {"time": "HH24:MI:SS.MS"}
}
Note This option is not available for data sources configured for JDBC. |
DATA SOURCE | Load data from the given data source.
Data source connect privilege is required when
loading from a data source. |
DEFAULT COLUMN FORMATS | Use the given formats for source data being loaded into target table columns with the
corresponding column types. This should be a map of target column type to source format
for data being loaded into columns of that type, formatted as a JSON string. Supported column properties and source data formats are the same as those
listed in the description of the COLUMN FORMATS option. For example, to make the default format for loading source data dates like 2010.10.30
and 24-hour times like 18:36:54.789: {
"date": "YYYY.MM.DD",
"time": "HH24:MI:SS.MS",
"datetime": "YYYY.MM.DD HH24:MI:SS.MS"
}
Note This option is not available for data sources configured for JDBC. |
FIELDS IGNORED BY | Choose a comma-separated list of fields from the source file(s) to ignore, loading only
those fields that are not in the identified list in the order they appear in the file.
Fields can be identified by either POSITION or NAME. If ignoring by NAME, the
specified names must match the source file field names exactly. Identifying by Name: FIELDS IGNORED BY NAME(Category, Description)
Identifying by Position: FIELDS IGNORED BY POSITION(3, 4)
Note - When ignoring source data file fields, the set of fields that are not ignored must
align, in type & number in their order in the source file, with the target table
columns into which the data will be loaded.
- Ignoring fields by POSITION is only supported for delimited text files.
|
FIELDS MAPPED BY | Choose a comma-separated list of fields from the source file(s) to load, in the specified
order, identifying fields by either POSITION or NAME. If mapping by NAME, the
specified names must match the source file field names exactly. Identifying by Name: FIELDS MAPPED BY NAME(ID, Name, Stock)
Identifying by Position: FIELDS MAPPED BY POSITION(1, 2, 5)
Note - When mapping source data file fields and the target table exists, the set of fields
that are identified must align, in type & number in the specified order, with the
target table columns into which data will be loaded.
- Mapping fields by POSITION is only supported for delimited text files.
|
FLATTEN_COLUMNS | Specify the policy for handling nested columns within JSON data. The default is FALSE. Value | Description |
---|
TRUE | Break up nested columns into multiple columns. | FALSE | Treat nested columns as JSON columns instead of flattening. |
|
IGNORE_EXISTING_PK | Specify the error suppression policy for inserting duplicate primary key values into a
table with a primary key. If the specified table does not have a primary key or the
UPDATE_ON_EXISTING_PK option is used, then this options has no effect. The default is FALSE. Value | Description |
---|
TRUE | Suppress errors when inserted records and existing records' PKs match. | FALSE | Return errors when inserted records and existing records' PKs match. |
|
INGESTION MODE | Whether to do a full ingest of the data or perform a dry run or type inference instead. The default mode is FULL. Value | Description |
---|
DRY RUN | No data will be inserted, but the file will be read with the
applied ON ERROR mode and the number of valid records
that would normally be inserted is returned. | FULL | Data is fully ingested according to the active ON ERROR mode. | TYPE INFERENCE | Infer the type of the source data and return, without ingesting
any data. The inferred type is returned in the response, as the
output of a SHOW TABLE command. |
|
JDBC_FETCH_SIZE | Retrieve this many records at a time from the remote database. Lowering this number will
help tables with large record sizes fit into available memory during ingest. The default is 50,000. Note This option is only available for data sources configured for JDBC. |
JDBC_SESSION_INIT_STATEMENT | Run the single given statement before the initial load is performed. Also run it before
each subsequent reload, if SUBSCRIBE is TRUE. For example, to set the time zone to UTC before running each load, use: JDBC_SESSION_INIT_STATEMENT = 'SET TIME ZONE ''UTC'''
Note This option is only available for data sources configured for JDBC. |
KAFKA_CONSUMERS_PER_RANK | Number of Kafka consumer threads per rank; valid range is 1 through 6. The default is 1. |
KAFKA_GROUP_ID | Consumer group ID to use for data sources configured for Kafka. If not specified, one
will be randomly generated. |
KAFKA_OPTIMISTIC_INGEST | Enable optimistic ingestion where Kafka topic offsets and table data are committed
independently to achieve parallelism. The default is FALSE. |
KAFKA_OFFSET_RESET_POLICY | Whether to start a Kafka subscription at the earliest or latest offset, for
data sources configured for Kafka. The default is earliest. Value | Description |
---|
earliest | Start streaming from the oldest message in the queue. | latest | Start streaming from the newest message in the queue. |
|
KAFKA_SUBSCRIPTION_CANCEL_AFTER | Optional number of minutes after which a Kafka subscription will be cancelled, for
data sources configured for Kafka. |
KAFKA_TYPE_INFERENCE_FETCH_TIMEOUT | Maximum time to collect Kafka messages before type inferencing on the set of them. The default is generally 30 seconds. |
MAX_RECORDS_TO_LOAD | Limit the number of records to load in this request. If the number is larger than
BATCH SIZE, the number of records loaded will be limited to the next whole number of
BATCH SIZE (per working thread). |
NUM_SPLITS_PER_RANK | The number of remote query partitions to assign each Kinetica worker process. The
queries assigned to a worker process will be executed by the tasks allotted to the process. To decrease memory pressure, increase the number of splits per rank. The default is 8 splits per rank. Note This option is only available for data sources configured for JDBC. |
NUM_TASKS_PER_RANK | The number of tasks to use on each Kinetica worker process to process remote queries.
The tasks assigned to a worker process will execute any remote query partitions assigned to
it. To decrease memory pressure, decrease the number of tasks per rank. The default is 8 tasks per rank. Note This option is only available for data sources configured for JDBC. |
ON ERROR | When an error is encountered loading a record, handle it using either of the
following modes. The default mode is ABORT. Mode | Description |
---|
SKIP | If an error is encountered parsing a source record, skip the
record. | ABORT | If an error is encountered parsing a source record, stop the
data load process. Primary key collisions are considered
abortable errors in this mode. |
|
POLL_INTERVAL | Interval, in seconds, at which a data source is polled for updates. The number of
seconds must be passed as a single-quoted string. The default interval is 60 seconds. This option is only applicable when SUBSCRIBE is
TRUE. Note This option is not available for data sources configured for Kafka. |
PRIMARY KEY | Optional primary key specification clause, where the value is a
parenthesized comma-separated list of columns to use as the primary key for the table: PRIMARY KEY = ('id', 'dept_id')
|
REMOTE_QUERY_INCREASING_COLUMN | For a JDBC query change data capture loading scheme, the remote query column that will be
used to determine whether a record is new and should be loaded or not. This column should
have an ever-increasing value and be of an integral or date/timestamp type. Often, this
column will be a sequence-based ID or create/modify timestamp. This option is only applicable when SUBSCRIBE is TRUE. Note This option is only available for data sources configured for JDBC. |
REMOTE_QUERY_NO_SPLIT | Whether to not distribute the retrieval of remote data and issue queries for blocks of data
at time in parallel. The default is FALSE. Note This option is only available for data sources configured for JDBC Value | Description |
---|
TRUE | Issue the remote data retrieval as a single query. | FALSE | Distribute and parallelize the remote data retrieval in queries for
blocks of data at a time. |
|
REMOTE_QUERY_ORDER_BY | Ordering expression to use in partitioning remote data for retrieval. The remote data will
be ordered according to this expression and then retrieved in sequential blocks from the
first record. This is potentially less performant than using REMOTE_QUERY_PARTITION_COLUMN. If REMOTE_QUERY_NO_SPLIT is TRUE, a valid REMOTE_QUERY_PARTITION_COLUMN is specified, or the column given is
invalid, this option is ignored. Note This option is only available for data sources configured for JDBC |
REMOTE_QUERY_PARTITION_COLUMN | Column to use to partition remote data for retrieval. The column must be numeric and
should be relatively evenly distributed so that queries using values of this column to
partition data will retrieve relatively consistently-sized result sets. If REMOTE_QUERY_NO_SPLIT is TRUE or the column given is invalid, this option is ignored. Note This option is only available for data sources configured for JDBC |
SHARD KEY | Optional shard key specification clause, where the value is a
parenthesized comma-separated list of columns to use as the shard key for the table: SHARD KEY = ('id')
|
SKIP_LINES | Skip this number of lines when loading from a source file. |
SUBSCRIBE | Whether to subscribe to the data source specified in the
DATA SOURCE option. The default is FALSE. Note This option is not available for data sources configured for HDFS. Value | Description |
---|
TRUE | Stream data from the specified data source. | FALSE | Load all data once from the specified data source. |
|
TRUNCATE_STRINGS | Specify the string truncation policy for inserting text into VARCHAR columns that are
not large enough to hold the entire text value. The default is FALSE. Value | Description |
---|
TRUE | Truncate any inserted string value at the maximum size for its column. | FALSE | Reject any record with a string value that is too long for its column. |
|
TRUNCATE_TABLE | Specify whether to truncate the target table before beginning the load or subscription. The default is FALSE (don't truncate the target table). |
TYPE_INFERENCE_MODE | When making a type inference of the data values in order to define column types for the
target table, use one of the following modes. The default mode is SPEED. Value | Description |
---|
ACCURACY | Scan all available data to arrive at column types that are the
narrowest possible that can still hold all the data. | SPEED | Pick the widest possible column types from the minimum data
scanned in order to quickly arrive at column types that should
fit all data values. |
|
UPDATE_ON_EXISTING_PK | Specify the record collision policy for inserting into a table with a primary key.
If the specified table does not have a primary key, then this options has no effect. The default is FALSE. Value | Description |
---|
TRUE | Update existing records with records being inserted, when PKs match. | FALSE | Discard records being inserted when existing records' PKs match. |
|