BAD RECORD TABLE |
Name of the table containing records that failed to be loaded into the target table. This
bad record table will include the following columns:
Column Name |
Source Data Format Codes |
line_number |
Number of the line in the input file containing the failed record |
char_number |
Position of character within a failed record that is assessed as
the beginning of the portion of the record that failed to process |
filename |
Name of file that contained the failed record |
line_rejected |
Text of the record that failed to process |
error_msg |
Error message associated with the record processing failure |
Note
This option is not applicable for an ON ERROR mode of ABORT. In that
mode, processing stops at the first error and that error is returned to the user.
|
BATCH SIZE |
Use an ingest batch size of the given number of records.
The default batch size is 200,000.
Note
This option is not applicable to Kafka data sources.
|
COLUMN FORMATS |
Use the given type-specific formatting for the given column when parsing source data being
loaded into that column. This should be a map of column names to format specifications,
where each format specification is map of column type to data format, all formatted as a
JSON string.
Supported column types include:
Column Type |
Source Data Format Codes |
date |
Apply the given date format to the given column.
Common date format codes follow. For the complete list, see
Date/Time Conversion Codes.
Code |
Description |
YYYY |
4-digit year |
MM |
2-digit month, where January is 01 |
DD |
2-digit day of the month, where the 1st
of each month is 01 |
|
time |
Apply the given time format to the given column.
Common time format codes follow. For the complete list, see
Date/Time Conversion Codes.
Code |
Description |
HH24 |
24-based hour, where 12:00 AM is 00 and 7:00 PM
is 19 |
MI |
2-digit minute of the hour |
SS |
2-digit second of the minute |
MS |
milliseconds |
|
datetime |
Apply the given date/time format to the given column. |
For example, to load dates of the format 2010.10.30 into date column d and times of
the 24-hour format 18:36:54.789 into time column t:
{
"d": {"date": "YYYY.MM.DD"},
"t": {"time": "HH24:MI:SS.MS"}
}
Note
This option is not applicable to JDBC data sources.
|
DATA SOURCE |
Load data from the given data source.
Data source connect privilege is required when
loading from a data source. |
DEFAULT COLUMN FORMATS |
Use the given formats for source data being loaded into target table columns with the
corresponding column types. This should be a map of target column type to source format
for data being loaded into columns of that type, formatted as a JSON string.
Supported column properties and source data formats are the same as those
listed in the description of the COLUMN FORMATS option.
For example, to make the default format for loading source data dates like 2010.10.30
and 24-hour times like 18:36:54.789:
{
"date": "YYYY.MM.DD",
"time": "HH24:MI:SS.MS",
"datetime": "YYYY.MM.DD HH24:MI:SS.MS"
}
Note
This option is not applicable to JDBC data sources.
|
FIELDS IGNORED BY
<
POSITION(<col#s>)
|
NAME(<field names>)
> |
Choose a comma-separated list of fields from the source file(s) to ignore, loading only
those fields that are not in the identified list in the order they appear in the file.
Fields can be identified by either POSITION or NAME. If ignoring by NAME, the
specified names must match the source file field names exactly.
Note
When ignoring source data file fields and the target table exists, the set of
fields that are not ignored must align, in type & number in their order in the source
file, with the target table columns into which the data will be loaded.
Important
Ignoring fields by POSITION is only supported for delimited text files.
|
FIELDS MAPPED BY
<
POSITION(<col#s>)
|
NAME(<field names>)
> |
Choose a comma-separated list of fields from the source file(s) to load, in the specified
order, identifying fields by either POSITION or NAME. If mapping by NAME, the
specified names must match the source file field names exactly.
Note
When mapping source data file fields and the target table exists, the set of
fields that are identified must align, in type & number in the specified order, with the
target table columns into which data will be loaded.
Important
Mapping fields by POSITION is only supported for delimited text files.
|
INGESTION MODE |
Whether to do a full ingest of the data or perform a dry run or type inference instead.
The default mode is FULL.
Value |
Description |
DRY RUN |
No data will be inserted, but the file will be read with the
applied ON ERROR mode and the number of valid records
that would normally be inserted is returned. |
FULL |
Data is fully ingested according to the active ON ERROR mode. |
TYPE INFERENCE |
Infer the type of the source data and return, without ingesting
any data. The inferred type is returned in the response, as the
output of a SHOW TABLE command. |
|
KAFKA GROUP ID |
Consumer group ID to use for data sources configured for Kafka. If not specified, one
will be randomly generated. |
ON ERROR |
When an error is encountered loading a record, handle it using one of the
following modes. The default mode is ABORT.
Mode |
Description |
PERMISSIVE |
If an error is encountered parsing a source record, attempt
to insert as many of the valid fields from the record as
possible; insert a null for each errant value found. |
SKIP |
If an error is encountered parsing a source record, skip the
record. |
ABORT |
If an error is encountered parsing a source record, stop the
data load process. Primary key collisions are considered
abortable errors in this mode. |
|
POLL_INTERVAL |
Interval, in seconds, at which a data source is polled for updates. The number of
seconds must be passed as a single-quoted string. Default is 60 seconds.
Important
This option is only available for data sources configured for Azure or
S3 and when SUBSCRIBE is true.
|
PRIMARY KEY |
Optional primary key specification clause, where the value is a
parenthesized comma-separated list of columns to use as the primary key for the table:
PRIMARY KEY = ('id', 'dept_id')
|
SHARD KEY |
Optional shard key specification clause, where the value is a
parenthesized comma-separated list of columns to use as the shard key for the table:
SHARD KEY = ('id')
|
SUBSCRIBE |
Whether to subscribe to the data source specified in the
DATA SOURCE option. The default value is FALSE.
Important
This option is only available for data sources configured for Azure,
GCS, Kafka, or S3.
Value |
Description |
TRUE |
Stream data from the specified data source. |
FALSE |
Load all data once from the specified data source. |
|