| bad_record_table_name | Name of a table to which records that
were rejected are written. The
bad-record-table has the following
columns: line_number (long),
line_rejected (string), error_message
(string). When error handling is Abort,
bad records table is not populated. |
| bad_record_table_limit | A positive integer indicating the
maximum number of records that can be
written to the bad-record-table.
Default value is 10000. |
| batch_size | Number of records per batch when
inserting data. |
| datasource_name | Name of an existing external data source
from which table will be loaded. |
| error_handling | Specifies how errors should be handled
upon insertion. The default value is abort. | Supported Values | Description |
|---|
| permissive | Records with missing columns are
populated with nulls if possible;
otherwise, the malformed records are
skipped. | | ignore_bad_records | Malformed records are skipped. | | abort | Stops current insertion and aborts
entire operation when an error is
encountered. Primary key collisions are
considered abortable errors in this
mode. |
|
| ignore_existing_pk | Specifies the record collision
error-suppression policy for inserting
into a table with a
primary key,
only used when not in upsert mode
(upsert mode is disabled when
update_on_existing_pk is false). If
set to true, any record being inserted
that is rejected for having primary key
values that match those of an existing
table record will be ignored with no
error generated. If false, the
rejection of any record for having
primary key values matching an existing
record will result in an error being
reported, as determined by
error_handling. If the specified
table does not have a primary key or if
upsert mode is in effect
(update_on_existing_pk is true),
then this option has no effect. The default value is false. | Supported
Values | Description |
|---|
| true | Ignore new records whose primary key
values collide with those of existing
records. | | false | Treat as errors any new records whose
primary key values collide with those of
existing records. |
|
| ingestion_mode | Whether to do a full load, dry run, or
perform a type inference on the source
data. The default value is full. | Supported Values | Description |
|---|
| full | Run a type inference on the source data
(if needed) and ingest. | | dry_run | Does not load data, but walks through
the source data and determines the
number of valid records, taking into
account the current mode of
error_handling. | | type_inference_only | Infer the type of the source data and
return, without ingesting any data. The
inferred type is returned in the
response. |
|
| jdbc_fetch_size | The JDBC fetch size, which determines
how many rows to fetch per round trip. |
| jdbc_session_init_statement | Executes the statement per each JDBC
session before doing actual load. The default value is ''. |
| num_splits_per_rank | Number of splits for reading data per
rank. Default will be
external_file_reader_num_tasks. The default value is ''. |
| num_tasks_per_rank | Number of tasks for reading data per
rank. Default will be
external_file_reader_num_tasks. |
| primary_keys | Comma separated list of column names, to
set as primary keys, when not specified
in the type. The default value is ''. |
| shard_keys | Comma separated list of column names, to
set as shard keys, when not specified in
the type. The default value is ''. |
| subscribe | Continuously poll the data source to
check for new data and load it into the
table. The default value is false. The supported values are: |
| truncate_table | If set to true, truncates the table
specified by input parameter
table_name prior to loading the data. The default value is false. The supported values are: |
| remote_query | Remote SQL query from which data will be
sourced. |
| remote_query_order_by | Name of column to be used for splitting
the query into multiple sub-queries
using ordering of given column. The default value is ''. |
| remote_query_filter_column | Name of column to be used for splitting
the query into multiple sub-queries
using the data distribution of given
column. The default value is ''. |
| remote_query_increasing_column | Column on subscribed remote query result
that will increase for new records
(e.g., TIMESTAMP). The default value is ''. |
| remote_query_partition_column | Alias name for
remote_query_filter_column. The default value is ''. |
| truncate_strings | If set to true, truncate string values
that are longer than the column's type
size. The default value is false. The supported values are: |
| update_on_existing_pk | Specifies the record collision policy
for inserting into a table with a
primary key.
If set to true, any existing table
record with primary key values that
match those of a record being inserted
will be replaced by that new record (the
new data will be "upserted"). If set to
false, any existing table record with
primary key values that match those of a
record being inserted will remain
unchanged, while the new record will be
rejected and the error handled as
determined by ignore_existing_pk and
error_handling. If the specified
table does not have a primary key, then
this option has no effect. The default value is false. | Supported
Values | Description |
|---|
| true | Upsert new records when primary keys
match existing records. | | false | Reject new records when primary keys
match existing records. |
|