bad_record_table_name | Optional name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error handling is Abort, bad records table is not populated. |
bad_record_table_limit | A positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000 |
batch_size | Number of records per batch when inserting data. |
datasource_name | Name of an existing external data source from which table will be loaded |
error_handling | Specifies how errors should be handled upon insertion. The default value is abort. Supported Values | Description |
---|
permissive | Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped. | ignore_bad_records | Malformed records are skipped. | abort | Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode. |
|
ignore_existing_pk | Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set
to true, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If false, the rejection of any record for having primary key values matching an
existing record will result in an error being reported, as determined by error_handling. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. The default
value is false. Supported
Values | Description |
---|
true | Ignore new records whose primary key values collide with those of existing records | false | Treat as errors any new records whose primary key values collide with those of existing records |
|
ingestion_mode | Whether to do a full load, dry run, or perform a type inference on the source data. The default value is full. Supported Values | Description |
---|
full | Run a type inference on the source data (if needed) and ingest | dry_run | Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of error_handling. | type_inference_only | Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response. |
|
jdbc_fetch_size | The JDBC fetch size, which determines how many rows to fetch per round trip. |
jdbc_session_init_statement | Executes the statement per each jdbc session before doing actual load. The default value is ''. |
num_splits_per_rank | Optional: number of splits for reading data per rank. Default will be external_file_reader_num_tasks. The default value is ''. |
num_tasks_per_rank | Optional: number of tasks for reading data per rank. Default will be external_file_reader_num_tasks |
primary_keys | Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''. |
shard_keys | Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''. |
subscribe | Continuously poll the data source to check for new data and load it into the table. The default value is false. The supported values are: |
truncate_table | If set to true, truncates the table specified by input parameter table_name prior to loading the data. The default value is false. The supported values are: |
remote_query | Remote SQL query from which data will be sourced |
remote_query_order_by | Name of column to be used for splitting the query into multiple sub-queries using ordering of given column. The default value is ''. |
remote_query_filter_column | Name of column to be used for splitting the query into multiple sub-queries using the data distribution of given column. The default value is ''. |
remote_query_increasing_column | Column on subscribed remote query result that will increase for new records (e.g., TIMESTAMP). The default value is ''. |
remote_query_partition_column | Alias name for remote_query_filter_column. The default value is ''. |
truncate_strings | If set to true, truncate string values that are longer than the column's type size. The default value is false. The supported values are: |
update_on_existing_pk | Specifies the record collision policy for inserting into a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted will be replaced
by that new record (the new data will be "upserted"). If set to false, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as
determined by ignore_existing_pk & error_handling. If the specified table does not have a primary key, then this option has no effect. The default value is false. Supported
Values | Description |
---|
true | Upsert new records when primary keys match existing records | false | Reject new records when primary keys match existing records |
|