Insert Records from external database

Computes remote query result and inserts the result data into a new or existing table

Input Parameter Description

NameTypeDescription
table_namestringName of the table into which the data will be inserted, in [schema_name.]table_name format, using standard name resolution rules. If the table does not exist, the table will be created using either an existing type_id or the type inferred from the remote query, and the new table name will have to meet standard table naming criteria.
remote_querystringQuery for which result data needs to be imported
modify_columnsmap of string to maps of string to stringsNot implemented yet. The default value is an empty map ( {} ).
create_table_optionsmap of string to strings

Options used when creating the target table. The default value is an empty map ( {} ).

Supported Parameters (keys)Parameter Description
type_idID of a currently registered type. The default value is ''.
no_error_if_exists

If true, prevents an error from occurring if the table already exists and is of the given type. If a table with the same ID but a different type exists, it is still an error. The default value is false. The supported values are:

  • true
  • false
is_replicated

Affects the distribution scheme for the table's data. If true and the given type has no explicit shard key defined, the table will be replicated. If false, the table will be sharded according to the shard key specified in the given type_id, or randomly sharded, if no shard key is specified. Note that a type containing a shard key cannot be used to create a replicated table. The default value is false. The supported values are:

  • true
  • false
foreign_keysSemicolon-separated list of foreign keys, of the format '(source_column_name [, ...]) references target_table_name(primary_key_column_name [, ...]) [as foreign_key_name]'.
foreign_shard_keyForeign shard key of the format 'source_column references shard_by_column from target_table(primary_key_column)'.
partition_type

Partitioning scheme to use.

Supported ValuesDescription
RANGEUse range partitioning.
INTERVALUse interval partitioning.
LISTUse list partitioning.
HASHUse hash partitioning.
SERIESUse series partitioning.
partition_keysComma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.
partition_definitionsComma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.
is_automatic_partition

If true, a new partition will be created for values which don't fall into an existing partition. Currently only supported for list partitions. The default value is false. The supported values are:

  • true
  • false
ttlSets the TTL of the table specified in input parameter table_name.
chunk_sizeIndicates the number of records per chunk to be used for this table.
is_result_table

Indicates whether the table is a memory-only table. A result table cannot contain columns with text_search data-handling, and it will not be retained if the server is restarted. The default value is false. The supported values are:

  • true
  • false
strategy_definitionThe tier strategy for the table and its columns.
optionsmap of string to strings

Optional parameters. The default value is an empty map ( {} ).

Supported Parameters (keys)Parameter Description
bad_record_table_nameOptional name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error handling is Abort, bad records table is not populated.
bad_record_table_limitA positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000
batch_sizeNumber of records per batch when inserting data.
datasource_nameName of an existing external data source from which table will be loaded
error_handling

Specifies how errors should be handled upon insertion. The default value is abort.

Supported ValuesDescription
permissiveRecords with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.
ignore_bad_recordsMalformed records are skipped.
abortStops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.
ignore_existing_pk

Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If false, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by error_handling. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. The default value is false.

Supported ValuesDescription
trueIgnore new records whose primary key values collide with those of existing records
falseTreat as errors any new records whose primary key values collide with those of existing records
ingestion_mode

Whether to do a full load, dry run, or perform a type inference on the source data. The default value is full.

Supported ValuesDescription
fullRun a type inference on the source data (if needed) and ingest
dry_runDoes not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of error_handling.
type_inference_onlyInfer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.
jdbc_fetch_sizeThe JDBC fetch size, which determines how many rows to fetch per round trip.
jdbc_session_init_statementExecutes the statement per each jdbc session before doing actual load. The default value is ''.
num_splits_per_rankOptional: number of splits for reading data per rank. Default will be external_file_reader_num_tasks. The default value is ''.
num_tasks_per_rankOptional: number of tasks for reading data per rank. Default will be external_file_reader_num_tasks
primary_keysOptional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
shard_keysOptional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
subscribe

Continuously poll the data source to check for new data and load it into the table. The default value is false. The supported values are:

  • true
  • false
truncate_table

If set to true, truncates the table specified by input parameter table_name prior to loading the data. The default value is false. The supported values are:

  • true
  • false
remote_queryRemote SQL query from which data will be sourced
remote_query_order_byName of column to be used for splitting the query into multiple sub-queries using ordering of given column. The default value is ''.
remote_query_filter_columnName of column to be used for splitting the query into multiple sub-queries using the data distribution of given column. The default value is ''.
remote_query_increasing_columnColumn on subscribed remote query result that will increase for new records (e.g., TIMESTAMP). The default value is ''.
remote_query_partition_columnAlias name for remote_query_filter_column. The default value is ''.
truncate_strings

If set to true, truncate string values that are longer than the column's type size. The default value is false. The supported values are:

  • true
  • false
update_on_existing_pk

Specifies the record collision policy for inserting into a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be "upserted"). If set to false, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by ignore_existing_pk & error_handling. If the specified table does not have a primary key, then this option has no effect. The default value is false.

Supported ValuesDescription
trueUpsert new records when primary keys match existing records
falseReject new records when primary keys match existing records

Output Parameter Description

NameTypeDescription
table_namestringValue of input parameter table_name.
type_idstringID of the currently registered table structure type for the target table
type_definitionstringA JSON string describing the columns of the target table
type_labelstringThe user-defined description associated with the target table's structure
type_propertiesmap of string to arrays of stringsA mapping of each target table column name to an array of column properties associated with that column
count_insertedlongNumber of records inserted into the target table.
count_skippedlongNumber of records skipped, when not running in abort error handling mode.
count_updatedlong[Not yet implemented] Number of records updated within the target table.
infomap of string to stringsAdditional information.