Class InsertRecordsFromFilesRequest.Options
InsertRecordsFromFilesRequest parameter options.
Optional parameters.
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringStops current insertion and aborts entire operation when an error is encountered.static final StringScans data to get exactly-typed and sized columns for all data scanned.static final StringAuto detect compression type.static final StringAvro file format.static final StringA positive integer indicating the maximum number of records that can be written to the bad-record-table.static final StringFor subscriptions, a positive integer indicating the maximum number of records that can be written to the bad-record-table per file/payload.static final StringName of a table to which records that were rejected are written.static final StringNumber of records to insert per batch when inserting data.static final Stringbzip2 file compression.static final StringFor each target column specified, applies the column-property-bound format to the source data loaded into that column.static final StringSpecifies a comma-delimited list of columns from the source data to load.static final StringSpecifies a comma-delimited list of columns from the source data to skip.static final StringSource data compression type.static final StringName of an existing external data source from which data file(s) specified infilepathswill be loaded.static final StringSpecifies the default format to be applied to source data loaded into columns with the corresponding column property.static final StringDelimited text file format; e.g., CSV, TSV, PSV, etc.static final StringA single worker process on each node loads all files that are available to it.static final StringThe head node coordinates loading data by worker processes across all nodes from shared files available to all workers.static final StringDoes not load data, but walks through the source data and determines the number of valid records, taking into account the current mode ofERROR_HANDLING.static final Stringstatic final StringApplies only when upserting (when update_on_existing_pk is true).static final StringSpecifies how errors should be handled upon insertion.static final StringReject new records when primary keys match existing records.static final StringSpecifies the type of the file(s) whose records will be inserted.static final StringSpecifies how to handle nested columns.static final StringRun a type inference on the source data (if needed) and ingest.static final StringComma separated list of gdal conf options, for the specific requests: key=value.static final StringEsri/GDB file format.static final Stringgzip file compression.static final StringThe head node loads all data.static final StringMalformed records are skipped.static final StringSpecifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled whenUPDATE_ON_EXISTING_PKisFALSE).static final StringWhether to do a full load, dry run, or perform a type inference on the source data.static final StringJSON file format.static final StringNumber of Kafka consumer threads per rank (valid range 1-6).static final StringThe group id to be used when consuming data from a Kafka topic (valid only for Kafka datasource subscriptions).static final StringPolicy to determine whether the Kafka data consumption starts either at earliest offset or latest offset.static final StringEnable optimistic ingestion where Kafka topic offsets and table data are committed independently to achieve parallelism.static final StringSets the Kafka subscription lifespan (in minutes).static final StringMaximum time to collect Kafka messages before type inferencing on the set of them.static final Stringstatic final StringGeo files layer(s) name(s): comma separated.static final StringScheme for distributing the extraction and loading of data from the source data file(s).static final StringApply an offset to Avro local timestamp columns.static final StringMax records to skip due to schema related errors, before failing.static final StringLimit the number of records to load in this request: if this number is larger thanBATCH_SIZE, then the number of records loaded will be limited to the next whole number ofBATCH_SIZE(per working thread).static final StringSpecifies a comma-delimited list of column names to be used as the source-data column names.static final StringNo compression.static final StringNumber of tasks for reading file per rank.static final StringApache Parquet file format.static final StringRecords with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.static final StringIfTRUE, the number of seconds between attempts to load external files into the table.static final StringComma separated list of column names to set as primary keys, when not specified in the type.static final StringConfluent Schema registry connection timeout (in secs).static final StringConfluent Schema registry connection timeout (in secs).static final StringMax records to skip due to SR connection failures, before failing.static final StringName of the Avro schema in the schema registry to use when reading Avro records.static final StringShapeFile file format.static final StringComma separated list of column names to set as shard keys, when not specified in the type.static final StringInsert all records into a single table.static final StringSkip a number of lines from the beginning of the file.static final StringScans data and picks the widest possible column types so that ‘all’ values will fit with minimum data scanned.static final StringStarting offsets by partition to fetch from kafka.static final StringContinuously poll the data source to check for new data and load it into the table.static final StringInsertion scheme to use when inserting records from multiple shapefiles.static final StringInsert records from each file into a new table corresponding to that file.static final StringSpecifies the character string that should be interpreted as a comment line prefix in the source data.static final StringSpecifies the character delimiting field values in the source data and field names in the header (if present).static final StringSpecifies the character that is used to escape other characters in the source data.static final StringIndicates whether the source data contains a header row.static final StringSpecifies the delimiter for column properties in the header row (if present).static final StringSpecifies the character string that should be interpreted as a null value in the source data.static final StringSpecifies the character that should be interpreted as a field value quoting character in the source data.static final StringAdd ‘text_search’ property to internally inferenced string columns.static final StringSet the minimum column size for strings to apply the ‘text_search’ property to.static final StringComma-separated expressions, one per target table column.static final StringIf set toTRUE, remove leading or trailing space from fields.static final StringUpsert new records when primary keys match existing records.static final StringIf set toTRUE, truncate string values that are longer than the column’s type size.static final Stringstatic final Stringstatic final StringOptimize type inferencing for either speed or accuracy.static final StringInfer the type of the source data and return, without ingesting any data.static final StringSpecifies the record collision policy for inserting into a table with a primary key.
Field Details
BAD_RECORD_TABLE_NAME
Name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). WhenERROR_HANDLINGisABORT, bad records table is not populated.See Also:BAD_RECORD_TABLE_LIMIT_PER_INPUT
For subscriptions, a positive integer indicating the maximum number of records that can be written to the bad-record-table per file/payload. Default value will beBAD_RECORD_TABLE_LIMITand total size of the table per rank is limited toBAD_RECORD_TABLE_LIMIT.See Also:COLUMN_FORMATS
For each target column specified, applies the column-property-bound format to the source data loaded into that column. Each column format will contain a mapping of one or more of its column properties to an appropriate format for each property. Currently supported column properties include date, time, and datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ’ “order_date” : “date” : “%Y.%m.%d” , “order_time” : “time” : “%H:%M:%S” ’.See
DEFAULT_COLUMN_FORMATSfor valid format syntax.See Also:COLUMNS_TO_LOAD
Specifies a comma-delimited list of columns from the source data to load. If more than one file is being loaded, this list applies to all files.Column numbers can be specified discretely or as a range. For example, a value of ‘5,7,1..3’ will insert values from the fifth column in the source data into the first column in the target table, from the seventh column in the source data into the second column in the target table, and from the first through third columns in the source data into the third through fifth columns in the target table.
If the source data contains a header, column names matching the file header names may be provided instead of column numbers. If the target table doesn’t exist, the table will be created with the columns in this order. If the target table does exist with columns in a different order than the source data, this list can be used to match the order of the target table. For example, a value of ‘C, B, A’ will create a three column table with column C, followed by column B, followed by column A; or will insert those fields in that order into a table created with columns in that order. If the target table exists, the column names must match the source data field names for a name-mapping to be successful.
Mutually exclusive with
COLUMNS_TO_SKIP.See Also:COLUMNS_TO_SKIP
Specifies a comma-delimited list of columns from the source data to skip. Mutually exclusive withCOLUMNS_TO_LOAD.See Also:DEFAULT_COLUMN_FORMATS
Specifies the default format to be applied to source data loaded into columns with the corresponding column property. Currently supported column properties include date, time, and datetime. This default column-property-bound format can be overridden by specifying a column property and format for a given target column inCOLUMN_FORMATS. For each specified annotation, the format will apply to all columns with that annotation unless a customCOLUMN_FORMATSfor that annotation is specified.The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ’ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” ’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).
Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘“datetime” : “%m/%d/%Y %H:%M:%S” ’ would be used to interpret text as “05/04/2000 12:12:11”
See Also:ERROR_HANDLING
Specifies how errors should be handled upon insertion. Supported values:PERMISSIVE: Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.IGNORE_BAD_RECORDS: Malformed records are skipped.ABORT: Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.
ABORT.See Also:FILE_TYPE
Specifies the type of the file(s) whose records will be inserted. Supported values:AVRO: Avro file format.DELIMITED_TEXT: Delimited text file format; e.g., CSV, TSV, PSV, etc.GDB: Esri/GDB file format.JSON: JSON file format.PARQUET: Apache Parquet file format.SHAPEFILE: ShapeFile file format.
DELIMITED_TEXT.See Also:IGNORE_EXISTING_PK
Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled whenUPDATE_ON_EXISTING_PKisFALSE). If set toTRUE, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. IfFALSE, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined byERROR_HANDLING. If the specified table does not have a primary key or if upsert mode is in effect (UPDATE_ON_EXISTING_PKisTRUE), then this option has no effect. Supported values:TRUE: Ignore new records whose primary key values collide with those of existing records.FALSE: Treat as errors any new records whose primary key values collide with those of existing records.
FALSE.See Also:INGESTION_MODE
Whether to do a full load, dry run, or perform a type inference on the source data. Supported values:FULL: Run a type inference on the source data (if needed) and ingest.DRY_RUN: Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode ofERROR_HANDLING.TYPE_INFERENCE_ONLY: Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.
FULL.See Also:DRY_RUN
Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode ofERROR_HANDLING.See Also:LOADING_MODE
Scheme for distributing the extraction and loading of data from the source data file(s). This option applies only when loading files that are local to the database. Supported values:HEAD: The head node loads all data. All files must be available to the head node.DISTRIBUTED_SHARED: The head node coordinates loading data by worker processes across all nodes from shared files available to all workers. NOTE: Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.DISTRIBUTED_LOCAL: A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data). NOTE: If the target table doesn’t exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail. If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.
HEAD.See Also:DISTRIBUTED_SHARED
The head node coordinates loading data by worker processes across all nodes from shared files available to all workers.NOTE:
Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.
See Also:DISTRIBUTED_LOCAL
A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data).NOTE:
If the target table doesn’t exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.
If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.
See Also:MAX_RECORDS_TO_LOAD
Limit the number of records to load in this request: if this number is larger thanBATCH_SIZE, then the number of records loaded will be limited to the next whole number ofBATCH_SIZE(per working thread).See Also:NAME_COLUMNS_FROM_FILE
Specifies a comma-delimited list of column names to be used as the source-data column names. If the file has a header row (i.e.,TEXT_HAS_HEADERisTRUE), these names override the file’s header names. If the file has no header row, these names are used as the source-data column names. Either way, the i-th name in this list applies to the i-th column in the file, enabling name-based matching against the target table’s columns (and use withCOLUMNS_TO_LOAD/COLUMNS_TO_SKIP).See Also:POLL_INTERVAL
IfTRUE, the number of seconds between attempts to load external files into the table. If zero, polling will be continuous as long as data is found. If no data is found, the interval will steadily increase to a maximum of 60 seconds. The default value is ‘0’.See Also:TABLE_INSERT_MODE
Insertion scheme to use when inserting records from multiple shapefiles. Supported values:SINGLE: Insert all records into a single table.TABLE_PER_FILE: Insert records from each file into a new table corresponding to that file.
SINGLE.See Also:TEXT_COMMENT_STRING
Specifies the character string that should be interpreted as a comment line prefix in the source data. All lines in the data starting with the provided string are ignored.For
DELIMITED_TEXTFILE_TYPEonly. The default value is ’#’.See Also:TEXT_DELIMITER
Specifies the character delimiting field values in the source data and field names in the header (if present).For
DELIMITED_TEXTFILE_TYPEonly. The default value is ’,’.See Also:TEXT_ESCAPE_CHARACTER
Specifies the character that is used to escape other characters in the source data.An ‘a’, ‘b’, ‘f’, ‘n’, ‘r’, ‘t’, or ‘v’ preceded by an escape character will be interpreted as the ASCII bell, backspace, form feed, line feed, carriage return, horizontal tab, and vertical tab, respectively. For example, the escape character followed by an ‘n’ will be interpreted as a newline within a field value.
The escape character can also be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.
For
DELIMITED_TEXTFILE_TYPEonly.See Also:TEXT_HEADER_PROPERTY_DELIMITER
Specifies the delimiter for column properties in the header row (if present). Cannot be set to same value asTEXT_DELIMITER.For
DELIMITED_TEXTFILE_TYPEonly. The default value is ’|’.See Also:TEXT_NULL_STRING
Specifies the character string that should be interpreted as a null value in the source data.For
DELIMITED_TEXTFILE_TYPEonly. The default value is ‘\N’.See Also:TEXT_QUOTE_CHARACTER
Specifies the character that should be interpreted as a field value quoting character in the source data. The character must appear at beginning and end of field value to take effect. Delimiters within quoted fields are treated as literals and not delimiters. Within a quoted field, two consecutive quote characters will be interpreted as a single literal quote character, effectively escaping it. To not have a quote character, specify an empty string.For
DELIMITED_TEXTFILE_TYPEonly. The default value is ’”’.See Also:TEXT_SEARCH_COLUMNS
Add ‘text_search’ property to internally inferenced string columns. Comma separated list of column names or ’*’ for all columns. To add ‘text_search’ property only to string columns greater than or equal to a minimum size, also set theTEXT_SEARCH_MIN_COLUMN_LENGTHSee Also:TEXT_SEARCH_MIN_COLUMN_LENGTH
Set the minimum column size for strings to apply the ‘text_search’ property to. Used only whenTEXT_SEARCH_COLUMNShas a value.See Also:TYPE_INFERENCE_MODE
Optimize type inferencing for either speed or accuracy. Supported values:ACCURACY: Scans data to get exactly-typed and sized columns for all data scanned.SPEED: Scans data and picks the widest possible column types so that ‘all’ values will fit with minimum data scanned.
ACCURACY.See Also:ENABLE_INPLACE_UPDATES
Applies only when upserting (when update_on_existing_pk is true). If set to true (the default), an existing record matched by primary key is modified in place. If set to false, the matched record is updated by deleting it and inserting a replacement (delete and insert), which prevents the change from being reflected in dependent materialized views until they are refreshed. Supported values:The default value isTRUE.See Also:UPDATE_ON_EXISTING_PK
Specifies the record collision policy for inserting into a table with a primary key. If set toTRUE, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be ‘upserted’). If set toFALSE, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined byIGNORE_EXISTING_PKandERROR_HANDLING. If the specified table does not have a primary key, then this option has no effect. Supported values:TRUE: Upsert new records when primary keys match existing records.FALSE: Reject new records when primary keys match existing records.
FALSE.See Also:TRANSFORMATIONS
Comma-separated expressions, one per target table column. Each expression is evaluated per record. Empty entries (two consecutive commas) mean no transformation for that column — the value is resolved from the input record, table default, NULL, or an error. Expressions may reference input columns by name or by position (2 for the second, etc.). The default value is ”.See Also: