Skip to main content

Class InsertRecordsFromPayloadRequest.Options

java.lang.Object
com.gpudb.protocol.InsertRecordsFromPayloadRequest.Options

public static final class InsertRecordsFromPayloadRequest.Options extends Object
A set of string constants for the InsertRecordsFromPayloadRequest parameter options.

Optional parameters.

  • Field Details

    • BAD_RECORD_TABLE_NAME

      public static final String BAD_RECORD_TABLE_NAME
      Name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string).
    • BAD_RECORD_TABLE_LIMIT

      public static final String BAD_RECORD_TABLE_LIMIT
      A positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000.
    • BAD_RECORD_TABLE_LIMIT_PER_INPUT

      public static final String BAD_RECORD_TABLE_LIMIT_PER_INPUT
      For subscriptions: A positive integer indicating the maximum number of records that can be written to the bad-record-table per file/payload. Default value will be ‘bad_record_table_limit’ and total size of the table per rank is limited to ‘bad_record_table_limit’.
    • BATCH_SIZE

      public static final String BATCH_SIZE
      Internal tuning parameter—number of records per batch when inserting data.
    • COLUMN_FORMATS

      public static final String COLUMN_FORMATS
      For each target column specified, applies the column-property-bound format to the source data loaded into that column. Each column format will contain a mapping of one or more of its column properties to an appropriate format for each property. Currently supported column properties include date, time, and datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ’ “order_date” : “date” : “%Y.%m.%d” , “order_time” : “time” : “%H:%M:%S” ’.

      See DEFAULT_COLUMN_FORMATS for valid format syntax.

    • COLUMNS_TO_LOAD

      public static final String COLUMNS_TO_LOAD
      Specifies a comma-delimited list of columns from the source data to load. If more than one file is being loaded, this list applies to all files.

      Column numbers can be specified discretely or as a range. For example, a value of ‘5,7,1..3’ will insert values from the fifth column in the source data into the first column in the target table, from the seventh column in the source data into the second column in the target table, and from the first through third columns in the source data into the third through fifth columns in the target table.

      If the source data contains a header, column names matching the file header names may be provided instead of column numbers. If the target table doesn’t exist, the table will be created with the columns in this order. If the target table does exist with columns in a different order than the source data, this list can be used to match the order of the target table. For example, a value of ‘C, B, A’ will create a three column table with column C, followed by column B, followed by column A; or will insert those fields in that order into a table created with columns in that order. If the target table exists, the column names must match the source data field names for a name-mapping to be successful.

      Mutually exclusive with COLUMNS_TO_SKIP.

    • COLUMNS_TO_SKIP

      public static final String COLUMNS_TO_SKIP
      Specifies a comma-delimited list of columns from the source data to skip. Mutually exclusive with COLUMNS_TO_LOAD.
    • COMPRESSION_TYPE

      public static final String COMPRESSION_TYPE
      Payload compression type. Supported values:
      • NONE: Uncompressed.
      • AUTO: Default. Auto detect compression type.
      • GZIP: gzip file compression.
      • BZIP2: bzip2 file compression.
      The default value is AUTO.
    • NONE

      public static final String NONE
      Uncompressed.
    • AUTO

      public static final String AUTO
      Default. Auto detect compression type.
    • GZIP

      public static final String GZIP
      gzip file compression.
    • BZIP2

      public static final String BZIP2
      bzip2 file compression.
    • DEFAULT_COLUMN_FORMATS

      public static final String DEFAULT_COLUMN_FORMATS
      Specifies the default format to be applied to source data loaded into columns with the corresponding column property. Currently supported column properties include date, time, and datetime. This default column-property-bound format can be overridden by specifying a column property and format for a given target column in COLUMN_FORMATS. For each specified annotation, the format will apply to all columns with that annotation unless a custom COLUMN_FORMATS for that annotation is specified.

      The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ’ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” ’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).

      Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘“datetime” : “%m/%d/%Y %H:%M:%S” ’ would be used to interpret text as “05/04/2000 12:12:11”

    • ERROR_HANDLING

      public static final String ERROR_HANDLING
      Specifies how errors should be handled upon insertion. Supported values:
      • PERMISSIVE: Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.
      • IGNORE_BAD_RECORDS: Malformed records are skipped.
      • ABORT: Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.
      The default value is ABORT.
    • PERMISSIVE

      public static final String PERMISSIVE
      Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.
    • IGNORE_BAD_RECORDS

      public static final String IGNORE_BAD_RECORDS
      Malformed records are skipped.
    • ABORT

      public static final String ABORT
      Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.
    • FILE_TYPE

      public static final String FILE_TYPE
      Specifies the type of the file(s) whose records will be inserted. Supported values:
      • AVRO: Avro file format.
      • DELIMITED_TEXT: Delimited text file format; e.g., CSV, TSV, PSV, etc.
      • GDB: Esri/GDB file format.
      • JSON: JSON file format.
      • PARQUET: Apache Parquet file format.
      • SHAPEFILE: ShapeFile file format.
      The default value is DELIMITED_TEXT.
    • AVRO

      public static final String AVRO
      Avro file format.
    • DELIMITED_TEXT

      public static final String DELIMITED_TEXT
      Delimited text file format; e.g., CSV, TSV, PSV, etc.
    • GDB

      public static final String GDB
      Esri/GDB file format.
    • JSON

      public static final String JSON
      JSON file format.
    • PARQUET

      public static final String PARQUET
      Apache Parquet file format.
    • SHAPEFILE

      public static final String SHAPEFILE
      ShapeFile file format.
    • FLATTEN_COLUMNS

      public static final String FLATTEN_COLUMNS
      Specifies how to handle nested columns. Supported values:
      • TRUE: Break up nested columns to multiple columns.
      • FALSE: Treat nested columns as JSON columns instead of flattening.
      The default value is FALSE.
    • TRUE

      public static final String TRUE
      Upsert new records when primary keys match existing records.
    • FALSE

      public static final String FALSE
      Reject new records when primary keys match existing records.
    • GDAL_CONFIGURATION_OPTIONS

      public static final String GDAL_CONFIGURATION_OPTIONS
      Comma separated list of gdal conf options, for the specific requests: key=value. The default value is ”.
    • IGNORE_EXISTING_PK

      public static final String IGNORE_EXISTING_PK
      Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when UPDATE_ON_EXISTING_PK is FALSE). If set to TRUE, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If FALSE, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by ERROR_HANDLING. If the specified table does not have a primary key or if upsert mode is in effect (UPDATE_ON_EXISTING_PK is TRUE), then this option has no effect. Supported values:
      • TRUE: Ignore new records whose primary key values collide with those of existing records.
      • FALSE: Treat as errors any new records whose primary key values collide with those of existing records.
      The default value is FALSE.
    • INGESTION_MODE

      public static final String INGESTION_MODE
      Whether to do a full load, dry run, or perform a type inference on the source data. Supported values:
      • FULL: Run a type inference on the source data (if needed) and ingest.
      • DRY_RUN: Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of ERROR_HANDLING.
      • TYPE_INFERENCE_ONLY: Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.
      The default value is FULL.
    • FULL

      public static final String FULL
      Run a type inference on the source data (if needed) and ingest.
    • DRY_RUN

      public static final String DRY_RUN
      Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of ERROR_HANDLING.
    • TYPE_INFERENCE_ONLY

      public static final String TYPE_INFERENCE_ONLY
      Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.
    • LAYER

      public static final String LAYER
      Geo files layer(s) name(s): comma separated. The default value is ”.
    • LOADING_MODE

      public static final String LOADING_MODE
      Scheme for distributing the extraction and loading of data from the source data file(s). This option applies only when loading files that are local to the database. Supported values:
      • HEAD: The head node loads all data. All files must be available to the head node.
      • DISTRIBUTED_SHARED: The head node coordinates loading data by worker processes across all nodes from shared files available to all workers. NOTE: Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.
      • DISTRIBUTED_LOCAL: A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data). NOTE: If the target table doesn’t exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail. If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.
      The default value is HEAD.
    • DISTRIBUTED_SHARED

      public static final String DISTRIBUTED_SHARED
      The head node coordinates loading data by worker processes across all nodes from shared files available to all workers.

      NOTE:

      Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.

    • DISTRIBUTED_LOCAL

      public static final String DISTRIBUTED_LOCAL
      A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data).

      NOTE:

      If the target table doesn’t exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.

      If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.

    • LOCAL_TIME_OFFSET

      public static final String LOCAL_TIME_OFFSET
      For Avro local timestamp columns.
    • MAX_RECORDS_TO_LOAD

      public static final String MAX_RECORDS_TO_LOAD
      Limit the number of records to load in this request: If this number is larger than a batch_size, then the number of records loaded will be limited to the next whole number of batch_size (per working thread). The default value is ”.
    • NAME_COLUMNS_FROM_FILE

      public static final String NAME_COLUMNS_FROM_FILE
      Specifies a comma-delimited list of column names to be used as the source-data column names. If the payload has a header row (i.e., TEXT_HAS_HEADER is TRUE), these names override the payload’s header names. If the payload has no header row, these names are used as the source-data column names. Either way, the i-th name in this list applies to the i-th column in the payload, enabling name-based matching against the target table’s columns (and use with COLUMNS_TO_LOAD / COLUMNS_TO_SKIP).
    • NUM_TASKS_PER_RANK

      public static final String NUM_TASKS_PER_RANK
      Number of tasks for reading file per rank. Default will be external_file_reader_num_tasks.
    • POLL_INTERVAL

      public static final String POLL_INTERVAL
      If TRUE, the number of seconds between attempts to load external files into the table. If zero, polling will be continuous as long as data is found. If no data is found, the interval will steadily increase to a maximum of 60 seconds.
    • PRIMARY_KEYS

      public static final String PRIMARY_KEYS
      Comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ”.
    • SCHEMA_REGISTRY_CONNECTION_RETRIES

      public static final String SCHEMA_REGISTRY_CONNECTION_RETRIES
      Confluent Schema registry connection timeout (in secs).
    • SCHEMA_REGISTRY_CONNECTION_TIMEOUT

      public static final String SCHEMA_REGISTRY_CONNECTION_TIMEOUT
      Confluent Schema registry connection timeout (in secs).
    • SCHEMA_REGISTRY_MAX_CONSECUTIVE_CONNECTION_FAILURES

      public static final String SCHEMA_REGISTRY_MAX_CONSECUTIVE_CONNECTION_FAILURES
      Max records to skip due to SR connection failures, before failing.
    • MAX_CONSECUTIVE_INVALID_SCHEMA_FAILURE

      public static final String MAX_CONSECUTIVE_INVALID_SCHEMA_FAILURE
      Max records to skip due to schema related errors, before failing.
    • SCHEMA_REGISTRY_SCHEMA_NAME

      public static final String SCHEMA_REGISTRY_SCHEMA_NAME
      Name of the Avro schema in the schema registry to use when reading Avro records.
    • SHARD_KEYS

      public static final String SHARD_KEYS
      Comma separated list of column names, to set as shard keys, when not specified in the type. The default value is ”.
    • SKIP_LINES

      public static final String SKIP_LINES
      Skip a number of lines from the beginning of the file.
    • SUBSCRIBE

      public static final String SUBSCRIBE
      Continuously poll the data source to check for new data and load it into the table. Supported values:The default value is FALSE.
    • TABLE_INSERT_MODE

      public static final String TABLE_INSERT_MODE
      When inserting records from multiple files: if TABLE_PER_FILE, then insert from each file into a new table. Currently supported only for shapefiles. Supported values:The default value is SINGLE.
    • SINGLE

      public static final String SINGLE
    • TABLE_PER_FILE

      public static final String TABLE_PER_FILE
    • TEXT_COMMENT_STRING

      public static final String TEXT_COMMENT_STRING
      Specifies the character string that should be interpreted as a comment line prefix in the source data. All lines in the data starting with the provided string are ignored.

      For DELIMITED_TEXT FILE_TYPE only. The default value is ’#’.

    • TEXT_DELIMITER

      public static final String TEXT_DELIMITER
      Specifies the character delimiting field values in the source data and field names in the header (if present).

      For DELIMITED_TEXT FILE_TYPE only. The default value is ’,’.

    • TEXT_ESCAPE_CHARACTER

      public static final String TEXT_ESCAPE_CHARACTER
      Specifies the character that is used to escape other characters in the source data.

      An ‘a’, ‘b’, ‘f’, ‘n’, ‘r’, ‘t’, or ‘v’ preceded by an escape character will be interpreted as the ASCII bell, backspace, form feed, line feed, carriage return, horizontal tab, and vertical tab, respectively. For example, the escape character followed by an ‘n’ will be interpreted as a newline within a field value.

      The escape character can also be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

      For DELIMITED_TEXT FILE_TYPE only.

    • TEXT_HAS_HEADER

      public static final String TEXT_HAS_HEADER
      Indicates whether the source data contains a header row.

      For DELIMITED_TEXT FILE_TYPE only. Supported values:

      The default value is TRUE.
    • TEXT_HEADER_PROPERTY_DELIMITER

      public static final String TEXT_HEADER_PROPERTY_DELIMITER
      Specifies the delimiter for column properties in the header row (if present). Cannot be set to same value as TEXT_DELIMITER.

      For DELIMITED_TEXT FILE_TYPE only. The default value is ’|’.

    • TEXT_NULL_STRING

      public static final String TEXT_NULL_STRING
      Specifies the character string that should be interpreted as a null value in the source data.

      For DELIMITED_TEXT FILE_TYPE only. The default value is ‘\N’.

    • TEXT_QUOTE_CHARACTER

      public static final String TEXT_QUOTE_CHARACTER
      Specifies the character that should be interpreted as a field value quoting character in the source data. The character must appear at beginning and end of field value to take effect. Delimiters within quoted fields are treated as literals and not delimiters. Within a quoted field, two consecutive quote characters will be interpreted as a single literal quote character, effectively escaping it. To not have a quote character, specify an empty string.

      For DELIMITED_TEXT FILE_TYPE only. The default value is ’”’.

    • TEXT_SEARCH_COLUMNS

      public static final String TEXT_SEARCH_COLUMNS
      Add ‘text_search’ property to internally inferenced string columns. Comma separated list of column names or ’*’ for all columns. To add text_search property only to string columns of minimum size, set also the option ‘text_search_min_column_length’.
    • TEXT_SEARCH_MIN_COLUMN_LENGTH

      public static final String TEXT_SEARCH_MIN_COLUMN_LENGTH
      Set minimum column size. Used only when ‘text_search_columns’ has a value.
    • TRIM_SPACE

      public static final String TRIM_SPACE
      If set to TRUE, remove leading or trailing space from fields. Supported values:The default value is FALSE.
    • TRUNCATE_STRINGS

      public static final String TRUNCATE_STRINGS
      If set to TRUE, truncate string values that are longer than the column’s type size. Supported values:The default value is FALSE.
    • TRUNCATE_TABLE

      public static final String TRUNCATE_TABLE
      If set to TRUE, truncates the table specified by tableName prior to loading the file(s). Supported values:The default value is FALSE.
    • TYPE_INFERENCE_MAX_RECORDS_READ

      public static final String TYPE_INFERENCE_MAX_RECORDS_READ
      The default value is ”.
    • TYPE_INFERENCE_MODE

      public static final String TYPE_INFERENCE_MODE
      Optimize type inference mode. Supported values:
      • ACCURACY: Scans data to get exactly-typed and sized columns for all data scanned.
      • SPEED: Scans data and picks the widest possible column types so that ‘all’ values will fit with minimum data scanned.
      The default value is ACCURACY.
    • ACCURACY

      public static final String ACCURACY
      Scans data to get exactly-typed and sized columns for all data scanned.
    • SPEED

      public static final String SPEED
      Scans data and picks the widest possible column types so that ‘all’ values will fit with minimum data scanned.
    • ENABLE_INPLACE_UPDATES

      public static final String ENABLE_INPLACE_UPDATES
      Applies only when upserting (when update_on_existing_pk is true). If set to true (the default), an existing record matched by primary key is modified in place. If set to false, the matched record is updated by deleting it and inserting a replacement (delete and insert), which prevents the change from being reflected in dependent materialized views until they are refreshed. Supported values:The default value is TRUE.
    • UPDATE_ON_EXISTING_PK

      public static final String UPDATE_ON_EXISTING_PK
      Specifies the record collision policy for inserting into a table with a primary key. If set to TRUE, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be “upserted”). If set to FALSE, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by IGNORE_EXISTING_PK and ERROR_HANDLING. If the specified table does not have a primary key, then this option has no effect. Supported values:
      • TRUE: Upsert new records when primary keys match existing records.
      • FALSE: Reject new records when primary keys match existing records.
      The default value is FALSE.
    • TRANSFORMATIONS

      public static final String TRANSFORMATIONS
      Comma-separated expressions, one per target table column. Each expression is evaluated per record. Empty entries (two consecutive commas) mean no transformation for that column — the value is resolved from the input record, table default, NULL, or an error. Expressions may reference input columns by name or by position (1forthefirstinputcolumn,1 for the first input column, 2 for the second, etc.). The default value is ”.