public static final class InsertRecordsFromPayloadRequest.Options extends Object
AVRO_HEADER_BYTES
: Optional number of bytes to skip when reading an
avro record.
AVRO_NUM_RECORDS
: Optional number of avro records, if data includes
only records.
AVRO_SCHEMA
: Optional string representing avro schema, for insert
records in avro format, that does not include is schema.
AVRO_SCHEMALESS
: When user provides 'avro_schema', avro data is assumed
to be schemaless, unless specified. Default is 'true' when given
avro_schema. Igonred when avro_schema is not given.
Supported values:
BAD_RECORD_TABLE_NAME
: Optional name of a table to which records that
were rejected are written. The bad-record-table has the following
columns: line_number (long), line_rejected (string), error_message
(string).
BAD_RECORD_TABLE_LIMIT
: A positive integer indicating the maximum
number of records that can be written to the bad-record-table.
Default value is 10000
BAD_RECORD_TABLE_LIMIT_PER_INPUT
: For subscriptions: A positive integer
indicating the maximum number of records that can be written to the
bad-record-table per file/payload. Default value will be
'bad_record_table_limit' and total size of the table per rank is limited
to 'bad_record_table_limit'
BATCH_SIZE
: Internal tuning parameter--number of records per batch when
inserting data.
COLUMN_FORMATS
: For each target column specified, applies the
column-property-bound format to the source data
loaded into that column. Each column format will contain a mapping of
one or more of its column
properties to an appropriate format for each property. Currently
supported column properties
include date, time, & datetime. The parameter value must be formatted as
a JSON string of maps of
column names to maps of column properties to their corresponding column
formats, e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time" : { "time" :
"%H:%M:%S" } }'.
See default_column_formats
for valid format syntax.
COLUMNS_TO_LOAD
: Specifies a comma-delimited list of columns from the
source data to
load. If more than one file is being loaded, this list applies to all
files.
Column numbers can be specified discretely or as a range. For example, a value of '5,7,1..3' will insert values from the fifth column in the source data into the first column in the target table, from the seventh column in the source data into the second column in the target table, and from the first through third columns in the source data into the third through fifth columns in the target table.
If the source data contains a header, column names matching the file header names may be provided instead of column numbers. If the target table doesn't exist, the table will be created with the columns in this order. If the target table does exist with columns in a different order than the source data, this list can be used to match the order of the target table. For example, a value of 'C, B, A' will create a three column table with column C, followed by column B, followed by column A; or will insert those fields in that order into a table created with columns in that order. If the target table exists, the column names must match the source data field names for a name-mapping to be successful.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Specifies a comma-delimited list of columns from the
source data to
skip. Mutually exclusive with columns_to_load
.
COMPRESSION_TYPE
: Optional: payload compression type
Supported values:
NONE
:
Uncompressed
AUTO
:
Default. Auto detect compression type
GZIP
:
gzip file compression.
BZIP2
:
bzip2 file compression.
AUTO
.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to be applied to
source data loaded
into columns with the corresponding column property. Currently
supported column properties include
date, time, & datetime. This default column-property-bound format can
be overridden by specifying a
column property & format for a given target column in column_formats
. For
each specified annotation, the format will apply to all columns with
that annotation unless a custom
column_formats
for that annotation is specified.
The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., '{ "date" : "%Y.%m.%d", "time" : "%H:%M:%S" }'. Column formats are specified as a string of control characters and plain text. The supported control characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's', which follow the Linux 'strptime()' specification, as well as 's', which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y', 'm', and 'd' control characters. Formats for the 'time' annotation must include the 'H', 'M', and either 'S' or 's' (but not both) control characters. Formats for the 'datetime' annotation meet both the 'date' and 'time' control character requirements. For example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }' would be used to interpret text as "05/04/2000 12:12:11"
ERROR_HANDLING
: Specifies how errors should be handled upon insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated with nulls if
possible; otherwise, the malformed records are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
:
Stops current insertion and aborts entire operation when an error is
encountered. Primary key collisions are considered abortable errors in
this mode.
ABORT
.
FILE_TYPE
: Specifies the type of the file(s) whose records will be
inserted.
Supported values:
AVRO
:
Avro file format
DELIMITED_TEXT
: Delimited text file format; e.g., CSV, TSV, PSV, etc.
GDB
:
Esri/GDB file format
JSON
:
Json file format
PARQUET
: Apache Parquet file format
SHAPEFILE
: ShapeFile file format
DELIMITED_TEXT
.
GDAL_CONFIGURATION_OPTIONS
: Comma separated list of gdal conf options,
for the specific requets: key=value. The default value is ''.
IGNORE_EXISTING_PK
: Specifies the record collision error-suppression
policy for
inserting into a table with a primary key, only used when
not in upsert mode (upsert mode is disabled when update_on_existing_pk
is
false
). If set to
true
, any record being inserted that is rejected
for having primary key values that match those of an existing table
record will be ignored with no
error generated. If false
, the rejection of any
record for having primary key values matching an existing record will
result in an error being
reported, as determined by error_handling
. If the specified
table does not
have a primary key or if upsert mode is in effect (update_on_existing_pk
is
true
), then this option has no effect.
Supported values:
TRUE
:
Ignore new records whose primary key values collide with those of
existing records
FALSE
:
Treat as errors any new records whose primary key values collide with
those of existing records
FALSE
.
INGESTION_MODE
: Whether to do a full load, dry run, or perform a type
inference on the source data.
Supported values:
FULL
:
Run a type inference on the source data (if needed) and ingest
DRY_RUN
: Does not load data, but walks through the source data and
determines the number of valid records, taking into account the current
mode of error_handling
.
TYPE_INFERENCE_ONLY
: Infer the type of the source data and return,
without ingesting any data. The inferred type is returned in the
response.
FULL
.
LAYER
:
Optional: geo files layer(s) name(s): comma separated. The default
value is ''.
LOADING_MODE
: Scheme for distributing the extraction and loading of
data from the source data file(s). This option applies only when loading
files that are local to the database
Supported values:
HEAD
:
The head node loads all data. All files must be available to the head
node.
DISTRIBUTED_SHARED
: The head node coordinates loading data by worker
processes across all nodes from shared files available to all workers.
NOTE:
Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.
DISTRIBUTED_LOCAL
: A single worker process on each node loads all files
that are available to it. This option works best when each worker loads
files from its own file
system, to maximize performance. In order to avoid data duplication,
either each worker performing
the load needs to have visibility to a set of files unique to it (no
file is visible to more than
one node) or the target table needs to have a primary key (which will
allow the worker to
automatically deduplicate data).
NOTE:
If the target table doesn't exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.
If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.
HEAD
.
LOCAL_TIME_OFFSET
: For Avro local timestamp columns
MAX_RECORDS_TO_LOAD
: Limit the number of records to load in this
request: If this number is larger than a batch_size, then the number of
records loaded will be limited to the next whole number of batch_size
(per working thread). The default value is ''.
NUM_TASKS_PER_RANK
: Optional: number of tasks for reading file per
rank. Default will be external_file_reader_num_tasks
POLL_INTERVAL
: If true
, the number of seconds between attempts
to load external files into the table. If zero, polling will be
continuous as long as data is found. If no data is found, the interval
will steadily increase to a maximum of 60 seconds.
PRIMARY_KEYS
: Optional: comma separated list of column names, to set as
primary keys, when not specified in the type. The default value is ''.
SCHEMA_REGISTRY_SCHEMA_ID
SCHEMA_REGISTRY_SCHEMA_NAME
SCHEMA_REGISTRY_SCHEMA_VERSION
SHARD_KEYS
: Optional: comma separated list of column names, to set as
primary keys, when not specified in the type. The default value is ''.
SKIP_LINES
: Skip number of lines from begining of file.
SUBSCRIBE
: Continuously poll the data source to check for new data and
load it into the table.
Supported values:
The default value is FALSE
.
TABLE_INSERT_MODE
: Optional: table_insert_mode. When inserting records
from multiple files: if table_per_file then insert from each file into a
new table. Currently supported only for shapefiles.
Supported values:
The default value is SINGLE
.
TEXT_COMMENT_STRING
: Specifies the character string that should be
interpreted as a comment line
prefix in the source data. All lines in the data starting with the
provided string are ignored.
For delimited_text
file_type
only. The default value is
'#'.
TEXT_DELIMITER
: Specifies the character delimiting field values in the
source data
and field names in the header (if present).
For delimited_text
file_type
only. The default value is
','.
TEXT_ESCAPE_CHARACTER
: Specifies the character that is used to escape
other characters in
the source data.
An 'a', 'b', 'f', 'n', 'r', 't', or 'v' preceded by an escape character will be interpreted as the ASCII bell, backspace, form feed, line feed, carriage return, horizontal tab, & vertical tab, respectively. For example, the escape character followed by an 'n' will be interpreted as a newline within a field value.
The escape character can also be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.
For delimited_text
file_type
only.
TEXT_HAS_HEADER
: Indicates whether the source data contains a header
row.
For delimited_text
file_type
only.
Supported values:
TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: Specifies the delimiter for
column properties in the header row (if
present). Cannot be set to same value as text_delimiter
.
For delimited_text
file_type
only. The default value is
'|'.
TEXT_NULL_STRING
: Specifies the character string that should be
interpreted as a null
value in the source data.
For delimited_text
file_type
only. The default value is
'\\N'.
TEXT_QUOTE_CHARACTER
: Specifies the character that should be
interpreted as a field value
quoting character in the source data. The character must appear at
beginning and end of field value
to take effect. Delimiters within quoted fields are treated as literals
and not delimiters. Within
a quoted field, two consecutive quote characters will be interpreted as
a single literal quote
character, effectively escaping it. To not have a quote character,
specify an empty string.
For delimited_text
file_type
only. The default value is
'"'.
TEXT_SEARCH_COLUMNS
: Add 'text_search' property to internally
inferenced string columns. Comma seperated list of column names or '*'
for all columns. To add text_search property only to string columns of
minimum size, set also the option 'text_search_min_column_length'
TEXT_SEARCH_MIN_COLUMN_LENGTH
: Set minimum column size. Used only when
'text_search_columns' has a value.
TRUNCATE_STRINGS
: If set to true
, truncate string values that
are longer than the column's type size.
Supported values:
The default value is FALSE
.
TRUNCATE_TABLE
: If set to true
, truncates the table specified
by tableName
prior to loading the file(s).
Supported values:
The default value is FALSE
.
TYPE_INFERENCE_MODE
: optimize type inference for:
Supported values:
ACCURACY
: Scans data to get exactly-typed & sized columns for all data
scanned.
SPEED
:
Scans data and picks the widest possible column types so that 'all'
values will fit with minimum data scanned
SPEED
.
UPDATE_ON_EXISTING_PK
: Specifies the record collision policy for
inserting into a table
with a primary key. If set to
true
, any existing table record with primary
key values that match those of a record being inserted will be replaced
by that new record (the new
data will be "upserted"). If set to false
,
any existing table record with primary key values that match those of a
record being inserted will
remain unchanged, while the new record will be rejected and the error
handled as determined by
ignore_existing_pk
& error_handling
. If the
specified table does not have a primary key, then this option has no
effect.
Supported values:
TRUE
:
Upsert new records when primary keys match existing records
FALSE
:
Reject new records when primary keys match existing records
FALSE
.
Map
.
A set of string constants for the parameter options
.Modifier and Type | Field and Description | ||
---|---|---|---|
static String |
ABORT
Stops current insertion and aborts entire operation when an error is
encountered.
|
||
static String |
ACCURACY
Scans data to get exactly-typed & sized columns for all data
scanned.
|
||
static String |
AUTO
Default.
|
||
static String |
AVRO
Avro file format
|
||
static String |
AVRO_HEADER_BYTES
Optional number of bytes to skip when reading an avro record.
|
||
static String |
AVRO_NUM_RECORDS
Optional number of avro records, if data includes only records.
|
||
static String |
AVRO_SCHEMA
Optional string representing avro schema, for insert records in avro
format, that does not include is schema.
|
||
static String |
AVRO_SCHEMALESS
When user provides 'avro_schema', avro data is assumed to be
schemaless, unless specified.
|
||
static String |
BAD_RECORD_TABLE_LIMIT
A positive integer indicating the maximum number of records that can
be written to the bad-record-table.
|
||
static String |
BAD_RECORD_TABLE_LIMIT_PER_INPUT
For subscriptions: A positive integer indicating the maximum number
of records that can be written to the bad-record-table per
file/payload.
|
||
static String |
BAD_RECORD_TABLE_NAME
Optional name of a table to which records that were rejected are
written.
|
||
static String |
BATCH_SIZE
Internal tuning parameter--number of records per batch when
inserting data.
|
||
static String |
BZIP2
bzip2 file compression.
|
||
static String |
COLUMN_FORMATS
For each target column specified, applies the column-property-bound
format to the source data
loaded into that column.
|
||
static String |
COLUMNS_TO_LOAD
Specifies a comma-delimited list of columns from the source data to
load.
|
||
static String |
COLUMNS_TO_SKIP
Specifies a comma-delimited list of columns from the source data to
skip.
|
||
static String |
COMPRESSION_TYPE
|
||
static String |
DEFAULT_COLUMN_FORMATS
Specifies the default format to be applied to source data loaded
into columns with the corresponding column property.
|
||
static String |
DELIMITED_TEXT
Delimited text file format; e.g., CSV, TSV, PSV, etc.
|
||
static String |
DISTRIBUTED_LOCAL
A single worker process on each node loads all files
that are available to it.
|
||
static String |
DISTRIBUTED_SHARED
The head node coordinates loading data by worker
processes across all nodes from shared files available to all
workers.
|
||
static String |
DRY_RUN
Does not load data, but walks through the source data and determines
the number of valid records, taking into account the current mode of
error_handling . |
||
static String |
ERROR_HANDLING
Specifies how errors should be handled upon insertion.
|
||
static String |
FALSE
Reject new records when primary keys match existing records
|
||
static String |
FILE_TYPE
Specifies the type of the file(s) whose records will be inserted.
|
||
static String |
FULL
Run a type inference on the source data (if needed) and ingest
|
||
static String |
GDAL_CONFIGURATION_OPTIONS
Comma separated list of gdal conf options, for the specific requets:
key=value.
|
||
static String |
GDB
Esri/GDB file format
|
||
static String |
GZIP
gzip file compression.
|
||
static String |
HEAD
The head node loads all data.
|
||
static String |
IGNORE_BAD_RECORDS
Malformed records are skipped.
|
||
static String |
IGNORE_EXISTING_PK
Specifies the record collision error-suppression policy for
inserting into a table with a
static String INGESTION_MODE
Whether to do a full load, dry run, or perform a type inference on
the source data.
| ||
static String |
JSON
Json file format
|
||
static String |
LAYER
Optional: geo files layer(s) name(s): comma separated.
|
||
static String |
LOADING_MODE
Scheme for distributing the extraction and loading of data from the
source data file(s).
|
||
static String |
LOCAL_TIME_OFFSET
For Avro local timestamp columns
|
||
static String |
MAX_RECORDS_TO_LOAD
Limit the number of records to load in this request: If this number
is larger than a batch_size, then the number of records loaded will
be limited to the next whole number of batch_size (per working
thread).
|
||
static String |
NONE
Uncompressed
|
||
static String |
NUM_TASKS_PER_RANK
Optional: number of tasks for reading file per rank.
|
||
static String |
PARQUET
Apache Parquet file format
|
||
static String |
PERMISSIVE
Records with missing columns are populated with nulls if possible;
otherwise, the malformed records are skipped.
|
||
static String |
POLL_INTERVAL
If
true , the number of seconds between attempts to load
external files into the table. |
||
static String |
PRIMARY_KEYS
Optional: comma separated list of column names, to set as primary
keys, when not specified in the type.
|
||
static String |
SCHEMA_REGISTRY_SCHEMA_ID |
||
static String |
SCHEMA_REGISTRY_SCHEMA_NAME |
||
static String |
SCHEMA_REGISTRY_SCHEMA_VERSION |
||
static String |
SHAPEFILE
ShapeFile file format
|
||
static String |
SHARD_KEYS
Optional: comma separated list of column names, to set as primary
keys, when not specified in the type.
|
||
static String |
SINGLE |
||
static String |
SKIP_LINES
Skip number of lines from begining of file.
|
||
static String |
SPEED
Scans data and picks the widest possible column types so that 'all'
values will fit with minimum data scanned
|
||
static String |
SUBSCRIBE
Continuously poll the data source to check for new data and load it
into the table.
|
||
static String |
TABLE_INSERT_MODE
Optional: table_insert_mode.
|
||
static String |
TABLE_PER_FILE |
||
static String |
TEXT_COMMENT_STRING
Specifies the character string that should be interpreted as a
comment line
prefix in the source data.
|
||
static String |
TEXT_DELIMITER
Specifies the character delimiting field values in the source data
and field names in the header (if present).
|
||
static String |
TEXT_ESCAPE_CHARACTER
Specifies the character that is used to escape other characters in
the source data.
|
||
static String |
TEXT_HAS_HEADER
Indicates whether the source data contains a header row.
|
||
static String |
TEXT_HEADER_PROPERTY_DELIMITER
Specifies the delimiter for
static String TEXT_NULL_STRING
Specifies the character string that should be interpreted as a null
value in the source data.
| ||
static String |
TEXT_QUOTE_CHARACTER
Specifies the character that should be interpreted as a field value
quoting character in the source data.
|
||
static String |
TEXT_SEARCH_COLUMNS
Add 'text_search' property to internally inferenced string columns.
|
||
static String |
TEXT_SEARCH_MIN_COLUMN_LENGTH
Set minimum column size.
|
||
static String |
TRUE
Upsert new records when primary keys match existing records
|
||
static String |
TRUNCATE_STRINGS
If set to
true , truncate string values that are longer than
the column's type size. |
||
static String |
TRUNCATE_TABLE
If set to
true , truncates the table specified by tableName prior to loading the file(s). |
||
static String |
TYPE_INFERENCE_MODE
optimize type inference for:
Supported values:
ACCURACY : Scans data to get exactly-typed & sized columns for all
data scanned. |
||
static String |
TYPE_INFERENCE_ONLY
Infer the type of the source data and return, without ingesting any
data.
|
||
static String |
UPDATE_ON_EXISTING_PK
Specifies the record collision policy for inserting into a table
with a
Method Summary
Copyright © 2024. All rights reserved. |