public static final class GPUdbBase.InsertRecordsJsonRequest extends InsertRecordsFromPayloadRequest
InsertRecordsFromPayloadRequest.CreateTableOptions, InsertRecordsFromPayloadRequest.Options
Constructor and Description |
---|
InsertRecordsJsonRequest() |
Modifier and Type | Method and Description |
---|---|
Map<String,String> |
getCreateTableOptions() |
ByteBuffer |
getDataBytes() |
String |
getDataText() |
Map<String,Map<String,String>> |
getModifyColumns() |
Map<String,String> |
getOptions() |
org.apache.avro.Schema |
getSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getTableName() |
InsertRecordsFromPayloadRequest |
setCreateTableOptions(Map<String,String> createTableOptions) |
InsertRecordsFromPayloadRequest |
setDataBytes(ByteBuffer dataBytes) |
InsertRecordsFromPayloadRequest |
setDataText(String dataText) |
InsertRecordsFromPayloadRequest |
setModifyColumns(Map<String,Map<String,String>> modifyColumns) |
InsertRecordsFromPayloadRequest |
setOptions(Map<String,String> options) |
InsertRecordsFromPayloadRequest |
setTableName(String tableName) |
equals, get, getClassSchema, hashCode, put, toString
public String getTableName()
getTableName
in class InsertRecordsFromPayloadRequest
type_id
or the type inferred from the
payload, and the new table name will have to meet standard
table naming criteria.public InsertRecordsFromPayloadRequest setTableName(String tableName)
setTableName
in class InsertRecordsFromPayloadRequest
tableName
- Name of the table into which the data will be
inserted, in
[schema_name.]table_name format, using standard
name resolution rules.
If the table does not exist, the table will be created
using either an existing
type_id
or the type inferred from the
payload, and the new table name will have to meet
standard
table naming criteria.this
to mimic the builder pattern.public String getDataText()
getDataText
in class InsertRecordsFromPayloadRequest
public InsertRecordsFromPayloadRequest setDataText(String dataText)
setDataText
in class InsertRecordsFromPayloadRequest
dataText
- Records formatted as delimited textthis
to mimic the builder pattern.public ByteBuffer getDataBytes()
getDataBytes
in class InsertRecordsFromPayloadRequest
public InsertRecordsFromPayloadRequest setDataBytes(ByteBuffer dataBytes)
setDataBytes
in class InsertRecordsFromPayloadRequest
dataBytes
- Records formatted as binary datathis
to mimic the builder pattern.public Map<String,Map<String,String>> getModifyColumns()
getModifyColumns
in class InsertRecordsFromPayloadRequest
Map
.public InsertRecordsFromPayloadRequest setModifyColumns(Map<String,Map<String,String>> modifyColumns)
setModifyColumns
in class InsertRecordsFromPayloadRequest
modifyColumns
- Not implemented yet. The default value is an
empty Map
.this
to mimic the builder pattern.public Map<String,String> getCreateTableOptions()
getCreateTableOptions
in class InsertRecordsFromPayloadRequest
GPUdb.createTable(CreateTableRequest)
TYPE_ID
: ID of a currently registered type.
The default value is ''.
NO_ERROR_IF_EXISTS
: If true
, prevents an error from
occurring if the table already exists and is of the given type.
If a table with the same ID but a different type exists, it is
still an error.
Supported values:
The default value is FALSE
.
IS_REPLICATED
: Affects the distribution scheme for the table's data. If
true
and the given type has no explicit shard key defined, the table will be replicated. If false
, the table will
be sharded according to the shard key specified
in the given type_id
, or randomly sharded, if no shard key is
specified. Note that a type containing a shard key cannot be
used to create a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [, ...]) [as
foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the format
'source_column references shard_by_column from
target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
SERIES
: Use series partitioning.
PARTITION_KEYS
: Comma-separated list of partition keys, which
are the columns or column expressions by which records will be
assigned to partitions defined by partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list of partition
definitions, whose format depends on the choice of partition_type
. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.
IS_AUTOMATIC_PARTITION
: If true
, a new partition will
be created for values which don't fall into an existing
partition. Currently only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified in tableName
.
CHUNK_SIZE
: Indicates the number of records per chunk to be
used for this table.
IS_RESULT_TABLE
: Indicates whether the table is a memory-only table. A result table cannot
contain columns with store_only or text_search data-handling or that are non-charN strings, and it will not be retained
if the server is restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy for the table and its columns.
Map
.public InsertRecordsFromPayloadRequest setCreateTableOptions(Map<String,String> createTableOptions)
setCreateTableOptions
in class InsertRecordsFromPayloadRequest
createTableOptions
- Options used when creating the target table.
Includes type to use. The other options match
those in GPUdb.createTable(CreateTableRequest)
TYPE_ID
: ID of a currently registered type. The default value is
''.
NO_ERROR_IF_EXISTS
: If true
,
prevents an error from occurring if the table
already exists and is of the given type. If
a table with the same ID but a different type
exists, it is still an error.
Supported values:
The default value is FALSE
.
IS_REPLICATED
: Affects the distribution scheme for the
table's data. If true
and the given
type has no explicit shard key defined, the
table will be replicated. If false
, the table will be sharded according to the
shard key specified in the given type_id
, or randomly sharded, if no
shard key is specified. Note that a type
containing a shard key cannot be used to
create a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [,
...]) [as foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the
format 'source_column references
shard_by_column from
target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
SERIES
: Use series partitioning.
PARTITION_KEYS
: Comma-separated list of
partition keys, which are the columns or
column expressions by which records will be
assigned to partitions defined by partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list
of partition definitions, whose format
depends on the choice of partition_type
. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for
example formats.
IS_AUTOMATIC_PARTITION
: If true
, a
new partition will be created for values
which don't fall into an existing partition.
Currently only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified
in tableName
.
CHUNK_SIZE
: Indicates the number of records
per chunk to be used for this table.
IS_RESULT_TABLE
: Indicates whether the table
is a memory-only table. A result
table cannot contain columns with store_only
or text_search data-handling or that are
non-charN strings, and it
will not be retained if the server is
restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy for the table
and its columns.
Map
.this
to mimic the builder pattern.public Map<String,String> getOptions()
getOptions
in class InsertRecordsFromPayloadRequest
AVRO_HEADER_BYTES
: Optional number of bytes to skip when
reading an avro record.
AVRO_NUM_RECORDS
: Optional number of avro records, if data
includes only records.
AVRO_SCHEMA
: Optional string representing avro schema, for
insert records in avro format, that does not include is schema.
AVRO_SCHEMALESS
: When user provides 'avro_schema', avro data is
assumed to be schemaless, unless specified. Default is 'true'
when given avro_schema. Igonred when avro_schema is not given.
Supported values:
BAD_RECORD_TABLE_NAME
: Optional name of a table to which
records that were rejected are written. The bad-record-table
has the following columns: line_number (long), line_rejected
(string), error_message (string).
BAD_RECORD_TABLE_LIMIT
: A positive integer indicating the
maximum number of records that can be written to the
bad-record-table. Default value is 10000
BAD_RECORD_TABLE_LIMIT_PER_INPUT
: For subscriptions: A positive
integer indicating the maximum number of records that can be
written to the bad-record-table per file/payload. Default value
will be 'bad_record_table_limit' and total size of the table per
rank is limited to 'bad_record_table_limit'
BATCH_SIZE
: Internal tuning parameter--number of records per
batch when inserting data.
COLUMN_FORMATS
: For each target column specified, applies the
column-property-bound format to the source data
loaded into that column. Each column format will contain a
mapping of one or more of its column
properties to an appropriate format for each property.
Currently supported column properties
include date, time, & datetime. The parameter value must be
formatted as a JSON string of maps of
column names to maps of column properties to their corresponding
column formats, e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time" : {
"time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format syntax.
COLUMNS_TO_LOAD
: Specifies a comma-delimited list of columns
from the source data to
load. If more than one file is being loaded, this list applies
to all files.
Column numbers can be specified discretely or as a range. For
example, a value of '5,7,1..3' will
insert values from the fifth column in the source data into the
first column in the target table,
from the seventh column in the source data into the second
column in the target table, and from the
first through third columns in the source data into the third
through fifth columns in the target
table.
If the source data contains a header, column names matching the
file header names may be provided
instead of column numbers. If the target table doesn't exist,
the table will be created with the
columns in this order. If the target table does exist with
columns in a different order than the
source data, this list can be used to match the order of the
target table. For example, a value of
'C, B, A' will create a three column table with column C,
followed by column B, followed by column
A; or will insert those fields in that order into a table
created with columns in that order. If
the target table exists, the column names must match the source
data field names for a name-mapping
to be successful.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Specifies a comma-delimited list of columns
from the source data to
skip. Mutually exclusive with columns_to_load
.
COMPRESSION_TYPE
: Optional: payload compression type
Supported values:
NONE
: Uncompressed
AUTO
: Default. Auto detect compression type
GZIP
: gzip file compression.
BZIP2
: bzip2 file compression.
AUTO
.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to be
applied to source data loaded
into columns with the corresponding column property. Currently
supported column properties include
date, time, & datetime. This default column-property-bound
format can be overridden by specifying a
column property & format for a given target column in column_formats
. For
each specified annotation, the format will apply to all columns
with that annotation unless a custom
column_formats
for that annotation is specified.
The parameter value must be formatted as a JSON string that is a
map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d", "time"
: "%H:%M:%S" }'. Column
formats are specified as a string of control characters and
plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's', which
follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds and
fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y', 'm', and
'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and either 'S'
or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet both the
'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }'
would be used to interpret text
as "05/04/2000 12:12:11"
ERROR_HANDLING
: Specifies how errors should be handled upon
insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated with
nulls if possible; otherwise, the malformed records are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire operation when
an error is encountered. Primary key collisions are considered
abortable errors in this mode.
ABORT
.
FILE_TYPE
: Specifies the type of the file(s) whose records will
be inserted.
Supported values:
AVRO
: Avro file format
DELIMITED_TEXT
: Delimited text file format; e.g., CSV, TSV,
PSV, etc.
GDB
: Esri/GDB file format
JSON
: Json file format
PARQUET
: Apache Parquet file format
SHAPEFILE
: ShapeFile file format
DELIMITED_TEXT
.
GDAL_CONFIGURATION_OPTIONS
: Comma separated list of gdal conf
options, for the specific requets: key=value. The default value
is ''.
IGNORE_EXISTING_PK
: Specifies the record collision
error-suppression policy for
inserting into a table with a primary key, only used when
not in upsert mode (upsert mode is disabled when update_on_existing_pk
is
false
). If set to
true
, any record being inserted that is rejected
for having primary key values that match those of an existing
table record will be ignored with no
error generated. If false
, the rejection of any
record for having primary key values matching an existing record
will result in an error being
reported, as determined by error_handling
. If the
specified table does not
have a primary key or if upsert mode is in effect (update_on_existing_pk
is
true
), then this option has no effect.
Supported values:
TRUE
: Ignore new records whose primary key values collide with
those of existing records
FALSE
: Treat as errors any new records whose primary key values
collide with those of existing records
FALSE
.
INGESTION_MODE
: Whether to do a full load, dry run, or perform
a type inference on the source data.
Supported values:
FULL
: Run a type inference on the source data (if needed) and
ingest
DRY_RUN
: Does not load data, but walks through the source data
and determines the number of valid records, taking into account
the current mode of error_handling
.
TYPE_INFERENCE_ONLY
: Infer the type of the source data and
return, without ingesting any data. The inferred type is
returned in the response.
FULL
.
LAYER
: Optional: geo files layer(s) name(s): comma separated.
The default value is ''.
LOADING_MODE
: Scheme for distributing the extraction and
loading of data from the source data file(s). This option
applies only when loading files that are local to the database
Supported values:
HEAD
: The head node loads all data. All files must be available
to the head node.
DISTRIBUTED_SHARED
: The head node coordinates loading data by
worker
processes across all nodes from shared files available to all
workers.
NOTE:
Instead of existing on a shared source, the files can be
duplicated on a source local to each host
to improve performance, though the files must appear as the same
data set from the perspective of
all hosts performing the load.
DISTRIBUTED_LOCAL
: A single worker process on each node loads
all files
that are available to it. This option works best when each
worker loads files from its own file
system, to maximize performance. In order to avoid data
duplication, either each worker performing
the load needs to have visibility to a set of files unique to it
(no file is visible to more than
one node) or the target table needs to have a primary key (which
will allow the worker to
automatically deduplicate data).
NOTE:
If the target table doesn't exist, the table structure will be
determined by the head node. If the
head node has no files local to it, it will be unable to
determine the structure and the request
will fail.
If the head node is configured to have no worker processes, no
data strictly accessible to the head
node will be loaded.
HEAD
.
LOCAL_TIME_OFFSET
: For Avro local timestamp columns
MAX_RECORDS_TO_LOAD
: Limit the number of records to load in
this request: If this number is larger than a batch_size, then
the number of records loaded will be limited to the next whole
number of batch_size (per working thread). The default value is
''.
NUM_TASKS_PER_RANK
: Optional: number of tasks for reading file
per rank. Default will be external_file_reader_num_tasks
POLL_INTERVAL
: If true
, the number of seconds between
attempts to load external files into the table. If zero,
polling will be continuous as long as data is found. If no data
is found, the interval will steadily increase to a maximum of 60
seconds.
PRIMARY_KEYS
: Optional: comma separated list of column names,
to set as primary keys, when not specified in the type. The
default value is ''.
SCHEMA_REGISTRY_SCHEMA_ID
SCHEMA_REGISTRY_SCHEMA_NAME
SCHEMA_REGISTRY_SCHEMA_VERSION
SHARD_KEYS
: Optional: comma separated list of column names, to
set as primary keys, when not specified in the type. The
default value is ''.
SKIP_LINES
: Skip number of lines from begining of file.
SUBSCRIBE
: Continuously poll the data source to check for new
data and load it into the table.
Supported values:
The default value is FALSE
.
TABLE_INSERT_MODE
: Optional: table_insert_mode. When inserting
records from multiple files: if table_per_file then insert from
each file into a new table. Currently supported only for
shapefiles.
Supported values:
The default value is SINGLE
.
TEXT_COMMENT_STRING
: Specifies the character string that should
be interpreted as a comment line
prefix in the source data. All lines in the data starting with
the provided string are ignored.
For delimited_text
file_type
only. The default
value is '#'.
TEXT_DELIMITER
: Specifies the character delimiting field values
in the source data
and field names in the header (if present).
For delimited_text
file_type
only. The default
value is ','.
TEXT_ESCAPE_CHARACTER
: Specifies the character that is used to
escape other characters in
the source data.
An 'a', 'b', 'f', 'n', 'r', 't', or 'v' preceded by an escape
character will be interpreted as the
ASCII bell, backspace, form feed, line feed, carriage return,
horizontal tab, & vertical tab,
respectively. For example, the escape character followed by an
'n' will be interpreted as a newline
within a field value.
The escape character can also be used to escape the quoting
character, and will be treated as an
escape character whether it is within a quoted field value or
not.
For delimited_text
file_type
only.
TEXT_HAS_HEADER
: Indicates whether the source data contains a
header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: Specifies the delimiter for
column properties in the header row (if
present). Cannot be set to same value as text_delimiter
.
For delimited_text
file_type
only. The default
value is '|'.
TEXT_NULL_STRING
: Specifies the character string that should be
interpreted as a null
value in the source data.
For delimited_text
file_type
only. The default
value is '\\N'.
TEXT_QUOTE_CHARACTER
: Specifies the character that should be
interpreted as a field value
quoting character in the source data. The character must appear
at beginning and end of field value
to take effect. Delimiters within quoted fields are treated as
literals and not delimiters. Within
a quoted field, two consecutive quote characters will be
interpreted as a single literal quote
character, effectively escaping it. To not have a quote
character, specify an empty string.
For delimited_text
file_type
only. The default
value is '"'.
TEXT_SEARCH_COLUMNS
: Add 'text_search' property to internally
inferenced string columns. Comma seperated list of column names
or '*' for all columns. To add text_search property only to
string columns of minimum size, set also the option
'text_search_min_column_length'
TEXT_SEARCH_MIN_COLUMN_LENGTH
: Set minimum column size. Used
only when 'text_search_columns' has a value.
TRUNCATE_STRINGS
: If set to true
, truncate string
values that are longer than the column's type size.
Supported values:
The default value is FALSE
.
TRUNCATE_TABLE
: If set to true
, truncates the table
specified by tableName
prior to loading the file(s).
Supported values:
The default value is FALSE
.
TYPE_INFERENCE_MODE
: optimize type inference for:
Supported values:
ACCURACY
: Scans data to get exactly-typed & sized columns for
all data scanned.
SPEED
: Scans data and picks the widest possible column types so
that 'all' values will fit with minimum data scanned
SPEED
.
UPDATE_ON_EXISTING_PK
: Specifies the record collision policy
for inserting into a table
with a primary key. If set to
true
, any existing table record with primary
key values that match those of a record being inserted will be
replaced by that new record (the new
data will be "upserted"). If set to false
,
any existing table record with primary key values that match
those of a record being inserted will
remain unchanged, while the new record will be rejected and the
error handled as determined by
ignore_existing_pk
& error_handling
. If the
specified table does not have a primary key, then this option
has no effect.
Supported values:
TRUE
: Upsert new records when primary keys match existing
records
FALSE
: Reject new records when primary keys match existing
records
FALSE
.
Map
.public InsertRecordsFromPayloadRequest setOptions(Map<String,String> options)
setOptions
in class InsertRecordsFromPayloadRequest
options
- Optional parameters.
AVRO_HEADER_BYTES
: Optional number of bytes to skip
when reading an avro record.
AVRO_NUM_RECORDS
: Optional number of avro records, if
data includes only records.
AVRO_SCHEMA
: Optional string representing avro schema,
for insert records in avro format, that does not include
is schema.
AVRO_SCHEMALESS
: When user provides 'avro_schema', avro
data is assumed to be schemaless, unless specified.
Default is 'true' when given avro_schema. Igonred when
avro_schema is not given.
Supported values:
BAD_RECORD_TABLE_NAME
: Optional name of a table to
which records that were rejected are written. The
bad-record-table has the following columns: line_number
(long), line_rejected (string), error_message (string).
BAD_RECORD_TABLE_LIMIT
: A positive integer indicating
the maximum number of records that can be written to
the bad-record-table. Default value is 10000
BAD_RECORD_TABLE_LIMIT_PER_INPUT
: For subscriptions: A
positive integer indicating the maximum number of
records that can be written to the bad-record-table per
file/payload. Default value will be
'bad_record_table_limit' and total size of the table per
rank is limited to 'bad_record_table_limit'
BATCH_SIZE
: Internal tuning parameter--number of
records per batch when inserting data.
COLUMN_FORMATS
: For each target column specified,
applies the column-property-bound format to the source
data
loaded into that column. Each column format will
contain a mapping of one or more of its column
properties to an appropriate format for each property.
Currently supported column properties
include date, time, & datetime. The parameter value must
be formatted as a JSON string of maps of
column names to maps of column properties to their
corresponding column formats, e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time"
: { "time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format
syntax.
COLUMNS_TO_LOAD
: Specifies a comma-delimited list of
columns from the source data to
load. If more than one file is being loaded, this list
applies to all files.
Column numbers can be specified discretely or as a
range. For example, a value of '5,7,1..3' will
insert values from the fifth column in the source data
into the first column in the target table,
from the seventh column in the source data into the
second column in the target table, and from the
first through third columns in the source data into the
third through fifth columns in the target
table.
If the source data contains a header, column names
matching the file header names may be provided
instead of column numbers. If the target table doesn't
exist, the table will be created with the
columns in this order. If the target table does exist
with columns in a different order than the
source data, this list can be used to match the order of
the target table. For example, a value of
'C, B, A' will create a three column table with column
C, followed by column B, followed by column
A; or will insert those fields in that order into a
table created with columns in that order. If
the target table exists, the column names must match the
source data field names for a name-mapping
to be successful.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Specifies a comma-delimited list of
columns from the source data to
skip. Mutually exclusive with columns_to_load
.
COMPRESSION_TYPE
: Optional: payload compression type
Supported values:
NONE
: Uncompressed
AUTO
: Default. Auto detect compression type
GZIP
: gzip file compression.
BZIP2
: bzip2 file compression.
AUTO
.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to
be applied to source data loaded
into columns with the corresponding column property.
Currently supported column properties include
date, time, & datetime. This default
column-property-bound format can be overridden by
specifying a
column property & format for a given target column in
column_formats
. For
each specified annotation, the format will apply to all
columns with that annotation unless a custom
column_formats
for that annotation is specified.
The parameter value must be formatted as a JSON string
that is a map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d",
"time" : "%H:%M:%S" }'. Column
formats are specified as a string of control characters
and plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds
and fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y',
'm', and 'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and
either 'S' or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet
both the 'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y
%H:%M:%S" }' would be used to interpret text
as "05/04/2000 12:12:11"
ERROR_HANDLING
: Specifies how errors should be handled
upon insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated
with nulls if possible; otherwise, the malformed records
are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire
operation when an error is encountered. Primary key
collisions are considered abortable errors in this mode.
ABORT
.
FILE_TYPE
: Specifies the type of the file(s) whose
records will be inserted.
Supported values:
AVRO
: Avro file format
DELIMITED_TEXT
: Delimited text file format; e.g., CSV,
TSV, PSV, etc.
GDB
: Esri/GDB file format
JSON
: Json file format
PARQUET
: Apache Parquet file format
SHAPEFILE
: ShapeFile file format
DELIMITED_TEXT
.
GDAL_CONFIGURATION_OPTIONS
: Comma separated list of
gdal conf options, for the specific requets: key=value.
The default value is ''.
IGNORE_EXISTING_PK
: Specifies the record collision
error-suppression policy for
inserting into a table with a primary key, only used when
not in upsert mode (upsert mode is disabled when update_on_existing_pk
is
false
). If set to
true
, any record being inserted that is rejected
for having primary key values that match those of an
existing table record will be ignored with no
error generated. If false
, the rejection of any
record for having primary key values matching an
existing record will result in an error being
reported, as determined by error_handling
. If
the specified table does not
have a primary key or if upsert mode is in effect
(update_on_existing_pk
is
true
), then this option has no effect.
Supported values:
TRUE
: Ignore new records whose primary key values
collide with those of existing records
FALSE
: Treat as errors any new records whose primary
key values collide with those of existing records
FALSE
.
INGESTION_MODE
: Whether to do a full load, dry run, or
perform a type inference on the source data.
Supported values:
FULL
: Run a type inference on the source data (if
needed) and ingest
DRY_RUN
: Does not load data, but walks through the
source data and determines the number of valid records,
taking into account the current mode of error_handling
.
TYPE_INFERENCE_ONLY
: Infer the type of the source data
and return, without ingesting any data. The inferred
type is returned in the response.
FULL
.
LAYER
: Optional: geo files layer(s) name(s): comma
separated. The default value is ''.
LOADING_MODE
: Scheme for distributing the extraction
and loading of data from the source data file(s). This
option applies only when loading files that are local to
the database
Supported values:
HEAD
: The head node loads all data. All files must be
available to the head node.
DISTRIBUTED_SHARED
: The head node coordinates loading
data by worker
processes across all nodes from shared files available
to all workers.
NOTE:
Instead of existing on a shared source, the files can be
duplicated on a source local to each host
to improve performance, though the files must appear as
the same data set from the perspective of
all hosts performing the load.
DISTRIBUTED_LOCAL
: A single worker process on each node
loads all files
that are available to it. This option works best when
each worker loads files from its own file
system, to maximize performance. In order to avoid data
duplication, either each worker performing
the load needs to have visibility to a set of files
unique to it (no file is visible to more than
one node) or the target table needs to have a primary
key (which will allow the worker to
automatically deduplicate data).
NOTE:
If the target table doesn't exist, the table structure
will be determined by the head node. If the
head node has no files local to it, it will be unable to
determine the structure and the request
will fail.
If the head node is configured to have no worker
processes, no data strictly accessible to the head
node will be loaded.
HEAD
.
LOCAL_TIME_OFFSET
: For Avro local timestamp columns
MAX_RECORDS_TO_LOAD
: Limit the number of records to
load in this request: If this number is larger than a
batch_size, then the number of records loaded will be
limited to the next whole number of batch_size (per
working thread). The default value is ''.
NUM_TASKS_PER_RANK
: Optional: number of tasks for
reading file per rank. Default will be
external_file_reader_num_tasks
POLL_INTERVAL
: If true
, the number of seconds
between attempts to load external files into the table.
If zero, polling will be continuous as long as data is
found. If no data is found, the interval will steadily
increase to a maximum of 60 seconds.
PRIMARY_KEYS
: Optional: comma separated list of column
names, to set as primary keys, when not specified in the
type. The default value is ''.
SCHEMA_REGISTRY_SCHEMA_ID
SCHEMA_REGISTRY_SCHEMA_NAME
SCHEMA_REGISTRY_SCHEMA_VERSION
SHARD_KEYS
: Optional: comma separated list of column
names, to set as primary keys, when not specified in the
type. The default value is ''.
SKIP_LINES
: Skip number of lines from begining of file.
SUBSCRIBE
: Continuously poll the data source to check
for new data and load it into the table.
Supported values:
The default value is FALSE
.
TABLE_INSERT_MODE
: Optional: table_insert_mode. When
inserting records from multiple files: if table_per_file
then insert from each file into a new table. Currently
supported only for shapefiles.
Supported values:
The default value is SINGLE
.
TEXT_COMMENT_STRING
: Specifies the character string
that should be interpreted as a comment line
prefix in the source data. All lines in the data
starting with the provided string are ignored.
For delimited_text
file_type
only. The
default value is '#'.
TEXT_DELIMITER
: Specifies the character delimiting
field values in the source data
and field names in the header (if present).
For delimited_text
file_type
only. The
default value is ','.
TEXT_ESCAPE_CHARACTER
: Specifies the character that is
used to escape other characters in
the source data.
An 'a', 'b', 'f', 'n', 'r', 't', or 'v' preceded by an
escape character will be interpreted as the
ASCII bell, backspace, form feed, line feed, carriage
return, horizontal tab, & vertical tab,
respectively. For example, the escape character
followed by an 'n' will be interpreted as a newline
within a field value.
The escape character can also be used to escape the
quoting character, and will be treated as an
escape character whether it is within a quoted field
value or not.
For delimited_text
file_type
only.
TEXT_HAS_HEADER
: Indicates whether the source data
contains a header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: Specifies the delimiter
for
column properties in the header row
(if
present). Cannot be set to same value as text_delimiter
.
For delimited_text
file_type
only. The
default value is '|'.
TEXT_NULL_STRING
: Specifies the character string that
should be interpreted as a null
value in the source data.
For delimited_text
file_type
only. The
default value is '\\N'.
TEXT_QUOTE_CHARACTER
: Specifies the character that
should be interpreted as a field value
quoting character in the source data. The character
must appear at beginning and end of field value
to take effect. Delimiters within quoted fields are
treated as literals and not delimiters. Within
a quoted field, two consecutive quote characters will be
interpreted as a single literal quote
character, effectively escaping it. To not have a quote
character, specify an empty string.
For delimited_text
file_type
only. The
default value is '"'.
TEXT_SEARCH_COLUMNS
: Add 'text_search' property to
internally inferenced string columns. Comma seperated
list of column names or '*' for all columns. To add
text_search property only to string columns of minimum
size, set also the option
'text_search_min_column_length'
TEXT_SEARCH_MIN_COLUMN_LENGTH
: Set minimum column size.
Used only when 'text_search_columns' has a value.
TRUNCATE_STRINGS
: If set to true
, truncate
string values that are longer than the column's type
size.
Supported values:
The default value is FALSE
.
TRUNCATE_TABLE
: If set to true
, truncates the
table specified by tableName
prior to loading
the file(s).
Supported values:
The default value is FALSE
.
TYPE_INFERENCE_MODE
: optimize type inference for:
Supported values:
ACCURACY
: Scans data to get exactly-typed & sized
columns for all data scanned.
SPEED
: Scans data and picks the widest possible column
types so that 'all' values will fit with minimum data
scanned
SPEED
.
UPDATE_ON_EXISTING_PK
: Specifies the record collision
policy for inserting into a table
with a primary key. If set to
true
, any existing table record with primary
key values that match those of a record being inserted
will be replaced by that new record (the new
data will be "upserted"). If set to false
,
any existing table record with primary key values that
match those of a record being inserted will
remain unchanged, while the new record will be rejected
and the error handled as determined by
ignore_existing_pk
& error_handling
. If
the
specified table does not have a primary key, then this
option has no effect.
Supported values:
TRUE
: Upsert new records when primary keys match
existing records
FALSE
: Reject new records when primary keys match
existing records
FALSE
.
Map
.this
to mimic the builder pattern.public org.apache.avro.Schema getSchema()
InsertRecordsFromPayloadRequest
getSchema
in interface org.apache.avro.generic.GenericContainer
getSchema
in class InsertRecordsFromPayloadRequest
Copyright © 2024. All rights reserved.