public static final class CreateTableExternalRequest.Options extends Object
BAD_RECORD_TABLE_NAME
: Name of a table to which records that were
rejected are written.
The bad-record-table has the following columns: line_number (long),
line_rejected (string),
error_message (string). When error_handling
is
abort
, bad records table is not populated.
BAD_RECORD_TABLE_LIMIT
: A positive integer indicating the maximum
number of records that can be
written to the bad-record-table. The default value is '10000'.
BAD_RECORD_TABLE_LIMIT_PER_INPUT
: For subscriptions, a positive integer
indicating the maximum number
of records that can be written to the bad-record-table per file/payload.
Default value will be
bad_record_table_limit
and total size of the table per rank is
limited to
bad_record_table_limit
.
BATCH_SIZE
: Number of records to insert per batch when inserting data.
The default value is '50000'.
COLUMN_FORMATS
: For each target column specified, applies the
column-property-bound
format to the source data loaded into that column. Each column format
will contain a mapping of one
or more of its column properties to an appropriate format for each
property. Currently supported
column properties include date, time, & datetime. The parameter value
must be formatted as a JSON
string of maps of column names to maps of column properties to their
corresponding column formats,
e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time" : { "time" :
"%H:%M:%S" } }'.
See default_column_formats
for valid format syntax.
COLUMNS_TO_LOAD
: Specifies a comma-delimited list of columns from the
source data to
load. If more than one file is being loaded, this list applies to all
files.
Column numbers can be specified discretely or as a range. For example, a value of '5,7,1..3' will insert values from the fifth column in the source data into the first column in the target table, from the seventh column in the source data into the second column in the target table, and from the first through third columns in the source data into the third through fifth columns in the target table.
If the source data contains a header, column names matching the file header names may be provided instead of column numbers. If the target table doesn't exist, the table will be created with the columns in this order. If the target table does exist with columns in a different order than the source data, this list can be used to match the order of the target table. For example, a value of 'C, B, A' will create a three column table with column C, followed by column B, followed by column A; or will insert those fields in that order into a table created with columns in that order. If the target table exists, the column names must match the source data field names for a name-mapping to be successful.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Specifies a comma-delimited list of columns from the
source data to
skip. Mutually exclusive with columns_to_load
.
COMPRESSION_TYPE
: Source data compression type
Supported values:
NONE
: No
compression.
AUTO
: Auto
detect compression type
GZIP
: gzip
file compression.
BZIP2
:
bzip2 file compression.
AUTO
.
DATASOURCE_NAME
: Name of an existing external data source from which
data file(s) specified in filepaths
will be loaded
DEFAULT_COLUMN_FORMATS
: Specifies the default format to be applied to
source data loaded
into columns with the corresponding column property. Currently
supported column properties include
date, time, & datetime. This default column-property-bound format can
be overridden by specifying a
column property & format for a given target column in column_formats
. For
each specified annotation, the format will apply to all columns with
that annotation unless a custom
column_formats
for that annotation is specified.
The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., '{ "date" : "%Y.%m.%d", "time" : "%H:%M:%S" }'. Column formats are specified as a string of control characters and plain text. The supported control characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's', which follow the Linux 'strptime()' specification, as well as 's', which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y', 'm', and 'd' control characters. Formats for the 'time' annotation must include the 'H', 'M', and either 'S' or 's' (but not both) control characters. Formats for the 'datetime' annotation meet both the 'date' and 'time' control character requirements. For example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }' would be used to interpret text as "05/04/2000 12:12:11"
ERROR_HANDLING
: Specifies how errors should be handled upon insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated with nulls if
possible; otherwise, the malformed records are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
:
Stops current insertion and aborts entire operation when an error is
encountered. Primary key collisions are considered abortable errors in
this mode.
ABORT
.
EXTERNAL_TABLE_TYPE
: Specifies whether the external table holds a local
copy of the external data.
Supported values:
MATERIALIZED
: Loads a copy of the external data into the database,
refreshed on demand
LOGICAL
:
External data will not be loaded into the database; the data will be
retrieved from the source upon servicing each query against the external
table
MATERIALIZED
.
FILE_TYPE
: Specifies the type of the file(s) whose records will be
inserted.
Supported values:
AVRO
: Avro
file format
DELIMITED_TEXT
: Delimited text file format; e.g., CSV, TSV, PSV, etc.
GDB
: Esri/GDB
file format
JSON
: Json
file format
PARQUET
:
Apache Parquet file format
SHAPEFILE
: ShapeFile file format
DELIMITED_TEXT
.
GDAL_CONFIGURATION_OPTIONS
: Comma separated list of gdal conf options,
for the specific requets: key=value
IGNORE_EXISTING_PK
: Specifies the record collision error-suppression
policy for
inserting into a table with a primary key, only used when
not in upsert mode (upsert mode is disabled when update_on_existing_pk
is
false
). If set to
true
, any record being inserted that is rejected
for having primary key values that match those of an existing table
record will be ignored with no
error generated. If false
, the rejection of any
record for having primary key values matching an existing record will
result in an error being
reported, as determined by error_handling
. If the specified
table does not
have a primary key or if upsert mode is in effect (update_on_existing_pk
is
true
), then this option has no effect.
Supported values:
TRUE
: Ignore
new records whose primary key values collide with those of existing
records
FALSE
:
Treat as errors any new records whose primary key values collide with
those of existing records
FALSE
.
INGESTION_MODE
: Whether to do a full load, dry run, or perform a type
inference on the source data.
Supported values:
FULL
: Run a
type inference on the source data (if needed) and ingest
DRY_RUN
:
Does not load data, but walks through the source data and determines the
number of valid records, taking into account the current mode of error_handling
.
TYPE_INFERENCE_ONLY
: Infer the type of the source data and return,
without ingesting any data. The inferred type is returned in the
response.
FULL
.
JDBC_FETCH_SIZE
: The JDBC fetch size, which determines how many rows to
fetch per round trip. The default value is '50000'.
KAFKA_CONSUMERS_PER_RANK
: Number of Kafka consumer threads per rank
(valid range 1-6). The default value is '1'.
KAFKA_GROUP_ID
: The group id to be used when consuming data from a
Kafka topic (valid only for Kafka datasource subscriptions).
KAFKA_OFFSET_RESET_POLICY
: Policy to determine whether the Kafka data
consumption starts either at earliest offset or latest offset.
Supported values:
The default value is EARLIEST
.
KAFKA_OPTIMISTIC_INGEST
: Enable optimistic ingestion where Kafka topic
offsets and table data are committed independently to achieve
parallelism.
Supported values:
The default value is FALSE
.
KAFKA_SUBSCRIPTION_CANCEL_AFTER
: Sets the Kafka subscription lifespan
(in minutes). Expired subscription will be cancelled automatically.
KAFKA_TYPE_INFERENCE_FETCH_TIMEOUT
: Maximum time to collect Kafka
messages before type inferencing on the set of them.
LAYER
: Geo
files layer(s) name(s): comma separated.
LOADING_MODE
: Scheme for distributing the extraction and loading of
data from the source data file(s). This option applies only when loading
files that are local to the database
Supported values:
HEAD
: The
head node loads all data. All files must be available to the head node.
DISTRIBUTED_SHARED
: The head node coordinates loading data by worker
processes across all nodes from shared files available to all workers.
NOTE:
Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.
DISTRIBUTED_LOCAL
: A single worker process on each node loads all files
that are available to it. This option works best when each worker loads
files from its own file
system, to maximize performance. In order to avoid data duplication,
either each worker performing
the load needs to have visibility to a set of files unique to it (no
file is visible to more than
one node) or the target table needs to have a primary key (which will
allow the worker to
automatically deduplicate data).
NOTE:
If the target table doesn't exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.
If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.
HEAD
.
LOCAL_TIME_OFFSET
: Apply an offset to Avro local timestamp columns.
MAX_RECORDS_TO_LOAD
: Limit the number of records to load in this
request: if this number
is larger than batch_size
, then the number of records loaded
will be
limited to the next whole number of batch_size
(per working
thread).
NUM_TASKS_PER_RANK
: Number of tasks for reading file per rank. Default
will be system configuration parameter, external_file_reader_num_tasks.
POLL_INTERVAL
: If true
, the number of
seconds between attempts to load external files into the table. If
zero, polling will be continuous
as long as data is found. If no data is found, the interval will
steadily increase to a maximum of
60 seconds. The default value is '0'.
PRIMARY_KEYS
: Comma separated list of column names to set as primary
keys, when not specified in the type.
REFRESH_METHOD
: Method by which the table can be refreshed from its
source data.
Supported values:
MANUAL
:
Refresh only occurs when manually requested by invoking the refresh
action of GPUdb.alterTable(AlterTableRequest)
on this
table.
ON_START
: Refresh table on database startup and when manually requested
by invoking the refresh action of GPUdb.alterTable(AlterTableRequest)
on this table.
MANUAL
.
SCHEMA_REGISTRY_SCHEMA_NAME
: Name of the Avro schema in the schema
registry to use when reading Avro records.
SHARD_KEYS
: Comma separated list of column names to set as shard keys,
when not specified in the type.
SKIP_LINES
: Skip number of lines from begining of file.
SUBSCRIBE
: Continuously poll the data source to check for new data and
load it into the table.
Supported values:
The default value is FALSE
.
TABLE_INSERT_MODE
: Insertion scheme to use when inserting records from
multiple shapefiles.
Supported values:
SINGLE
:
Insert all records into a single table.
TABLE_PER_FILE
: Insert records from each file into a new table
corresponding to that file.
SINGLE
.
TEXT_COMMENT_STRING
: Specifies the character string that should be
interpreted as a comment line
prefix in the source data. All lines in the data starting with the
provided string are ignored.
For delimited_text
file_type
only. The default value is
'#'.
TEXT_DELIMITER
: Specifies the character delimiting field values in the
source data
and field names in the header (if present).
For delimited_text
file_type
only. The default value is
','.
TEXT_ESCAPE_CHARACTER
: Specifies the character that is used to escape
other characters in
the source data.
An 'a', 'b', 'f', 'n', 'r', 't', or 'v' preceded by an escape character will be interpreted as the ASCII bell, backspace, form feed, line feed, carriage return, horizontal tab, & vertical tab, respectively. For example, the escape character followed by an 'n' will be interpreted as a newline within a field value.
The escape character can also be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.
For delimited_text
file_type
only.
TEXT_HAS_HEADER
: Indicates whether the source data contains a header
row.
For delimited_text
file_type
only.
Supported values:
TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: Specifies the delimiter for
column properties in the header row (if
present). Cannot be set to same value as text_delimiter
.
For delimited_text
file_type
only. The default value is
'|'.
TEXT_NULL_STRING
: Specifies the character string that should be
interpreted as a null
value in the source data.
For delimited_text
file_type
only. The default value is
'\\N'.
TEXT_QUOTE_CHARACTER
: Specifies the character that should be
interpreted as a field value
quoting character in the source data. The character must appear at
beginning and end of field value
to take effect. Delimiters within quoted fields are treated as literals
and not delimiters. Within
a quoted field, two consecutive quote characters will be interpreted as
a single literal quote
character, effectively escaping it. To not have a quote character,
specify an empty string.
For delimited_text
file_type
only. The default value is
'"'.
TEXT_SEARCH_COLUMNS
: Add 'text_search' property to internally
inferenced string columns.
Comma seperated list of column names or '*' for all columns. To add
'text_search' property only to
string columns greater than or equal to a minimum size, also set the
text_search_min_column_length
TEXT_SEARCH_MIN_COLUMN_LENGTH
: Set the minimum column size for strings
to apply the 'text_search' property to. Used only when text_search_columns
has a value.
TRUNCATE_STRINGS
: If set to true
, truncate string values that
are longer than the column's type size.
Supported values:
The default value is FALSE
.
TRUNCATE_TABLE
: If set to true
, truncates the table specified
by tableName
prior to loading the file(s).
Supported values:
The default value is FALSE
.
TYPE_INFERENCE_MODE
: Optimize type inferencing for either speed or
accuracy.
Supported values:
ACCURACY
: Scans data to get exactly-typed & sized columns for all data
scanned.
SPEED
:
Scans data and picks the widest possible column types so that 'all'
values will fit with minimum data scanned
SPEED
.
REMOTE_QUERY
: Remote SQL query from which data will be sourced
REMOTE_QUERY_FILTER_COLUMN
: Name of column to be used for splitting
remote_query
into multiple sub-queries using the data
distribution of given column
REMOTE_QUERY_INCREASING_COLUMN
: Column on subscribed remote query
result that will increase for new records (e.g., TIMESTAMP).
REMOTE_QUERY_PARTITION_COLUMN
: Alias name for remote_query_filter_column
.
UPDATE_ON_EXISTING_PK
: Specifies the record collision policy for
inserting into a table
with a primary key. If set to
true
, any existing table record with primary
key values that match those of a record being inserted will be replaced
by that new record (the new
data will be 'upserted'). If set to false
,
any existing table record with primary key values that match those of a
record being inserted will
remain unchanged, while the new record will be rejected and the error
handled as determined by
ignore_existing_pk
& error_handling
. If the
specified table does not have a primary key, then this option has no
effect.
Supported values:
TRUE
: Upsert
new records when primary keys match existing records
FALSE
:
Reject new records when primary keys match existing records
FALSE
.
Map
.
A set of string constants for the parameter options
.Modifier and Type | Field and Description | ||
---|---|---|---|
static String |
ABORT
Stops current insertion and aborts entire operation when an error is
encountered.
|
||
static String |
ACCURACY
Scans data to get exactly-typed & sized columns for all data
scanned.
|
||
static String |
AUTO
Auto detect compression type
|
||
static String |
AVRO
Avro file format
|
||
static String |
BAD_RECORD_TABLE_LIMIT
A positive integer indicating the maximum number of records that can
be
written to the bad-record-table.
|
||
static String |
BAD_RECORD_TABLE_LIMIT_PER_INPUT
For subscriptions, a positive integer indicating the maximum number
of records that can be written to the bad-record-table per
file/payload.
|
||
static String |
BAD_RECORD_TABLE_NAME
Name of a table to which records that were rejected are written.
|
||
static String |
BATCH_SIZE
Number of records to insert per batch when inserting data.
|
||
static String |
BZIP2
bzip2 file compression.
|
||
static String |
COLUMN_FORMATS
For each target column specified, applies the column-property-bound
format to the source data loaded into that column.
|
||
static String |
COLUMNS_TO_LOAD
Specifies a comma-delimited list of columns from the source data to
load.
|
||
static String |
COLUMNS_TO_SKIP
Specifies a comma-delimited list of columns from the source data to
skip.
|
||
static String |
COMPRESSION_TYPE
Source data compression type
Supported values:
NONE : No
compression. |
||
static String |
DATASOURCE_NAME
Name of an existing external data source from which data file(s)
specified in
filepaths will be loaded |
||
static String |
DEFAULT_COLUMN_FORMATS
Specifies the default format to be applied to source data loaded
into columns with the corresponding column property.
|
||
static String |
DELIMITED_TEXT
Delimited text file format; e.g., CSV, TSV, PSV, etc.
|
||
static String |
DISTRIBUTED_LOCAL
A single worker process on each node loads all files
that are available to it.
|
||
static String |
DISTRIBUTED_SHARED
The head node coordinates loading data by worker
processes across all nodes from shared files available to all
workers.
|
||
static String |
DRY_RUN
Does not load data, but walks through the source data and determines
the number of valid records, taking into account the current mode of
error_handling . |
||
static String |
EARLIEST |
||
static String |
ERROR_HANDLING
Specifies how errors should be handled upon insertion.
|
||
static String |
EXTERNAL_TABLE_TYPE
Specifies whether the external table holds a local copy of the
external data.
|
||
static String |
FALSE
Reject new records when primary keys match existing records
|
||
static String |
FILE_TYPE
Specifies the type of the file(s) whose records will be inserted.
|
||
static String |
FULL
Run a type inference on the source data (if needed) and ingest
|
||
static String |
GDAL_CONFIGURATION_OPTIONS
Comma separated list of gdal conf options, for the specific requets:
key=value
|
||
static String |
GDB
Esri/GDB file format
|
||
static String |
GZIP
gzip file compression.
|
||
static String |
HEAD
The head node loads all data.
|
||
static String |
IGNORE_BAD_RECORDS
Malformed records are skipped.
|
||
static String |
IGNORE_EXISTING_PK
Specifies the record collision error-suppression policy for
inserting into a table with a
static String INGESTION_MODE
Whether to do a full load, dry run, or perform a type inference on
the source data.
| ||
static String |
JDBC_FETCH_SIZE
The JDBC fetch size, which determines how many rows to fetch per
round trip.
|
||
static String |
JSON
Json file format
|
||
static String |
KAFKA_CONSUMERS_PER_RANK
Number of Kafka consumer threads per rank (valid range 1-6).
|
||
static String |
KAFKA_GROUP_ID
The group id to be used when consuming data from a Kafka topic
(valid only for Kafka datasource subscriptions).
|
||
static String |
KAFKA_OFFSET_RESET_POLICY
Policy to determine whether the Kafka data consumption starts either
at earliest offset or latest offset.
|
||
static String |
KAFKA_OPTIMISTIC_INGEST
Enable optimistic ingestion where Kafka topic offsets and table data
are committed independently to achieve parallelism.
|
||
static String |
KAFKA_SUBSCRIPTION_CANCEL_AFTER
Sets the Kafka subscription lifespan (in minutes).
|
||
static String |
KAFKA_TYPE_INFERENCE_FETCH_TIMEOUT
Maximum time to collect Kafka messages before type inferencing on
the set of them.
|
||
static String |
LATEST |
||
static String |
LAYER
Geo files layer(s) name(s): comma separated.
|
||
static String |
LOADING_MODE
Scheme for distributing the extraction and loading of data from the
source data file(s).
|
||
static String |
LOCAL_TIME_OFFSET
Apply an offset to Avro local timestamp columns.
|
||
static String |
LOGICAL
External data will not be loaded into the database; the data will be
retrieved from the source upon servicing each query against the
external table
|
||
static String |
MANUAL
Refresh only occurs when manually requested by invoking the refresh
action of
GPUdb.alterTable(AlterTableRequest) on
this table. |
||
static String |
MATERIALIZED
Loads a copy of the external data into the database, refreshed on
demand
|
||
static String |
MAX_RECORDS_TO_LOAD
Limit the number of records to load in this request: if this number
is larger than
batch_size , then the number of records loaded
will be
limited to the next whole number of batch_size (per working
thread). |
||
static String |
NONE
No compression.
|
||
static String |
NUM_TASKS_PER_RANK
Number of tasks for reading file per rank.
|
||
static String |
ON_START
Refresh table on database startup and when manually requested by
invoking the refresh action of
GPUdb.alterTable(AlterTableRequest) on this table. |
||
static String |
PARQUET
Apache Parquet file format
|
||
static String |
PERMISSIVE
Records with missing columns are populated with nulls if possible;
otherwise, the malformed records are skipped.
|
||
static String |
POLL_INTERVAL
If
true , the number of
seconds between attempts to load external files into the table. |
||
static String |
PRIMARY_KEYS
Comma separated list of column names to set as primary keys, when
not specified in the type.
|
||
static String |
REFRESH_METHOD
Method by which the table can be refreshed from its source data.
|
||
static String |
REMOTE_QUERY
Remote SQL query from which data will be sourced
|
||
static String |
REMOTE_QUERY_FILTER_COLUMN
Name of column to be used for splitting
remote_query into
multiple sub-queries using the data distribution of given column |
||
static String |
REMOTE_QUERY_INCREASING_COLUMN
Column on subscribed remote query result that will increase for new
records (e.g., TIMESTAMP).
|
||
static String |
REMOTE_QUERY_PARTITION_COLUMN
Alias name for
remote_query_filter_column . |
||
static String |
SCHEMA_REGISTRY_SCHEMA_NAME
Name of the Avro schema in the schema registry to use when reading
Avro records.
|
||
static String |
SHAPEFILE
ShapeFile file format
|
||
static String |
SHARD_KEYS
Comma separated list of column names to set as shard keys, when not
specified in the type.
|
||
static String |
SINGLE
Insert all records into a single table.
|
||
static String |
SKIP_LINES
Skip number of lines from begining of file.
|
||
static String |
SPEED
Scans data and picks the widest possible column types so that 'all'
values will fit with minimum data scanned
|
||
static String |
SUBSCRIBE
Continuously poll the data source to check for new data and load it
into the table.
|
||
static String |
TABLE_INSERT_MODE
Insertion scheme to use when inserting records from multiple
shapefiles.
|
||
static String |
TABLE_PER_FILE
Insert records from each file into a new table corresponding to that
file.
|
||
static String |
TEXT_COMMENT_STRING
Specifies the character string that should be interpreted as a
comment line
prefix in the source data.
|
||
static String |
TEXT_DELIMITER
Specifies the character delimiting field values in the source data
and field names in the header (if present).
|
||
static String |
TEXT_ESCAPE_CHARACTER
Specifies the character that is used to escape other characters in
the source data.
|
||
static String |
TEXT_HAS_HEADER
Indicates whether the source data contains a header row.
|
||
static String |
TEXT_HEADER_PROPERTY_DELIMITER
Specifies the delimiter for
static String TEXT_NULL_STRING
Specifies the character string that should be interpreted as a null
value in the source data.
| ||
static String |
TEXT_QUOTE_CHARACTER
Specifies the character that should be interpreted as a field value
quoting character in the source data.
|
||
static String |
TEXT_SEARCH_COLUMNS
Add 'text_search' property to internally inferenced string columns.
|
||
static String |
TEXT_SEARCH_MIN_COLUMN_LENGTH
Set the minimum column size for strings to apply the 'text_search'
property to.
|
||
static String |
TRUE
Upsert new records when primary keys match existing records
|
||
static String |
TRUNCATE_STRINGS
If set to
true , truncate string values that are longer than
the column's type size. |
||
static String |
TRUNCATE_TABLE
If set to
true , truncates the table specified by tableName prior to loading the file(s). |
||
static String |
TYPE_INFERENCE_MODE
Optimize type inferencing for either speed or accuracy.
|
||
static String |
TYPE_INFERENCE_ONLY
Infer the type of the source data and return, without ingesting any
data.
|
||
static String |
UPDATE_ON_EXISTING_PK
Specifies the record collision policy for inserting into a table
with a
Method Summary
Copyright © 2024. All rights reserved. |