public class CreateTableExternalRequest extends Object implements org.apache.avro.generic.IndexedRecord
GPUdb.createTableExternal(CreateTableExternalRequest)
.
Creates a new external table, which is a local database object whose source data is located externally to the database. The source data can be located either in KiFS; on the cluster, accessible to the database; or remotely, accessible via a pre-defined external data source.
The external table can have its structure defined explicitly, via createTableOptions
,
which contains many of the options from GPUdb.createTable(CreateTableRequest)
; or defined implicitly,
inferred
from the source data.
Modifier and Type | Class and Description | ||
---|---|---|---|
static class |
CreateTableExternalRequest.CreateTableOptions
Options from GPUdb.createTable(CreateTableRequest) ,
allowing the structure of the table to
be defined independently of the data source
TYPE_ID : ID of a currently registered
static class CreateTableExternalRequest.Options
Optional parameters.
|
Constructor and Description |
---|
CreateTableExternalRequest()
Constructs a CreateTableExternalRequest object with default parameters.
|
CreateTableExternalRequest(String tableName,
List<String> filepaths,
Map<String,Map<String,String>> modifyColumns,
Map<String,String> createTableOptions,
Map<String,String> options)
Constructs a CreateTableExternalRequest object with the specified
parameters.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
Object |
get(int index)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
static org.apache.avro.Schema |
getClassSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
Map<String,String> |
getCreateTableOptions() |
List<String> |
getFilepaths() |
Map<String,Map<String,String>> |
getModifyColumns() |
Map<String,String> |
getOptions() |
org.apache.avro.Schema |
getSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getTableName() |
int |
hashCode() |
void |
put(int index,
Object value)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
CreateTableExternalRequest |
setCreateTableOptions(Map<String,String> createTableOptions) |
CreateTableExternalRequest |
setFilepaths(List<String> filepaths) |
CreateTableExternalRequest |
setModifyColumns(Map<String,Map<String,String>> modifyColumns) |
CreateTableExternalRequest |
setOptions(Map<String,String> options) |
CreateTableExternalRequest |
setTableName(String tableName) |
String |
toString() |
public CreateTableExternalRequest()
public CreateTableExternalRequest(String tableName, List<String> filepaths, Map<String,Map<String,String>> modifyColumns, Map<String,String> createTableOptions, Map<String,String> options)
tableName
- Name of the table to be created, in
[schema_name.]table_name format, using
standard name resolution rules and meeting
table naming criteria.filepaths
- A list of file paths from which data will be sourced;
For paths in KiFS, use the uri prefix of kifs://
followed by the path to
a file or directory. File matching by prefix is
supported, e.g. kifs://dir/file would match dir/file_1
and dir/file_2. When prefix matching is used, the path
must start with a full, valid KiFS directory name.
If an external data source is specified in datasource_name
, these file
paths must resolve to accessible files at that data
source location. Prefix matching is supported.
If the data source is hdfs, prefixes must be aligned
with directories, i.e. partial file names will
not match.
If no data source is specified, the files are assumed
to be local to the database and must all be
accessible to the gpudb user, residing on the path (or
relative to the path) specified by the
external files directory in the Kinetica
configuration file. Wildcards (*)
can be used to
specify a group of files. Prefix matching is
supported, the prefixes must be aligned with
directories.
If the first path ends in .tsv, the text delimiter
will be defaulted to a tab character.
If the first path ends in .psv, the text delimiter
will be defaulted to a pipe character (|).modifyColumns
- Not implemented yet. The default value is an
empty Map
.createTableOptions
- Options from GPUdb.createTable(CreateTableRequest)
,
allowing the structure of the table to
be defined independently of the data source
TYPE_ID
: ID of a currently registered type.
NO_ERROR_IF_EXISTS
: If true
,
prevents an error from occurring if the table
already exists and is of the given type. If
a table with
the same name but a different type exists, it
is still an error.
Supported values:
The default value is FALSE
.
IS_REPLICATED
: Affects the distribution scheme
for the table's data. If true
and
the
given table has no explicit shard key defined, the
table will be replicated. If
false
, the table will be
sharded according to the
shard key specified in the
given type_id
, or
randomly sharded, if no
shard key is specified.
Note that a type containing a shard key
cannot be used to create a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of
foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [,
...]) [as foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the
format
'source_column references shard_by_column
from target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
SERIES
: Use series partitioning.
PARTITION_KEYS
: Comma-separated list of
partition keys, which are the columns or
column expressions by which records will be
assigned to partitions defined by
partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list
of partition definitions, whose format
depends
on the choice of partition_type
. See
range partitioning,
interval partitioning,
list partitioning,
hash partitioning, or
series partitioning for
example formats.
IS_AUTOMATIC_PARTITION
: If true
,
a new partition will be created for values
which don't fall into an existing partition.
Currently,
only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified
in tableName
.
CHUNK_SIZE
: Indicates the number of records
per chunk to be used for this table.
IS_RESULT_TABLE
: Indicates whether the table
is a
memory-only table. A result
table cannot contain
columns with store_only or text_search
data-handling or that are
non-charN strings, and it
will not be retained if
the server is restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy
for the table and its columns.
Map
.options
- Optional parameters.
BAD_RECORD_TABLE_NAME
: Name of a table to which records
that were rejected are written.
The bad-record-table has the following columns:
line_number (long), line_rejected (string),
error_message (string). When error_handling
is
abort
, bad records table is not populated.
BAD_RECORD_TABLE_LIMIT
: A positive integer indicating
the maximum number of records that can be
written to the bad-record-table. The default value is
'10000'.
BAD_RECORD_TABLE_LIMIT_PER_INPUT
: For subscriptions, a
positive integer indicating the maximum number
of records that can be written to the bad-record-table
per file/payload. Default value will be
bad_record_table_limit
and total size of the
table per rank is limited to
bad_record_table_limit
.
BATCH_SIZE
: Number of records to insert per batch when
inserting data. The default value is '50000'.
COLUMN_FORMATS
: For each target column specified,
applies the column-property-bound
format to the source data loaded into that column. Each
column format will contain a mapping of one
or more of its column properties to an appropriate
format for each property. Currently supported
column properties include date, time, & datetime. The
parameter value must be formatted as a JSON
string of maps of column names to maps of column
properties to their corresponding column formats,
e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time"
: { "time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format
syntax.
COLUMNS_TO_LOAD
: Specifies a comma-delimited list of
columns from the source data to
load. If more than one file is being loaded, this list
applies to all files.
Column numbers can be specified discretely or as a
range. For example, a value of '5,7,1..3' will
insert values from the fifth column in the source data
into the first column in the target table,
from the seventh column in the source data into the
second column in the target table, and from the
first through third columns in the source data into the
third through fifth columns in the target
table.
If the source data contains a header, column names
matching the file header names may be provided
instead of column numbers. If the target table doesn't
exist, the table will be created with the
columns in this order. If the target table does exist
with columns in a different order than the
source data, this list can be used to match the order of
the target table. For example, a value of
'C, B, A' will create a three column table with column
C, followed by column B, followed by column
A; or will insert those fields in that order into a
table created with columns in that order. If
the target table exists, the column names must match the
source data field names for a name-mapping
to be successful.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Specifies a comma-delimited list of
columns from the source data to
skip. Mutually exclusive with columns_to_load
.
COMPRESSION_TYPE
: Source data compression type
Supported values:
NONE
: No compression.
AUTO
: Auto detect compression type
GZIP
: gzip file compression.
BZIP2
: bzip2 file compression.
AUTO
.
DATASOURCE_NAME
: Name of an existing external data
source from which data file(s) specified in filepaths
will be loaded
DEFAULT_COLUMN_FORMATS
: Specifies the default format to
be applied to source data loaded
into columns with the corresponding column property.
Currently supported column properties include
date, time, & datetime. This default
column-property-bound format can be overridden by
specifying a
column property & format for a given target column in
column_formats
. For
each specified annotation, the format will apply to all
columns with that annotation unless a custom
column_formats
for that annotation is specified.
The parameter value must be formatted as a JSON string
that is a map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d",
"time" : "%H:%M:%S" }'. Column
formats are specified as a string of control characters
and plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds
and fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y',
'm', and 'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and
either 'S' or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet
both the 'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y
%H:%M:%S" }' would be used to interpret text
as "05/04/2000 12:12:11"
ERROR_HANDLING
: Specifies how errors should be handled
upon insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated
with nulls if possible; otherwise, the malformed records
are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire
operation when an error is encountered. Primary key
collisions are considered abortable errors in this mode.
ABORT
.
EXTERNAL_TABLE_TYPE
: Specifies whether the external
table holds a local copy of the external data.
Supported values:
MATERIALIZED
: Loads a copy of the external data into
the database, refreshed on demand
LOGICAL
: External data will not be loaded into the
database; the data will be retrieved from the source
upon servicing each query against the external table
MATERIALIZED
.
FILE_TYPE
: Specifies the type of the file(s) whose
records will be inserted.
Supported values:
AVRO
: Avro file format
DELIMITED_TEXT
: Delimited text file format; e.g., CSV,
TSV, PSV, etc.
GDB
: Esri/GDB file format
JSON
: Json file format
PARQUET
: Apache Parquet file format
SHAPEFILE
: ShapeFile file format
DELIMITED_TEXT
.
GDAL_CONFIGURATION_OPTIONS
: Comma separated list of
gdal conf options, for the specific requets: key=value
IGNORE_EXISTING_PK
: Specifies the record collision
error-suppression policy for
inserting into a table with a primary key, only used when
not in upsert mode (upsert mode is disabled when update_on_existing_pk
is
false
). If set to
true
, any record being inserted that is rejected
for having primary key values that match those of an
existing table record will be ignored with no
error generated. If false
, the rejection of any
record for having primary key values matching an
existing record will result in an error being
reported, as determined by error_handling
. If
the specified table does not
have a primary key or if upsert mode is in effect
(update_on_existing_pk
is
true
), then this option has no effect.
Supported values:
TRUE
: Ignore new records whose primary key values
collide with those of existing records
FALSE
: Treat as errors any new records whose primary
key values collide with those of existing records
FALSE
.
INGESTION_MODE
: Whether to do a full load, dry run, or
perform a type inference on the source data.
Supported values:
FULL
: Run a type inference on the source data (if
needed) and ingest
DRY_RUN
: Does not load data, but walks through the
source data and determines the number of valid records,
taking into account the current mode of error_handling
.
TYPE_INFERENCE_ONLY
: Infer the type of the source data
and return, without ingesting any data. The inferred
type is returned in the response.
FULL
.
JDBC_FETCH_SIZE
: The JDBC fetch size, which determines
how many rows to fetch per round trip. The default
value is '50000'.
KAFKA_CONSUMERS_PER_RANK
: Number of Kafka consumer
threads per rank (valid range 1-6). The default value
is '1'.
KAFKA_GROUP_ID
: The group id to be used when consuming
data from a Kafka topic (valid only for Kafka datasource
subscriptions).
KAFKA_OFFSET_RESET_POLICY
: Policy to determine whether
the Kafka data consumption starts either at earliest
offset or latest offset.
Supported values:
The default value is EARLIEST
.
KAFKA_OPTIMISTIC_INGEST
: Enable optimistic ingestion
where Kafka topic offsets and table data are committed
independently to achieve parallelism.
Supported values:
The default value is FALSE
.
KAFKA_SUBSCRIPTION_CANCEL_AFTER
: Sets the Kafka
subscription lifespan (in minutes). Expired subscription
will be cancelled automatically.
KAFKA_TYPE_INFERENCE_FETCH_TIMEOUT
: Maximum time to
collect Kafka messages before type inferencing on the
set of them.
LAYER
: Geo files layer(s) name(s): comma separated.
LOADING_MODE
: Scheme for distributing the extraction
and loading of data from the source data file(s). This
option applies only when loading files that are local to
the database
Supported values:
HEAD
: The head node loads all data. All files must be
available to the head node.
DISTRIBUTED_SHARED
: The head node coordinates loading
data by worker
processes across all nodes from shared files available
to all workers.
NOTE:
Instead of existing on a shared source, the files can be
duplicated on a source local to each host
to improve performance, though the files must appear as
the same data set from the perspective of
all hosts performing the load.
DISTRIBUTED_LOCAL
: A single worker process on each node
loads all files
that are available to it. This option works best when
each worker loads files from its own file
system, to maximize performance. In order to avoid data
duplication, either each worker performing
the load needs to have visibility to a set of files
unique to it (no file is visible to more than
one node) or the target table needs to have a primary
key (which will allow the worker to
automatically deduplicate data).
NOTE:
If the target table doesn't exist, the table structure
will be determined by the head node. If the
head node has no files local to it, it will be unable to
determine the structure and the request
will fail.
If the head node is configured to have no worker
processes, no data strictly accessible to the head
node will be loaded.
HEAD
.
LOCAL_TIME_OFFSET
: Apply an offset to Avro local
timestamp columns.
MAX_RECORDS_TO_LOAD
: Limit the number of records to
load in this request: if this number
is larger than batch_size
, then the number of
records loaded will be
limited to the next whole number of batch_size
(per working thread).
NUM_TASKS_PER_RANK
: Number of tasks for reading file
per rank. Default will be system configuration
parameter, external_file_reader_num_tasks.
POLL_INTERVAL
: If true
, the number of
seconds between attempts to load external files into the
table. If zero, polling will be continuous
as long as data is found. If no data is found, the
interval will steadily increase to a maximum of
60 seconds. The default value is '0'.
PRIMARY_KEYS
: Comma separated list of column names to
set as primary keys, when not specified in the type.
REFRESH_METHOD
: Method by which the table can be
refreshed from its source data.
Supported values:
MANUAL
: Refresh only occurs when manually requested by
invoking the refresh action of GPUdb.alterTable(AlterTableRequest)
on this
table.
ON_START
: Refresh table on database startup and when
manually requested by invoking the refresh action of
GPUdb.alterTable(AlterTableRequest)
on
this table.
MANUAL
.
SCHEMA_REGISTRY_SCHEMA_NAME
: Name of the Avro schema in
the schema registry to use when reading Avro records.
SHARD_KEYS
: Comma separated list of column names to set
as shard keys, when not specified in the type.
SKIP_LINES
: Skip number of lines from begining of file.
SUBSCRIBE
: Continuously poll the data source to check
for new data and load it into the table.
Supported values:
The default value is FALSE
.
TABLE_INSERT_MODE
: Insertion scheme to use when
inserting records from multiple shapefiles.
Supported values:
SINGLE
: Insert all records into a single table.
TABLE_PER_FILE
: Insert records from each file into a
new table corresponding to that file.
SINGLE
.
TEXT_COMMENT_STRING
: Specifies the character string
that should be interpreted as a comment line
prefix in the source data. All lines in the data
starting with the provided string are ignored.
For delimited_text
file_type
only. The
default value is '#'.
TEXT_DELIMITER
: Specifies the character delimiting
field values in the source data
and field names in the header (if present).
For delimited_text
file_type
only. The
default value is ','.
TEXT_ESCAPE_CHARACTER
: Specifies the character that is
used to escape other characters in
the source data.
An 'a', 'b', 'f', 'n', 'r', 't', or 'v' preceded by an
escape character will be interpreted as the
ASCII bell, backspace, form feed, line feed, carriage
return, horizontal tab, & vertical tab,
respectively. For example, the escape character
followed by an 'n' will be interpreted as a newline
within a field value.
The escape character can also be used to escape the
quoting character, and will be treated as an
escape character whether it is within a quoted field
value or not.
For delimited_text
file_type
only.
TEXT_HAS_HEADER
: Indicates whether the source data
contains a header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: Specifies the delimiter
for
column properties in the header row
(if
present). Cannot be set to same value as text_delimiter
.
For delimited_text
file_type
only. The
default value is '|'.
TEXT_NULL_STRING
: Specifies the character string that
should be interpreted as a null
value in the source data.
For delimited_text
file_type
only. The
default value is '\\N'.
TEXT_QUOTE_CHARACTER
: Specifies the character that
should be interpreted as a field value
quoting character in the source data. The character
must appear at beginning and end of field value
to take effect. Delimiters within quoted fields are
treated as literals and not delimiters. Within
a quoted field, two consecutive quote characters will be
interpreted as a single literal quote
character, effectively escaping it. To not have a quote
character, specify an empty string.
For delimited_text
file_type
only. The
default value is '"'.
TEXT_SEARCH_COLUMNS
: Add 'text_search' property to
internally inferenced string columns.
Comma seperated list of column names or '*' for all
columns. To add 'text_search' property only to
string columns greater than or equal to a minimum size,
also set the
text_search_min_column_length
TEXT_SEARCH_MIN_COLUMN_LENGTH
: Set the minimum column
size for strings to apply the 'text_search' property to.
Used only when text_search_columns
has a value.
TRUNCATE_STRINGS
: If set to true
, truncate
string values that are longer than the column's type
size.
Supported values:
The default value is FALSE
.
TRUNCATE_TABLE
: If set to true
, truncates the
table specified by tableName
prior to loading
the file(s).
Supported values:
The default value is FALSE
.
TYPE_INFERENCE_MODE
: Optimize type inferencing for
either speed or accuracy.
Supported values:
ACCURACY
: Scans data to get exactly-typed & sized
columns for all data scanned.
SPEED
: Scans data and picks the widest possible column
types so that 'all' values will fit with minimum data
scanned
SPEED
.
REMOTE_QUERY
: Remote SQL query from which data will be
sourced
REMOTE_QUERY_FILTER_COLUMN
: Name of column to be used
for splitting remote_query
into multiple
sub-queries using the data distribution of given column
REMOTE_QUERY_INCREASING_COLUMN
: Column on subscribed
remote query result that will increase for new records
(e.g., TIMESTAMP).
REMOTE_QUERY_PARTITION_COLUMN
: Alias name for remote_query_filter_column
.
UPDATE_ON_EXISTING_PK
: Specifies the record collision
policy for inserting into a table
with a primary key. If set to
true
, any existing table record with primary
key values that match those of a record being inserted
will be replaced by that new record (the new
data will be 'upserted'). If set to false
,
any existing table record with primary key values that
match those of a record being inserted will
remain unchanged, while the new record will be rejected
and the error handled as determined by
ignore_existing_pk
& error_handling
. If
the
specified table does not have a primary key, then this
option has no effect.
Supported values:
TRUE
: Upsert new records when primary keys match
existing records
FALSE
: Reject new records when primary keys match
existing records
FALSE
.
Map
.public static org.apache.avro.Schema getClassSchema()
public String getTableName()
public CreateTableExternalRequest setTableName(String tableName)
tableName
- Name of the table to be created, in
[schema_name.]table_name format, using
standard name resolution rules and meeting
table naming criteria.this
to mimic the builder pattern.public List<String> getFilepaths()
datasource_name
, these file
paths must resolve to accessible files at that data source
location. Prefix matching is supported.
If the data source is hdfs, prefixes must be aligned with
directories, i.e. partial file names will
not match.
If no data source is specified, the files are assumed to be
local to the database and must all be
accessible to the gpudb user, residing on the path (or relative
to the path) specified by the
external files directory in the Kinetica
configuration file. Wildcards (*) can be used
to
specify a group of files. Prefix matching is supported, the
prefixes must be aligned with
directories.
If the first path ends in .tsv, the text delimiter will be
defaulted to a tab character.
If the first path ends in .psv, the text delimiter will be
defaulted to a pipe character (|).public CreateTableExternalRequest setFilepaths(List<String> filepaths)
filepaths
- A list of file paths from which data will be sourced;
For paths in KiFS, use the uri prefix of kifs://
followed by the path to
a file or directory. File matching by prefix is
supported, e.g. kifs://dir/file would match dir/file_1
and dir/file_2. When prefix matching is used, the path
must start with a full, valid KiFS directory name.
If an external data source is specified in datasource_name
, these file
paths must resolve to accessible files at that data
source location. Prefix matching is supported.
If the data source is hdfs, prefixes must be aligned
with directories, i.e. partial file names will
not match.
If no data source is specified, the files are assumed
to be local to the database and must all be
accessible to the gpudb user, residing on the path (or
relative to the path) specified by the
external files directory in the Kinetica
configuration file. Wildcards (*)
can be used to
specify a group of files. Prefix matching is
supported, the prefixes must be aligned with
directories.
If the first path ends in .tsv, the text delimiter
will be defaulted to a tab character.
If the first path ends in .psv, the text delimiter
will be defaulted to a pipe character (|).this
to mimic the builder pattern.public Map<String,Map<String,String>> getModifyColumns()
Map
.public CreateTableExternalRequest setModifyColumns(Map<String,Map<String,String>> modifyColumns)
modifyColumns
- Not implemented yet. The default value is an
empty Map
.this
to mimic the builder pattern.public Map<String,String> getCreateTableOptions()
GPUdb.createTable(CreateTableRequest)
, allowing the
structure of the table to
be defined independently of the data source
TYPE_ID
: ID of a currently registered type.
NO_ERROR_IF_EXISTS
: If true
,
prevents an error from occurring if the table already exists and
is of the given type. If a table with
the same name but a different type exists, it is still an error.
Supported values:
The default value is FALSE
.
IS_REPLICATED
: Affects the distribution scheme
for the table's data. If true
and the
given table has no explicit shard key defined, the
table will be replicated. If
false
, the table will be
sharded according to the shard key specified
in the
given type_id
, or
randomly sharded, if no shard key is
specified.
Note that a type containing a shard key cannot be used to create
a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of
foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [, ...]) [as
foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the format
'source_column references shard_by_column from
target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
SERIES
: Use series partitioning.
PARTITION_KEYS
: Comma-separated list of partition keys, which
are the columns or
column expressions by which records will be assigned to
partitions defined by
partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list of partition
definitions, whose format depends
on the choice of partition_type
. See
range partitioning,
interval partitioning,
list partitioning,
hash partitioning, or
series partitioning for example formats.
IS_AUTOMATIC_PARTITION
: If true
,
a new partition will be created for values which don't fall into
an existing partition. Currently,
only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified in tableName
.
CHUNK_SIZE
: Indicates the number of records per chunk to be
used for this table.
IS_RESULT_TABLE
: Indicates whether the table is a
memory-only table. A result table cannot
contain
columns with store_only or text_search
data-handling or that are
non-charN strings, and it will not be retained
if
the server is restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy
for the table and its columns.
Map
.public CreateTableExternalRequest setCreateTableOptions(Map<String,String> createTableOptions)
createTableOptions
- Options from GPUdb.createTable(CreateTableRequest)
,
allowing the structure of the table to
be defined independently of the data source
TYPE_ID
: ID of a currently registered type.
NO_ERROR_IF_EXISTS
: If true
,
prevents an error from occurring if the table
already exists and is of the given type. If
a table with
the same name but a different type exists, it
is still an error.
Supported values:
The default value is FALSE
.
IS_REPLICATED
: Affects the distribution scheme
for the table's data. If true
and
the
given table has no explicit shard key defined, the
table will be replicated. If
false
, the table will be
sharded according to the
shard key specified in the
given type_id
, or
randomly sharded, if no
shard key is specified.
Note that a type containing a shard key
cannot be used to create a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of
foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [,
...]) [as foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the
format
'source_column references shard_by_column
from target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
SERIES
: Use series partitioning.
PARTITION_KEYS
: Comma-separated list of
partition keys, which are the columns or
column expressions by which records will be
assigned to partitions defined by
partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list
of partition definitions, whose format
depends
on the choice of partition_type
. See
range partitioning,
interval partitioning,
list partitioning,
hash partitioning, or
series partitioning for
example formats.
IS_AUTOMATIC_PARTITION
: If true
,
a new partition will be created for values
which don't fall into an existing partition.
Currently,
only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: Sets the TTL of the table specified
in tableName
.
CHUNK_SIZE
: Indicates the number of records
per chunk to be used for this table.
IS_RESULT_TABLE
: Indicates whether the table
is a
memory-only table. A result
table cannot contain
columns with store_only or text_search
data-handling or that are
non-charN strings, and it
will not be retained if
the server is restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy
for the table and its columns.
Map
.this
to mimic the builder pattern.public Map<String,String> getOptions()
BAD_RECORD_TABLE_NAME
: Name of a table to which records that
were rejected are written.
The bad-record-table has the following columns: line_number
(long), line_rejected (string),
error_message (string). When error_handling
is
abort
, bad records table is not populated.
BAD_RECORD_TABLE_LIMIT
: A positive integer indicating the
maximum number of records that can be
written to the bad-record-table. The default value is '10000'.
BAD_RECORD_TABLE_LIMIT_PER_INPUT
: For subscriptions, a positive
integer indicating the maximum number
of records that can be written to the bad-record-table per
file/payload. Default value will be
bad_record_table_limit
and total size of the table per
rank is limited to
bad_record_table_limit
.
BATCH_SIZE
: Number of records to insert per batch when
inserting data. The default value is '50000'.
COLUMN_FORMATS
: For each target column specified, applies the
column-property-bound
format to the source data loaded into that column. Each column
format will contain a mapping of one
or more of its column properties to an appropriate format for
each property. Currently supported
column properties include date, time, & datetime. The parameter
value must be formatted as a JSON
string of maps of column names to maps of column properties to
their corresponding column formats,
e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time" : {
"time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format syntax.
COLUMNS_TO_LOAD
: Specifies a comma-delimited list of columns
from the source data to
load. If more than one file is being loaded, this list applies
to all files.
Column numbers can be specified discretely or as a range. For
example, a value of '5,7,1..3' will
insert values from the fifth column in the source data into the
first column in the target table,
from the seventh column in the source data into the second
column in the target table, and from the
first through third columns in the source data into the third
through fifth columns in the target
table.
If the source data contains a header, column names matching the
file header names may be provided
instead of column numbers. If the target table doesn't exist,
the table will be created with the
columns in this order. If the target table does exist with
columns in a different order than the
source data, this list can be used to match the order of the
target table. For example, a value of
'C, B, A' will create a three column table with column C,
followed by column B, followed by column
A; or will insert those fields in that order into a table
created with columns in that order. If
the target table exists, the column names must match the source
data field names for a name-mapping
to be successful.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Specifies a comma-delimited list of columns
from the source data to
skip. Mutually exclusive with columns_to_load
.
COMPRESSION_TYPE
: Source data compression type
Supported values:
NONE
: No compression.
AUTO
: Auto detect compression type
GZIP
: gzip file compression.
BZIP2
: bzip2 file compression.
AUTO
.
DATASOURCE_NAME
: Name of an existing external data source from
which data file(s) specified in filepaths
will be loaded
DEFAULT_COLUMN_FORMATS
: Specifies the default format to be
applied to source data loaded
into columns with the corresponding column property. Currently
supported column properties include
date, time, & datetime. This default column-property-bound
format can be overridden by specifying a
column property & format for a given target column in column_formats
. For
each specified annotation, the format will apply to all columns
with that annotation unless a custom
column_formats
for that annotation is specified.
The parameter value must be formatted as a JSON string that is a
map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d", "time"
: "%H:%M:%S" }'. Column
formats are specified as a string of control characters and
plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's', which
follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds and
fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y', 'm', and
'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and either 'S'
or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet both the
'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }'
would be used to interpret text
as "05/04/2000 12:12:11"
ERROR_HANDLING
: Specifies how errors should be handled upon
insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated with
nulls if possible; otherwise, the malformed records are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire operation when
an error is encountered. Primary key collisions are considered
abortable errors in this mode.
ABORT
.
EXTERNAL_TABLE_TYPE
: Specifies whether the external table holds
a local copy of the external data.
Supported values:
MATERIALIZED
: Loads a copy of the external data into the
database, refreshed on demand
LOGICAL
: External data will not be loaded into the database;
the data will be retrieved from the source upon servicing each
query against the external table
MATERIALIZED
.
FILE_TYPE
: Specifies the type of the file(s) whose records will
be inserted.
Supported values:
AVRO
: Avro file format
DELIMITED_TEXT
: Delimited text file format; e.g., CSV, TSV,
PSV, etc.
GDB
:
Esri/GDB file format
JSON
: Json file format
PARQUET
: Apache Parquet file format
SHAPEFILE
: ShapeFile file format
DELIMITED_TEXT
.
GDAL_CONFIGURATION_OPTIONS
: Comma separated list of gdal conf
options, for the specific requets: key=value
IGNORE_EXISTING_PK
: Specifies the record collision
error-suppression policy for
inserting into a table with a primary key, only used when
not in upsert mode (upsert mode is disabled when update_on_existing_pk
is
false
). If set to
true
, any record being inserted that is rejected
for having primary key values that match those of an existing
table record will be ignored with no
error generated. If false
, the rejection of any
record for having primary key values matching an existing record
will result in an error being
reported, as determined by error_handling
. If the
specified table does not
have a primary key or if upsert mode is in effect (update_on_existing_pk
is
true
), then this option has no effect.
Supported values:
TRUE
: Ignore new records whose primary key values collide with
those of existing records
FALSE
: Treat as errors any new records whose primary key values
collide with those of existing records
FALSE
.
INGESTION_MODE
: Whether to do a full load, dry run, or perform
a type inference on the source data.
Supported values:
FULL
: Run a type inference on the source data (if needed) and
ingest
DRY_RUN
: Does not load data, but walks through the source data
and determines the number of valid records, taking into account
the current mode of error_handling
.
TYPE_INFERENCE_ONLY
: Infer the type of the source data and
return, without ingesting any data. The inferred type is
returned in the response.
FULL
.
JDBC_FETCH_SIZE
: The JDBC fetch size, which determines how many
rows to fetch per round trip. The default value is '50000'.
KAFKA_CONSUMERS_PER_RANK
: Number of Kafka consumer threads per
rank (valid range 1-6). The default value is '1'.
KAFKA_GROUP_ID
: The group id to be used when consuming data
from a Kafka topic (valid only for Kafka datasource
subscriptions).
KAFKA_OFFSET_RESET_POLICY
: Policy to determine whether the
Kafka data consumption starts either at earliest offset or
latest offset.
Supported values:
The default value is EARLIEST
.
KAFKA_OPTIMISTIC_INGEST
: Enable optimistic ingestion where
Kafka topic offsets and table data are committed independently
to achieve parallelism.
Supported values:
The default value is FALSE
.
KAFKA_SUBSCRIPTION_CANCEL_AFTER
: Sets the Kafka subscription
lifespan (in minutes). Expired subscription will be cancelled
automatically.
KAFKA_TYPE_INFERENCE_FETCH_TIMEOUT
: Maximum time to collect
Kafka messages before type inferencing on the set of them.
LAYER
: Geo files layer(s) name(s): comma separated.
LOADING_MODE
: Scheme for distributing the extraction and
loading of data from the source data file(s). This option
applies only when loading files that are local to the database
Supported values:
HEAD
: The head node loads all data. All files must be available
to the head node.
DISTRIBUTED_SHARED
: The head node coordinates loading data by
worker
processes across all nodes from shared files available to all
workers.
NOTE:
Instead of existing on a shared source, the files can be
duplicated on a source local to each host
to improve performance, though the files must appear as the same
data set from the perspective of
all hosts performing the load.
DISTRIBUTED_LOCAL
: A single worker process on each node loads
all files
that are available to it. This option works best when each
worker loads files from its own file
system, to maximize performance. In order to avoid data
duplication, either each worker performing
the load needs to have visibility to a set of files unique to it
(no file is visible to more than
one node) or the target table needs to have a primary key (which
will allow the worker to
automatically deduplicate data).
NOTE:
If the target table doesn't exist, the table structure will be
determined by the head node. If the
head node has no files local to it, it will be unable to
determine the structure and the request
will fail.
If the head node is configured to have no worker processes, no
data strictly accessible to the head
node will be loaded.
HEAD
.
LOCAL_TIME_OFFSET
: Apply an offset to Avro local timestamp
columns.
MAX_RECORDS_TO_LOAD
: Limit the number of records to load in
this request: if this number
is larger than batch_size
, then the number of records
loaded will be
limited to the next whole number of batch_size
(per
working thread).
NUM_TASKS_PER_RANK
: Number of tasks for reading file per rank.
Default will be system configuration parameter,
external_file_reader_num_tasks.
POLL_INTERVAL
: If true
, the number of
seconds between attempts to load external files into the table.
If zero, polling will be continuous
as long as data is found. If no data is found, the interval
will steadily increase to a maximum of
60 seconds. The default value is '0'.
PRIMARY_KEYS
: Comma separated list of column names to set as
primary keys, when not specified in the type.
REFRESH_METHOD
: Method by which the table can be refreshed from
its source data.
Supported values:
MANUAL
: Refresh only occurs when manually requested by invoking
the refresh action of GPUdb.alterTable(AlterTableRequest)
on this table.
ON_START
: Refresh table on database startup and when manually
requested by invoking the refresh action of GPUdb.alterTable(AlterTableRequest)
on this table.
MANUAL
.
SCHEMA_REGISTRY_SCHEMA_NAME
: Name of the Avro schema in the
schema registry to use when reading Avro records.
SHARD_KEYS
: Comma separated list of column names to set as
shard keys, when not specified in the type.
SKIP_LINES
: Skip number of lines from begining of file.
SUBSCRIBE
: Continuously poll the data source to check for new
data and load it into the table.
Supported values:
The default value is FALSE
.
TABLE_INSERT_MODE
: Insertion scheme to use when inserting
records from multiple shapefiles.
Supported values:
SINGLE
: Insert all records into a single table.
TABLE_PER_FILE
: Insert records from each file into a new table
corresponding to that file.
SINGLE
.
TEXT_COMMENT_STRING
: Specifies the character string that should
be interpreted as a comment line
prefix in the source data. All lines in the data starting with
the provided string are ignored.
For delimited_text
file_type
only. The default
value is '#'.
TEXT_DELIMITER
: Specifies the character delimiting field values
in the source data
and field names in the header (if present).
For delimited_text
file_type
only. The default
value is ','.
TEXT_ESCAPE_CHARACTER
: Specifies the character that is used to
escape other characters in
the source data.
An 'a', 'b', 'f', 'n', 'r', 't', or 'v' preceded by an escape
character will be interpreted as the
ASCII bell, backspace, form feed, line feed, carriage return,
horizontal tab, & vertical tab,
respectively. For example, the escape character followed by an
'n' will be interpreted as a newline
within a field value.
The escape character can also be used to escape the quoting
character, and will be treated as an
escape character whether it is within a quoted field value or
not.
For delimited_text
file_type
only.
TEXT_HAS_HEADER
: Indicates whether the source data contains a
header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: Specifies the delimiter for
column properties in the header row (if
present). Cannot be set to same value as text_delimiter
.
For delimited_text
file_type
only. The default
value is '|'.
TEXT_NULL_STRING
: Specifies the character string that should be
interpreted as a null
value in the source data.
For delimited_text
file_type
only. The default
value is '\\N'.
TEXT_QUOTE_CHARACTER
: Specifies the character that should be
interpreted as a field value
quoting character in the source data. The character must appear
at beginning and end of field value
to take effect. Delimiters within quoted fields are treated as
literals and not delimiters. Within
a quoted field, two consecutive quote characters will be
interpreted as a single literal quote
character, effectively escaping it. To not have a quote
character, specify an empty string.
For delimited_text
file_type
only. The default
value is '"'.
TEXT_SEARCH_COLUMNS
: Add 'text_search' property to internally
inferenced string columns.
Comma seperated list of column names or '*' for all columns. To
add 'text_search' property only to
string columns greater than or equal to a minimum size, also set
the
text_search_min_column_length
TEXT_SEARCH_MIN_COLUMN_LENGTH
: Set the minimum column size for
strings to apply the 'text_search' property to. Used only when
text_search_columns
has a value.
TRUNCATE_STRINGS
: If set to true
, truncate string
values that are longer than the column's type size.
Supported values:
The default value is FALSE
.
TRUNCATE_TABLE
: If set to true
, truncates the table
specified by tableName
prior to loading the file(s).
Supported values:
The default value is FALSE
.
TYPE_INFERENCE_MODE
: Optimize type inferencing for either speed
or accuracy.
Supported values:
ACCURACY
: Scans data to get exactly-typed & sized columns for
all data scanned.
SPEED
: Scans data and picks the widest possible column types so
that 'all' values will fit with minimum data scanned
SPEED
.
REMOTE_QUERY
: Remote SQL query from which data will be sourced
REMOTE_QUERY_FILTER_COLUMN
: Name of column to be used for
splitting remote_query
into multiple sub-queries using
the data distribution of given column
REMOTE_QUERY_INCREASING_COLUMN
: Column on subscribed remote
query result that will increase for new records (e.g.,
TIMESTAMP).
REMOTE_QUERY_PARTITION_COLUMN
: Alias name for remote_query_filter_column
.
UPDATE_ON_EXISTING_PK
: Specifies the record collision policy
for inserting into a table
with a primary key. If set to
true
, any existing table record with primary
key values that match those of a record being inserted will be
replaced by that new record (the new
data will be 'upserted'). If set to false
,
any existing table record with primary key values that match
those of a record being inserted will
remain unchanged, while the new record will be rejected and the
error handled as determined by
ignore_existing_pk
& error_handling
. If the
specified table does not have a primary key, then this option
has no effect.
Supported values:
TRUE
: Upsert new records when primary keys match existing
records
FALSE
: Reject new records when primary keys match existing
records
FALSE
.
Map
.public CreateTableExternalRequest setOptions(Map<String,String> options)
options
- Optional parameters.
BAD_RECORD_TABLE_NAME
: Name of a table to which records
that were rejected are written.
The bad-record-table has the following columns:
line_number (long), line_rejected (string),
error_message (string). When error_handling
is
abort
, bad records table is not populated.
BAD_RECORD_TABLE_LIMIT
: A positive integer indicating
the maximum number of records that can be
written to the bad-record-table. The default value is
'10000'.
BAD_RECORD_TABLE_LIMIT_PER_INPUT
: For subscriptions, a
positive integer indicating the maximum number
of records that can be written to the bad-record-table
per file/payload. Default value will be
bad_record_table_limit
and total size of the
table per rank is limited to
bad_record_table_limit
.
BATCH_SIZE
: Number of records to insert per batch when
inserting data. The default value is '50000'.
COLUMN_FORMATS
: For each target column specified,
applies the column-property-bound
format to the source data loaded into that column. Each
column format will contain a mapping of one
or more of its column properties to an appropriate
format for each property. Currently supported
column properties include date, time, & datetime. The
parameter value must be formatted as a JSON
string of maps of column names to maps of column
properties to their corresponding column formats,
e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time"
: { "time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format
syntax.
COLUMNS_TO_LOAD
: Specifies a comma-delimited list of
columns from the source data to
load. If more than one file is being loaded, this list
applies to all files.
Column numbers can be specified discretely or as a
range. For example, a value of '5,7,1..3' will
insert values from the fifth column in the source data
into the first column in the target table,
from the seventh column in the source data into the
second column in the target table, and from the
first through third columns in the source data into the
third through fifth columns in the target
table.
If the source data contains a header, column names
matching the file header names may be provided
instead of column numbers. If the target table doesn't
exist, the table will be created with the
columns in this order. If the target table does exist
with columns in a different order than the
source data, this list can be used to match the order of
the target table. For example, a value of
'C, B, A' will create a three column table with column
C, followed by column B, followed by column
A; or will insert those fields in that order into a
table created with columns in that order. If
the target table exists, the column names must match the
source data field names for a name-mapping
to be successful.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Specifies a comma-delimited list of
columns from the source data to
skip. Mutually exclusive with columns_to_load
.
COMPRESSION_TYPE
: Source data compression type
Supported values:
NONE
: No compression.
AUTO
: Auto detect compression type
GZIP
: gzip file compression.
BZIP2
: bzip2 file compression.
AUTO
.
DATASOURCE_NAME
: Name of an existing external data
source from which data file(s) specified in filepaths
will be loaded
DEFAULT_COLUMN_FORMATS
: Specifies the default format to
be applied to source data loaded
into columns with the corresponding column property.
Currently supported column properties include
date, time, & datetime. This default
column-property-bound format can be overridden by
specifying a
column property & format for a given target column in
column_formats
. For
each specified annotation, the format will apply to all
columns with that annotation unless a custom
column_formats
for that annotation is specified.
The parameter value must be formatted as a JSON string
that is a map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d",
"time" : "%H:%M:%S" }'. Column
formats are specified as a string of control characters
and plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds
and fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y',
'm', and 'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and
either 'S' or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet
both the 'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y
%H:%M:%S" }' would be used to interpret text
as "05/04/2000 12:12:11"
ERROR_HANDLING
: Specifies how errors should be handled
upon insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated
with nulls if possible; otherwise, the malformed records
are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire
operation when an error is encountered. Primary key
collisions are considered abortable errors in this mode.
ABORT
.
EXTERNAL_TABLE_TYPE
: Specifies whether the external
table holds a local copy of the external data.
Supported values:
MATERIALIZED
: Loads a copy of the external data into
the database, refreshed on demand
LOGICAL
: External data will not be loaded into the
database; the data will be retrieved from the source
upon servicing each query against the external table
MATERIALIZED
.
FILE_TYPE
: Specifies the type of the file(s) whose
records will be inserted.
Supported values:
AVRO
: Avro file format
DELIMITED_TEXT
: Delimited text file format; e.g., CSV,
TSV, PSV, etc.
GDB
: Esri/GDB file format
JSON
: Json file format
PARQUET
: Apache Parquet file format
SHAPEFILE
: ShapeFile file format
DELIMITED_TEXT
.
GDAL_CONFIGURATION_OPTIONS
: Comma separated list of
gdal conf options, for the specific requets: key=value
IGNORE_EXISTING_PK
: Specifies the record collision
error-suppression policy for
inserting into a table with a primary key, only used when
not in upsert mode (upsert mode is disabled when update_on_existing_pk
is
false
). If set to
true
, any record being inserted that is rejected
for having primary key values that match those of an
existing table record will be ignored with no
error generated. If false
, the rejection of any
record for having primary key values matching an
existing record will result in an error being
reported, as determined by error_handling
. If
the specified table does not
have a primary key or if upsert mode is in effect
(update_on_existing_pk
is
true
), then this option has no effect.
Supported values:
TRUE
: Ignore new records whose primary key values
collide with those of existing records
FALSE
: Treat as errors any new records whose primary
key values collide with those of existing records
FALSE
.
INGESTION_MODE
: Whether to do a full load, dry run, or
perform a type inference on the source data.
Supported values:
FULL
: Run a type inference on the source data (if
needed) and ingest
DRY_RUN
: Does not load data, but walks through the
source data and determines the number of valid records,
taking into account the current mode of error_handling
.
TYPE_INFERENCE_ONLY
: Infer the type of the source data
and return, without ingesting any data. The inferred
type is returned in the response.
FULL
.
JDBC_FETCH_SIZE
: The JDBC fetch size, which determines
how many rows to fetch per round trip. The default
value is '50000'.
KAFKA_CONSUMERS_PER_RANK
: Number of Kafka consumer
threads per rank (valid range 1-6). The default value
is '1'.
KAFKA_GROUP_ID
: The group id to be used when consuming
data from a Kafka topic (valid only for Kafka datasource
subscriptions).
KAFKA_OFFSET_RESET_POLICY
: Policy to determine whether
the Kafka data consumption starts either at earliest
offset or latest offset.
Supported values:
The default value is EARLIEST
.
KAFKA_OPTIMISTIC_INGEST
: Enable optimistic ingestion
where Kafka topic offsets and table data are committed
independently to achieve parallelism.
Supported values:
The default value is FALSE
.
KAFKA_SUBSCRIPTION_CANCEL_AFTER
: Sets the Kafka
subscription lifespan (in minutes). Expired subscription
will be cancelled automatically.
KAFKA_TYPE_INFERENCE_FETCH_TIMEOUT
: Maximum time to
collect Kafka messages before type inferencing on the
set of them.
LAYER
: Geo files layer(s) name(s): comma separated.
LOADING_MODE
: Scheme for distributing the extraction
and loading of data from the source data file(s). This
option applies only when loading files that are local to
the database
Supported values:
HEAD
: The head node loads all data. All files must be
available to the head node.
DISTRIBUTED_SHARED
: The head node coordinates loading
data by worker
processes across all nodes from shared files available
to all workers.
NOTE:
Instead of existing on a shared source, the files can be
duplicated on a source local to each host
to improve performance, though the files must appear as
the same data set from the perspective of
all hosts performing the load.
DISTRIBUTED_LOCAL
: A single worker process on each node
loads all files
that are available to it. This option works best when
each worker loads files from its own file
system, to maximize performance. In order to avoid data
duplication, either each worker performing
the load needs to have visibility to a set of files
unique to it (no file is visible to more than
one node) or the target table needs to have a primary
key (which will allow the worker to
automatically deduplicate data).
NOTE:
If the target table doesn't exist, the table structure
will be determined by the head node. If the
head node has no files local to it, it will be unable to
determine the structure and the request
will fail.
If the head node is configured to have no worker
processes, no data strictly accessible to the head
node will be loaded.
HEAD
.
LOCAL_TIME_OFFSET
: Apply an offset to Avro local
timestamp columns.
MAX_RECORDS_TO_LOAD
: Limit the number of records to
load in this request: if this number
is larger than batch_size
, then the number of
records loaded will be
limited to the next whole number of batch_size
(per working thread).
NUM_TASKS_PER_RANK
: Number of tasks for reading file
per rank. Default will be system configuration
parameter, external_file_reader_num_tasks.
POLL_INTERVAL
: If true
, the number of
seconds between attempts to load external files into the
table. If zero, polling will be continuous
as long as data is found. If no data is found, the
interval will steadily increase to a maximum of
60 seconds. The default value is '0'.
PRIMARY_KEYS
: Comma separated list of column names to
set as primary keys, when not specified in the type.
REFRESH_METHOD
: Method by which the table can be
refreshed from its source data.
Supported values:
MANUAL
: Refresh only occurs when manually requested by
invoking the refresh action of GPUdb.alterTable(AlterTableRequest)
on this
table.
ON_START
: Refresh table on database startup and when
manually requested by invoking the refresh action of
GPUdb.alterTable(AlterTableRequest)
on
this table.
MANUAL
.
SCHEMA_REGISTRY_SCHEMA_NAME
: Name of the Avro schema in
the schema registry to use when reading Avro records.
SHARD_KEYS
: Comma separated list of column names to set
as shard keys, when not specified in the type.
SKIP_LINES
: Skip number of lines from begining of file.
SUBSCRIBE
: Continuously poll the data source to check
for new data and load it into the table.
Supported values:
The default value is FALSE
.
TABLE_INSERT_MODE
: Insertion scheme to use when
inserting records from multiple shapefiles.
Supported values:
SINGLE
: Insert all records into a single table.
TABLE_PER_FILE
: Insert records from each file into a
new table corresponding to that file.
SINGLE
.
TEXT_COMMENT_STRING
: Specifies the character string
that should be interpreted as a comment line
prefix in the source data. All lines in the data
starting with the provided string are ignored.
For delimited_text
file_type
only. The
default value is '#'.
TEXT_DELIMITER
: Specifies the character delimiting
field values in the source data
and field names in the header (if present).
For delimited_text
file_type
only. The
default value is ','.
TEXT_ESCAPE_CHARACTER
: Specifies the character that is
used to escape other characters in
the source data.
An 'a', 'b', 'f', 'n', 'r', 't', or 'v' preceded by an
escape character will be interpreted as the
ASCII bell, backspace, form feed, line feed, carriage
return, horizontal tab, & vertical tab,
respectively. For example, the escape character
followed by an 'n' will be interpreted as a newline
within a field value.
The escape character can also be used to escape the
quoting character, and will be treated as an
escape character whether it is within a quoted field
value or not.
For delimited_text
file_type
only.
TEXT_HAS_HEADER
: Indicates whether the source data
contains a header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: Specifies the delimiter
for
column properties in the header row
(if
present). Cannot be set to same value as text_delimiter
.
For delimited_text
file_type
only. The
default value is '|'.
TEXT_NULL_STRING
: Specifies the character string that
should be interpreted as a null
value in the source data.
For delimited_text
file_type
only. The
default value is '\\N'.
TEXT_QUOTE_CHARACTER
: Specifies the character that
should be interpreted as a field value
quoting character in the source data. The character
must appear at beginning and end of field value
to take effect. Delimiters within quoted fields are
treated as literals and not delimiters. Within
a quoted field, two consecutive quote characters will be
interpreted as a single literal quote
character, effectively escaping it. To not have a quote
character, specify an empty string.
For delimited_text
file_type
only. The
default value is '"'.
TEXT_SEARCH_COLUMNS
: Add 'text_search' property to
internally inferenced string columns.
Comma seperated list of column names or '*' for all
columns. To add 'text_search' property only to
string columns greater than or equal to a minimum size,
also set the
text_search_min_column_length
TEXT_SEARCH_MIN_COLUMN_LENGTH
: Set the minimum column
size for strings to apply the 'text_search' property to.
Used only when text_search_columns
has a value.
TRUNCATE_STRINGS
: If set to true
, truncate
string values that are longer than the column's type
size.
Supported values:
The default value is FALSE
.
TRUNCATE_TABLE
: If set to true
, truncates the
table specified by tableName
prior to loading
the file(s).
Supported values:
The default value is FALSE
.
TYPE_INFERENCE_MODE
: Optimize type inferencing for
either speed or accuracy.
Supported values:
ACCURACY
: Scans data to get exactly-typed & sized
columns for all data scanned.
SPEED
: Scans data and picks the widest possible column
types so that 'all' values will fit with minimum data
scanned
SPEED
.
REMOTE_QUERY
: Remote SQL query from which data will be
sourced
REMOTE_QUERY_FILTER_COLUMN
: Name of column to be used
for splitting remote_query
into multiple
sub-queries using the data distribution of given column
REMOTE_QUERY_INCREASING_COLUMN
: Column on subscribed
remote query result that will increase for new records
(e.g., TIMESTAMP).
REMOTE_QUERY_PARTITION_COLUMN
: Alias name for remote_query_filter_column
.
UPDATE_ON_EXISTING_PK
: Specifies the record collision
policy for inserting into a table
with a primary key. If set to
true
, any existing table record with primary
key values that match those of a record being inserted
will be replaced by that new record (the new
data will be 'upserted'). If set to false
,
any existing table record with primary key values that
match those of a record being inserted will
remain unchanged, while the new record will be rejected
and the error handled as determined by
ignore_existing_pk
& error_handling
. If
the
specified table does not have a primary key, then this
option has no effect.
Supported values:
TRUE
: Upsert new records when primary keys match
existing records
FALSE
: Reject new records when primary keys match
existing records
FALSE
.
Map
.this
to mimic the builder pattern.public org.apache.avro.Schema getSchema()
getSchema
in interface org.apache.avro.generic.GenericContainer
public Object get(int index)
get
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to getIndexOutOfBoundsException
public void put(int index, Object value)
put
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to setvalue
- the value to setIndexOutOfBoundsException
Copyright © 2024. All rights reserved.