public class InsertRecordsFromFilesRequest extends Object implements org.apache.avro.generic.IndexedRecord
GPUdb.insertRecordsFromFiles(InsertRecordsFromFilesRequest)
.
Reads from one or more files located on the server and inserts the data into a new or existing table.
For CSV files, there are two loading schemes: positional and name-based. The
name-based loading scheme is enabled when the file has a header present and
text_has_header
is set to true
. In this scheme, the source
file(s) field names must match the target table's column names exactly;
however, the source file can have more fields than the target table has
columns. If error_handling
is set to permissive
, the source
file can have fewer fields than the target table has columns. If the
name-based loading scheme is being used, names matching the file header's
names may be provided to columns_to_load
instead of numbers, but
ranges are not supported.
Returns once all files are processed.
Modifier and Type | Class and Description |
---|---|
static class |
InsertRecordsFromFilesRequest.CreateTableOptions
Options used when creating a new table.
|
static class |
InsertRecordsFromFilesRequest.Options
Optional parameters.
|
Constructor and Description |
---|
InsertRecordsFromFilesRequest()
Constructs an InsertRecordsFromFilesRequest object with default
parameters.
|
InsertRecordsFromFilesRequest(String tableName,
List<String> filepaths,
Map<String,String> createTableOptions,
Map<String,String> options)
Constructs an InsertRecordsFromFilesRequest object with the specified
parameters.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
Object |
get(int index)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
static org.apache.avro.Schema |
getClassSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
Map<String,String> |
getCreateTableOptions() |
List<String> |
getFilepaths() |
Map<String,String> |
getOptions() |
org.apache.avro.Schema |
getSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getTableName() |
int |
hashCode() |
void |
put(int index,
Object value)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
InsertRecordsFromFilesRequest |
setCreateTableOptions(Map<String,String> createTableOptions) |
InsertRecordsFromFilesRequest |
setFilepaths(List<String> filepaths) |
InsertRecordsFromFilesRequest |
setOptions(Map<String,String> options) |
InsertRecordsFromFilesRequest |
setTableName(String tableName) |
String |
toString() |
public InsertRecordsFromFilesRequest()
public InsertRecordsFromFilesRequest(String tableName, List<String> filepaths, Map<String,String> createTableOptions, Map<String,String> options)
tableName
- Name of the table into which the data will be
inserted. If the table does not exist, the table will
be created using either an existing type_id
or
the type inferred from the file.filepaths
- Absolute or relative filepath(s) from where files will
be loaded. Relative filepaths are relative to the
defined external_files_directory parameter
in the server configuration. The filepaths may include
wildcards (*). If the first path ends in .tsv, the
text delimiter will be defaulted to a tab character.
If the first path ends in .psv, the text delimiter
will be defaulted to a pipe character (|).createTableOptions
- Options used when creating a new table.
TYPE_ID
: ID of a currently registered type. The default value is
''.
NO_ERROR_IF_EXISTS
: If true
,
prevents an error from occurring if the table
already exists and is of the given type. If
a table with the same ID but a different type
exists, it is still an error.
Supported values:
The default value is FALSE
.
COLLECTION_NAME
: Name of a collection which
is to contain the newly created table. If the
collection provided is non-existent, the
collection will be automatically created. If
empty, then the newly created table will be a
top-level table.
IS_REPLICATED
: For a table, affects the distribution scheme for the
table's data. If true and the given type has
no explicit shard key defined, the
table will be replicated. If false, the
table will be sharded according to the
shard key specified in the given type_id
, or randomly sharded, if no
shard key is specified. Note that a type
containing a shard key cannot be used to
create a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [,
...]) [as foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the
format 'source_column references
shard_by_column from
target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
PARTITION_KEYS
: Comma-separated list of
partition keys, which are the columns or
column expressions by which records will be
assigned to partitions defined by partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list
of partition definitions, whose format
depends on the choice of partition_type
. See range partitioning, interval partitioning, list partitioning, or hash partitioning for
example formats.
IS_AUTOMATIC_PARTITION
: If true, a new
partition will be created for values which
don't fall into an existing partition.
Currently only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: For a table, sets the TTL of the table specified
in tableName
.
CHUNK_SIZE
: Indicates the number of records
per chunk to be used for this table.
IS_RESULT_TABLE
: For a table, indicates
whether the table is an in-memory table. A
result table cannot contain store_only,
text_search, or string columns (charN columns
are acceptable), and it will not be retained
if the server is restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy for the table
and its columns. See tier strategy usage for
format and tier strategy examples for
examples.
Map
.options
- Optional parameters.
BATCH_SIZE
: Specifies number of records to process
before inserting.
COLUMN_FORMATS
: For each target column specified,
applies the column-property-bound format to the source
data loaded into that column. Each column format will
contain a mapping of one or more of its column
properties to an appropriate format for each property.
Currently supported column properties include date,
time, & datetime. The parameter value must be formatted
as a JSON string of maps of column names to maps of
column properties to their corresponding column formats,
e.g., { "order_date" : { "date" : "%Y.%m.%d" },
"order_time" : { "time" : "%H:%M:%S" } }. See default_column_formats
for valid format syntax.
COLUMNS_TO_LOAD
: For delimited_text
file_type
only. Specifies a comma-delimited list of
column positions or names to load instead of loading all
columns in the file(s); if more than one file is being
loaded, the list of columns will apply to all files.
Column numbers can be specified discretely or as a
range, e.g., a value of '5,7,1..3' will create a table
with the first column in the table being the fifth
column in the file, followed by seventh column in the
file, then the first column through the fourth column in
the file.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to
be applied to source data loaded into columns with the
corresponding column property. This default
column-property-bound format can be overridden by
specifying a column property & format for a given target
column in column_formats
. For each specified
annotation, the format will apply to all columns with
that annotation unless a custom column_formats
for that annotation is specified. The parameter value
must be formatted as a JSON string that is a map of
column properties to their respective column formats,
e.g., { "date" : "%Y.%m.%d", "time" : "%H:%M:%S" }.
Column formats are specified as a string of control
characters and plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()' specification, as
well as 's', which specifies seconds and fractional
seconds (though the fractional component will be
truncated past milliseconds). Formats for the 'date'
annotation must include the 'Y', 'm', and 'd' control
characters. Formats for the 'time' annotation must
include the 'H', 'M', and either 'S' or 's' (but not
both) control characters. Formats for the 'datetime'
annotation meet both the 'date' and 'time' control
character requirements. For example, '{"datetime" :
"%m/%d/%Y %H:%M:%S" }' would be used to interpret text
as "05/04/2000 12:12:11"
DRY_RUN
: If set to true
, no data will be
inserted but the file will be read with the applied
error_handling
mode and the number of valid
records that would be normally inserted are returned.
Supported values:
The default value is FALSE
.
ERROR_HANDLING
: Specifies how errors should be handled
upon insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated
with nulls if possible; otherwise, the malformed records
are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire
operation when an error is encountered.
PERMISSIVE
.
FILE_TYPE
: File type for the file(s).
Supported values:
DELIMITED_TEXT
: Indicates the file(s) are in delimited
text format, e.g., CSV, TSV, PSV, etc.
DELIMITED_TEXT
.
LOADING_MODE
: Specifies how to divide data loading
among nodes.
Supported values:
HEAD
: The head node loads all data. All files must be
available on the head node.
DISTRIBUTED_SHARED
: The worker nodes coordinate loading
a set of files that are available to all of them. All
files must be available on all nodes. This option is
best when there is a shared file system.
DISTRIBUTED_LOCAL
: Each worker node loads all files
that are available to it. This option is best when each
worker node has its own file system.
HEAD
.
TEXT_COMMENT_STRING
: For delimited_text
file_type
only. All lines in the file(s) starting with
the provided string are ignored. The comment string has
no effect unless it appears at the beginning of a line.
The default value is '#'.
TEXT_DELIMITER
: For delimited_text
file_type
only. Specifies the delimiter for values and
columns in the header row (if present). Must be a single
character. The default value is ','.
TEXT_ESCAPE_CHARACTER
: For delimited_text
file_type
only. The character used in the
file(s) to escape certain character sequences in text.
For example, the escape character followed by a literal
'n' escapes to a newline character within the field. Can
be used within quoted string to escape a quote
character. An empty value for this option does not
specify an escape character.
TEXT_HAS_HEADER
: For delimited_text
file_type
only. Indicates whether the delimited text
files have a header row.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: For delimited_text
file_type
only. Specifies the
delimiter for column properties in the header row (if
present). Cannot be set to same value as text_delimiter.
The default value is '|'.
TEXT_NULL_STRING
: For delimited_text
file_type
only. The value in the file(s) to treat as a
null value in the database. The default value is ''.
TEXT_QUOTE_CHARACTER
: For delimited_text
file_type
only. The quote character used in the
file(s), typically encompassing a field value. The
character must appear at beginning and end of field to
take effect. Delimiters within quoted fields are not
treated as delimiters. Within a quoted field, double
quotes (") can be used to escape a single literal quote
character. To not have a quote character, specify an
empty string (""). The default value is '"'.
TRUNCATE_TABLE
: If set to true
, truncates the
table specified by tableName
prior to loading
the file(s).
Supported values:
The default value is FALSE
.
NUM_TASKS_PER_RANK
: Optional: number of tasks for
reading file per rank. Default will be
external_file_reader_num_tasks
Map
.public static org.apache.avro.Schema getClassSchema()
public String getTableName()
type_id
or the type inferred from the file.public InsertRecordsFromFilesRequest setTableName(String tableName)
tableName
- Name of the table into which the data will be
inserted. If the table does not exist, the table will
be created using either an existing type_id
or
the type inferred from the file.this
to mimic the builder pattern.public List<String> getFilepaths()
public InsertRecordsFromFilesRequest setFilepaths(List<String> filepaths)
filepaths
- Absolute or relative filepath(s) from where files will
be loaded. Relative filepaths are relative to the
defined external_files_directory parameter
in the server configuration. The filepaths may include
wildcards (*). If the first path ends in .tsv, the
text delimiter will be defaulted to a tab character.
If the first path ends in .psv, the text delimiter
will be defaulted to a pipe character (|).this
to mimic the builder pattern.public Map<String,String> getCreateTableOptions()
TYPE_ID
: ID of a currently registered type. The default value is ''.
NO_ERROR_IF_EXISTS
: If true
, prevents an error from
occurring if the table already exists and is of the given type.
If a table with the same ID but a different type exists, it is
still an error.
Supported values:
The default value is FALSE
.
COLLECTION_NAME
: Name of a collection which is to contain the
newly created table. If the collection provided is non-existent,
the collection will be automatically created. If empty, then the
newly created table will be a top-level table.
IS_REPLICATED
: For a table, affects the distribution scheme for the table's data. If
true and the given type has no explicit shard key defined, the table will be replicated. If false, the table will be sharded according to the shard key specified
in the given type_id
, or randomly sharded, if no shard key is
specified. Note that a type containing a shard key cannot be
used to create a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [, ...]) [as
foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the format
'source_column references shard_by_column from
target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
PARTITION_KEYS
: Comma-separated list of partition keys, which
are the columns or column expressions by which records will be
assigned to partitions defined by partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list of partition
definitions, whose format depends on the choice of partition_type
. See range partitioning, interval partitioning, list partitioning, or hash partitioning for example formats.
IS_AUTOMATIC_PARTITION
: If true, a new partition will be
created for values which don't fall into an existing partition.
Currently only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: For a table, sets the TTL of
the table specified in tableName
.
CHUNK_SIZE
: Indicates the number of records per chunk to be
used for this table.
IS_RESULT_TABLE
: For a table, indicates whether the table is an
in-memory table. A result table cannot contain store_only,
text_search, or string columns (charN columns are acceptable),
and it will not be retained if the server is restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy for the table and its columns.
See tier strategy usage for format and tier strategy examples for examples.
Map
.public InsertRecordsFromFilesRequest setCreateTableOptions(Map<String,String> createTableOptions)
createTableOptions
- Options used when creating a new table.
TYPE_ID
: ID of a currently registered type. The default value is
''.
NO_ERROR_IF_EXISTS
: If true
,
prevents an error from occurring if the table
already exists and is of the given type. If
a table with the same ID but a different type
exists, it is still an error.
Supported values:
The default value is FALSE
.
COLLECTION_NAME
: Name of a collection which
is to contain the newly created table. If the
collection provided is non-existent, the
collection will be automatically created. If
empty, then the newly created table will be a
top-level table.
IS_REPLICATED
: For a table, affects the distribution scheme for the
table's data. If true and the given type has
no explicit shard key defined, the
table will be replicated. If false, the
table will be sharded according to the
shard key specified in the given type_id
, or randomly sharded, if no
shard key is specified. Note that a type
containing a shard key cannot be used to
create a replicated table.
Supported values:
The default value is FALSE
.
FOREIGN_KEYS
: Semicolon-separated list of foreign keys, of the format
'(source_column_name [, ...]) references
target_table_name(primary_key_column_name [,
...]) [as foreign_key_name]'.
FOREIGN_SHARD_KEY
: Foreign shard key of the
format 'source_column references
shard_by_column from
target_table(primary_key_column)'.
PARTITION_TYPE
: Partitioning scheme to use.
Supported values:
RANGE
: Use range partitioning.
INTERVAL
: Use interval partitioning.
LIST
: Use list partitioning.
HASH
: Use hash partitioning.
PARTITION_KEYS
: Comma-separated list of
partition keys, which are the columns or
column expressions by which records will be
assigned to partitions defined by partition_definitions
.
PARTITION_DEFINITIONS
: Comma-separated list
of partition definitions, whose format
depends on the choice of partition_type
. See range partitioning, interval partitioning, list partitioning, or hash partitioning for
example formats.
IS_AUTOMATIC_PARTITION
: If true, a new
partition will be created for values which
don't fall into an existing partition.
Currently only supported for list partitions.
Supported values:
The default value is FALSE
.
TTL
: For a table, sets the TTL of the table specified
in tableName
.
CHUNK_SIZE
: Indicates the number of records
per chunk to be used for this table.
IS_RESULT_TABLE
: For a table, indicates
whether the table is an in-memory table. A
result table cannot contain store_only,
text_search, or string columns (charN columns
are acceptable), and it will not be retained
if the server is restarted.
Supported values:
The default value is FALSE
.
STRATEGY_DEFINITION
: The tier strategy for the table
and its columns. See tier strategy usage for
format and tier strategy examples for
examples.
Map
.this
to mimic the builder pattern.public Map<String,String> getOptions()
BATCH_SIZE
: Specifies number of records to process before
inserting.
COLUMN_FORMATS
: For each target column specified, applies the
column-property-bound format to the source data loaded into that
column. Each column format will contain a mapping of one or
more of its column properties to an appropriate format for each
property. Currently supported column properties include date,
time, & datetime. The parameter value must be formatted as a
JSON string of maps of column names to maps of column properties
to their corresponding column formats, e.g., { "order_date" : {
"date" : "%Y.%m.%d" }, "order_time" : { "time" : "%H:%M:%S" } }.
See default_column_formats
for valid format syntax.
COLUMNS_TO_LOAD
: For delimited_text
file_type
only. Specifies a comma-delimited list of column positions or
names to load instead of loading all columns in the file(s); if
more than one file is being loaded, the list of columns will
apply to all files. Column numbers can be specified discretely
or as a range, e.g., a value of '5,7,1..3' will create a table
with the first column in the table being the fifth column in the
file, followed by seventh column in the file, then the first
column through the fourth column in the file.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to be
applied to source data loaded into columns with the
corresponding column property. This default
column-property-bound format can be overridden by specifying a
column property & format for a given target column in column_formats
. For each specified annotation, the format will
apply to all columns with that annotation unless a custom column_formats
for that annotation is specified. The parameter
value must be formatted as a JSON string that is a map of column
properties to their respective column formats, e.g., { "date" :
"%Y.%m.%d", "time" : "%H:%M:%S" }. Column formats are specified
as a string of control characters and plain text. The supported
control characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()' specification, as well as
's', which specifies seconds and fractional seconds (though the
fractional component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y', 'm', and
'd' control characters. Formats for the 'time' annotation must
include the 'H', 'M', and either 'S' or 's' (but not both)
control characters. Formats for the 'datetime' annotation meet
both the 'date' and 'time' control character requirements. For
example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }' would be used to
interpret text as "05/04/2000 12:12:11"
DRY_RUN
: If set to true
, no data will be inserted but
the file will be read with the applied error_handling
mode and the number of valid records that would be normally
inserted are returned.
Supported values:
The default value is FALSE
.
ERROR_HANDLING
: Specifies how errors should be handled upon
insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated with
nulls if possible; otherwise, the malformed records are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire operation when
an error is encountered.
PERMISSIVE
.
FILE_TYPE
: File type for the file(s).
Supported values:
DELIMITED_TEXT
: Indicates the file(s) are in delimited text
format, e.g., CSV, TSV, PSV, etc.
DELIMITED_TEXT
.
LOADING_MODE
: Specifies how to divide data loading among nodes.
Supported values:
HEAD
: The head node loads all data. All files must be available
on the head node.
DISTRIBUTED_SHARED
: The worker nodes coordinate loading a set
of files that are available to all of them. All files must be
available on all nodes. This option is best when there is a
shared file system.
DISTRIBUTED_LOCAL
: Each worker node loads all files that are
available to it. This option is best when each worker node has
its own file system.
HEAD
.
TEXT_COMMENT_STRING
: For delimited_text
file_type
only. All lines in the file(s) starting with the
provided string are ignored. The comment string has no effect
unless it appears at the beginning of a line. The default value
is '#'.
TEXT_DELIMITER
: For delimited_text
file_type
only. Specifies the delimiter for values and columns in the
header row (if present). Must be a single character. The
default value is ','.
TEXT_ESCAPE_CHARACTER
: For delimited_text
file_type
only. The character used in the file(s) to escape
certain character sequences in text. For example, the escape
character followed by a literal 'n' escapes to a newline
character within the field. Can be used within quoted string to
escape a quote character. An empty value for this option does
not specify an escape character.
TEXT_HAS_HEADER
: For delimited_text
file_type
only. Indicates whether the delimited text files have a header
row.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: For delimited_text
file_type
only. Specifies the delimiter for column
properties in the header row (if present). Cannot be set to same
value as text_delimiter. The default value is '|'.
TEXT_NULL_STRING
: For delimited_text
file_type
only. The value in the file(s) to treat as a null value in the
database. The default value is ''.
TEXT_QUOTE_CHARACTER
: For delimited_text
file_type
only. The quote character used in the file(s),
typically encompassing a field value. The character must appear
at beginning and end of field to take effect. Delimiters within
quoted fields are not treated as delimiters. Within a quoted
field, double quotes (") can be used to escape a single literal
quote character. To not have a quote character, specify an empty
string (""). The default value is '"'.
TRUNCATE_TABLE
: If set to true
, truncates the table
specified by tableName
prior to loading the file(s).
Supported values:
The default value is FALSE
.
NUM_TASKS_PER_RANK
: Optional: number of tasks for reading file
per rank. Default will be external_file_reader_num_tasks
Map
.public InsertRecordsFromFilesRequest setOptions(Map<String,String> options)
options
- Optional parameters.
BATCH_SIZE
: Specifies number of records to process
before inserting.
COLUMN_FORMATS
: For each target column specified,
applies the column-property-bound format to the source
data loaded into that column. Each column format will
contain a mapping of one or more of its column
properties to an appropriate format for each property.
Currently supported column properties include date,
time, & datetime. The parameter value must be formatted
as a JSON string of maps of column names to maps of
column properties to their corresponding column formats,
e.g., { "order_date" : { "date" : "%Y.%m.%d" },
"order_time" : { "time" : "%H:%M:%S" } }. See default_column_formats
for valid format syntax.
COLUMNS_TO_LOAD
: For delimited_text
file_type
only. Specifies a comma-delimited list of
column positions or names to load instead of loading all
columns in the file(s); if more than one file is being
loaded, the list of columns will apply to all files.
Column numbers can be specified discretely or as a
range, e.g., a value of '5,7,1..3' will create a table
with the first column in the table being the fifth
column in the file, followed by seventh column in the
file, then the first column through the fourth column in
the file.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to
be applied to source data loaded into columns with the
corresponding column property. This default
column-property-bound format can be overridden by
specifying a column property & format for a given target
column in column_formats
. For each specified
annotation, the format will apply to all columns with
that annotation unless a custom column_formats
for that annotation is specified. The parameter value
must be formatted as a JSON string that is a map of
column properties to their respective column formats,
e.g., { "date" : "%Y.%m.%d", "time" : "%H:%M:%S" }.
Column formats are specified as a string of control
characters and plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()' specification, as
well as 's', which specifies seconds and fractional
seconds (though the fractional component will be
truncated past milliseconds). Formats for the 'date'
annotation must include the 'Y', 'm', and 'd' control
characters. Formats for the 'time' annotation must
include the 'H', 'M', and either 'S' or 's' (but not
both) control characters. Formats for the 'datetime'
annotation meet both the 'date' and 'time' control
character requirements. For example, '{"datetime" :
"%m/%d/%Y %H:%M:%S" }' would be used to interpret text
as "05/04/2000 12:12:11"
DRY_RUN
: If set to true
, no data will be
inserted but the file will be read with the applied
error_handling
mode and the number of valid
records that would be normally inserted are returned.
Supported values:
The default value is FALSE
.
ERROR_HANDLING
: Specifies how errors should be handled
upon insertion.
Supported values:
PERMISSIVE
: Records with missing columns are populated
with nulls if possible; otherwise, the malformed records
are skipped.
IGNORE_BAD_RECORDS
: Malformed records are skipped.
ABORT
: Stops current insertion and aborts entire
operation when an error is encountered.
PERMISSIVE
.
FILE_TYPE
: File type for the file(s).
Supported values:
DELIMITED_TEXT
: Indicates the file(s) are in delimited
text format, e.g., CSV, TSV, PSV, etc.
DELIMITED_TEXT
.
LOADING_MODE
: Specifies how to divide data loading
among nodes.
Supported values:
HEAD
: The head node loads all data. All files must be
available on the head node.
DISTRIBUTED_SHARED
: The worker nodes coordinate loading
a set of files that are available to all of them. All
files must be available on all nodes. This option is
best when there is a shared file system.
DISTRIBUTED_LOCAL
: Each worker node loads all files
that are available to it. This option is best when each
worker node has its own file system.
HEAD
.
TEXT_COMMENT_STRING
: For delimited_text
file_type
only. All lines in the file(s) starting with
the provided string are ignored. The comment string has
no effect unless it appears at the beginning of a line.
The default value is '#'.
TEXT_DELIMITER
: For delimited_text
file_type
only. Specifies the delimiter for values and
columns in the header row (if present). Must be a single
character. The default value is ','.
TEXT_ESCAPE_CHARACTER
: For delimited_text
file_type
only. The character used in the
file(s) to escape certain character sequences in text.
For example, the escape character followed by a literal
'n' escapes to a newline character within the field. Can
be used within quoted string to escape a quote
character. An empty value for this option does not
specify an escape character.
TEXT_HAS_HEADER
: For delimited_text
file_type
only. Indicates whether the delimited text
files have a header row.
Supported values:
The default value is TRUE
.
TEXT_HEADER_PROPERTY_DELIMITER
: For delimited_text
file_type
only. Specifies the
delimiter for column properties in the header row (if
present). Cannot be set to same value as text_delimiter.
The default value is '|'.
TEXT_NULL_STRING
: For delimited_text
file_type
only. The value in the file(s) to treat as a
null value in the database. The default value is ''.
TEXT_QUOTE_CHARACTER
: For delimited_text
file_type
only. The quote character used in the
file(s), typically encompassing a field value. The
character must appear at beginning and end of field to
take effect. Delimiters within quoted fields are not
treated as delimiters. Within a quoted field, double
quotes (") can be used to escape a single literal quote
character. To not have a quote character, specify an
empty string (""). The default value is '"'.
TRUNCATE_TABLE
: If set to true
, truncates the
table specified by tableName
prior to loading
the file(s).
Supported values:
The default value is FALSE
.
NUM_TASKS_PER_RANK
: Optional: number of tasks for
reading file per rank. Default will be
external_file_reader_num_tasks
Map
.this
to mimic the builder pattern.public org.apache.avro.Schema getSchema()
getSchema
in interface org.apache.avro.generic.GenericContainer
public Object get(int index)
get
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to getIndexOutOfBoundsException
public void put(int index, Object value)
put
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to setvalue
- the value to setIndexOutOfBoundsException
Copyright © 2020. All rights reserved.