public class ExportRecordsToFilesRequest extends Object implements org.apache.avro.generic.IndexedRecord
GPUdb.exportRecordsToFiles(ExportRecordsToFilesRequest)
.
Export records from a table to files. All tables can be exported, in full or
partial
(see columns_to_export
and columns_to_skip
).
Additional filtering can be applied when using export table with expression
through SQL.
Default destination is KIFS, though other storage types (Azure, S3, GCS, and
HDFS) are supported
through datasink_name
; see GPUdb.createDatasink(CreateDatasinkRequest)
.
Server's local file system is not supported. Default file format is delimited text. See options for different file types and different options for each file type. Table is saved to a single file if within max file size limits (may vary depending on datasink type). If not, then table is split into multiple files; these may be smaller than the max size limit.
All filenames created are returned in the response.
Modifier and Type | Class and Description |
---|---|
static class |
ExportRecordsToFilesRequest.Options
Optional parameters.
|
Constructor and Description |
---|
ExportRecordsToFilesRequest()
Constructs an ExportRecordsToFilesRequest object with default
parameters.
|
ExportRecordsToFilesRequest(String tableName,
String filepath,
Map<String,String> options)
Constructs an ExportRecordsToFilesRequest object with the specified
parameters.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
Object |
get(int index)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
static org.apache.avro.Schema |
getClassSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getFilepath() |
Map<String,String> |
getOptions() |
org.apache.avro.Schema |
getSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getTableName() |
int |
hashCode() |
void |
put(int index,
Object value)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
ExportRecordsToFilesRequest |
setFilepath(String filepath) |
ExportRecordsToFilesRequest |
setOptions(Map<String,String> options) |
ExportRecordsToFilesRequest |
setTableName(String tableName) |
String |
toString() |
public ExportRecordsToFilesRequest()
public ExportRecordsToFilesRequest(String tableName, String filepath, Map<String,String> options)
tableName
- filepath
- Path to data export target. If filepath
has a
file extension, it is
read as the name of a file. If filepath
is a
directory, then the source table name with a
random UUID appended will be used as the name of each
exported file, all written to that directory.
If filepath is a filename, then all exported files will
have a random UUID appended to the given
name. In either case, the target directory specified
or implied must exist. The names of all
exported files are returned in the response.options
- Optional parameters.
BATCH_SIZE
: Number of records to be exported as a
batch. The default value is '1000000'.
COLUMN_FORMATS
: For each source column specified,
applies the column-property-bound
format. Currently supported column properties include
date, time, & datetime. The parameter value
must be formatted as a JSON string of maps of column
names to maps of column properties to their
corresponding column formats, e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time"
: { "time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format
syntax.
COLUMNS_TO_EXPORT
: Specifies a comma-delimited list of
columns from the source table to
export, written to the output file in the order they are
given.
Column names can be provided, in which case the target
file will use those names as the column
headers as well.
Alternatively, column numbers can be
specified--discretely or as a range. For example, a
value of
'5,7,1..3' will write values from the fifth column in
the source table into the first column in the
target file, from the seventh column in the source table
into the second column in the target file,
and from the first through third columns in the source
table into the third through fifth columns in
the target file.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Comma-separated list of column names
or column numbers to not
export. All columns in the source table not specified
will be written to the target file in the
order they appear in the table definition. Mutually
exclusive with
columns_to_export
.
DATASINK_NAME
: Datasink name, created using GPUdb.createDatasink(CreateDatasinkRequest)
.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to
use to write data. Currently
supported column properties include date, time, &
datetime. This default column-property-bound
format can be overridden by specifying a column property
& format for a given source column in
column_formats
. For each specified annotation,
the format will apply to all
columns with that annotation unless custom column_formats
for that
annotation are specified.
The parameter value must be formatted as a JSON string
that is a map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d",
"time" : "%H:%M:%S" }'. Column
formats are specified as a string of control characters
and plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds
and fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y',
'm', and 'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and
either 'S' or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet
both the 'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y
%H:%M:%S" }' would be used to write text
as "05/04/2000 12:12:11"
EXPORT_DDL
: Save DDL to a separate file. The default
value is 'false'.
FILE_EXTENSION
: Extension to give the export file. The
default value is '.csv'.
FILE_TYPE
: Specifies the file format to use when
exporting data.
Supported values:
DELIMITED_TEXT
: Delimited text file format; e.g., CSV,
TSV, PSV, etc.
PARQUET
DELIMITED_TEXT
.
KINETICA_HEADER
: Whether to include a Kinetica
proprietary header. Will not be
written if text_has_header
is
false
.
Supported values:
The default value is FALSE
.
KINETICA_HEADER_DELIMITER
: If a Kinetica proprietary
header is included, then specify a
property separator. Different from column delimiter.
The default value is '|'.
COMPRESSION_TYPE
: File compression type. GZip can be
applied to text and Parquet files. Snappy can only be
applied to Parquet files, and is the default compression
for them.
Supported values:
SINGLE_FILE
: Save records to a single file. This option
may be ignored if file
size exceeds internal file size limits (this limit will
differ on different targets).
Supported values:
The default value is TRUE
.
SINGLE_FILE_MAX_SIZE
: Max file size (in MB) to allow
saving to a single file. May be overridden by target
limitations. The default value is ''.
TEXT_DELIMITER
: Specifies the character to write out to
delimit field values and
field names in the header (if present).
For delimited_text
file_type
only. The
default value is ','.
TEXT_HAS_HEADER
: Indicates whether to write out a
header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_NULL_STRING
: Specifies the character string that
should be written out for the null
value in the data.
For delimited_text
file_type
only. The
default value is '\\N'.
Map
.public static org.apache.avro.Schema getClassSchema()
public String getTableName()
public ExportRecordsToFilesRequest setTableName(String tableName)
tableName
- this
to mimic the builder pattern.public String getFilepath()
filepath
has a file
extension, it is
read as the name of a file. If filepath
is a directory,
then the source table name with a
random UUID appended will be used as the name of each exported
file, all written to that directory.
If filepath is a filename, then all exported files will have a
random UUID appended to the given
name. In either case, the target directory specified or implied
must exist. The names of all
exported files are returned in the response.public ExportRecordsToFilesRequest setFilepath(String filepath)
filepath
- Path to data export target. If filepath
has a
file extension, it is
read as the name of a file. If filepath
is a
directory, then the source table name with a
random UUID appended will be used as the name of each
exported file, all written to that directory.
If filepath is a filename, then all exported files will
have a random UUID appended to the given
name. In either case, the target directory specified
or implied must exist. The names of all
exported files are returned in the response.this
to mimic the builder pattern.public Map<String,String> getOptions()
BATCH_SIZE
: Number of records to be exported as a batch. The
default value is '1000000'.
COLUMN_FORMATS
: For each source column specified, applies the
column-property-bound
format. Currently supported column properties include date,
time, & datetime. The parameter value
must be formatted as a JSON string of maps of column names to
maps of column properties to their
corresponding column formats, e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time" : {
"time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format syntax.
COLUMNS_TO_EXPORT
: Specifies a comma-delimited list of columns
from the source table to
export, written to the output file in the order they are given.
Column names can be provided, in which case the target file will
use those names as the column
headers as well.
Alternatively, column numbers can be specified--discretely or as
a range. For example, a value of
'5,7,1..3' will write values from the fifth column in the source
table into the first column in the
target file, from the seventh column in the source table into
the second column in the target file,
and from the first through third columns in the source table
into the third through fifth columns in
the target file.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Comma-separated list of column names or column
numbers to not
export. All columns in the source table not specified will be
written to the target file in the
order they appear in the table definition. Mutually exclusive
with
columns_to_export
.
DATASINK_NAME
: Datasink name, created using GPUdb.createDatasink(CreateDatasinkRequest)
.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to use to
write data. Currently
supported column properties include date, time, & datetime.
This default column-property-bound
format can be overridden by specifying a column property &
format for a given source column in
column_formats
. For each specified annotation, the
format will apply to all
columns with that annotation unless custom column_formats
for that
annotation are specified.
The parameter value must be formatted as a JSON string that is a
map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d", "time"
: "%H:%M:%S" }'. Column
formats are specified as a string of control characters and
plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's', which
follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds and
fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y', 'm', and
'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and either 'S'
or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet both the
'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }'
would be used to write text
as "05/04/2000 12:12:11"
EXPORT_DDL
: Save DDL to a separate file. The default value is
'false'.
FILE_EXTENSION
: Extension to give the export file. The default
value is '.csv'.
FILE_TYPE
: Specifies the file format to use when exporting
data.
Supported values:
DELIMITED_TEXT
: Delimited text file format; e.g., CSV, TSV,
PSV, etc.
PARQUET
DELIMITED_TEXT
.
KINETICA_HEADER
: Whether to include a Kinetica proprietary
header. Will not be
written if text_has_header
is
false
.
Supported values:
The default value is FALSE
.
KINETICA_HEADER_DELIMITER
: If a Kinetica proprietary header is
included, then specify a
property separator. Different from column delimiter. The
default value is '|'.
COMPRESSION_TYPE
: File compression type. GZip can be applied to
text and Parquet files. Snappy can only be applied to Parquet
files, and is the default compression for them.
Supported values:
SINGLE_FILE
: Save records to a single file. This option may be
ignored if file
size exceeds internal file size limits (this limit will differ
on different targets).
Supported values:
The default value is TRUE
.
SINGLE_FILE_MAX_SIZE
: Max file size (in MB) to allow saving to
a single file. May be overridden by target limitations. The
default value is ''.
TEXT_DELIMITER
: Specifies the character to write out to delimit
field values and
field names in the header (if present).
For delimited_text
file_type
only. The default
value is ','.
TEXT_HAS_HEADER
: Indicates whether to write out a header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_NULL_STRING
: Specifies the character string that should be
written out for the null
value in the data.
For delimited_text
file_type
only. The default
value is '\\N'.
Map
.public ExportRecordsToFilesRequest setOptions(Map<String,String> options)
options
- Optional parameters.
BATCH_SIZE
: Number of records to be exported as a
batch. The default value is '1000000'.
COLUMN_FORMATS
: For each source column specified,
applies the column-property-bound
format. Currently supported column properties include
date, time, & datetime. The parameter value
must be formatted as a JSON string of maps of column
names to maps of column properties to their
corresponding column formats, e.g.,
'{ "order_date" : { "date" : "%Y.%m.%d" }, "order_time"
: { "time" : "%H:%M:%S" } }'.
See default_column_formats
for valid format
syntax.
COLUMNS_TO_EXPORT
: Specifies a comma-delimited list of
columns from the source table to
export, written to the output file in the order they are
given.
Column names can be provided, in which case the target
file will use those names as the column
headers as well.
Alternatively, column numbers can be
specified--discretely or as a range. For example, a
value of
'5,7,1..3' will write values from the fifth column in
the source table into the first column in the
target file, from the seventh column in the source table
into the second column in the target file,
and from the first through third columns in the source
table into the third through fifth columns in
the target file.
Mutually exclusive with columns_to_skip
.
COLUMNS_TO_SKIP
: Comma-separated list of column names
or column numbers to not
export. All columns in the source table not specified
will be written to the target file in the
order they appear in the table definition. Mutually
exclusive with
columns_to_export
.
DATASINK_NAME
: Datasink name, created using GPUdb.createDatasink(CreateDatasinkRequest)
.
DEFAULT_COLUMN_FORMATS
: Specifies the default format to
use to write data. Currently
supported column properties include date, time, &
datetime. This default column-property-bound
format can be overridden by specifying a column property
& format for a given source column in
column_formats
. For each specified annotation,
the format will apply to all
columns with that annotation unless custom column_formats
for that
annotation are specified.
The parameter value must be formatted as a JSON string
that is a map of column properties to their
respective column formats, e.g., '{ "date" : "%Y.%m.%d",
"time" : "%H:%M:%S" }'. Column
formats are specified as a string of control characters
and plain text. The supported control
characters are 'Y', 'm', 'd', 'H', 'M', 'S', and 's',
which follow the Linux 'strptime()'
specification, as well as 's', which specifies seconds
and fractional seconds (though the fractional
component will be truncated past milliseconds).
Formats for the 'date' annotation must include the 'Y',
'm', and 'd' control characters. Formats for
the 'time' annotation must include the 'H', 'M', and
either 'S' or 's' (but not both) control
characters. Formats for the 'datetime' annotation meet
both the 'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y
%H:%M:%S" }' would be used to write text
as "05/04/2000 12:12:11"
EXPORT_DDL
: Save DDL to a separate file. The default
value is 'false'.
FILE_EXTENSION
: Extension to give the export file. The
default value is '.csv'.
FILE_TYPE
: Specifies the file format to use when
exporting data.
Supported values:
DELIMITED_TEXT
: Delimited text file format; e.g., CSV,
TSV, PSV, etc.
PARQUET
DELIMITED_TEXT
.
KINETICA_HEADER
: Whether to include a Kinetica
proprietary header. Will not be
written if text_has_header
is
false
.
Supported values:
The default value is FALSE
.
KINETICA_HEADER_DELIMITER
: If a Kinetica proprietary
header is included, then specify a
property separator. Different from column delimiter.
The default value is '|'.
COMPRESSION_TYPE
: File compression type. GZip can be
applied to text and Parquet files. Snappy can only be
applied to Parquet files, and is the default compression
for them.
Supported values:
SINGLE_FILE
: Save records to a single file. This option
may be ignored if file
size exceeds internal file size limits (this limit will
differ on different targets).
Supported values:
The default value is TRUE
.
SINGLE_FILE_MAX_SIZE
: Max file size (in MB) to allow
saving to a single file. May be overridden by target
limitations. The default value is ''.
TEXT_DELIMITER
: Specifies the character to write out to
delimit field values and
field names in the header (if present).
For delimited_text
file_type
only. The
default value is ','.
TEXT_HAS_HEADER
: Indicates whether to write out a
header row.
For delimited_text
file_type
only.
Supported values:
The default value is TRUE
.
TEXT_NULL_STRING
: Specifies the character string that
should be written out for the null
value in the data.
For delimited_text
file_type
only. The
default value is '\\N'.
Map
.this
to mimic the builder pattern.public org.apache.avro.Schema getSchema()
getSchema
in interface org.apache.avro.generic.GenericContainer
public Object get(int index)
get
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to getIndexOutOfBoundsException
public void put(int index, Object value)
put
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to setvalue
- the value to setIndexOutOfBoundsException
Copyright © 2024. All rights reserved.