public class ExportRecordsToFilesRequest extends Object implements org.apache.avro.generic.IndexedRecord
GPUdb.exportRecordsToFiles
.
Export records from a table to files. All tables can be exported, in full or
partial (see COLUMNS_TO_EXPORT
and COLUMNS_TO_SKIP
). Additional filtering can be
applied when using export table with expression through SQL. Default
destination is KIFS, though other storage types (Azure, S3, GCS, and HDFS)
are supported through DATASINK_NAME
; see
GPUdb.createDatasink
.
Server's local file system is not supported. Default file format is delimited text. See options for different file types and different options for each file type. Table is saved to a single file if within max file size limits (may vary depending on datasink type). If not, then table is split into multiple files; these may be smaller than the max size limit.
All filenames created are returned in the response.
Modifier and Type | Class and Description |
---|---|
static class |
ExportRecordsToFilesRequest.Options
A set of string constants for the
ExportRecordsToFilesRequest
parameter options . |
Constructor and Description |
---|
ExportRecordsToFilesRequest()
Constructs an ExportRecordsToFilesRequest object with default
parameters.
|
ExportRecordsToFilesRequest(String tableName,
String filepath,
Map<String,String> options)
Constructs an ExportRecordsToFilesRequest object with the specified
parameters.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
Object |
get(int index)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
static org.apache.avro.Schema |
getClassSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getFilepath()
Path to data export target.
|
Map<String,String> |
getOptions()
Optional parameters.
|
org.apache.avro.Schema |
getSchema()
This method supports the Avro framework and is not intended to be called
directly by the user.
|
String |
getTableName() |
int |
hashCode() |
void |
put(int index,
Object value)
This method supports the Avro framework and is not intended to be called
directly by the user.
|
ExportRecordsToFilesRequest |
setFilepath(String filepath)
Path to data export target.
|
ExportRecordsToFilesRequest |
setOptions(Map<String,String> options)
Optional parameters.
|
ExportRecordsToFilesRequest |
setTableName(String tableName) |
String |
toString() |
public ExportRecordsToFilesRequest()
public ExportRecordsToFilesRequest(String tableName, String filepath, Map<String,String> options)
tableName
- filepath
- Path to data export target. If filepath
has a
file extension, it is read as the name of a file. If
filepath
is a directory, then the source table
name with a random UUID appended will be used as the
name of each exported file, all written to that
directory. If filepath is a filename, then all exported
files will have a random UUID appended to the given
name. In either case, the target directory specified
or implied must exist. The names of all exported files
are returned in the response.options
- Optional parameters.
BATCH_SIZE
: Number of
records to be exported as a batch. The default
value is '1000000'.
COLUMN_FORMATS
:
For each source column specified, applies the
column-property-bound format. Currently
supported column properties include date, time,
& datetime. The parameter value must be
formatted as a JSON string of maps of column
names to maps of column properties to their
corresponding column formats, e.g., '{
"order_date" : { "date" : "%Y.%m.%d" },
"order_time" : { "time" : "%H:%M:%S" } }'. See
DEFAULT_COLUMN_FORMATS
for valid format syntax.
COLUMNS_TO_EXPORT
: Specifies a comma-delimited
list of columns from the source table to export,
written to the output file in the order they are
given. Column names can be provided, in which
case the target file will use those names as the
column headers as well. Alternatively, column
numbers can be specified--discretely or as a
range. For example, a value of '5,7,1..3' will
write values from the fifth column in the source
table into the first column in the target file,
from the seventh column in the source table into
the second column in the target file, and from
the first through third columns in the source
table into the third through fifth columns in
the target file. Mutually exclusive with COLUMNS_TO_SKIP
.
COLUMNS_TO_SKIP
:
Comma-separated list of column names or column
numbers to not export. All columns in the
source table not specified will be written to
the target file in the order they appear in the
table definition. Mutually exclusive with
COLUMNS_TO_EXPORT
.
DATASINK_NAME
:
Datasink name, created using GPUdb.createDatasink
.
DEFAULT_COLUMN_FORMATS
: Specifies the default
format to use to write data. Currently
supported column properties include date, time,
& datetime. This default column-property-bound
format can be overridden by specifying a column
property & format for a given source column in
COLUMN_FORMATS
.
For each specified annotation, the format will
apply to all columns with that annotation unless
custom COLUMN_FORMATS
for that annotation are
specified. The parameter value must be
formatted as a JSON string that is a map of
column properties to their respective column
formats, e.g., '{ "date" : "%Y.%m.%d", "time" :
"%H:%M:%S" }'. Column formats are specified as
a string of control characters and plain text.
The supported control characters are 'Y', 'm',
'd', 'H', 'M', 'S', and 's', which follow the
Linux 'strptime()' specification, as well as
's', which specifies seconds and fractional
seconds (though the fractional component will be
truncated past milliseconds). Formats for the
'date' annotation must include the 'Y', 'm', and
'd' control characters. Formats for the 'time'
annotation must include the 'H', 'M', and either
'S' or 's' (but not both) control characters.
Formats for the 'datetime' annotation meet both
the 'date' and 'time' control character
requirements. For example, '{"datetime" :
"%m/%d/%Y %H:%M:%S" }' would be used to write
text as "05/04/2000 12:12:11"
EXPORT_DDL
: Save DDL
to a separate file. The default value is
'false'.
FILE_EXTENSION
:
Extension to give the export file. The default
value is '.csv'.
FILE_TYPE
: Specifies
the file format to use when exporting data.
Supported values:
DELIMITED_TEXT
: Delimited text file
format; e.g., CSV, TSV, PSV, etc.
PARQUET
DELIMITED_TEXT
.
KINETICA_HEADER
:
Whether to include a Kinetica proprietary
header. Will not be written if TEXT_HAS_HEADER
is
FALSE
.
Supported values:
The default value is FALSE
.
KINETICA_HEADER_DELIMITER
: If a Kinetica
proprietary header is included, then specify a
property separator. Different from column
delimiter. The default value is '|'.
COMPRESSION_TYPE
: File compression type. GZip
can be applied to text and Parquet files.
Snappy can only be applied to Parquet files, and
is the default compression for them.
Supported values:
SINGLE_FILE
: Save
records to a single file. This option may be
ignored if file size exceeds internal file size
limits (this limit will differ on different
targets).
Supported values:
The default value is TRUE
.
SINGLE_FILE_MAX_SIZE
: Max file size (in MB) to
allow saving to a single file. May be overridden
by target limitations. The default value is ''.
TEXT_DELIMITER
:
Specifies the character to write out to delimit
field values and field names in the header (if
present). For DELIMITED_TEXT
FILE_TYPE
only. The default value is ','.
TEXT_HAS_HEADER
:
Indicates whether to write out a header row.
For DELIMITED_TEXT
FILE_TYPE
only.
Supported values:
The default value is TRUE
.
TEXT_NULL_STRING
: Specifies the character
string that should be written out for the null
value in the data. For DELIMITED_TEXT
FILE_TYPE
only. The default
value is '\N'.
Map
.public static org.apache.avro.Schema getClassSchema()
public String getTableName()
tableName
.public ExportRecordsToFilesRequest setTableName(String tableName)
tableName
- The new value for tableName
.this
to mimic the builder pattern.public String getFilepath()
filepath
has a
file extension, it is read as the name of a file. If filepath
is a directory, then the source table name with
a random UUID appended will be used as the name of each exported file,
all written to that directory. If filepath is a filename, then all
exported files will have a random UUID appended to the given name. In
either case, the target directory specified or implied must exist. The
names of all exported files are returned in the response.filepath
.public ExportRecordsToFilesRequest setFilepath(String filepath)
filepath
has a
file extension, it is read as the name of a file. If filepath
is a directory, then the source table name with
a random UUID appended will be used as the name of each exported file,
all written to that directory. If filepath is a filename, then all
exported files will have a random UUID appended to the given name. In
either case, the target directory specified or implied must exist. The
names of all exported files are returned in the response.filepath
- The new value for filepath
.this
to mimic the builder pattern.public Map<String,String> getOptions()
BATCH_SIZE
: Number of records to be
exported as a batch. The default value is '1000000'.
COLUMN_FORMATS
: For each source
column specified, applies the column-property-bound format.
Currently supported column properties include date, time, &
datetime. The parameter value must be formatted as a JSON string
of maps of column names to maps of column properties to their
corresponding column formats, e.g., '{ "order_date" : { "date" :
"%Y.%m.%d" }, "order_time" : { "time" : "%H:%M:%S" } }'. See
DEFAULT_COLUMN_FORMATS
for valid format syntax.
COLUMNS_TO_EXPORT
: Specifies a
comma-delimited list of columns from the source table to export,
written to the output file in the order they are given. Column
names can be provided, in which case the target file will use
those names as the column headers as well. Alternatively,
column numbers can be specified--discretely or as a range. For
example, a value of '5,7,1..3' will write values from the fifth
column in the source table into the first column in the target
file, from the seventh column in the source table into the
second column in the target file, and from the first through
third columns in the source table into the third through fifth
columns in the target file. Mutually exclusive with COLUMNS_TO_SKIP
.
COLUMNS_TO_SKIP
: Comma-separated
list of column names or column numbers to not export. All
columns in the source table not specified will be written to the
target file in the order they appear in the table definition.
Mutually exclusive with COLUMNS_TO_EXPORT
.
DATASINK_NAME
: Datasink name,
created using GPUdb.createDatasink
.
DEFAULT_COLUMN_FORMATS
:
Specifies the default format to use to write data. Currently
supported column properties include date, time, & datetime.
This default column-property-bound format can be overridden by
specifying a column property & format for a given source column
in COLUMN_FORMATS
. For each
specified annotation, the format will apply to all columns with
that annotation unless custom COLUMN_FORMATS
for that annotation are specified. The
parameter value must be formatted as a JSON string that is a map
of column properties to their respective column formats, e.g.,
'{ "date" : "%Y.%m.%d", "time" : "%H:%M:%S" }'. Column formats
are specified as a string of control characters and plain text.
The supported control characters are 'Y', 'm', 'd', 'H', 'M',
'S', and 's', which follow the Linux 'strptime()' specification,
as well as 's', which specifies seconds and fractional seconds
(though the fractional component will be truncated past
milliseconds). Formats for the 'date' annotation must include
the 'Y', 'm', and 'd' control characters. Formats for the 'time'
annotation must include the 'H', 'M', and either 'S' or 's' (but
not both) control characters. Formats for the 'datetime'
annotation meet both the 'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }'
would be used to write text as "05/04/2000 12:12:11"
EXPORT_DDL
: Save DDL to a separate
file. The default value is 'false'.
FILE_EXTENSION
: Extension to give
the export file. The default value is '.csv'.
FILE_TYPE
: Specifies the file format
to use when exporting data.
Supported values:
DELIMITED_TEXT
: Delimited
text file format; e.g., CSV, TSV, PSV, etc.
PARQUET
DELIMITED_TEXT
.
KINETICA_HEADER
: Whether to
include a Kinetica proprietary header. Will not be written if
TEXT_HAS_HEADER
is FALSE
.
Supported values:
The default value is FALSE
.
KINETICA_HEADER_DELIMITER
: If a Kinetica proprietary header is
included, then specify a property separator. Different from
column delimiter. The default value is '|'.
COMPRESSION_TYPE
: File
compression type. GZip can be applied to text and Parquet files.
Snappy can only be applied to Parquet files, and is the default
compression for them.
Supported values:
SINGLE_FILE
: Save records to a
single file. This option may be ignored if file size exceeds
internal file size limits (this limit will differ on different
targets).
Supported values:
The default value is TRUE
.
SINGLE_FILE_MAX_SIZE
: Max
file size (in MB) to allow saving to a single file. May be
overridden by target limitations. The default value is ''.
TEXT_DELIMITER
: Specifies the
character to write out to delimit field values and field names
in the header (if present). For DELIMITED_TEXT
FILE_TYPE
only. The
default value is ','.
TEXT_HAS_HEADER
: Indicates
whether to write out a header row. For DELIMITED_TEXT
FILE_TYPE
only.
Supported values:
The default value is TRUE
.
TEXT_NULL_STRING
: Specifies the
character string that should be written out for the null value
in the data. For DELIMITED_TEXT
FILE_TYPE
only. The default value is
'\N'.
Map
.options
.public ExportRecordsToFilesRequest setOptions(Map<String,String> options)
BATCH_SIZE
: Number of records to be
exported as a batch. The default value is '1000000'.
COLUMN_FORMATS
: For each source
column specified, applies the column-property-bound format.
Currently supported column properties include date, time, &
datetime. The parameter value must be formatted as a JSON string
of maps of column names to maps of column properties to their
corresponding column formats, e.g., '{ "order_date" : { "date" :
"%Y.%m.%d" }, "order_time" : { "time" : "%H:%M:%S" } }'. See
DEFAULT_COLUMN_FORMATS
for valid format syntax.
COLUMNS_TO_EXPORT
: Specifies a
comma-delimited list of columns from the source table to export,
written to the output file in the order they are given. Column
names can be provided, in which case the target file will use
those names as the column headers as well. Alternatively,
column numbers can be specified--discretely or as a range. For
example, a value of '5,7,1..3' will write values from the fifth
column in the source table into the first column in the target
file, from the seventh column in the source table into the
second column in the target file, and from the first through
third columns in the source table into the third through fifth
columns in the target file. Mutually exclusive with COLUMNS_TO_SKIP
.
COLUMNS_TO_SKIP
: Comma-separated
list of column names or column numbers to not export. All
columns in the source table not specified will be written to the
target file in the order they appear in the table definition.
Mutually exclusive with COLUMNS_TO_EXPORT
.
DATASINK_NAME
: Datasink name,
created using GPUdb.createDatasink
.
DEFAULT_COLUMN_FORMATS
:
Specifies the default format to use to write data. Currently
supported column properties include date, time, & datetime.
This default column-property-bound format can be overridden by
specifying a column property & format for a given source column
in COLUMN_FORMATS
. For each
specified annotation, the format will apply to all columns with
that annotation unless custom COLUMN_FORMATS
for that annotation are specified. The
parameter value must be formatted as a JSON string that is a map
of column properties to their respective column formats, e.g.,
'{ "date" : "%Y.%m.%d", "time" : "%H:%M:%S" }'. Column formats
are specified as a string of control characters and plain text.
The supported control characters are 'Y', 'm', 'd', 'H', 'M',
'S', and 's', which follow the Linux 'strptime()' specification,
as well as 's', which specifies seconds and fractional seconds
(though the fractional component will be truncated past
milliseconds). Formats for the 'date' annotation must include
the 'Y', 'm', and 'd' control characters. Formats for the 'time'
annotation must include the 'H', 'M', and either 'S' or 's' (but
not both) control characters. Formats for the 'datetime'
annotation meet both the 'date' and 'time' control character
requirements. For example, '{"datetime" : "%m/%d/%Y %H:%M:%S" }'
would be used to write text as "05/04/2000 12:12:11"
EXPORT_DDL
: Save DDL to a separate
file. The default value is 'false'.
FILE_EXTENSION
: Extension to give
the export file. The default value is '.csv'.
FILE_TYPE
: Specifies the file format
to use when exporting data.
Supported values:
DELIMITED_TEXT
: Delimited
text file format; e.g., CSV, TSV, PSV, etc.
PARQUET
DELIMITED_TEXT
.
KINETICA_HEADER
: Whether to
include a Kinetica proprietary header. Will not be written if
TEXT_HAS_HEADER
is FALSE
.
Supported values:
The default value is FALSE
.
KINETICA_HEADER_DELIMITER
: If a Kinetica proprietary header is
included, then specify a property separator. Different from
column delimiter. The default value is '|'.
COMPRESSION_TYPE
: File
compression type. GZip can be applied to text and Parquet files.
Snappy can only be applied to Parquet files, and is the default
compression for them.
Supported values:
SINGLE_FILE
: Save records to a
single file. This option may be ignored if file size exceeds
internal file size limits (this limit will differ on different
targets).
Supported values:
The default value is TRUE
.
SINGLE_FILE_MAX_SIZE
: Max
file size (in MB) to allow saving to a single file. May be
overridden by target limitations. The default value is ''.
TEXT_DELIMITER
: Specifies the
character to write out to delimit field values and field names
in the header (if present). For DELIMITED_TEXT
FILE_TYPE
only. The
default value is ','.
TEXT_HAS_HEADER
: Indicates
whether to write out a header row. For DELIMITED_TEXT
FILE_TYPE
only.
Supported values:
The default value is TRUE
.
TEXT_NULL_STRING
: Specifies the
character string that should be written out for the null value
in the data. For DELIMITED_TEXT
FILE_TYPE
only. The default value is
'\N'.
Map
.options
- The new value for options
.this
to mimic the builder pattern.public org.apache.avro.Schema getSchema()
getSchema
in interface org.apache.avro.generic.GenericContainer
public Object get(int index)
get
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to getIndexOutOfBoundsException
public void put(int index, Object value)
put
in interface org.apache.avro.generic.IndexedRecord
index
- the position of the field to setvalue
- the value to setIndexOutOfBoundsException
Copyright © 2025. All rights reserved.