ExportRecordsToFilesRequest

java.lang.Object

com.gpudb.protocol.ExportRecordsToFilesRequest

All Implemented Interfaces:

org.apache.avro.generic.GenericContainer, org.apache.avro.generic.IndexedRecord

public class ExportRecordsToFilesRequest extends Object implements org.apache.avro.generic.IndexedRecord

A set of parameters for GPUdb.exportRecordsToFiles.

Export records from a table to files. All tables can be exported, in full or partial (see COLUMNS_TO_EXPORT and COLUMNS_TO_SKIP). Additional filtering can be applied when using export table with expression through SQL. Default destination is KIFS, though other storage types (Azure, S3, GCS, and HDFS) are supported through DATASINK_NAME; see GPUdb.createDatasink.

Server’s local file system is not supported. Default file format is delimited text. See options for different file types and different options for each file type. Table is saved to a single file if within max file size limits (may vary depending on datasink type). If not, then table is split into multiple files; these may be smaller than the max size limit.

All filenames created are returned in the response.

Nested Class Summary
Nested Classes
Modifier and Type
Class
Description
static final class
ExportRecordsToFilesRequest.Options
A set of string constants for the ExportRecordsToFilesRequest parameter options.
Constructor Summary
Constructors
Constructor
Description
ExportRecordsToFilesRequest()
Constructs an ExportRecordsToFilesRequest object with default parameters.
ExportRecordsToFilesRequest(String tableName, String filepath, Map<String,String> options)
Constructs an ExportRecordsToFilesRequest object with the specified parameters.
Method Summary
Modifier and Type
Method
Description
boolean
equals(Object obj)

Object
get(int index)
This method supports the Avro framework and is not intended to be called directly by the user.
static org.apache.avro.Schema
getClassSchema()
This method supports the Avro framework and is not intended to be called directly by the user.
String
getFilepath()
Path to data export target.
Map<String,String>
getOptions()
Optional parameters.
org.apache.avro.Schema
getSchema()
This method supports the Avro framework and is not intended to be called directly by the user.
String
getTableName()
The name of the table whose records are to be exported.
int
hashCode()

void
put(int index, Object value)
This method supports the Avro framework and is not intended to be called directly by the user.
ExportRecordsToFilesRequest
setFilepath(String filepath)
Path to data export target.
ExportRecordsToFilesRequest
setOptions(Map<String,String> options)
Optional parameters.
ExportRecordsToFilesRequest
setTableName(String tableName)
The name of the table whose records are to be exported.
String
toString()

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Details
- ExportRecordsToFilesRequest
  public ExportRecordsToFilesRequest()
  Constructs an ExportRecordsToFilesRequest object with default parameters.
- ExportRecordsToFilesRequest
  public ExportRecordsToFilesRequest(String tableName, String filepath, Map<String,String> options)
  Constructs an ExportRecordsToFilesRequest object with the specified parameters.
  Parameters:
  tableName - The name of the table whose records are to be exported.
  filepath - Path to data export target. If filepath has a file extension, it is read as the name of a file. If filepath is a directory, then the source table name with a random UUID appended will be used as the name of each exported file, all written to that directory. If filepath is a filename, then all exported files will have a random UUID appended to the given name. In either case, the target directory specified or implied must exist. The names of all exported files are returned in the response.
  options - Optional parameters.
  BATCH_SIZE: Number of records to be exported as a batch. The default value is ‘1000000’.
  COLUMN_FORMATS: For each source column specified, applies the column-property-bound format. Currently supported column properties include date, time, and datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ’ “order_date” : “date” : “%Y.%m.%d” , “order_time” : “time” : “%H:%M:%S” ’. See DEFAULT_COLUMN_FORMATS for valid format syntax.
  COLUMNS_TO_EXPORT: Specifies a comma-delimited list of columns from the source table to export, written to the output file in the order they are given. Column names can be provided, in which case the target file will use those names as the column headers as well. Alternatively, column numbers can be specified—discretely or as a range. For example, a value of ‘5,7,1..3’ will write values from the fifth column in the source table into the first column in the target file, from the seventh column in the source table into the second column in the target file, and from the first through third columns in the source table into the third through fifth columns in the target file. Mutually exclusive with COLUMNS_TO_SKIP.
  COLUMNS_TO_SKIP: Comma-separated list of column names or column numbers to not export. All columns in the source table not specified will be written to the target file in the order they appear in the table definition. Mutually exclusive with COLUMNS_TO_EXPORT.
  DATASINK_NAME: Datasink name, created using GPUdb.createDatasink.
  DEFAULT_COLUMN_FORMATS: Specifies the default format to use to write data. Currently supported column properties include date, time, and datetime. This default column-property-bound format can be overridden by specifying a column property and format for a given source column in COLUMN_FORMATS. For each specified annotation, the format will apply to all columns with that annotation unless custom COLUMN_FORMATS for that annotation are specified. The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ’ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” ’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds). Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘“datetime” : “%m/%d/%Y %H:%M:%S” ’ would be used to write text as “05/04/2000 12:12:11”
  EXPORT_DDL: Save DDL to a separate file. The default value is ‘false’.
  FILE_EXTENSION: Extension to give the export file. The default value is ‘.csv’.
  FILE_TYPE: Specifies the file format to use when exporting data. Supported values:
  DELIMITED_TEXT: Delimited text file format; e.g., CSV, TSV, PSV, etc.
  PARQUET
  The default value is DELIMITED_TEXT.
  KINETICA_HEADER: Whether to include a Kinetica proprietary header. Will not be written if TEXT_HAS_HEADER is FALSE. Supported values:
  TRUE
  FALSE
  The default value is FALSE.
  KINETICA_HEADER_DELIMITER: If a Kinetica proprietary header is included, then specify a property separator. Different from column delimiter. The default value is ’|’.
  COMPRESSION_TYPE: File compression type. GZip can be applied to text and Parquet files. Snappy can only be applied to Parquet files, and is the default compression for them. Supported values:
  UNCOMPRESSED
  SNAPPY
  GZIP
  SINGLE_FILE: Save records to a single file. This option may be ignored if file size exceeds internal file size limits (this limit will differ on different targets). Supported values:
  TRUE
  FALSE
  OVERWRITE
  The default value is TRUE.
  SINGLE_FILE_MAX_SIZE: Max file size (in MB) to allow saving to a single file. May be overridden by target limitations. The default value is ”.
  TEXT_DELIMITER: Specifies the character to write out to delimit field values and field names in the header (if present). For DELIMITED_TEXT FILE_TYPE only. The default value is ’,’.
  TEXT_HAS_HEADER: Indicates whether to write out a header row. For DELIMITED_TEXTFILE_TYPE only. Supported values:
  TRUE
  FALSE
  The default value is TRUE.
  TEXT_NULL_STRING: Specifies the character string that should be written out for the null value in the data. For DELIMITED_TEXT FILE_TYPE only. The default value is ‘\N’.
  The default value is an empty Map.
Method Details
- getClassSchema
  public static org.apache.avro.Schema getClassSchema()
  This method supports the Avro framework and is not intended to be called directly by the user.
  Returns:
  The schema for the class.
- getTableName
  public String getTableName()
  The name of the table whose records are to be exported.
  Returns:
  The current value of tableName.
- setTableName
  public ExportRecordsToFilesRequest setTableName(String tableName)
  The name of the table whose records are to be exported.
  Parameters:
  tableName - The new value for tableName.
  Returns:
  this to mimic the builder pattern.
- getFilepath
  public String getFilepath()
  Path to data export target. If filepath has a file extension, it is read as the name of a file. If filepath is a directory, then the source table name with a random UUID appended will be used as the name of each exported file, all written to that directory. If filepath is a filename, then all exported files will have a random UUID appended to the given name. In either case, the target directory specified or implied must exist. The names of all exported files are returned in the response.
  Returns:
  The current value of filepath.
- setFilepath
  public ExportRecordsToFilesRequest setFilepath(String filepath)
  Path to data export target. If filepath has a file extension, it is read as the name of a file. If filepath is a directory, then the source table name with a random UUID appended will be used as the name of each exported file, all written to that directory. If filepath is a filename, then all exported files will have a random UUID appended to the given name. In either case, the target directory specified or implied must exist. The names of all exported files are returned in the response.
  Parameters:
  filepath - The new value for filepath.
  Returns:
  this to mimic the builder pattern.
- getOptions
  public Map<String,String> getOptions()
  Optional parameters.
  BATCH_SIZE: Number of records to be exported as a batch. The default value is ‘1000000’.
  COLUMN_FORMATS: For each source column specified, applies the column-property-bound format. Currently supported column properties include date, time, and datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ’ “order_date” : “date” : “%Y.%m.%d” , “order_time” : “time” : “%H:%M:%S” ’. See DEFAULT_COLUMN_FORMATS for valid format syntax.
  COLUMNS_TO_EXPORT: Specifies a comma-delimited list of columns from the source table to export, written to the output file in the order they are given. Column names can be provided, in which case the target file will use those names as the column headers as well. Alternatively, column numbers can be specified—discretely or as a range. For example, a value of ‘5,7,1..3’ will write values from the fifth column in the source table into the first column in the target file, from the seventh column in the source table into the second column in the target file, and from the first through third columns in the source table into the third through fifth columns in the target file. Mutually exclusive with COLUMNS_TO_SKIP.
  COLUMNS_TO_SKIP: Comma-separated list of column names or column numbers to not export. All columns in the source table not specified will be written to the target file in the order they appear in the table definition. Mutually exclusive with COLUMNS_TO_EXPORT.
  DATASINK_NAME: Datasink name, created using GPUdb.createDatasink.
  DEFAULT_COLUMN_FORMATS: Specifies the default format to use to write data. Currently supported column properties include date, time, and datetime. This default column-property-bound format can be overridden by specifying a column property and format for a given source column in COLUMN_FORMATS. For each specified annotation, the format will apply to all columns with that annotation unless custom COLUMN_FORMATS for that annotation are specified. The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ’ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” ’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds). Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘“datetime” : “%m/%d/%Y %H:%M:%S” ’ would be used to write text as “05/04/2000 12:12:11”
  EXPORT_DDL: Save DDL to a separate file. The default value is ‘false’.
  FILE_EXTENSION: Extension to give the export file. The default value is ‘.csv’.
  FILE_TYPE: Specifies the file format to use when exporting data. Supported values:
  DELIMITED_TEXT: Delimited text file format; e.g., CSV, TSV, PSV, etc.
  PARQUET
  The default value is DELIMITED_TEXT.
  KINETICA_HEADER: Whether to include a Kinetica proprietary header. Will not be written if TEXT_HAS_HEADER is FALSE. Supported values:
  TRUE
  FALSE
  The default value is FALSE.
  KINETICA_HEADER_DELIMITER: If a Kinetica proprietary header is included, then specify a property separator. Different from column delimiter. The default value is ’|’.
  COMPRESSION_TYPE: File compression type. GZip can be applied to text and Parquet files. Snappy can only be applied to Parquet files, and is the default compression for them. Supported values:
  UNCOMPRESSED
  SNAPPY
  GZIP
  SINGLE_FILE: Save records to a single file. This option may be ignored if file size exceeds internal file size limits (this limit will differ on different targets). Supported values:
  TRUE
  FALSE
  OVERWRITE
  The default value is TRUE.
  SINGLE_FILE_MAX_SIZE: Max file size (in MB) to allow saving to a single file. May be overridden by target limitations. The default value is ”.
  TEXT_DELIMITER: Specifies the character to write out to delimit field values and field names in the header (if present). For DELIMITED_TEXT FILE_TYPE only. The default value is ’,’.
  TEXT_HAS_HEADER: Indicates whether to write out a header row. For DELIMITED_TEXTFILE_TYPE only. Supported values:
  TRUE
  FALSE
  The default value is TRUE.
  TEXT_NULL_STRING: Specifies the character string that should be written out for the null value in the data. For DELIMITED_TEXT FILE_TYPE only. The default value is ‘\N’.
  The default value is an empty Map.
  Returns:
  The current value of options.
- setOptions
  public ExportRecordsToFilesRequest setOptions(Map<String,String> options)
  Optional parameters.
  BATCH_SIZE: Number of records to be exported as a batch. The default value is ‘1000000’.
  COLUMN_FORMATS: For each source column specified, applies the column-property-bound format. Currently supported column properties include date, time, and datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ’ “order_date” : “date” : “%Y.%m.%d” , “order_time” : “time” : “%H:%M:%S” ’. See DEFAULT_COLUMN_FORMATS for valid format syntax.
  COLUMNS_TO_EXPORT: Specifies a comma-delimited list of columns from the source table to export, written to the output file in the order they are given. Column names can be provided, in which case the target file will use those names as the column headers as well. Alternatively, column numbers can be specified—discretely or as a range. For example, a value of ‘5,7,1..3’ will write values from the fifth column in the source table into the first column in the target file, from the seventh column in the source table into the second column in the target file, and from the first through third columns in the source table into the third through fifth columns in the target file. Mutually exclusive with COLUMNS_TO_SKIP.
  COLUMNS_TO_SKIP: Comma-separated list of column names or column numbers to not export. All columns in the source table not specified will be written to the target file in the order they appear in the table definition. Mutually exclusive with COLUMNS_TO_EXPORT.
  DATASINK_NAME: Datasink name, created using GPUdb.createDatasink.
  DEFAULT_COLUMN_FORMATS: Specifies the default format to use to write data. Currently supported column properties include date, time, and datetime. This default column-property-bound format can be overridden by specifying a column property and format for a given source column in COLUMN_FORMATS. For each specified annotation, the format will apply to all columns with that annotation unless custom COLUMN_FORMATS for that annotation are specified. The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ’ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” ’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds). Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘“datetime” : “%m/%d/%Y %H:%M:%S” ’ would be used to write text as “05/04/2000 12:12:11”
  EXPORT_DDL: Save DDL to a separate file. The default value is ‘false’.
  FILE_EXTENSION: Extension to give the export file. The default value is ‘.csv’.
  FILE_TYPE: Specifies the file format to use when exporting data. Supported values:
  DELIMITED_TEXT: Delimited text file format; e.g., CSV, TSV, PSV, etc.
  PARQUET
  The default value is DELIMITED_TEXT.
  KINETICA_HEADER: Whether to include a Kinetica proprietary header. Will not be written if TEXT_HAS_HEADER is FALSE. Supported values:
  TRUE
  FALSE
  The default value is FALSE.
  KINETICA_HEADER_DELIMITER: If a Kinetica proprietary header is included, then specify a property separator. Different from column delimiter. The default value is ’|’.
  COMPRESSION_TYPE: File compression type. GZip can be applied to text and Parquet files. Snappy can only be applied to Parquet files, and is the default compression for them. Supported values:
  UNCOMPRESSED
  SNAPPY
  GZIP
  SINGLE_FILE: Save records to a single file. This option may be ignored if file size exceeds internal file size limits (this limit will differ on different targets). Supported values:
  TRUE
  FALSE
  OVERWRITE
  The default value is TRUE.
  SINGLE_FILE_MAX_SIZE: Max file size (in MB) to allow saving to a single file. May be overridden by target limitations. The default value is ”.
  TEXT_DELIMITER: Specifies the character to write out to delimit field values and field names in the header (if present). For DELIMITED_TEXT FILE_TYPE only. The default value is ’,’.
  TEXT_HAS_HEADER: Indicates whether to write out a header row. For DELIMITED_TEXTFILE_TYPE only. Supported values:
  TRUE
  FALSE
  The default value is TRUE.
  TEXT_NULL_STRING: Specifies the character string that should be written out for the null value in the data. For DELIMITED_TEXT FILE_TYPE only. The default value is ‘\N’.
  The default value is an empty Map.
  Parameters:
  options - The new value for options.
  Returns:
  this to mimic the builder pattern.
- getSchema
  public org.apache.avro.Schema getSchema()
  This method supports the Avro framework and is not intended to be called directly by the user.
  Specified by:
  getSchema in interface org.apache.avro.generic.GenericContainer
  Returns:
  The schema object describing this class.
- get
  public Object get(int index)
  This method supports the Avro framework and is not intended to be called directly by the user.
  Specified by:
  get in interface org.apache.avro.generic.IndexedRecord
  Parameters:
  index - the position of the field to get
  Returns:
  value of the field with the given index.
  Throws:
  IndexOutOfBoundsException
- put
  public void put(int index, Object value)
  This method supports the Avro framework and is not intended to be called directly by the user.
  Specified by:
  put in interface org.apache.avro.generic.IndexedRecord
  Parameters:
  index - the position of the field to set
  value - the value to set
  Throws:
  IndexOutOfBoundsException
- equals
  public boolean equals(Object obj)
  Overrides:
  equals in class Object
- toString
  public String toString()
  Overrides:
  toString in class Object
- hashCode
  public int hashCode()
  Overrides:
  hashCode in class Object

ExportQueryMetricsResponse.Info ExportRecordsToFilesRequest.Options

⌘I

API

ExportRecordsToFilesRequest

Class ExportRecordsToFilesRequest

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details