Class InsertRecordsFromQueryRequest

  • All Implemented Interfaces:
    org.apache.avro.generic.GenericContainer, org.apache.avro.generic.IndexedRecord

    public class InsertRecordsFromQueryRequest
    extends Object
    implements org.apache.avro.generic.IndexedRecord
    A set of parameters for GPUdb.insertRecordsFromQuery.

    Computes remote query result and inserts the result data into a new or existing table

    • Constructor Detail

      • InsertRecordsFromQueryRequest

        public InsertRecordsFromQueryRequest()
        Constructs an InsertRecordsFromQueryRequest object with default parameters.
      • InsertRecordsFromQueryRequest

        public InsertRecordsFromQueryRequest​(String tableName,
                                             String remoteQuery,
                                             Map<String,​Map<String,​String>> modifyColumns,
                                             Map<String,​String> createTableOptions,
                                             Map<String,​String> options)
        Constructs an InsertRecordsFromQueryRequest object with the specified parameters.
        Parameters:
        tableName - Name of the table into which the data will be inserted, in [schema_name.]table_name format, using standard name resolution rules. If the table does not exist, the table will be created using either an existing TYPE_ID or the type inferred from the remote query, and the new table name will have to meet standard table naming criteria.
        remoteQuery - Query for which result data needs to be imported
        modifyColumns - Not implemented yet. The default value is an empty Map.
        createTableOptions - Options used when creating the target table. The default value is an empty Map.
        options - Optional parameters.
        • BAD_RECORD_TABLE_NAME: Optional name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error handling is Abort, bad records table is not populated.
        • BAD_RECORD_TABLE_LIMIT: A positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000
        • BATCH_SIZE: Number of records per batch when inserting data.
        • DATASOURCE_NAME: Name of an existing external data source from which table will be loaded
        • ERROR_HANDLING: Specifies how errors should be handled upon insertion. Supported values:
          • PERMISSIVE: Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.
          • IGNORE_BAD_RECORDS: Malformed records are skipped.
          • ABORT: Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.
          The default value is ABORT.
        • IGNORE_EXISTING_PK: Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when UPDATE_ON_EXISTING_PK is FALSE). If set to TRUE, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If FALSE, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by ERROR_HANDLING. If the specified table does not have a primary key or if upsert mode is in effect (UPDATE_ON_EXISTING_PK is TRUE), then this option has no effect. Supported values:
          • TRUE: Ignore new records whose primary key values collide with those of existing records
          • FALSE: Treat as errors any new records whose primary key values collide with those of existing records
          The default value is FALSE.
        • INGESTION_MODE: Whether to do a full load, dry run, or perform a type inference on the source data. Supported values:
          • FULL: Run a type inference on the source data (if needed) and ingest
          • DRY_RUN: Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of ERROR_HANDLING.
          • TYPE_INFERENCE_ONLY: Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.
          The default value is FULL.
        • JDBC_FETCH_SIZE: The JDBC fetch size, which determines how many rows to fetch per round trip.
        • JDBC_SESSION_INIT_STATEMENT: Executes the statement per each jdbc session before doing actual load. The default value is ''.
        • NUM_SPLITS_PER_RANK: Optional: number of splits for reading data per rank. Default will be external_file_reader_num_tasks. The default value is ''.
        • NUM_TASKS_PER_RANK: Optional: number of tasks for reading data per rank. Default will be external_file_reader_num_tasks
        • PRIMARY_KEYS: Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
        • SHARD_KEYS: Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
        • SUBSCRIBE: Continuously poll the data source to check for new data and load it into the table. Supported values: The default value is FALSE.
        • TRUNCATE_TABLE: If set to TRUE, truncates the table specified by tableName prior to loading the data. Supported values: The default value is FALSE.
        • REMOTE_QUERY: Remote SQL query from which data will be sourced
        • REMOTE_QUERY_ORDER_BY: Name of column to be used for splitting the query into multiple sub-queries using ordering of given column. The default value is ''.
        • REMOTE_QUERY_FILTER_COLUMN: Name of column to be used for splitting the query into multiple sub-queries using the data distribution of given column. The default value is ''.
        • REMOTE_QUERY_INCREASING_COLUMN: Column on subscribed remote query result that will increase for new records (e.g., TIMESTAMP). The default value is ''.
        • REMOTE_QUERY_PARTITION_COLUMN: Alias name for remote_query_filter_column. The default value is ''.
        • TRUNCATE_STRINGS: If set to TRUE, truncate string values that are longer than the column's type size. Supported values: The default value is FALSE.
        • UPDATE_ON_EXISTING_PK: Specifies the record collision policy for inserting into a table with a primary key. If set to TRUE, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be "upserted"). If set to FALSE, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by IGNORE_EXISTING_PK and ERROR_HANDLING. If the specified table does not have a primary key, then this option has no effect. Supported values:
          • TRUE: Upsert new records when primary keys match existing records
          • FALSE: Reject new records when primary keys match existing records
          The default value is FALSE.
        The default value is an empty Map.
    • Method Detail

      • getClassSchema

        public static org.apache.avro.Schema getClassSchema()
        This method supports the Avro framework and is not intended to be called directly by the user.
        Returns:
        The schema for the class.
      • getTableName

        public String getTableName()
        Name of the table into which the data will be inserted, in [schema_name.]table_name format, using standard name resolution rules. If the table does not exist, the table will be created using either an existing TYPE_ID or the type inferred from the remote query, and the new table name will have to meet standard table naming criteria.
        Returns:
        The current value of tableName.
      • setTableName

        public InsertRecordsFromQueryRequest setTableName​(String tableName)
        Name of the table into which the data will be inserted, in [schema_name.]table_name format, using standard name resolution rules. If the table does not exist, the table will be created using either an existing TYPE_ID or the type inferred from the remote query, and the new table name will have to meet standard table naming criteria.
        Parameters:
        tableName - The new value for tableName.
        Returns:
        this to mimic the builder pattern.
      • getRemoteQuery

        public String getRemoteQuery()
        Query for which result data needs to be imported
        Returns:
        The current value of remoteQuery.
      • setRemoteQuery

        public InsertRecordsFromQueryRequest setRemoteQuery​(String remoteQuery)
        Query for which result data needs to be imported
        Parameters:
        remoteQuery - The new value for remoteQuery.
        Returns:
        this to mimic the builder pattern.
      • getModifyColumns

        public Map<String,​Map<String,​String>> getModifyColumns()
        Not implemented yet. The default value is an empty Map.
        Returns:
        The current value of modifyColumns.
      • setModifyColumns

        public InsertRecordsFromQueryRequest setModifyColumns​(Map<String,​Map<String,​String>> modifyColumns)
        Not implemented yet. The default value is an empty Map.
        Parameters:
        modifyColumns - The new value for modifyColumns.
        Returns:
        this to mimic the builder pattern.
      • getOptions

        public Map<String,​String> getOptions()
        Optional parameters.
        • BAD_RECORD_TABLE_NAME: Optional name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error handling is Abort, bad records table is not populated.
        • BAD_RECORD_TABLE_LIMIT: A positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000
        • BATCH_SIZE: Number of records per batch when inserting data.
        • DATASOURCE_NAME: Name of an existing external data source from which table will be loaded
        • ERROR_HANDLING: Specifies how errors should be handled upon insertion. Supported values:
          • PERMISSIVE: Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.
          • IGNORE_BAD_RECORDS: Malformed records are skipped.
          • ABORT: Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.
          The default value is ABORT.
        • IGNORE_EXISTING_PK: Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when UPDATE_ON_EXISTING_PK is FALSE). If set to TRUE, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If FALSE, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by ERROR_HANDLING. If the specified table does not have a primary key or if upsert mode is in effect (UPDATE_ON_EXISTING_PK is TRUE), then this option has no effect. Supported values:
          • TRUE: Ignore new records whose primary key values collide with those of existing records
          • FALSE: Treat as errors any new records whose primary key values collide with those of existing records
          The default value is FALSE.
        • INGESTION_MODE: Whether to do a full load, dry run, or perform a type inference on the source data. Supported values:
          • FULL: Run a type inference on the source data (if needed) and ingest
          • DRY_RUN: Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of ERROR_HANDLING.
          • TYPE_INFERENCE_ONLY: Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.
          The default value is FULL.
        • JDBC_FETCH_SIZE: The JDBC fetch size, which determines how many rows to fetch per round trip.
        • JDBC_SESSION_INIT_STATEMENT: Executes the statement per each jdbc session before doing actual load. The default value is ''.
        • NUM_SPLITS_PER_RANK: Optional: number of splits for reading data per rank. Default will be external_file_reader_num_tasks. The default value is ''.
        • NUM_TASKS_PER_RANK: Optional: number of tasks for reading data per rank. Default will be external_file_reader_num_tasks
        • PRIMARY_KEYS: Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
        • SHARD_KEYS: Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
        • SUBSCRIBE: Continuously poll the data source to check for new data and load it into the table. Supported values: The default value is FALSE.
        • TRUNCATE_TABLE: If set to TRUE, truncates the table specified by tableName prior to loading the data. Supported values: The default value is FALSE.
        • REMOTE_QUERY: Remote SQL query from which data will be sourced
        • REMOTE_QUERY_ORDER_BY: Name of column to be used for splitting the query into multiple sub-queries using ordering of given column. The default value is ''.
        • REMOTE_QUERY_FILTER_COLUMN: Name of column to be used for splitting the query into multiple sub-queries using the data distribution of given column. The default value is ''.
        • REMOTE_QUERY_INCREASING_COLUMN: Column on subscribed remote query result that will increase for new records (e.g., TIMESTAMP). The default value is ''.
        • REMOTE_QUERY_PARTITION_COLUMN: Alias name for remote_query_filter_column. The default value is ''.
        • TRUNCATE_STRINGS: If set to TRUE, truncate string values that are longer than the column's type size. Supported values: The default value is FALSE.
        • UPDATE_ON_EXISTING_PK: Specifies the record collision policy for inserting into a table with a primary key. If set to TRUE, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be "upserted"). If set to FALSE, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by IGNORE_EXISTING_PK and ERROR_HANDLING. If the specified table does not have a primary key, then this option has no effect. Supported values:
          • TRUE: Upsert new records when primary keys match existing records
          • FALSE: Reject new records when primary keys match existing records
          The default value is FALSE.
        The default value is an empty Map.
        Returns:
        The current value of options.
      • setOptions

        public InsertRecordsFromQueryRequest setOptions​(Map<String,​String> options)
        Optional parameters.
        • BAD_RECORD_TABLE_NAME: Optional name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error handling is Abort, bad records table is not populated.
        • BAD_RECORD_TABLE_LIMIT: A positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000
        • BATCH_SIZE: Number of records per batch when inserting data.
        • DATASOURCE_NAME: Name of an existing external data source from which table will be loaded
        • ERROR_HANDLING: Specifies how errors should be handled upon insertion. Supported values:
          • PERMISSIVE: Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.
          • IGNORE_BAD_RECORDS: Malformed records are skipped.
          • ABORT: Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.
          The default value is ABORT.
        • IGNORE_EXISTING_PK: Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when UPDATE_ON_EXISTING_PK is FALSE). If set to TRUE, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If FALSE, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by ERROR_HANDLING. If the specified table does not have a primary key or if upsert mode is in effect (UPDATE_ON_EXISTING_PK is TRUE), then this option has no effect. Supported values:
          • TRUE: Ignore new records whose primary key values collide with those of existing records
          • FALSE: Treat as errors any new records whose primary key values collide with those of existing records
          The default value is FALSE.
        • INGESTION_MODE: Whether to do a full load, dry run, or perform a type inference on the source data. Supported values:
          • FULL: Run a type inference on the source data (if needed) and ingest
          • DRY_RUN: Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of ERROR_HANDLING.
          • TYPE_INFERENCE_ONLY: Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.
          The default value is FULL.
        • JDBC_FETCH_SIZE: The JDBC fetch size, which determines how many rows to fetch per round trip.
        • JDBC_SESSION_INIT_STATEMENT: Executes the statement per each jdbc session before doing actual load. The default value is ''.
        • NUM_SPLITS_PER_RANK: Optional: number of splits for reading data per rank. Default will be external_file_reader_num_tasks. The default value is ''.
        • NUM_TASKS_PER_RANK: Optional: number of tasks for reading data per rank. Default will be external_file_reader_num_tasks
        • PRIMARY_KEYS: Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
        • SHARD_KEYS: Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ''.
        • SUBSCRIBE: Continuously poll the data source to check for new data and load it into the table. Supported values: The default value is FALSE.
        • TRUNCATE_TABLE: If set to TRUE, truncates the table specified by tableName prior to loading the data. Supported values: The default value is FALSE.
        • REMOTE_QUERY: Remote SQL query from which data will be sourced
        • REMOTE_QUERY_ORDER_BY: Name of column to be used for splitting the query into multiple sub-queries using ordering of given column. The default value is ''.
        • REMOTE_QUERY_FILTER_COLUMN: Name of column to be used for splitting the query into multiple sub-queries using the data distribution of given column. The default value is ''.
        • REMOTE_QUERY_INCREASING_COLUMN: Column on subscribed remote query result that will increase for new records (e.g., TIMESTAMP). The default value is ''.
        • REMOTE_QUERY_PARTITION_COLUMN: Alias name for remote_query_filter_column. The default value is ''.
        • TRUNCATE_STRINGS: If set to TRUE, truncate string values that are longer than the column's type size. Supported values: The default value is FALSE.
        • UPDATE_ON_EXISTING_PK: Specifies the record collision policy for inserting into a table with a primary key. If set to TRUE, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be "upserted"). If set to FALSE, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by IGNORE_EXISTING_PK and ERROR_HANDLING. If the specified table does not have a primary key, then this option has no effect. Supported values:
          • TRUE: Upsert new records when primary keys match existing records
          • FALSE: Reject new records when primary keys match existing records
          The default value is FALSE.
        The default value is an empty Map.
        Parameters:
        options - The new value for options.
        Returns:
        this to mimic the builder pattern.
      • getSchema

        public org.apache.avro.Schema getSchema()
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        getSchema in interface org.apache.avro.generic.GenericContainer
        Returns:
        The schema object describing this class.
      • get

        public Object get​(int index)
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        get in interface org.apache.avro.generic.IndexedRecord
        Parameters:
        index - the position of the field to get
        Returns:
        value of the field with the given index.
        Throws:
        IndexOutOfBoundsException
      • put

        public void put​(int index,
                        Object value)
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        put in interface org.apache.avro.generic.IndexedRecord
        Parameters:
        index - the position of the field to set
        value - the value to set
        Throws:
        IndexOutOfBoundsException
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object