Class CreateUnionRequest

  • All Implemented Interfaces:
    org.apache.avro.generic.GenericContainer, org.apache.avro.generic.IndexedRecord

    public class CreateUnionRequest
    extends Object
    implements org.apache.avro.generic.IndexedRecord
    A set of parameters for GPUdb.createUnion.

    Merges data from one or more tables with comparable data types into a new table.

    The following merges are supported:

    UNION (DISTINCT/ALL) - For data set union details and examples, see Union. For limitations, see Union Limitations and Cautions.

    INTERSECT (DISTINCT/ALL) - For data set intersection details and examples, see Intersect. For limitations, see Intersect Limitations.

    EXCEPT (DISTINCT/ALL) - For data set subtraction details and examples, see Except. For limitations, see Except Limitations.

    MERGE VIEWS - For a given set of filtered views on a single table, creates a single filtered view containing all of the unique records across all of the given filtered data sets.

    Non-charN 'string' and 'bytes' column types cannot be merged, nor can columns marked as store-only.

    • Constructor Detail

      • CreateUnionRequest

        public CreateUnionRequest()
        Constructs a CreateUnionRequest object with default parameters.
      • CreateUnionRequest

        public CreateUnionRequest​(String tableName,
                                  List<String> tableNames,
                                  List<List<String>> inputColumnNames,
                                  List<String> outputColumnNames,
                                  Map<String,​String> options)
        Constructs a CreateUnionRequest object with the specified parameters.
        Parameters:
        tableName - Name of the table to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.
        tableNames - The list of table names to merge, in [schema_name.]table_name format, using standard name resolution rules. Must contain the names of one or more existing tables.
        inputColumnNames - The list of columns from each of the corresponding input tables.
        outputColumnNames - The list of names of the columns to be stored in the output table.
        options - Optional parameters.
        • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of tableName. If PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_TABLE_NAME. Supported values: The default value is FALSE.
        • COLLECTION_NAME: [DEPRECATED--please specify the containing schema for the projection as part of tableName and use GPUdb.createSchema to create the schema if non-existent] Name of the schema for the output table. If the schema provided is non-existent, it will be automatically created. The default value is ''.
        • MODE: The mode describes what rows of the tables being unioned will be retained. Supported values:
          • UNION_ALL: Retains all rows from the specified tables.
          • UNION: Retains all unique rows from the specified tables (synonym for UNION_DISTINCT).
          • UNION_DISTINCT: Retains all unique rows from the specified tables.
          • EXCEPT: Retains all unique rows from the first table that do not appear in the second table (only works on 2 tables).
          • EXCEPT_ALL: Retains all rows(including duplicates) from the first table that do not appear in the second table (only works on 2 tables).
          • INTERSECT: Retains all unique rows that appear in both of the specified tables (only works on 2 tables).
          • INTERSECT_ALL: Retains all rows(including duplicates) that appear in both of the specified tables (only works on 2 tables).
          The default value is UNION_ALL.
        • LONG_HASH: When true use 128 bit hash for union-distinct, except, except_all, intersect and intersect_all modes. Otherwise use 64 bit hash.
        • CHUNK_SIZE: Indicates the number of records per chunk to be used for this output table.
        • CHUNK_COLUMN_MAX_MEMORY: Indicates the target maximum data size for each column in a chunk to be used for this output table.
        • CHUNK_MAX_MEMORY: Indicates the target maximum data size for all columns in a chunk to be used for this output table.
        • CREATE_INDEXES: Comma-separated list of columns on which to create indexes on the output table. The columns specified must be present in outputColumnNames.
        • TTL: Sets the TTL of the output table specified in tableName.
        • PERSIST: If TRUE, then the output table specified in tableName will be persisted and will not expire unless a TTL is specified. If FALSE, then the output table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values: The default value is FALSE.
        • VIEW_ID: ID of view of which this output table is a member. The default value is ''.
        • FORCE_REPLICATED: If TRUE, then the output table specified in tableName will be replicated even if the source tables are not. Supported values: The default value is FALSE.
        • STRATEGY_DEFINITION: The tier strategy for the table and its columns.
        • COMPRESSION_CODEC: The default compression codec for this table's columns.
        • NO_COUNT: Return a count of 0 for the union table response to avoid the cost of counting; optimization needed for many chunk virtual_union's. The default value is 'false'.
        The default value is an empty Map.
    • Method Detail

      • getClassSchema

        public static org.apache.avro.Schema getClassSchema()
        This method supports the Avro framework and is not intended to be called directly by the user.
        Returns:
        The schema for the class.
      • getTableNames

        public List<String> getTableNames()
        The list of table names to merge, in [schema_name.]table_name format, using standard name resolution rules. Must contain the names of one or more existing tables.
        Returns:
        The current value of tableNames.
      • setTableNames

        public CreateUnionRequest setTableNames​(List<String> tableNames)
        The list of table names to merge, in [schema_name.]table_name format, using standard name resolution rules. Must contain the names of one or more existing tables.
        Parameters:
        tableNames - The new value for tableNames.
        Returns:
        this to mimic the builder pattern.
      • getInputColumnNames

        public List<List<String>> getInputColumnNames()
        The list of columns from each of the corresponding input tables.
        Returns:
        The current value of inputColumnNames.
      • setInputColumnNames

        public CreateUnionRequest setInputColumnNames​(List<List<String>> inputColumnNames)
        The list of columns from each of the corresponding input tables.
        Parameters:
        inputColumnNames - The new value for inputColumnNames.
        Returns:
        this to mimic the builder pattern.
      • getOutputColumnNames

        public List<String> getOutputColumnNames()
        The list of names of the columns to be stored in the output table.
        Returns:
        The current value of outputColumnNames.
      • setOutputColumnNames

        public CreateUnionRequest setOutputColumnNames​(List<String> outputColumnNames)
        The list of names of the columns to be stored in the output table.
        Parameters:
        outputColumnNames - The new value for outputColumnNames.
        Returns:
        this to mimic the builder pattern.
      • getOptions

        public Map<String,​String> getOptions()
        Optional parameters.
        • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of tableName. If PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_TABLE_NAME. Supported values: The default value is FALSE.
        • COLLECTION_NAME: [DEPRECATED--please specify the containing schema for the projection as part of tableName and use GPUdb.createSchema to create the schema if non-existent] Name of the schema for the output table. If the schema provided is non-existent, it will be automatically created. The default value is ''.
        • MODE: The mode describes what rows of the tables being unioned will be retained. Supported values:
          • UNION_ALL: Retains all rows from the specified tables.
          • UNION: Retains all unique rows from the specified tables (synonym for UNION_DISTINCT).
          • UNION_DISTINCT: Retains all unique rows from the specified tables.
          • EXCEPT: Retains all unique rows from the first table that do not appear in the second table (only works on 2 tables).
          • EXCEPT_ALL: Retains all rows(including duplicates) from the first table that do not appear in the second table (only works on 2 tables).
          • INTERSECT: Retains all unique rows that appear in both of the specified tables (only works on 2 tables).
          • INTERSECT_ALL: Retains all rows(including duplicates) that appear in both of the specified tables (only works on 2 tables).
          The default value is UNION_ALL.
        • LONG_HASH: When true use 128 bit hash for union-distinct, except, except_all, intersect and intersect_all modes. Otherwise use 64 bit hash.
        • CHUNK_SIZE: Indicates the number of records per chunk to be used for this output table.
        • CHUNK_COLUMN_MAX_MEMORY: Indicates the target maximum data size for each column in a chunk to be used for this output table.
        • CHUNK_MAX_MEMORY: Indicates the target maximum data size for all columns in a chunk to be used for this output table.
        • CREATE_INDEXES: Comma-separated list of columns on which to create indexes on the output table. The columns specified must be present in outputColumnNames.
        • TTL: Sets the TTL of the output table specified in tableName.
        • PERSIST: If TRUE, then the output table specified in tableName will be persisted and will not expire unless a TTL is specified. If FALSE, then the output table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values: The default value is FALSE.
        • VIEW_ID: ID of view of which this output table is a member. The default value is ''.
        • FORCE_REPLICATED: If TRUE, then the output table specified in tableName will be replicated even if the source tables are not. Supported values: The default value is FALSE.
        • STRATEGY_DEFINITION: The tier strategy for the table and its columns.
        • COMPRESSION_CODEC: The default compression codec for this table's columns.
        • NO_COUNT: Return a count of 0 for the union table response to avoid the cost of counting; optimization needed for many chunk virtual_union's. The default value is 'false'.
        The default value is an empty Map.
        Returns:
        The current value of options.
      • setOptions

        public CreateUnionRequest setOptions​(Map<String,​String> options)
        Optional parameters.
        • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of tableName. If PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_TABLE_NAME. Supported values: The default value is FALSE.
        • COLLECTION_NAME: [DEPRECATED--please specify the containing schema for the projection as part of tableName and use GPUdb.createSchema to create the schema if non-existent] Name of the schema for the output table. If the schema provided is non-existent, it will be automatically created. The default value is ''.
        • MODE: The mode describes what rows of the tables being unioned will be retained. Supported values:
          • UNION_ALL: Retains all rows from the specified tables.
          • UNION: Retains all unique rows from the specified tables (synonym for UNION_DISTINCT).
          • UNION_DISTINCT: Retains all unique rows from the specified tables.
          • EXCEPT: Retains all unique rows from the first table that do not appear in the second table (only works on 2 tables).
          • EXCEPT_ALL: Retains all rows(including duplicates) from the first table that do not appear in the second table (only works on 2 tables).
          • INTERSECT: Retains all unique rows that appear in both of the specified tables (only works on 2 tables).
          • INTERSECT_ALL: Retains all rows(including duplicates) that appear in both of the specified tables (only works on 2 tables).
          The default value is UNION_ALL.
        • LONG_HASH: When true use 128 bit hash for union-distinct, except, except_all, intersect and intersect_all modes. Otherwise use 64 bit hash.
        • CHUNK_SIZE: Indicates the number of records per chunk to be used for this output table.
        • CHUNK_COLUMN_MAX_MEMORY: Indicates the target maximum data size for each column in a chunk to be used for this output table.
        • CHUNK_MAX_MEMORY: Indicates the target maximum data size for all columns in a chunk to be used for this output table.
        • CREATE_INDEXES: Comma-separated list of columns on which to create indexes on the output table. The columns specified must be present in outputColumnNames.
        • TTL: Sets the TTL of the output table specified in tableName.
        • PERSIST: If TRUE, then the output table specified in tableName will be persisted and will not expire unless a TTL is specified. If FALSE, then the output table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values: The default value is FALSE.
        • VIEW_ID: ID of view of which this output table is a member. The default value is ''.
        • FORCE_REPLICATED: If TRUE, then the output table specified in tableName will be replicated even if the source tables are not. Supported values: The default value is FALSE.
        • STRATEGY_DEFINITION: The tier strategy for the table and its columns.
        • COMPRESSION_CODEC: The default compression codec for this table's columns.
        • NO_COUNT: Return a count of 0 for the union table response to avoid the cost of counting; optimization needed for many chunk virtual_union's. The default value is 'false'.
        The default value is an empty Map.
        Parameters:
        options - The new value for options.
        Returns:
        this to mimic the builder pattern.
      • getSchema

        public org.apache.avro.Schema getSchema()
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        getSchema in interface org.apache.avro.generic.GenericContainer
        Returns:
        The schema object describing this class.
      • get

        public Object get​(int index)
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        get in interface org.apache.avro.generic.IndexedRecord
        Parameters:
        index - the position of the field to get
        Returns:
        value of the field with the given index.
        Throws:
        IndexOutOfBoundsException
      • put

        public void put​(int index,
                        Object value)
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        put in interface org.apache.avro.generic.IndexedRecord
        Parameters:
        index - the position of the field to set
        value - the value to set
        Throws:
        IndexOutOfBoundsException
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object