Class AggregateKMeansRequest

  • All Implemented Interfaces:
    org.apache.avro.generic.GenericContainer, org.apache.avro.generic.IndexedRecord

    public class AggregateKMeansRequest
    extends Object
    implements org.apache.avro.generic.IndexedRecord
    A set of parameters for GPUdb.aggregateKMeans.

    This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.

    NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

    • Constructor Detail

      • AggregateKMeansRequest

        public AggregateKMeansRequest()
        Constructs an AggregateKMeansRequest object with default parameters.
      • AggregateKMeansRequest

        public AggregateKMeansRequest​(String tableName,
                                      List<String> columnNames,
                                      int k,
                                      double tolerance,
                                      Map<String,​String> options)
        Constructs an AggregateKMeansRequest object with the specified parameters.
        Parameters:
        tableName - Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
        columnNames - List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
        k - The number of mean points to be determined by the algorithm.
        tolerance - Stop iterating when the distances between successive points is less than the given tolerance.
        options - Optional parameters.
        • WHITEN: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
        • MAX_ITERS: Number of times to try to hit the tolerance limit before giving up - default is 10.
        • NUM_TRIES: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
        • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of RESULT_TABLE. If RESULT_TABLE_PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_RESULT_TABLE_NAME. Supported values: The default value is FALSE.
        • RESULT_TABLE: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
        • RESULT_TABLE_PERSIST: If TRUE, then the result table specified in RESULT_TABLE will be persisted and will not expire unless a TTL is specified. If FALSE, then the result table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values: The default value is FALSE.
        • TTL: Sets the TTL of the table specified in RESULT_TABLE.
        The default value is an empty Map.
    • Method Detail

      • getClassSchema

        public static org.apache.avro.Schema getClassSchema()
        This method supports the Avro framework and is not intended to be called directly by the user.
        Returns:
        The schema for the class.
      • getTableName

        public String getTableName()
        Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
        Returns:
        The current value of tableName.
      • setTableName

        public AggregateKMeansRequest setTableName​(String tableName)
        Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
        Parameters:
        tableName - The new value for tableName.
        Returns:
        this to mimic the builder pattern.
      • getColumnNames

        public List<String> getColumnNames()
        List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
        Returns:
        The current value of columnNames.
      • setColumnNames

        public AggregateKMeansRequest setColumnNames​(List<String> columnNames)
        List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
        Parameters:
        columnNames - The new value for columnNames.
        Returns:
        this to mimic the builder pattern.
      • getK

        public int getK()
        The number of mean points to be determined by the algorithm.
        Returns:
        The current value of k.
      • setK

        public AggregateKMeansRequest setK​(int k)
        The number of mean points to be determined by the algorithm.
        Parameters:
        k - The new value for k.
        Returns:
        this to mimic the builder pattern.
      • getTolerance

        public double getTolerance()
        Stop iterating when the distances between successive points is less than the given tolerance.
        Returns:
        The current value of tolerance.
      • setTolerance

        public AggregateKMeansRequest setTolerance​(double tolerance)
        Stop iterating when the distances between successive points is less than the given tolerance.
        Parameters:
        tolerance - The new value for tolerance.
        Returns:
        this to mimic the builder pattern.
      • getOptions

        public Map<String,​String> getOptions()
        Optional parameters.
        • WHITEN: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
        • MAX_ITERS: Number of times to try to hit the tolerance limit before giving up - default is 10.
        • NUM_TRIES: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
        • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of RESULT_TABLE. If RESULT_TABLE_PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_RESULT_TABLE_NAME. Supported values: The default value is FALSE.
        • RESULT_TABLE: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
        • RESULT_TABLE_PERSIST: If TRUE, then the result table specified in RESULT_TABLE will be persisted and will not expire unless a TTL is specified. If FALSE, then the result table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values: The default value is FALSE.
        • TTL: Sets the TTL of the table specified in RESULT_TABLE.
        The default value is an empty Map.
        Returns:
        The current value of options.
      • setOptions

        public AggregateKMeansRequest setOptions​(Map<String,​String> options)
        Optional parameters.
        • WHITEN: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
        • MAX_ITERS: Number of times to try to hit the tolerance limit before giving up - default is 10.
        • NUM_TRIES: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
        • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of RESULT_TABLE. If RESULT_TABLE_PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_RESULT_TABLE_NAME. Supported values: The default value is FALSE.
        • RESULT_TABLE: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
        • RESULT_TABLE_PERSIST: If TRUE, then the result table specified in RESULT_TABLE will be persisted and will not expire unless a TTL is specified. If FALSE, then the result table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values: The default value is FALSE.
        • TTL: Sets the TTL of the table specified in RESULT_TABLE.
        The default value is an empty Map.
        Parameters:
        options - The new value for options.
        Returns:
        this to mimic the builder pattern.
      • getSchema

        public org.apache.avro.Schema getSchema()
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        getSchema in interface org.apache.avro.generic.GenericContainer
        Returns:
        The schema object describing this class.
      • get

        public Object get​(int index)
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        get in interface org.apache.avro.generic.IndexedRecord
        Parameters:
        index - the position of the field to get
        Returns:
        value of the field with the given index.
        Throws:
        IndexOutOfBoundsException
      • put

        public void put​(int index,
                        Object value)
        This method supports the Avro framework and is not intended to be called directly by the user.
        Specified by:
        put in interface org.apache.avro.generic.IndexedRecord
        Parameters:
        index - the position of the field to set
        value - the value to set
        Throws:
        IndexOutOfBoundsException
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object