Skip to main content

Class AggregateKMeansRequest

java.lang.Object
com.gpudb.protocol.AggregateKMeansRequest
All Implemented Interfaces:
org.apache.avro.generic.GenericContainer, org.apache.avro.generic.IndexedRecord

public class AggregateKMeansRequest extends Object implements org.apache.avro.generic.IndexedRecord
A set of parameters for GPUdb.aggregateKMeans.

This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

  • Constructor Details

    • AggregateKMeansRequest

      public AggregateKMeansRequest()
      Constructs an AggregateKMeansRequest object with default parameters.
    • AggregateKMeansRequest

      public AggregateKMeansRequest(String tableName, List<String> columnNames, int k, double tolerance, Map<String,String> options)
      Constructs an AggregateKMeansRequest object with the specified parameters.
      Parameters:
      tableName - Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
      columnNames - List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
      k - The number of mean points to be determined by the algorithm.
      tolerance - Stop iterating when the distances between successive points is less than the given tolerance.
      options - Optional parameters.
      • WHITEN: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
      • MAX_ITERS: Number of times to try to hit the tolerance limit before giving up - default is 10.
      • NUM_TRIES: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
      • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of RESULT_TABLE. If RESULT_TABLE_PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_RESULT_TABLE_NAME. Supported values:The default value is FALSE.
      • RESULT_TABLE: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
      • RESULT_TABLE_PERSIST: If TRUE, then the result table specified in RESULT_TABLE will be persisted and will not expire unless a TTL is specified. If FALSE, then the result table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values:The default value is FALSE.
      • TTL: Sets the TTL of the table specified in RESULT_TABLE.
      The default value is an empty Map.
  • Method Details

    • getClassSchema

      public static org.apache.avro.Schema getClassSchema()
      This method supports the Avro framework and is not intended to be called directly by the user.
      Returns:
      The schema for the class.
    • getTableName

      public String getTableName()
      Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
      Returns:
      The current value of tableName.
    • setTableName

      public AggregateKMeansRequest setTableName(String tableName)
      Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
      Parameters:
      tableName - The new value for tableName.
      Returns:
      this to mimic the builder pattern.
    • getColumnNames

      public List<String> getColumnNames()
      List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
      Returns:
      The current value of columnNames.
    • setColumnNames

      public AggregateKMeansRequest setColumnNames(List<String> columnNames)
      List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
      Parameters:
      columnNames - The new value for columnNames.
      Returns:
      this to mimic the builder pattern.
    • getK

      public int getK()
      The number of mean points to be determined by the algorithm.
      Returns:
      The current value of k.
    • setK

      public AggregateKMeansRequest setK(int k)
      The number of mean points to be determined by the algorithm.
      Parameters:
      k - The new value for k.
      Returns:
      this to mimic the builder pattern.
    • getTolerance

      public double getTolerance()
      Stop iterating when the distances between successive points is less than the given tolerance.
      Returns:
      The current value of tolerance.
    • setTolerance

      public AggregateKMeansRequest setTolerance(double tolerance)
      Stop iterating when the distances between successive points is less than the given tolerance.
      Parameters:
      tolerance - The new value for tolerance.
      Returns:
      this to mimic the builder pattern.
    • getOptions

      public Map<String,String> getOptions()
      Optional parameters.
      • WHITEN: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
      • MAX_ITERS: Number of times to try to hit the tolerance limit before giving up - default is 10.
      • NUM_TRIES: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
      • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of RESULT_TABLE. If RESULT_TABLE_PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_RESULT_TABLE_NAME. Supported values:The default value is FALSE.
      • RESULT_TABLE: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
      • RESULT_TABLE_PERSIST: If TRUE, then the result table specified in RESULT_TABLE will be persisted and will not expire unless a TTL is specified. If FALSE, then the result table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values:The default value is FALSE.
      • TTL: Sets the TTL of the table specified in RESULT_TABLE.
      The default value is an empty Map.
      Returns:
      The current value of options.
    • setOptions

      public AggregateKMeansRequest setOptions(Map<String,String> options)
      Optional parameters.
      • WHITEN: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
      • MAX_ITERS: Number of times to try to hit the tolerance limit before giving up - default is 10.
      • NUM_TRIES: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
      • CREATE_TEMP_TABLE: If TRUE, a unique temporary table name will be generated in the sys_temp schema and used in place of RESULT_TABLE. If RESULT_TABLE_PERSIST is FALSE (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in QUALIFIED_RESULT_TABLE_NAME. Supported values:The default value is FALSE.
      • RESULT_TABLE: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
      • RESULT_TABLE_PERSIST: If TRUE, then the result table specified in RESULT_TABLE will be persisted and will not expire unless a TTL is specified. If FALSE, then the result table will be an in-memory table and will expire unless a TTL is specified otherwise. Supported values:The default value is FALSE.
      • TTL: Sets the TTL of the table specified in RESULT_TABLE.
      The default value is an empty Map.
      Parameters:
      options - The new value for options.
      Returns:
      this to mimic the builder pattern.
    • getSchema

      public org.apache.avro.Schema getSchema()
      This method supports the Avro framework and is not intended to be called directly by the user.
      Specified by:
      getSchema in interface org.apache.avro.generic.GenericContainer
      Returns:
      The schema object describing this class.
    • get

      public Object get(int index)
      This method supports the Avro framework and is not intended to be called directly by the user.
      Specified by:
      get in interface org.apache.avro.generic.IndexedRecord
      Parameters:
      index - the position of the field to get
      Returns:
      value of the field with the given index.
      Throws:
    • put

      public void put(int index, Object value)
      This method supports the Avro framework and is not intended to be called directly by the user.
      Specified by:
      put in interface org.apache.avro.generic.IndexedRecord
      Parameters:
      index - the position of the field to set
      value - the value to set
      Throws:
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object