gpudb::AggregateKMeansRequest

A set of parameters for GPUdb::aggregateKMeans. More…

#include <gpudb/protocol/aggregate_k_means.h>

Public Member Functions
	AggregateKMeansRequest ()
	Constructs an AggregateKMeansRequest object with default parameters.

	AggregateKMeansRequest (const std::string &tableName_, const std::vector< std::string > &columnNames_, const int32_t k_, const double tolerance_, const std::map< std::string, std::string > &options_)
	Constructs an AggregateKMeansRequest object with the specified parameters.

Public Attributes
std::string	tableName
	Name of the table on which the operation will be performed.

std::vector< std::string >	columnNames
	List of column names on which the operation would be performed.

int32_t	k
	The number of mean points to be determined by the algorithm.

double	tolerance
	Stop iterating when the distances between successive points is less than the given tolerance.

std::map< std::string, std::string >	options
	Optional parameters.

Detailed Description

A set of parameters for GPUdb::aggregateKMeans.

This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Definition at line 29 of file aggregate_k_means.h.

Constructor & Destructor Documentation

◆ AggregateKMeansRequest() [1/2]

gpudb::AggregateKMeansRequest::AggregateKMeansRequest ( )

inline

Constructs an AggregateKMeansRequest object with default parameters.

Definition at line 34 of file aggregate_k_means.h.

◆ AggregateKMeansRequest() [2/2]

gpudb::AggregateKMeansRequest::AggregateKMeansRequest	(	const std::string &	tableName_,
		const std::vector< std::string > &	columnNames_,
		const int32_t	k_,
		const double	tolerance_,
		const std::map< std::string, std::string > &	options_ )

inline

Constructs an AggregateKMeansRequest object with the specified parameters.

Parameters

[in]	tableName_	Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
[in]	columnNames_	List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
[in]	k_	The number of mean points to be determined by the algorithm.
[in]	tolerance_	Stop iterating when the distances between successive points is less than the given tolerance.
[in]	options_	Optional parameters. aggregate_k_means_whiten: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten. aggregate_k_means_max_iters: Number of times to try to hit the tolerance limit before giving up - default is 10. aggregate_k_means_num_tries: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1. aggregate_k_means_create_temp_table: If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Supported values: aggregate_k_means_true aggregate_k_means_false The default value is aggregate_k_means_false. aggregate_k_means_result_table: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response. aggregate_k_means_result_table_persist: If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Supported values: aggregate_k_means_true aggregate_k_means_false The default value is aggregate_k_means_false. aggregate_k_means_ttl: Sets the TTL of the table specified in result_table. The default value is an empty map.

Definition at line 162 of file aggregate_k_means.h.

Member Data Documentation

◆ columnNames

std::vector<std::string> gpudb::AggregateKMeansRequest::columnNames

List of column names on which the operation would be performed.

If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.

Definition at line 184 of file aggregate_k_means.h.

◆ k

int32_t gpudb::AggregateKMeansRequest::k

The number of mean points to be determined by the algorithm.

Definition at line 189 of file aggregate_k_means.h.

◆ options

std::map<std::string, std::string> gpudb::AggregateKMeansRequest::options

Optional parameters.

aggregate_k_means_whiten: When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
aggregate_k_means_max_iters: Number of times to try to hit the tolerance limit before giving up - default is 10.
aggregate_k_means_num_tries: Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
aggregate_k_means_create_temp_table: If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Supported values:
- aggregate_k_means_true
- aggregate_k_means_false
The default value is aggregate_k_means_false.
aggregate_k_means_result_table: The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
aggregate_k_means_result_table_persist: If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Supported values:
- aggregate_k_means_true
- aggregate_k_means_false
The default value is aggregate_k_means_false.
aggregate_k_means_ttl: Sets the TTL of the table specified in result_table.

The default value is an empty map.

Definition at line 268 of file aggregate_k_means.h.

◆ tableName

std::string gpudb::AggregateKMeansRequest::tableName

Name of the table on which the operation will be performed.

Must be an existing table, in [ schema_name. ]table_name format, using standard name resolution rules.

Definition at line 177 of file aggregate_k_means.h.

◆ tolerance

double gpudb::AggregateKMeansRequest::tolerance

Stop iterating when the distances between successive points is less than the given tolerance.

Definition at line 195 of file aggregate_k_means.h.

The documentation for this struct was generated from the following file:

gpudb/protocol/aggregate_k_means.h

​ Public Member Functions

​ Public Attributes

​Detailed Description

​Constructor & Destructor Documentation

​◆ AggregateKMeansRequest() [1/2]

​◆ AggregateKMeansRequest() [2/2]

​Member Data Documentation

​◆ columnNames

​◆ k

​◆ options

​◆ tableName

​◆ tolerance

Public Member Functions

Public Attributes