/aggregate/kmeans

URL: http://GPUDB_IP_ADDRESS:GPUDB_PORT/aggregate/kmeans

This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Input Parameter Description

NameTypeDescription
table_namestringName of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.
column_namesarray of stringsList of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns.
kintThe number of mean points to be determined by the algorithm.
tolerancedoubleStop iterating when the distances between successive points is less than the given tolerance.
optionsmap of string to strings

Optional parameters. The default value is an empty map ( {} ).

Supported Parameters (keys)Parameter Description
whitenWhen set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
max_itersNumber of times to try to hit the tolerance limit before giving up - default is 10.
num_triesNumber of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.

Output Parameter Description

The GPUdb server embeds the endpoint response inside a standard response structure which contains status information and the actual response to the query. Here is a description of the various fields of the wrapper:

NameTypeDescription
statusString'OK' or 'ERROR'
messageStringEmpty if success or an error message
data_typeString'aggregate_k_means_request' or 'none' in case of an error
dataStringEmpty string
data_strJSON or String

This embedded JSON represents the result of the /aggregate/kmeans endpoint:

NameTypeDescription
meansarray of arrays of doublesThe k-mean values found.
countsarray of longsThe number of elements in the cluster closest the corresponding k-means values.
rms_distsarray of doublesThe root mean squared distance of the elements in the cluster for each of the k-means values.
countlongThe total count of all the clusters - will be the size of the input table.
rms_distdoubleThe sum of all the rms_dists - the value the k-means algorithm is attempting to minimize.
tolerancedoubleThe distance between the last two iterations of the algorithm before it quit.
num_itersintThe number of iterations the algorithm executed before it quit.
infomap of string to stringsAdditional information.

Empty string in case of an error.