Overview
Kinetica's vector search capability is enabled through its use as a vector store database. Once a table has been created with vector type and a set of embeddings have been loaded into the table, a variety of K-nearest neighbor searches can be performed on the data set.
Details on the vector type and its usage are found below. For more complete walkthroughs of the functionality in Jupyter notebook form, see:
Vector Type
The vector data type has been added to facilitate managing embeddings and issuing vector search queries. The vector type is effectively an array of float types and can be used as shown in the following examples.
Create Table
A vector column can optionally be configured to normalize the vector data inserted into it, giving each vector a magnitude (L2 norm) of 1. This can improve the performance of some vector operations with minimal overhead.
Vector Column without Normalization
|
|
|
|
|
|
Vector Column with Normalization
|
|
|
|
|
|
Insert Data
|
|
|
|
|
|
|
|
Retrieve Data
|
|
|
|
|
|
|
|
Vector Indexes
There are two types of indexes available to improve the performance of vector searches:
CAGRA Vector Index
The performance of some vector searches can be improved with the application of a CAGRA index, which must be manually refreshed to account for updates to the data in the corresponding table.
This can be applied, in SQL, during table creation, via CREATE TABLE, as well as afterwards, as below:
|
|
|
|
HNSW Vector Index
The performance of some vector searches can be improved with the application of an HNSW index, which is automatically updated as the data in the corresponding table changes.
This can be applied, in SQL, during table creation, via CREATE TABLE, as well as afterwards, as below:
|
|
|
|
Vector Functions & Operators
Vector Column Functions
Function | Description |
---|---|
L1_NORM(v) | Calculates the sum of the absolute values of the given vector's values |
L2_NORM(v) | Calculates the square root of the sum of squares of the given vector's values |
LINF_NORM(v) | Returns the maximum of the given vector's values |
LP_NORM(v, p) | Calculates the Lp-space norm of the given vector in the space p |
NTH(v, n) | Returns the given vector's value at 0-based index n |
SIZE(v) | Returns the given vector's number of values |
Vector Search Functions
A number of K-nearest neighbor functions have been implemented to support vector searches. For examples, see Vector Function Examples.
Function | Description |
---|---|
COSINE_DISTANCE(v1, v2) | 1 minus the cosine similarity (equality of angle) of the given vectors |
DOT_PRODUCT(v1, v2) | Calculates the sum of products of the given vectors' values |
EUCLIDEAN_DISTANCE(v1, v2) | Alias for L2_DISTANCE |
L1_DISTANCE(v1, v2) | Calculates the L1-space (taxicab) distance between the given vectors |
L2_DISTANCE(v1, v2) | Calculates the L2-space (Euclidean) distance between the given vectors |
L2_SQUAREDDISTANCE(v1, v2) | Calculates the sum of squares of distances between the given vectors' values |
L2_DISTSQ(v1, v2) | Alias for L2_SQUAREDDISTANCE |
LINF_DISTANCE(v1, v2) | Calculates the maximum of the distances between pairs of values in the given vectors |
LP_DISTANCE(v1, v2, p) | Calculates the Lp-space distance between the given vectors in the space p |
Vector Search Operators
These operators can be used as shorthand to apply vector functions to individual vector column values. For examples, see Vector Operator Examples.
Note
These operators are only available in SQL or in the native API via /execute/sql.
Operator | Equivalent Function |
---|---|
v1 <-> v2 | L2_DISTANCE(v1, v2) |
v1 <=> v2 | COSINE_DISTANCE(v1, v2) |
v1 <#> v2 | DOT_PRODUCT(v1, v2) |
Vector Search Examples
Vector searches can be performed using either named functions or the corresponding operators for select functions.
Vector Operator Examples
|
|
|
|
|
|
|
|
|
|
|
|
Vector Function Examples
|
|
|
|
|
|
|
|
|
|
|
|