Indexes

An index is used to improve the performance of data access within the system.

There are several types of indexes:

Primary Key Index

A primary key index is created by default when a table is created with a primary key specified. It will match the primary key in nature, in that it will be a single-column index if the primary key is on a single column and a multi-column composite index if the primary key is a composite primary key.

Tip

For primary key SQL syntax, see CREATE TABLE.

The primary key index is hash-based and optimizes the performance of equality-based filter expressions (e.g. (x = 0)). Note that if the index is a composite index, the filter expressions employed must reference all of the primary key columns to take advantage of the performance gains of the index.

Operations which make use of a primary key index include:

On-Disk Primary Key Index

The entire primary key index for a table is loaded into memory when that table is accessed, and depending on the size of the key columns and record count, this could have a large impact on available RAM. The primary key index can instead be stored solely on disk, saving RAM, but reducing some performance, by setting the primary_key_type to disk when creating the table.

Relational Index

A relational index is created as the result of applying a foreign key to a column. A description of the index and its requirements can be found under Foreign Keys.

Tip

In SQL, foreign keys can be applied during table creation, via CREATE TABLE, as well as afterwards, via ALTER TABLE.

Column Index

A column index (also known as an attribute index) can be applied to a column in a table to improve the performance of operations applied to that column in an expression.

Tip

In SQL, column indexes can be applied during table creation, via the CREATE TABLE index clause, as well as afterwards, via ALTER TABLE.

The column index is implemented as a b-tree, which provides performance improvements for both equality-based (e.g. (x = 0)) and range-based (e.g. (x > 0)) filter criteria on individual columns. Multi-column and function-based indexes are not supported at this time.

If the column index will be indexing a column with low cardinality data (e.g., number of unique values * 2 < total number of records) then using a low cardinality index should use less memory and still give the same performance improvements.

As primary key indexes only optimize the performance of equality-based filter criteria, a column index can be applied to a primary key column (or each of the columns in a composite primary key) for a performance boost on range-based filter criteria--the two index types are not mutually exclusive.

Column indexes optimize the performance of the following operations, when those operations are given equality-based or range-based filter expressions:

A column index will additionally improve the performance of the following operations:

To apply a column index to a column, the /alter/table endpoint should be called with the create_index action. The default index type is the column index, so no index_type option is necessary to specify.

For example, to create the index:

Python
SQL

Python
1
2
3
4
5
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "create_index",
    value = "last_name"
)
SQL
1
2
ALTER TABLE example.employee
ADD INDEX (last_name)

To drop the index:

Python
SQL

Python
1
2
3
4
5
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "delete_index",
    value = "last_name"
)
SQL
1
2
ALTER TABLE example.employee
DROP INDEX (last_name)

Limitations

A column index can be applied to any table column specific type except:
- wkt
- json
- array of any type
A column index can be applied to tables as well as to materialized views that do not have a filter or join as the last operation in their creation.

Low-Cardinality Index

A low-cardinality index is a column index intended for columns with low cardinality--those with many duplicate values. The less distinct values in a column are, the less memory this index will use.

It is recommended to use a low-cardinality index when the number of distinct values in a column is less than half the total number of values in the column.

To apply a low-cardinality index to a column, the /alter/table endpoint should be called with the create_index action and the low_cardinality index type.

For example, to create the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "create_index",
    value = "dept_id",
    options = {"index_type": "low_cardinality"}
)
SQL
1
2
ALTER TABLE example.employee
ADD LOW CARDINALITY INDEX (dept_id)

To drop the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "delete_index",
    value = "dept_id",
    options = {"index_type": "low_cardinality"}
)
SQL
1
2
ALTER TABLE example.employee
DROP LOW CARDINALITY INDEX (dept_id)

Chunk Skip Index

A chunk skip index can be applied to a column in a table to improve the performance of equality-based filtering expressions applied to that column. The primary use for this index is with partitioned tables, where the filter is applied to one or more of the partition key columns. The larger the data set and the greater the number of partitions generated, the more performant filters on the table will be when the index is applied.

Tip

In SQL, chunk skip indexes can be applied during table creation, via the CREATE TABLE index clause, as well as afterwards, via ALTER TABLE.

To apply a chunk skip index to a column, the /alter/table endpoint should be called with the create_index action and the chunk_skip index type.

For example, to create the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "create_index",
    value = "id",
    options = {"index_type": "chunk_skip"}
)
SQL
1
2
ALTER TABLE example.employee
ADD CHUNK SKIP INDEX (id)

To drop the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "delete_index",
    value = "id",
    options = {"index_type": "chunk_skip"}
)
SQL
1
2
ALTER TABLE example.employee
DROP CHUNK SKIP INDEX (id)

Limitations

A chunk skip index can be applied to any table column with a specific type matching any of the following:
- int
- long
- ulong
- charN
- date
- time
- datetime
- timestamp
A chunk skip index can be applied to tables as well as to materialized views that do not have a filter or join as the last operation in their creation.

Geospatial Index

A geospatial index can be applied to one or more columns in a table to improve the performance of geospatial functions applied to them.

Tip

In SQL, geospatial indexes can be applied during table creation, via the CREATE TABLE index clause, as well as afterwards, via ALTER TABLE.

Geospatial indexes can be applied to either of the following:

a WKT column
two float or double columns that constitute a coordinate pair

The performance of the following functions can be improved with the use of a geospatial index:

Scalar Functions	Enhanced Performance Scalar Functions
`ST_CONTAINS`	`STXY_CONTAINS`
`ST_CONTAINS_PROPERLY`	`STXY_CONTAINS_PROPERLY`
`ST_COVERED_BY`	`STXY_COVERED_BY`
`ST_COVERS`	`STXY_COVERS`
`ST_CROSSES`
`ST_DWITHIN`	`STXY_DWITHIN`
`ST_ENV_DWITHIN`	`STXY_ENV_DWITHIN`
`ST_ENV_INTERSECTS`	`STXY_ENV_INTERSECTS`
`ST_INTERSECTS`	`STXY_INTERSECTS`
`ST_OVERLAPS`
`ST_TOUCHES`	`STXY_TOUCHES`
`ST_WITHIN`	`STXY_WITHIN`

To apply a geospatial index to a column, the /alter/table endpoint should be called with the create_index action and the geospatial index type.

For example, to apply a geospatial index to a WKT column:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "create_index",
    value = "work_district",
    options = {"index_type": "geospatial"}
)
SQL
1
2
ALTER TABLE example.employee
ADD GEOSPATIAL INDEX (work_district)

To drop the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "delete_index",
    value = "work_district",
    options = {"index_type": "geospatial"}
)
SQL
1
2
ALTER TABLE example.employee
DROP GEOSPATIAL INDEX (work_district)

To apply a geospatial index to a coordinate pair of columns:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "create_index",
    value = "office_longitude,office_latitude",
    options = {"index_type": "geospatial"}
)
SQL
1
2
ALTER TABLE example.employee
ADD GEOSPATIAL INDEX (office_longitude, office_latitude)

To drop the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "delete_index",
    value = "office_longitude,office_latitude",
    options = {"index_type": "geospatial"}
)
SQL
1
2
ALTER TABLE example.employee
DROP GEOSPATIAL INDEX (office_longitude, office_latitude)

CAGRA Index

A CAGRA (Cuda Anns GRAph-based) index can be applied to a vector column in a table to improve the performance of vector searches applied to that column.

A CAGRA index is not automatically maintained and must be refreshed manually to reflect the latest set of records in the associated table.

Tip

In SQL, CAGRA indexes can be applied during table creation, via the CREATE TABLE index clause, as well as afterwards, via ALTER TABLE.

To apply a CAGRA index to a column, the /alter/table endpoint should be called with the create_index action and the cagra index type.

For example, to create the index:

Python
SQL

Python
1
2
3
4
5
6
7
8
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "create_index",
    value = "profile",
    options = {
            "index_type": "cagra"
    }
)
SQL
1
2
ALTER TABLE example.employee
ADD CAGRA INDEX (profile)

To refresh the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "refresh_index",
    value = "profile",
    options = {"index_type": "cagra"}
)
SQL
1
2
ALTER TABLE example.employee
REFRESH CAGRA INDEX ON (profile)

To drop the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "delete_index",
    value = "profile",
    options = {"index_type": "cagra"}
)
SQL
1
2
ALTER TABLE example.employee
DROP CAGRA INDEX (profile)

HNSW Index

An HNSW (Hierarchical Navigable Small Worlds) index can be applied to a vector column in a table to improve the performance of vector searches applied to that column.

An HNSW index is automatically maintained and doesn't need to be refreshed manually to reflect the latest set of records in the associated table.

Tip

In SQL, HNSW indexes can be applied during table creation, via the CREATE TABLE index clause, as well as afterwards, via ALTER TABLE.

To apply an HNSW index to a column, the /alter/table endpoint should be called with the create_index action and the hnsw index type.

For example, to create the index:

Python
SQL

Python
1
2
3
4
5
6
7
8
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "create_index",
    value = "profile",
    options = {
            "index_type": "hnsw"
    }
)
SQL
1
2
ALTER TABLE example.employee
ADD HNSW INDEX (profile)

To drop the index:

Python
SQL

Python
1
2
3
4
5
6
retobj = kinetica.alter_table(
    table_name = "example.employee",
    action = "delete_index",
    value = "profile",
    options = {"index_type": "hnsw"}
)
SQL
1
2
ALTER TABLE example.employee
DROP HNSW INDEX (profile)