Performance Optimization

I/O

Since Kinetica is designed to work with big data sets and many-node setups, a distributed input (multi-head ingest) and output (multi-head egress) mechanism are provided for fast data ingestion and fast record retrieval. Most interactions are done through the single head node, and from there, parceled out to the rest of the cluster. However, distributed operations allow insertion and retrieval of data through transactions with cluster nodes directly, bypassing the head node and improving performance.

Queries

The performance of queries can be improved in several ways.

The most important of these is via sharding. This data distribution method allows aggregations on sharded columns to be distributed evenly across the cluster, minimizing the need to share data between nodes to complete the processing. It also allows joins to be performed between sharded columns on each node, again alleviating the need to redistribute the data to complete processing.

Many geospatial queries can be accelerated by geo-partitioning the queried data, which effectively groups spatial entities by location and allows for more effective geospatial processing. Queries involving constant filters can also be restructured to take advantage of this performance improvement.