Cluster Resiliency Overview

Kinetica enables cluster resiliency by failing over terminated processes from the head or worker node to other hosts within the same cluster seamlessly. Individual processes and entire nodes can be failed over to hosts with available resources.

Important

Cluster resiliency requires a shared file system accessible to all hosts and for the enable_worker_http_servers configuration parameter to be set to true.

Spare nodes are typically configured during the install process using KAgent but can also be added later using the Cluster Management Interface. Note that failover for head and worker nodes (individually) will only occur if node failover is enabled. See Failover Policy for more details. The following processes will be monitored to maintain cluster resiliency:

  • Head node process
  • Worker process(es)
  • Apache HTTPD
  • Host Manager
  • Graph server
  • Apache Calcite / Query Planner (SQL syntax engine and planner)

Failover

Failover occurs when a key process terminates on a node and is brought up on another node within the cluster. There are two types of failover: node failover, where an entire node fails over to a spare node, and process failover, where a process fails over to an active node (or a spare node in cases where there are no active nodes with necessary resources). Note that a process cannot fail over to the node on which it is running. When a process dies on a node, the host will no longer be able to accept failover processes until a user marks it otherwise using KAgent.

Before failover occurs, the system will check if sufficient resources (RAM and/or GPUs) are available on an active or spare node that has been marked as accepting failover in the cluster. If the head node process or a worker process is sharing a GPU with a worker process and fails over, the new host must have an available GPU for each process as they will not share GPUs post-failover. For example, if the head node process is sharing a GPU with a worker process (as is the case in all default Kinetica installations) and the head node process fails, it will only fail over to a new node if a whole GPU is available, in which case, the head node process will fail over and occupy the entire available GPU.

When a failover occurs, there are settings to control how primary keys / column indexes are built and how data is loaded upon process failover. See Data Loading Policy for more information. Any non-persisted table (temporary tables and tables configured to be in memory only, e.g., projections, views) will not survive failover. There are two types of failover:

Head Node

The head node refers to the head node process and any attendant processes, e.g., Calcite, HTTPD, and Host Manager. A head node has two fail states:

  • head node failover -- the head node process has died and/or one of the following processes has died on the head node:
    • Calcite
    • HTTPD
    • Host Manager
  • process failover -- any of the worker processes or the Graph server has died

Important

While a termination of the Active Analytics Workbench (AAW) process will not trigger failover, the process will failover with the head node should it terminate.

In the case of a head node failover, the system will check if a node in the cluster has sufficient resources to take on the head node process and Calcite (HTTPD and Host Manager are already running on every node). These processes (and, if installed, AAW) will fail over to the target node; head node responsibilities are assigned to the target node and it will begin to accept connections as the head node. Any healthy processes on the failed node are left running. If there are healthy worker processes, the failed node will then become a worker node. Primary keys and indexes will build and data will load on the target node. The API and SQL clients will automatically detect that the head node has failed over.

Head node failure diagram

If the head node process fails, a head node failover is initiated and the head node process & Calcite will failover to a spare node.


In the case of a process failover, the failing process(es) will be redistributed across the cluster to available resources; the system will first check if a node in the cluster has sufficient resources to take on a process(es). Primary keys and indexes will build and data will load on the target node.

Head node worker process failure diagram

If a worker process on the head node fails, the worker process will fail over to a new node.


If the head node enters a failed state but no resources are available for failover, a critical error is reported. If the cluster is in a ring with other clusters, ring resiliency protocol will commence; otherwise, the cluster will be inoperational.

GAdmin

The full GAdmin functionality is only available while browsing GAdmin on the head node. Users can only access Logging, Support, and User features from an active worker node's GAdmin instance. This means if the head node fails over to another node, the URL where most users would normally access GAdmin will no longer have access to full GAdmin functionality; these users will need to be provided a new head node URL. This head node URL can be accessed by administrative users or via KAgent.

Worker Node

The worker node refers to any node that is running worker process(es) but not the head node process. Worker nodes can have a range of processes available on them. A worker node has two fail states:

  • worker node failover -- one of the following is true:
    • Host Manager has died
    • HTTPD has died
    • All worker processes on the node have died
  • process failover -- one or more worker processes (but not all) or the Graph server has died

In the case of a worker node failover, the system will check if a node(s) in the cluster have sufficient resources to take on the worker process(es). These processes will fail over to the target node(s). The current head node of the cluster will temporarily stop sending query requests to the failed worker node. Any healthy processes on the failed worker node are left running. Primary keys and indexes will build and data will load on the target node as the new worker node begins accepting queries.

Worker node failure diagram

If Host Manager on a worker node fails, the entire worker node fails over to a new host(s).


In the case of a process failover, the failed process(es) will be redistributed to node(s) across the cluster with sufficient resources. Primary keys and indexes will build and data will load using the configured modes.

Worker node process failure diagram

If some worker processes fail (but not all), they will be redistributed accordingly throughout the cluster.


If the worker node enters a failed state but no resources are available for failover, a critical error is reported. If the cluster is in a ring with other clusters, ring resiliency protocol will commence.

Switchover

Switchover occurs when a process is manually reassigned from one node to another in the same cluster via user invocation in KAgent. An entire host can be switched over to a spare node via KAgent as well.

Before switchover occurs, the system will check if sufficient resources (RAM and/or GPUs) are available on a spare node or an active node in the cluster. If the head node process or a worker process is sharing a GPU with a worker process and switches over, the new host must have an available GPU for each process as they will not share GPUs post-switchover. For example, if the head node process is sharing a GPU with a worker process (as is the case in all default Kinetica installations) and the head node process is designated for switchover, it can only switch over to a new node if a whole GPU is available, in which case, the head node process will switch over and occupy the entire available GPU.

When a switchover occurs, there are settings to control how primary keys / column indexes are built and how data is loaded upon process switchover. See Data Loading Policy for more information. Any non-persisted table (temporary tables and tables configured to be in memory only, e.g., projections, views) will not survive switchover.

GAdmin

The full GAdmin functionality is only available while browsing GAdmin on the head node. Users can only access Logging, Support, and User features from an active worker node's GAdmin instance. This means if the head node switches over to another node, the URL where most users would normally access GAdmin will no longer have access to full GAdmin functionality; these users will need to be provided a new head node URL. This head node URL can be accessed by administrative users or via KAgent.

APIs

Any client API connections can be redirected to another node in the cluster in the event of a connection or database failure even during a failover or switchover. The redirection procedure is different depending on if the desired cluster is in a ring. See High Availability Configuration & Management for more information on ring configuration and API usage with a ring.

Intercluster Failover (Ring)

If the desired cluster is in a ring, i.e. additional clusters are present, the API will attempt to re-establish the current cluster connection a configurable number of times. If no response is received, it is assumed the head node has failed/switched over and all worker processes in the current cluster connection are contacted to attempt to get an address for the head node. If a valid, new address is received, the connection will be retried. If the address is the same as it was prior to failover / switchover or no response is received from the worker processes, a cluster will be chosen randomly or sequentially (based on database object configuration) from the given list for the initial connection. If no response received from any of clusters in the initial connection list, the connection will fail.

Intracluster Failover (No Ring)

If the desired cluster is not in a ring, i.e. no additional clusters are present, intracluster failover is initiated. All other nodes in the current cluster connection are contacted to attempt to get an address for the head node. If a response is received, all nodes will be pinged to ensure their internal HTTP servers are up and running; the client will wait indefinitely until it has confirmation all HTTP servers are ready to go. When a response is received from all nodes, the failed connection is retried. By default, the nodes will attempt to get an address for the new head node three times.

If no response is received after three attempts, the API constructs qualified head node process URLs for each node in the cluster. Each of these URLs is tried for a connection until five minutes have passed. If a response is received before the five-minute timeout, all nodes will be pinged to ensure their internal HTTP servers are up and running; the client will wait indefinitely until it has confirmation all HTTP servers are ready to go. If no response is received after five minutes, the connection will fail.