Version:

High Availability Architecture

The Kinetica HA eliminates the single point of failure in a standard Kinetica cluster and provides recovery from failure. It is implemented externally to the cluster to utilize multiple replicas of data and provides an eventually consistent data store, by default. Through modification of the configuration, immediate consistency can also be achieved, though at a performance cost.

HA Architecture

HA Architecture

The Kinetica HA solution consists of four components: a front-end load balancer, high-availability process managers, one or more Kinetica clusters, and a Distributed Messaging Queue (DMQ). A high-level diagram of the Kinetica HA solution is shown in the HA Architecture diagram.

The diagram depicts a three-cluster Kinetica HA configuration. Each cluster has one HAProc (the main HA processing component) managing requests. This reduces the complexity of the HA solution and allows HAProc component to run on the same hardware as the Kinetica cluster it serves. By doing this, we can make some assumptions that if the HAProc is down, then the associated cluster is down. This assumption simplifies the request processing and speeds up the HA solution. The diagram also shows that the DMQ contains one message queue per cluster. While the grouping of queues & clusters is a logical one, it can also be a physical one, reusing each cluster's hardware. However, it is recommended that the queues themselves exist on hardware separate from the cluster in order to optimize performance, separating synchronization management from request processing.

HAProc Failure and Recovery

HAProc failure is possible due to hardware or software failure. When an HAProc goes down, depending on what kind of failure it is, the associated cluster services may also be up or down. In either case though, since all access to those services is through the HAProc, once it goes down, the services are unavailable. During this time, any requests coming from clients will be failed over and distributed to the remaining HAProc managers of the other clusters.

For all operations, each alive HAProc component will continue passing client requests to its corresponding cluster. For non-query operations, the HAProc will also write the request to the DMQ. When any crashed HAProc comes back online and before it starts servicing requests, it will fetch all the queued operations from the DMQ which were stored during its downtime, and replay them in sequence on its corresponding cluster. Query operations are not written to DMQ, by default, including those that create result set views, though this behavior can be altered via configuration. For all kinds of operations, once the load balancer is aware that an HAProc is down, each operation is seamlessly failed over to another HAProc that is still alive.

If an HAProc ever crashes while its corresponding cluster is still up and running, an administrator should restart the HAProc process using the standard start script. An HAProc does not store any data locally and always writes to and reads from the highly-available DMQ. Therefore, there is no need to restore or backup any files on behalf of an HAProc component. To bring down an HAProc for planned maintenance activity, first gracefully shut down its companion cluster and then simply terminate the HAProc process.

Kinetica Failure and Recovery

A Kinetica cluster can become unavailable under two circumstances: the cluster itself terminates (either for planned maintenance or crash) or its companion HAProc goes down (again, either planned or not, since all access to the cluster is through HAProc). If a cluster goes down, its companion HAProc notices that fact (due to failure to communicate with the cluster) and brings itself down. This way, the load balancer does not try to communicate with an HAProc serving a downed cluster. Kinetica stores data in memory while keeping a copy on disk. This allows the database to be brought back up quickly with all its data populated after going down. During a planned maintenance phase, when no data ingestion is expected, Kinetica can be brought down and the persistence directories can be backed up following standard in-house procedures for file backup. In a default installation, the data directory is located at /opt/gpudb/persist, but can be changed in the gpudb.conf file. The entire directory specified as the location of the persistent data should be backed up on all cluster nodes to create a completely recoverable copy of the database.

High Availability Processing

Kinetica achieves eventual consistency through effective rule-based operation handling.

Operation Handling

A typical interaction with Kinetica begins with a client sending a request to the system. The load balancer receives the request and sends it to a chosen HAProc. The HAProc will then process the request according to its internal request categorization scheme, comprising three operation types:

Synchronous Operations

Operations in this category are determined to be critical to the successful execution of subsequent requests, and, in a default installation, include DDL and system/administrative endpoints. They will be relayed to all Kinetica clusters and only succeed if the request is processed successfully by all available clusters. Requests in this category should execute & return quickly, as no further requests will be processed by the receiving HAProc for the requesting client until that client's synchronous operation is finished. Also, requests are passed serially to the other clusters, so a 3-cluster Kinetica system will take 3 times as long to complete each request in this category.

An example synchronous operation request is one to create a table. The table is created in all available clusters or, if an application-level error occurs, the request is rejected back to the client. In this way, the client can be assured that subsequent requests issued to Kinetica, which may be load balanced to other clusters, will have that table available. Note that this does not guarantee the presence of the table for other clients issuing requests shortly after the create table request; i.e. the system will not block further requests across all clusters until a given synchronous operation is processed.

HA Synchronous Operation Flow

HA Synchronous Operation Flow

Kinetica will handle requests in this category as follows:

  1. The receiving HAProc will issue the request to its corresponding cluster and receive a result. If an application-level error occurs, the request will be rejected back to the client. If an availability error occurs (connection times out, connection refused, etc...), the request will be rejected back to the load balancer, which will forward the same request to the next available HAProc and restart this service flow.
  2. If the request is successful on the initial cluster, the receiving HAProc will then issue the same request directly to each of the other available clusters, one after the other. If any operation fails against a cluster, it will be stored in that cluster's message queue for later replay during recovery (the case shown with the downed Kinetica C in the diagram).
  3. Once the receiving HAProc has received responses from all other clusters, it will return a success response to the client via the load balancer.

Asynchronous Operations

Operations in this category are determined to be important for data consistency across the Kinetica clusters, but not to the successful execution of subsequent requests. In a default installation, these include DML endpoints. They will be serviced immediately by the receiving cluster and added to the message queues of the remaining clusters and serviced in a timely fashion. Success is determined by the result of the operation on the original cluster only. Because of this asynchronicity, requests in this category can be long-running, and will only take as much time as the request takes on a single cluster with some marginal HA-related overhead.

An example asynchronous operation request is one to insert records into a table. The requested records are inserted immediately into the cluster associated with the receiving HAProc, and queued for insert into the remaining clusters, where they will be available shortly afterwards.

HA Asynchronous Operation Flow

HA Asynchronous Operation Flow

Kinetica will handle requests in this category as follows:

  1. The receiving HAProc will issue the request to its corresponding cluster and receive a result. If an application-level error occurs, the request will be rejected back to the client. If an availability error occurs (connection times out, connection refused, etc...), the request will be rejected back to the load balancer, which will forward the same request to the next available HAProc and restart this service flow.
  2. If the request is successful on the initial cluster, the receiving HAProc will then publish the request to the message queue associated with each of the other clusters for later processing. Note that requests are published to the message queues regardless of the up or down state of the corresponding clusters, though only alive clusters will pull operations off of the queue (the case shown with the up Kinetica B & downed Kinetica C in the diagram).
  3. Once the receiving HAProc has finished publishing the request, it will return a success response to the client via the load balancer.

Queries

Operations in this category are determined to be able to run on any single available cluster and not require any synchronization across clusters. In a default installation, these include data retrieval endpoints. They will be serviced immediately by the receiving cluster and return immediately. As they only run on the receiving cluster, success is only determined by the result of the operation on that cluster.

An example query request is one to aggregate records by groups and return the resulting data set. Since all available clusters should be consistent, aside from any pending asynchronous operations, any single cluster should be able to service the request.

HA Query Operation Flow

HA Query Operation Flow

Kinetica will handle requests in this category as follows:

  1. The receiving HAProc will issue the request to its corresponding cluster and receive a result. If an application-level error occurs, the request will be rejected back to the client. If an availability error occurs (connection times out, connection refused, etc...), the request will be rejected back to the load balancer, which will forward the same request to the next available HAProc and restart this service flow (the case shown with the downed Kinetica A & up Kinetica B in the diagram).
  2. If the request is successful on the initial cluster, the receiving HAProc will return a success response to the client via the load balancer.

Eventual Consistency

The Kinetica clusters within a given installation will reach eventual consistency through the combined synchronous & asynchronous processing of operations from other clusters during normal conditions and through in-sequence processing of queued operations during recovery.

Data Synchronization

During normal operation, asynchronous operations received & processed by one cluster will be published to the message queue of each of the other clusters. In addition to receiving client requests, each HAProc also polls its respective message queue for new queued asynchronous operations, and processes them. In the default configuration, insert requests are pulled off of the queue and executed in multi-threaded fashion. Updates & deletes, on the other hand, will block until all in-flight insert requests have finished and then be processed afterwards. This ordering mechanism only applies to requests pulled off the message queue, and has no bearing on the handling of external client requests.

This leads to eventual consistency, as each cluster, either via handling asynchronous operations directly from clients, or indirectly via the message queue, will end up processing all client requests.

Recovery

HA Recovery Flow

HA Recovery Flow

When a cluster enters into a failed state, inbound client requests will begin to accumulate on its corresponding message queue. When the failure has been corrected and the cluster restarted, the HAProc will begin the recovery synchronization process as follows:

  1. As the cluster comes up and starts its recovery process, all operations being executed against it by other live clusters will continue to be queued. At this point, the associated HAProc is still marked as unavailable and will not yet handle new inbound requests.
  2. The recovering cluster's HAProc will check its message queue for requests and service those in queued order.
  3. Once the queue has been drained, the HAProc will inform the load balancer that it is ready for new requests.

In the diagram, Kinetica C is in recovery mode and will be marked as unavailable until it has finished with its queued operations. While it recovers, Kinetica A & Kinetica B will continue to serve requests, adding all operations to its message queue.