High Availability Configuration & Management

Pre-requisites

An HA installation requires the following components:

  1. Two or more clusters with matching Kinetica installations managed by KAgent

    Note

    If the clusters are not managed, KAgent should be installed and then the clusters should be added to it.

  2. Two nodes (across the clusters) have RabbitMQ installed

  3. etcd

  4. The KAgent UI

Configuration

Enable HA

HA is enabled on a ring using the KAgent UI. To enable HA using KAgent:

  1. Log into the KAgent service with a web browser:

    http://<kagent-host>:8081
    
  2. From the left menu, click Manage.

  3. On the Rings page, next to the ring containing your clusters, click Enable HA.

  4. Click Enable to confirm setup. The High Availability package will be installed on each cluster and automatically configured.

  5. Click Close. HA is now enabled for the ring.

    ../../admin/kagent/images/manage_rings.png

Config HA

KAgent exposes the most important High Availability configuration settings via the KAgent UI. Information about the settings can be found in the table below.

Tip

All HA configuration settings can be edited via the etcd modify configuration window on the KAgent Rings dashboard. However, the settings can only be edited if the database is shutdown.

SettingDescriptionDefault Value
qosQuality of service; limits the number of messages that are prefetched and queued locally on the cluster for better consumption performance.1000
startup_queue_limitThe database appears down until the queue is drained of the set number of requests.100000
max_request_failuresDetermines how many times a cluster can fail to receive a request that was successful on another cluster.3
request_failure_pauseTime to pause in seconds between request send retries.3
discard_failed_requestsIf true, then requests that exceed the maximum failures (and timeout) will be discarded.true

To update the settings:

  1. Log into the KAgent service with a web browser:

    http://<kagent-host>:8081
    
  2. From the left menu, click Manage.

  3. On the Rings page, next to the ring containing your clusters, click Config HA.

  4. Adjust the settings as necessary.

  5. Click Update.

APIs

Any client API connection made to an HA cluster ring should be configured to fail over from one cluster to another even during a node failover or switchover. For more information on the HA failover modes, consult High Availability Architecture. The enabled failover mode (active/active or active/passive) is governed by whether a primary URL is specified or implied as follows:

  • Active/Active - No primary URL is specified and more than one URL is specified
  • Active/Passive - A primary URL is specified or a single URL is specified

If using either failover mode and a cluster is not reachable, the API will attempt to re-establish the current cluster connection a configurable number of times. If no response is received, it is assumed the head node has failed/switched over and all worker processes in the current cluster connection are contacted to attempt to get an address for the head node. If a valid, new address is received, the connection will be retried. If the address is the same as it was prior to failover / switchover or no response is received from the worker processes, another cluster from the list of known hosts is selected to amend the connection. If no other clusters are able to amend the connection, the connection fails.

C++

Tip

For more information on C++ API database object instantiation, review the C++ API Reference.

Active/Active Configuration

To instantiate a Kinetica connection object with failover in C++, pass a comma-delimited list of head node URLs to the constructor:

1
gpudb--GPUdb gpudb("http-//172.1.2.3-9191,http-//172.1.2.4-9191,http-//172.1.2.5-9191");

In this case, a cluster will be chosen randomly or sequentially (based on database object configuration) from the given list for the initial connection. Subsequent requests through the instantiated connection object will go to the same cluster.

Important

If you provide a single URL to the gpudb constructor, the failover mode will instead be Active/Passive and the URL will be treated as the primary URL.

Active/Passive Configuration

To designate a cluster from the list to always attempt to go to first, specify a primary URL:

1
2
gpudb--GPUdb--Options options = gpudb--GPUdb--Options().setPrimaryUrl("http-//172.1.2.4-9191");
gpudb--GPUdb gpudb("http-//172.1.2.3-9191,http-//172.1.2.4-9191,http-//172.1.2.5-9191", options);

In either case, if the current cluster has a failure, the connector will randomly or sequentially (based on database object configuration) choose a failover cluster from the list to send further requests. If no operational clusters are found, an error will be returned.

Java

Tip

For more information on Java API database object instantiation, review the Java API Reference.

Active/Active Configuration

To instantiate a Kinetica connection object with failover in Java, pass a comma-delimited list of head node URLs to the constructor:

1
GPUdb gpudb = new GPUdb("http://172.1.2.3:9191,http://172.1.2.4:9191,http://172.1.2.5:9191");

In this case, a cluster will be chosen randomly or sequentially (based on database object configuration) from the given list for the initial connection. Subsequent requests through the instantiated connection object will go to the same cluster.

Important

If you provide a single URL to the gpudb constructor, the failover mode will instead be Active/Passive and the URL will be treated as the primary URL.

Active/Passive Configuration

To designate a cluster from the list to always attempt to go to first, specify a primary URL:

1
2
GPUdb.Options options = new GPUdb.Options().setPrimaryUrl("http://172.1.2.4:9191");
GPUdb gpudb = new GPUdb("http://172.1.2.3:9191,http://172.1.2.4:9191,http://172.1.2.5:9191", options);

In either case, if the current cluster has a failure, the connector will randomly or sequentially (based on database object configuration) choose a failover cluster from the list to send further requests. If no operational clusters are found, an error will be returned.

Python

Tip

For more information on Python API database object instantiation, review the Python API Reference.

Active/Active Configuration

To instantiate a Kinetica connection object with failover in Python, pass a list of head node URLs to the constructor:

1
kinetica = gpudb.GPUdb(host=['http://172.1.2.3:9191','http://172.1.2.4:9191','http://172.1.2.5:9191'])

In this case, a cluster will be chosen randomly or sequentially (based on database object configuration) from the given list for the initial connection. Subsequent requests through the instantiated connection object will go to the same cluster.

Active/Passive Configuration

To designate a cluster from the list to always attempt to go to first, specify a primary host:

1
kinetica = gpudb.GPUdb(host=['http://172.1.2.3:9191','http://172.1.2.4:9191','http://172.1.2.5:9191'], primary_host='http://172.1.2.3:9191')

In either case, if the current cluster has a failure, the connector will randomly or sequentially (based on database object configuration) choose a failover cluster from the list to send further requests. If no operational clusters are found, an error will be returned.

Connectors

Any Kinetica connectors used to interface with an HA cluster ring should also be configured to fail over from one cluster to another.

ODBC

See Failover Connections for the ODBC/JDBC failover configuration.

External Data

When using an external table or when loading data via /insert/records/fromfiles (LOAD INTO, in SQL), the source of data needs to be accessible to all clusters within the ring or it needs to be synced between locations accessible to each cluster individually.

Similarly, when using a data source, the source needs to be accessible to all clusters within the ring.

Management

Once HA has been configured, several commands are available to aid in the management of the cluster. Run the following service with one of the commands from the table below:

service gpudb-ha <command>
CommandDescription
all-startStarts the gpudb, gpudb-ha, and gpudb-mq services.
all-statusDisplays the status of the gpudb, gpudb-ha, and gpudb-mq services.
all-stopStops the gpudb, gpudb-ha, and gpudb-mq services.
gpudb-startStarts the gpudb and gpudb-ha services.
gpudb-statusDisplays the status of the gpudb and gpudb-ha services.
gpudb-stopStops the gpudb and gpudb-ha services.
ha-restartRestarts the gpudb-ha service.
ha-startStarts the gpudb-ha service.
ha-statusDisplays the status of the gpudb-ha service.
ha-stopStops the gpudb-ha service.
mq-restartRestarts the gpudb-mq service.
mq-startStarts the gpudb-mq service.
mq-statusDisplays the status of the gpudb-mq service.
mq-stopStops the gpudb-mq service.
restartRestarts the gpudb-ha and backup processor services.
startStarts the gpudb-ha and backup processor services.
statusDisplays the status of the gpudb-ha and backup processor services.
stopStops the gpudb-ha and backup processor services.