KAgent

KAgent is a multi-faceted administration, installation, and configuration management tool. It provides a centralized way to perform a consistent install across an existing or yet-to-be-cloud-provisioned cluster of computers. KAgent can also assist in automating tasks such as provisioning cloud hardware, configuring cluster security, adding and removing nodes, data backup and restoration, monitoring cluster health, managing and configuring cluster high availability. It has both a graphical (web) interface as well as a command line interface.

Features

KAgent facilitates or directly performs the following operations:

UI

The KAgent UI is usually available on port 8081 of the desired machine, so it can be accessed via http://<kagent-host>:8081. The KAgent UI has a navigation pane on the left-hand side and a Notifications pane on the right-hand side. Review Logging In / Out for information on accessing the KAgent UI.

../images/kagent_ui_dashboard_full.png

Notifications

The Notifications pane lists notifications from metric or event Alerts. If there are no unread notifications, click See Past Notifications to open the Alert History . Click Mark All as Read to mark all unread notifications as read; click Mark as Read to mark an individual unread notification as read.

Logging In / Out

Once a cluster is added to KAgent (either via the installation or cluster addition process), users must login to KAgent any time they need to use any of its features. Conversely, if no clusters are in KAgent, there's no need to login, and thus, there's no way to log out (the button won't be available) until a cluster is added. To log into KAgent:

  1. Navigate to KAgent (http://<kagent-host>:8081)

  2. Provide a username for the Username field. Only System Admin users have access to KAgent.

  3. Provide a password for the Password field.

  4. Select a cluster from the Authentication Cluster drop-down menu.

  5. Click Log In.

    Important

    Authenticating against a particular cluster does not restrict users from accessing other clusters that have been added to this particular instance of KAgent.

After a successful login, the KAgent UI displays the Dashboard page by default.

To log out of KAgent:

  1. From the KAgent UI (http://<kagent-host>:8081), click Logout in the navigation pane.

CLI

The KAgent CLI is available via the kagent executable typically stored in /opt/gpudb/kagent/bin/.

The form of the command is as follows:

1
2
3
4
kagent [-h] [--debug] [--quiet] [-f <path>] [--kagent-dir <path>]
   [-o <format>] [--user <username>] [-v]
   < ring | cluster | node | log | check | etcd-control |
   factory-reset | get-etcd-credentials | monitor | refresh-config | update >

Options

OptionDescription

-h

--help

Show the help menu. When used following one of the subcommands, the subcommand-specific help menu will be shown.
--debugLog debug messages.
--quietSuppress all output messages.

-f <path>

--dbfile <path>

Path to KAgent cluster configuration database.
--kagent-dir <path>Path to where KAgent and its playbooks reside.

-o <format>

--output <format>

The output message format to use:

  • human - (default) output in human-readable form
  • json - output in JSON format
--user <username>Specify what user is performing the actions, for logging purposes.

-v

--verbose

Run ansible-playbook with -vvv to debug issues.
Subcommands 
ringManage rings.
clusterManage clusters.
nodeManage nodes.
logManage logs.
checkCheck cluster connectivity.
etcd-controlManage etcd nodes.
factory-resetReset KAgent to its original state and uninstall Kinetica packages.
get-etcd-credentialsShow kinetica-etcd credentials autogenerated during package installation.
monitorSet a monitor for checking cluster connectivity.
refresh-configForce a refresh of the clusters from current status.
updateUpdate global KAgent settings.

Ring

The form of the command to manage rings is as follows:

1
kagent ring <command>
Ring CommandDescription
add [options] <name>

Add a new ring with the given name.

OptionDescription

-a <addr>

--addr <addr>

Specify the load balancer address for the ring.
backup [options] <name>

Backup an existing ring with the given name.

OptionDescription
--backup-path <path>Specify the path in which the backup will be created. Once given, will become the default backup path for subsequent backups. Initial backup directory is /opt/backups.
control <name> <operation> <component>

Control the services of the ring with the given name.

Apply one of the following operations:

  • start
  • stop
  • restart

To one of the following components:

  • gpudb
  • host_manager
  • tomcat
  • reveal
  • kml
  • stats
  • text_search
  • httpd
  • ha
  • mq
  • all_gpudb
  • all
force-lock <name>Force the ring with the given name to lock.
force-unlock <name>Force the ring with the given name to unlock.
gather-logs [options] <name>

Download the logs from the ring with the given name to a destination on the KAgent host.

OptionDescription
--backtraceAdd a process backtrace to the database process.
--kagent-logsAdd the KAgent logs to the archive, up to the point at which they are collected.
--log-lines <line_count>Specify the number of lines to collect from the log. The first 100 lines are always saved. Use 0 to collect the entire log or ERROR to collect only error log messages. Default is 100,000 log lines.
--output-dir <path>Specify the path where the log archive will be written. The directory must be writable by the gpudb user on the KAgent host.
--package-verifyVerify the installed Kinetica packages.
inspect <name>Inspect the details of the ring with the given name.
install <name>Install the HA platform on the ring with the given name.
listList all managed rings.
rabbit-recovery [options] <name>

Attempt a RabbitMQ recovery; the process will clear all queues.

OptionDescription
--proceed <yes|no>Pass the final recovery confirmation (yes or no).
remove <name>Remove the ring with the given name.
update [options] <name>

Update the details of the ring with the given name.

OptionDescription
--addr <addr>Specify the load balancer address for the ring.
--ha-enabled <yes|no>Specify that HA has been enabled (yes) or not (no) for the ring.
upgrade [options] <name>

Upgrade Kinetica to the latest version on the ring with the given name. This will perform a sequential in-place upgrade of each cluster within the ring.

OptionDescription
--offline-aaw-installer <path>Specify the file path or URL of the location for the AAW installer package (rpm,deb).
--offline-core-installer <path>Specify the file path or URL of the location for the gpudb installer package (rpm,deb).
--offline-rabbit-installer <path>Specify the file path or URL of the location for the gpudb HA installer package (rpm,deb).
--offline-etcd-installer <path>Specify the file path or URL of the location for the kinetica-etcd installer package (rpm,deb).
--etcd-node-hostnames <list>Specify a comma-separated list of hostnames of existing nodes in the ring that will have kinetica-etcd installed. Use only when upgrading from versions prior to 7.1. If etcd is already installed, this option will be ignored.
--rabbit-drain-timeout <timeout>Specify the timeout in minutes that the upgrade will wait for queues to drain before beginning the upgrade. The upgrade will be aborted if the queues are not empty. Default is 3.

Cluster

The form of the command to manage clusters is as follows:

1
kagent cluster <command>
Cluster CommandDescription
backup [options] <name>

Backup the data on the cluster with the given name.

OptionDescription
--backup-path <path>Specify the path in which the backup will be created. Once given, will become the default backup path for subsequent backups. Initial backup directory is /opt/backups.
--list-scheduleList backup schedules by backup type.
--schedule <schedule>Specify the backup schedule. Use now to run an immediate backup. Use a quoted crontab-style expression to schedule a backup in cron. Use never to remove a backup schedule. Default is now.
--table-list <list>Specify a space-separated list of tables to backup.
backup-configuration-files [options] <name>

Backup all configuration files on the cluster with the given name.

OptionDescription
--backup-path <path>Specify the path in which the backup will be created. Once given, will become the default backup path for subsequent backups. Initial backup directory is /opt/backups. The directory must be writable by the gpudb user on the KAgent host.
backup-schedule <name>List scheduled backups on the cluster with the given name.
bootstrap-kagent [options] <name>

Bootstrap the KAgent role to a different host in the cluster; further cluster management must happen through the KAgent on this different host.

OptionDescription
--kagent-hostname <host>Specify the name of the host where KAgent will be bootstrapped.
check-for-upgrades <name>Check if upgrades are available on-line for the cluster with the given name.
clone [options]

Clone one cluster into another.

OptionDescription
--authentication <yes|no>Specify whether to copy (yes) or not copy (no) authentication settings.
--data <yes|no>Specify whether to copy (yes) or not copy (no) data.
--destination <name>Specify the name of the cluster to clone to.
--graph <yes|no>Specify whether to copy (yes) or not copy (no) persisted graph information.
--source <name>Specify the name of the cluster to clone from.
--users <yes|no>Specify whether to copy (yes) or not copy (no) users and permissions.
control <name> <operation> <component>

Control the services of the cluster with the given name.

Apply one of the following operations:

  • start
  • stop
  • restart

To one of the following components:

  • gpudb
  • host_manager
  • tomcat
  • reveal
  • kml
  • stats
  • text_search
  • httpd
  • ha
  • mq
  • all_gpudb
  • all
gather-logs [options] <name>

Download the logs from the cluster with the given name to a destination on the KAgent host.

OptionDescription
--backtraceAdd a process backtrace to the database process.
--kagent-logsAdd the KAgent logs to the archive, up to the point at which they are collected.
--log-lines <line_count>Specify the number of lines to collect from the log. The first 100 lines are always saved. Use 0 to collect the entire log or ERROR to collect only error log messages. Default is 100,000 log lines.
--output-dir <path>Specify the path where the log archive will be written. The directory must be writable by the gpudb user on the KAgent host.
--package-verifyVerify the installed Kinetica packages.
get-conf-properties <name>Show database configuration properties of the cluster with the given name.
get-logger [options] <name>

Show logger and logging level for the cluster with the given name. To list the available loggers, run:

1
kagent cluster get-logger --ranks 0 <name>
OptionDescription
--logger <name>Specify the name of the logger to show.
--ranks <ranks>Specify the number of the rank from which to retrieve logging config.
init [options] <name>

Initialize a new cluster with the given name.

OptionDescription
--ring <name>Specify the name of the ring in which to place this cluster.

-k <path>

--ssh-key <path>

Specify the path to the SSH private key to use for cluster operations.

-u <username>

--ssh-user <username>

Specify the SSH username to use for cluster operations.

This overrides the KAGENT_SSH_USER environment variable.

-p <password>

--ssh-password <password>

Specify the SSH password to use for the SSH user.

This overrides the KAGENT_SSH_PASS environment variable.

-su <username>

--sudo-user <username>

Specify the sudo username to use for cluster operations, in the case where root logins are not allowed.
--sudo-password <password>

Specify the sudo password to use for the sudo user.

This overrides the KAGENT_SUDO_PASSWORD environment variable.

-admpass <password>

--admin-pass <password>

Specify the Kinetica admin user password.
--connect-via <method>Specify whether to connect to each node's internal IP address (ip_addr) or public IP address (public_ip_addr).

-inf <provider_code>

--infrastructure-provider <provider_code>

Specify the cluster's infrastructure provider:

Provider CodeDescription
onpremOn-premise (bare-metal) installation, or a cloud-based installation not provisioned via KAgent
awsAmazon Web Services, provisioned via KAgent
azureMicrosoft Azure, provisioned via KAgent
gcpGoogle Cloud Services, provisioned via KAgent

-lic <key>

--lic-key <key>

Specify the license key to use for this cluster.
--aws-access-key <key>

Specify the AWS access key to use for cluster provisioning and operations.

This overrides the KAGENT_AWS_ACCESS_KEY environment variable.

--aws-secret-key <key>

Specify the AWS secret key to use for cluster provisioning and operations.

This overrides the KAGENT_AWS_SECRET_KEY environment variable.

--aws-ssh-key-name <name>Specify the name of the SSH key to use to log into cluster nodes. If none is provided, a key will be created.
--azure-client-id <id>

Specify the client id from the Azure login profile, usually found in .

This overrides the KAGENT_AZURE_CLIENT_ID environment variable.

--azure-secret <secret>

Specify the secret from the Azure login profile, usually found in .

This overrides the KAGENT_AZURE_SECRET environment variable.

--azure-subscription-id <id>

Specify the subscription id from the Azure login profile, usually found in .

This overrides the KAGENT_AZURE_SUBSCRIPTION_ID environment variable.

--azure-tenant <tenant>

Specify the tenant from the Azure login profile, usually found in .

This overrides the KAGENT_AZURE_TENANT environment variable.

--cloud-region <region>Specify the AWS region, Azure location, or GCP zone for the cluster.
--cloud-ssh-user <username>Specify the username to create a login for on Azure or GCP provisioned instances.
--cloud-ssh-public-key-file <path>Specify the path to the public key to use for authentication on Azure or GCP instances.
--gcp-project <project>Specify the GCP project with which this cluster should be associated.
--gcp-service-account-file <path>Specify the GCP service account file (JSON) for the user.
inspect <name>Inspect the details of the cluster with the given name.
install [options] <name>

Install Kinetica on a new cluster. Note: specifying any offline installer will switch the install to offline mode.

OptionDescription
--auto-config <yes|no>Whether to update (yes) or not update (no) the configuration on the cluster during install. Default is to update the configuration.

-c <yes|no>

--cuda <yes|no>

Whether to use a CUDA (GPU) build (yes) or Intel (CPU) build (no).
--enable-np1 <yes|no>Whether to enable (yes) or not enable (no) cluster resiliency
--k8s-config-file <path>Specify the path to the kubeconfig file of the external K8s cluster which AAW will use.
--k8s-public-ip <addr>Specify the IP address at which the K8s cluster is accessible by the Kinetica cluster.

-nv <yes|no>

--nvidia <yes|no>

Whether to install (yes) or not install (no) the Nvidia driver when none is detected.
--open-firewall-ports <yes|no>Whether to open (yes) or not open (no) relevant firewall ports if an enabled firewall is detected.
--offline-aaw-installer <path>Specify the file path or URL of the location for the AAW installer package (rpm,deb).
--offline-core-installer <path>Specify the file path or URL of the location for the gpudb installer package (rpm,deb).
--offline-etcd-installer <path>Specify the file path or URL of the location for the kinetica-etcd installer package (rpm,deb).
--offline-kagent-installer <path>Specify the file path or URL of the location for the KAgent installer package (rpm,deb).
--offline-nvidia-installer <path>Specify the file path or URL of the location for the Nvidia installer package (rpm,deb).
--offline-rabbit-installer <path>Specify the file path or URL of the location for the gpudb HA installer package (rpm,deb).
--reserve-k8s-gpus <number>Specify the number of GPUs to reserve for K8s/AAW usage.
listList all managed clusters.
list-backup-contents [options] <name>

List the contents of a backup on the cluster with the given name.

OptionDescription
--backup-path <path>Specify the path to the backup directory.
--restore-from <path>Specify the backup whose contents will be listed; this will be the name of a backup directory under the path given in --backup-path.
list-backups [options] <name>

List the available backups on the cluster with the given name.

OptionDescription
--backup-path <path>Specify the path to the backup directory.
list-cluster-contents <name>List all of the tables on the cluster with the given name.
preflight <name>Detect/regenerate environment settings for running KAgent commands on the cluster with the given name.
remove <name>Remove the cluster with the given name.
restore [options] <name>

Restore the contents of a backup to the cluster with the given name.

OptionDescription
--backup-path <path>Specify the path to the backup directory.
--preserve-persist <yes|no>Whether to move (yes) or not move (no) the existing database persist folder to a safe location before overwriting. Default is no.
--restore-from <path>Specify the backup to restore; this will be the name of a backup directory under the path given in --backup-path.
--table-list <list>Specify a space-delimited set of tables to restore from the backup.
secure [options] <name>

Secure the cluster with the given name by enabling HTTPS and/or authentication via LDAP, Active Directory, or Kerberos. Note: All parameters relevant to the desired authentication mechanism must be specified upon each invocation of this command--no existing settings will be used as defaults.

OptionDescription
--authentication <type>

Specify the type of authentication to use:

  • none
  • ad
  • kerberos
  • ldap
--generate-certs <yes|no>Whether to generate (yes) or not generate (no) self-signed certificates. Certificates can also be assigned directly to each node with the kagent node command.
--ldap-host <name>When using LDAP, the name of the LDAP bind host.
--ldap-port <port>When using LDAP, the port to bind to.
--ldap-base-filter <filter>When using LDAP, the filter to use when searching the directory for logins.
--ldap-bind-user <username>When using LDAP, the username of the account to use when connecting to the directory.
--ldap-bind-pwd <password>

When using LDAP, the password of the account to use when connecting to the directory

This overrides the KAGENT_LDAP_BIND_PWD environment variable

--kerberos-realm <realm>When using Kerberos, the realm to authenticate against. For example: MY-REALM.ACME.COM.
--kerberos-service-name <name>When using Kerberos, specify the Kerberos service location. For example: HTTP/kerb-server.acme.com.
--kerberos-keytab <path>When using Kerberos, specify the path to the keytab file to use.
set-conf-properties [options] <name>

Set database configuration properties for the cluster with the given name.

OptionDescription
--properties-map <map>

Specify a map of key-value pairs of database configuration parameters to set. For example:

1
{"np1.load_vectors_on_migration":"always"}
set-failover [options] <name>

Set the failover policies for the cluster with the given name.

OptionDescription
--enable-head-failover <yes|no>Whether to allow (yes) or not allow (no) failover of the head node if cluster resiliency is enabled.
--enable-worker-failover <yes|no>Whether to allow (yes) or not allow (no) failover of worker nodes if cluster resiliency is enabled.
set-logger [options] <name>

Set logger and logging level for the cluster with the given name.

OptionDescription
--level <level>

Specify the level of logging for the selected logger(s). One of:

  • TRACE
  • DEBUG
  • INFO
  • WARN
  • ERROR
  • FATAL
  • OFF
--logger <name>Specify the name of the logger to modify.
--ranks <ranks>Specify the number of the rank to where the logging modification will be applied. A comma-separated list of rank numbers can be used to specify multiple ranks to modify; e.g., 0,3,4. Use -1 to apply the modification across the cluster.
uninstall <name>Remove the cluster with the given name, including all components except this instance of KAgent.
update [options] <name>

Modify select parameters of the cluster with the given name.

OptionDescription
--connect-via <method>Specify whether to connect to each node's internal IP address (ip_addr) or public IP address (public_ip_addr).
--is-installed <yes|no>Whether to mark (yes) or not mark (no) this cluster as installed.
--move-to-ring <name>Specify the name of an existing ring to move this cluster into.
verify [options] <name>

Verify connectivity and basic configuration of the cluster with the given name.

OptionDescription
--include-dependencyInclude info from related cluster services like RabbitMQ & etcd.
--status-onlyOnly gather the service status on the nodes.
write-inventory [options] <name>

Write out an inventory file for the cluster with the given name.

OptionDescription

-i <path>

--inventory-dir <path>

The path to write the inventory file to. Default is ./ansible-inventory-<cluster_name>.
--vault-password <password>

Specify the password to use for the ansible vault.

This overrides the KAGENT_VAULT_PASSWORD environment variable.

Node

The form of the command to manage nodes is as follows:

1
kagent node <command>
Node CommandDescription
discover-hostname [options] <name>Attempt to auto-discover (and update) the hostname of the node with the given name.
gather-logs [options] <name>

Download the logs from the node with the given name to a destination on the KAgent host.

OptionDescription
--backtraceAdd a process backtrace to the database process.
--kagent-logsAdd the KAgent logs to the archive, up to the point at which they are collected.
--log-lines <line_count>Specify the number of lines to collect from the log. The first 100 lines are always saved. Use 0 to collect the entire log or ERROR to collect only error log messages. Default is 100,000 log lines.
--output-dir <path>Specify the path where the log archive will be written. The directory must be writable by the gpudb user on the KAgent host.
--package-verifyVerify the installed Kinetica packages.
init [options] <name> <addr> <cluster>

Initialize a new node with the given name and IP addr on the given cluster. The name must be unique across all nodes in the cluster.

OptionDescription
--cloud-instance-name <name>Specify an optional name for the node.
--cloud-instance-type <type>Specify the type of the node, based on the cloud provider.
--data-size <size>Specify the size of the storage to allocate for the node in GB.
--gcp-gpu-card <card>Specify the GPU card to attach to the node (if available and using GCP as the provider).
--public-ip-addr <addr>Specify the IP addr of the node accessible outside the DMZ, if applicable.
--public-hostname <hostname>Specify the hostname of the node accessible outside the DMZ, if applicable.
--roles <list>

Specify a comma-separated list of roles for the node.

headHead node for the cluster
workerOne of the worker nodes in the cluster
graphGraph node
kmlAAW node
ha_queuesRabbitMQ node for ring resiliency
kagentBootstrapped in-cluster KAgent for cluster/ring management
etcdetcd configuration management node for the associated KAgent
--ssh-port <port>Specify the port to use for SSH connections to the node.
--ssl-cert <path>Specify the path to the SSL certificate for the node.
--ssl-key <path>Specify the path to the SSL key for the node.
inspect <name>Inspect the details of the node with the given name.
listList all managed nodes.
remove [options] <name>

Remove the node with the given name.

OptionDescription

-f

--force

Always remove the node, even if some aspect of the removal fails.
set-failover [options] <name>

Set the failover policy for the node with the given name.

OptionDescription
--accepts-failover <yes|no>Whether to accept (yes) or not accept (no) failover from other nodes if cluster resiliency is enabled.
switchover <src_hostname> <dst_hostname>

Immediately switch all components & processes running on a source node with the given src_hostname to a destination node in the same cluster with the given dst_hostname. The following caveats apply:

  • AAW moves with rank0
  • KAgent, RabbitMQ, & etcd will not move
update [options] <name>

Modify select parameters of the node with the given name.

OptionDescription
--public-hostname <hostname>Specify the hostname of the node accessible outside the DMZ, if applicable.
--ssl-cert <path>Specify the path to the SSL certificate for the node.
--ssl-key <path>Specify the path to the SSL key for the node.

Log

The form of the command to manage KAgent logs is as follows:

1
kagent log <command>
Log CommandDescription
list [options]

Show a list of KAgent log events.

OptionDescription

-n <number>

--number <number>

Specify the maximum number of log events to show.

Check

This command is used to ensure that all the nodes of a cluster are up by checking for connectivity and then interjecting spare nodes, if available, to fill in any gaps.

The form of the command to perform this check is as follows:

1
kagent check [options]
OptionDescription
--retry-count <number>Specify the number of connectivity check retries before failing over nodes.
--retry-delay <seconds>Specify the seconds to wait between each connectivity check retry.

etcd

The form of the command to manage etcd is as follows:

1
kagent etcd-control <operation>
OperationDescription
startStart all etcd services associated with this KAgent.
stopStop all etcd services associated with this KAgent.
restartRestart all etcd services associated with this KAgent.

Factory Reset

This command uninstalls all Kinetica packages and resets KAgent configurations to an out-of-the-box condition. No directories will be removed unless requested.

The form of the command to perform a factory reset is as follows:

1
kagent factory-reset [options]
OptionDescription
--clear-data <yes|no>Whether to remove (yes) or not remove (no) directories left by the installation. Default is to not remove directories.
--proceed <yes|no>Whether to automatically proceed (yes) or ask for confirmation (no) before performing a reset.

Get etcd Credentials

This command shows kinetica-etcd credentials autogenerated during the installation of kinetica-etcd packages.

The form of the show etcd credentials is as follows:

1
kagent get-etcd-credentials

Monitor

This command sets a monitor for checking cluster connectivity.

The form of the command is as follows:

1
kagent monitor [options]
OptionDescription
--interval <schedule>Specify how often the check command will be run, in crontab format. Default is */5 * * * *.
--retry-count <number>Specify the number of check retries before failing over a node.
--retry-delay <seconds>Specify the seconds to wait between each check retry.

Refresh Config

This command forces a refresh of the cluster configuration and roles, given its current status.

The form of the command is as follows:

1
kagent refresh-config

Update

This command updates global KAgent settings.

The form of the command is as follows:

1
kagent update [options]
OptionDescription
--is-bootstrapped <yes|no>Whether to mark this KAgent as bootstrapped (yes) or not (no). A bootstrapped KAgent is one that is deployed into a cloud-provisioned cluster during installation. This marking will determine which set of IPs this KAgent will use in connecting via SSH to the cluster nodes.
--force-bootstrap-unlock <yes|no>Whether to remove (yes) or not remove (no) the lock placed on this KAgent if it had been used to bootstrap an in-cluster KAgent.