Version:

Installation

Kinetica installation and configuration instructions.

System Requirements

Operating system, hardware, and network requirements to run Kinetica.

Certified OS List

CPU Platform Linux Distribution Versions
x86 RHEL 6.x, 7.x
x86 Centos 6.x, 7.x
x86 Ubuntu 14.x LTS , 16.x LTS
x86 SUSE 11 SP3, 11 SP4
x86 Debian 8.x
ppc64le RHEL 7.2
ppc64le Centos 6.x, 7.x
ppc64le Ubuntu 14.04 LTS , 16.x LTS

Minimum Operating System Requirements

Kinetica runs on the following 32 or 64-bit Linux-based operating systems.

OS Supported Versions
Amazon AMI 2012
CentOS 6, 7
Fedora 14+
RedHat 6, 7
SUSE Linux Enterprise 11+
Ubuntu 12+

Minimum Hardware Requirements

Component Specification
CPU Two socket based server with at least 8 cores. Intel x86-64, Power PC 8le, or ARM processor.
GPU Nvidia K20, K40, K80, GTX Ti780, Tegra, or similar.
Memory Minimum 8GB. Recommended 64-96GB
Hard Drive SSD or SATA 7200RPM hard drive with 3X memory capacity

Network Requirements

The Kinetica HTTP server receives requests on port 9191 and an administration web server operates on port 8080. These can both be configured after installation in the file /opt/gpudb/core/etc/gpudb.conf.

Single Machine Installation

1. Install the package

For Red Hat, Centos, and Fedora

sudo yum install gpudb-<version>.rpm

For Ubuntu

sudo apt-get gpudb-<version>

This installs the package to the directory /opt/gpudb , creates a user and group named 'gpudb' and a home directory in /home/gpudb. SSH keys are also created to allow password-less ssh access between machines for the gpudb user when configured as a cluster. This will also register Kinetica as a service.

2. Request a License

Run the program /opt/gpudb/core/bin/gpudb_keygen and send the output to your Kinetica support contact to receive a valid license.

3. Configure the installation

Edit the configuration file /opt/gpudb/core/etc/gpudb.conf. There are many different parameters to adjust the behavior and tune Kinetica to best suit your machine's characteristics, but the defaults should provide reasonably good performance out of the box.

  1. A valid license key, received by email, must be entered for the parameter:

    license_key = ...
    
  1. The number of processes should be set to the number of GPUs on the machine plus one extra process for the 'head-node' HTTP server. For example, if your machine has four attached GPUs set the parameter:

    number_of_ranks = 5
    
  2. Specify which GPUs should be used by setting the parameters below. Note that the rank0 'head-node' HTTP server process can and should share the GPU with the first worker rank:

    rank0.gpu = 0
    rank1.taskcalc_gpu = 0
    rank2.taskcalc_gpu = 1
    rank3.taskcalc_gpu = 2
    rank4.taskcalc_gpu = 4
    
  3. Choose a directory to store the data in. Note that you can split where different types of data is stored if required:

    persist_directory = /opt/gpudb/persist
    
  4. If you will not be using Kibana, you can configure Kinetica to turn it off:

    enable_kibana_connector = false
    
  5. To configure the database to run over a specific interface or subnet:

    mpi_options = --mca btl_tcp_if_include eth0
    

    or to exclude a specific interface or subnet from use by the database:

    mpi_options = --mca btl_tcp_if_exclude 10.10.0.0/24
    

4. Start Kinetica

Start Kinetica as the 'root' user by running

service gpudb start

Verify that Kinetica is running by browsing to http://<yourhostname>:8080/gadmin

Cluster Install

The steps below are an example of how to configure a Kinetica cluster to illustrate the installation process. A single host configuration is also supported (see Quickstart) and is even simpler to configure.

Requirements

Other than the system requirements above, a cluster install needs the following.

  • No firewall restrictions between machines with in a cluster
  • Same installation directory location across all nodes

Install

1. Install the RPM on all desired machines (see step 1 of Single Machine Installation).

In our example cluster we have three machines each with a different number of GPUs in them. We have decided we want the head-node HTTP process to run on the 172.30.20.4 node.

  • 172.30.20.4 has 1 GPU
  • 172.30.20.5 has 2 GPUs
  • 172.30.20.11 has 4 GPUs

2. Edit the hostsfile

The hostsfile /opt/gpudb/core/etc/hostsfile contains a list of the nodes to use in the cluster and 'slots' describes how many Kinetica processes to run on each node. Note that 'slots' and 'maxslots' must equal each other per host.

In our example, the file would be configured as follows:

172.30.20.4  slots=2 maxslots=2
172.30.20.5  slots=2 maxslots=2
172.30.20.11 slots=4 maxslots=4

There are two major considerations when filling out the hostsfile.

The first IP address in the 'hostsfile' is the address of the ‘head’ HTTP server node that will handle requests. This is also the machine that you will use to start and stop the Kinetica cluster.

In general, you should have one slot per GPU device plus one extra slot on the first IP address line for the ‘head-node’ HTTP server process that will share the GPU with the first Kinetica worker process on that host.

3. Update your system config file

The system config file /opt/gpudb/core/etc/gpudb.conf should be updated with values that are specific to your setup. (see Configuration Reference)

  • Valid license key must be set (sent by email)

  • Kibana can be enabled or disabled

  • Number of ranks must be set to head node + workers desired

  • The head ip address must be set to the external interface of the head node

  • For each worker rank, the number of a specific GPU should be set (enumerated based on the machine they're located on - can be numbered using nvidia-smi which comes with the nvidia GPU driver)

    rank0.gpu = 0
    rank1.taskcalc_gpu = 0
    rank0.numa_node =
    rank3.base_numa_node = 0-2
    rank3.data_numa_node = 3
    

Use the config file description page Configuration Reference to help understand what to set for each key to personalize more parameters.

4. Configure the cluster nodes for password-less ssh

Run the script /opt/gpudb/core/bin/gpudb_hosts_ssh_copy_id.sh to configure all the nodes for password-less ssh.

5. Persistent storage

For all the nodes, ensure that the directory as configured in the Kinetica's persist_directory system parameter exists (and is writable by the gpudb user) and that is has enough free space. It is recommended to have at least twice the amount of free disk space available as the installed system memory.

When required gpudb_hosts_persist_clear.sh can be used to clear out the persisted data (ensure Kinetica is shutdown before running the script).

6. Start Kinetica

Start Kinetica as the root user by running

service gpudb start

Verify that Kinetica is running by navigating to the URL http://<yourhostname>:8080/gadmin in the browser. This is the admin page for Kinetica where you can manage data, see stats, query Kinetica, and more.

User Authentication

To enable user authentication, use the program /opt/gpudb/core/bin/gpudb_accounts.py to manage users and privileges. This program is similar in operation to the Apache HTTP Server htpasswd program.

Users can be granted read, write or admin privileges.

Users with read privileges can access the following endpoints:

  • /aggregate/convexhull
  • /aggregate/groupby
  • /aggregate/histogram
  • /aggregate/kmeans
  • /aggregate/minmax
  • /aggregate/statistics
  • /aggregate/statistics/byrange
  • /aggregate/unique
  • /execute/proc
  • /filter
  • /filter/byarea
  • /filter/bybox
  • /filter/bygeometry
  • /filter/bylist
  • /filter/byradius
  • /filter/byseries
  • /filter/bystring
  • /filter/bytable
  • /filter/byvalue
  • /get/records
  • /get/records/bycolumn
  • /get/records/byseries
  • /get/records/fromcollection
  • /has/table
  • /has/type
  • /show/system/properties
  • /show/system/status
  • /show/system/timing
  • /show/table
  • /show/table/metadata
  • /show/table/properties
  • /show/tables/bytype
  • /show/triggers
  • /show/types
  • /visualize/image
  • /visualize/image/classbreak
  • /visualize/image/heatmap
  • /visualize/image/labels
  • /visualize/video
  • /visualize/video/heatmap

Users with write privileges can access all of the read-privileged endpoints above as well as the following endpoints:

  • /alter/table
  • /alter/table/metadata
  • /alter/table/properties
  • /clear/table
  • /clear/tablemonitor
  • /clear/trigger
  • /create/jointable
  • /create/table
  • /create/tablemonitor
  • /create/trigger/byarea
  • /create/trigger/byrange
  • /delete/records
  • /insert/records
  • /insert/records/random
  • /insert/symbol
  • /lock/table
  • /update/records
  • /update/records/byseries
  • /update/trigger

Users with admin privileges can access all of the read- and write- privileged endpoints above as well as the following endpoints:

  • /admin/getshardassignments
  • /admin/offline
  • /admin/rebalance
  • /admin/setshardassignments
  • /admin/shutdown
  • /admin/verifydb
  • /alter/system/properties

Update the 'accounts_file' config in system config file /opt/gpudb/etc/gpudb.conf to point to the accounts file and restart Kinetica.

At this point users must provide their credentials along with each request to Kinetica, via HTTP Basic authentication.

HTTPS/SSL Support

Kinetica supports HTTPS as a way to secure communication with the database. To enable HTTPS, edit the system config file /opt/gpudb/etc/gpudb.conf specifying 'true' for the 'use_https' option. In addition, set the 'https_key_file' and 'https_cert_file' to point to the appropriate .pem files.

Installation Best Practices

This section describes the best practices for an optimal Kinetica installation. After modifying any parameter values, for the modification to take effect, it is recommended to reboot the system.

  • System-wide resource limits

    Linux by default imposes certain limits on the use of system-wide resources. Kinetica recommends (as a safety measure to change) these per-user limits from the defaults to higher values to prevent the system from ever hitting the limits. This can be done either by editing the /etc/security/limits.conf or by running the linux ulimit command which will make the be effective only for the remainder of the session.

    • File descriptors limit

      Kinetica does not require a large number of simultaneously open files during operation, nevertheless it is recommended to set the soft limit to 16384 and the hard limit to 63536. This can be done either the linux ulimit command or by directy updating (or adding) the following entries in the file /etc/security/limits.conf

      gpudb soft nofile 16384
      gpudb hard nofile 63536
      
    • Process limit

      Kinetica does not require a large number of simultaneously open files during operation, nevertheless it is recommended to set both the hard and the soft limits to 16384. This can be done either the linux ulimit command or by directy updating (or adding) the following entries in the file /etc/security/limits.conf

      gpudb soft nproc 16384
      gpudb hard nproc 16384
      
  • CPU performance throttling

    For optimal performance the power scaling setting should be set to performance in the file /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to disable the on-demand CPU throttling.


  • Kernel tuning parameters

    The following parameters to the Linux Virtual memory subsystem should be tweaked for optimal performance. Unless specified, these values are configured in the file /etc/sysctl.conf . Login with root privileges to modify (or add) the values and then reboot the system for the value to take effect.

    • System memory Swapping

      The swappiness value (which controls how aggressively the system swaps memory pages) should be set to 10 or less in the sysctl.conf

      vm.swappiness = 10

    • Zone reclaim mode

      The Zone reclaim mode (which controls the approach taken by the system taken to reclaim the memory when a zone runs out of memory) should be set to 7 in the sysctl.conf

      vm.zone_reclaim_mode = 7

    • vfs_cache_pressure

      This parameter controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects and it is recommended to set this value to 200 or greater

      vm.vfs_cache_pressure = 200

    • Transparent hugepages

      Disable transparent hugepages for better performance. This value is specified in file /sys/kernel/mm/transparent_hugepage/enabled and should be set to 'never' as follows

      echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
      

Persist Migration Script

  • gpudb_migrate_persistence.py

    • Description:

      Read types and tables from persist files (version 2015021000 or later) and create them on a working Kinetica (version 5 or later). Then send all objects in these tables.

      Note: Run on Rank0 first, to read types and tables, and then on the rest of the working ranks, to send the objects. Send objects only for tables that already exists in Kinetica (that were created from rank0 files).

    • Usage:

      gpudb_migrate_persistence.py [options] path

      • Options:

        -h, --help

        show this help message and exit

        -d, --dry_run

        do a dry run; i.e. don't actually send any data to Kinetica

        -g HOST

        Kinetica host, e.g. 127.0.0.1:9191

      • For exaple:

        python gpudb_migrate_persistence.py /opt/gpudb/persist -g 127.0.0.1:9191