Kinetica installation and configuration instructions.
Operating system, hardware, and network requirements to run Kinetica.
CPU Platform | Linux Distribution | Versions |
---|---|---|
x86 | RHEL | 6.x, 7.x |
x86 | Centos | 6.x, 7.x |
x86 | Ubuntu | 14.x LTS , 16.x LTS |
x86 | SUSE | 11 SP3, 11 SP4 |
x86 | Debian | 8.x |
ppc64le | RHEL | 7.2 |
ppc64le | Centos | 6.x, 7.x |
ppc64le | Ubuntu | 14.04 LTS , 16.x LTS |
Kinetica runs on the following 32 or 64-bit Linux-based operating systems.
OS | Supported Versions |
---|---|
Amazon AMI | 2012 |
CentOS | 6, 7 |
Fedora | 14+ |
RedHat | 6, 7 |
SUSE Linux Enterprise | 11+ |
Ubuntu | 12+ |
Component | Specification |
---|---|
CPU | Two socket based server with at least 8 cores. Intel x86-64, Power PC 8le, or ARM processor. |
GPU | Nvidia K20, K40, K80, GTX Ti780, Tegra, or similar. |
Memory | Minimum 8GB. Recommended 64-96GB |
Hard Drive | SSD or SATA 7200RPM hard drive with 3X memory capacity |
The Kinetica HTTP server receives requests on port 9191 and an administration web
server operates on port 8080.
These can both be configured after installation in the file /opt/gpudb/core/etc/gpudb.conf
.
1. Install the package
For Red Hat, Centos, and Fedora
sudo yum install gpudb-<version>.rpmFor Ubuntu
sudo apt-get gpudb-<version>This installs the package to the directory
/opt/gpudb
, creates a user and group named 'gpudb' and a home directory in/home/gpudb
. SSH keys are also created to allow password-less ssh access between machines for the gpudb user when configured as a cluster. This will also register Kinetica as a service.
2. Request a License
Run the program/opt/gpudb/core/bin/gpudb_keygen
and send the output to your Kinetica support contact to receive a valid license.
3. Configure the installation
Edit the configuration file
/opt/gpudb/core/etc/gpudb.conf
. There are many different parameters to adjust the behavior and tune Kinetica to best suit your machine's characteristics, but the defaults should provide reasonably good performance out of the box.
A valid license key, received by email, must be entered for the parameter:
license_key = ...
The number of processes should be set to the number of GPUs on the machine plus one extra process for the 'head-node' HTTP server. For example, if your machine has four attached GPUs set the parameter:
number_of_ranks = 5Specify which GPUs should be used by setting the parameters below. Note that the rank0 'head-node' HTTP server process can and should share the GPU with the first worker rank:
rank0.gpu = 0 rank1.taskcalc_gpu = 0 rank2.taskcalc_gpu = 1 rank3.taskcalc_gpu = 2 rank4.taskcalc_gpu = 4Choose a directory to store the data in. Note that you can split where different types of data is stored if required:
persist_directory = /opt/gpudb/persistIf you will not be using Kibana, you can configure Kinetica to turn it off:
enable_kibana_connector = falseTo configure the database to run over a specific interface or subnet:
mpi_options = --mca btl_tcp_if_include eth0or to exclude a specific interface or subnet from use by the database:
mpi_options = --mca btl_tcp_if_exclude 10.10.0.0/24
4. Start Kinetica
Start Kinetica as the 'root' user by running
service gpudb startVerify that Kinetica is running by browsing to
http://<yourhostname>:8080/gadmin
The steps below are an example of how to configure a Kinetica cluster to illustrate the installation process. A single host configuration is also supported (see Quickstart) and is even simpler to configure.
Other than the system requirements above, a cluster install needs the following.
1. Install the RPM on all desired machines (see step 1 of Single Machine Installation).
In our example cluster we have three machines each with a different number of GPUs in them. We have decided we want the head-node HTTP process to run on the 172.30.20.4 node.
- 172.30.20.4 has 1 GPU
- 172.30.20.5 has 2 GPUs
- 172.30.20.11 has 4 GPUs
2. Edit the hostsfile
The hostsfile
/opt/gpudb/core/etc/hostsfile
contains a list of the nodes to use in the cluster and 'slots' describes how many Kinetica processes to run on each node. Note that 'slots' and 'maxslots' must equal each other per host.In our example, the file would be configured as follows:
172.30.20.4 slots=2 maxslots=2 172.30.20.5 slots=2 maxslots=2 172.30.20.11 slots=4 maxslots=4There are two major considerations when filling out the hostsfile.
The first IP address in the 'hostsfile' is the address of the ‘head’ HTTP server node that will handle requests. This is also the machine that you will use to start and stop the Kinetica cluster.
In general, you should have one slot per GPU device plus one extra slot on the first IP address line for the ‘head-node’ HTTP server process that will share the GPU with the first Kinetica worker process on that host.
3. Update your system config file
The system config file
/opt/gpudb/core/etc/gpudb.conf
should be updated with values that are specific to your setup. (see Configuration Reference)
Valid license key must be set (sent by email)
Kibana can be enabled or disabled
Number of ranks must be set to head node + workers desired
The head ip address must be set to the external interface of the head node
For each worker rank, the number of a specific GPU should be set (enumerated based on the machine they're located on - can be numbered using nvidia-smi which comes with the nvidia GPU driver)
rank0.gpu = 0 rank1.taskcalc_gpu = 0 rank0.numa_node = rank3.base_numa_node = 0-2 rank3.data_numa_node = 3Use the config file description page Configuration Reference to help understand what to set for each key to personalize more parameters.
4. Configure the cluster nodes for password-less ssh
Run the script/opt/gpudb/core/bin/gpudb_hosts_ssh_copy_id.sh
to configure all the nodes for password-less ssh.
5. Persistent storage
For all the nodes, ensure that the directory as configured in the Kinetica's persist_directory system parameter exists (and is writable by the gpudb user) and that is has enough free space. It is recommended to have at least twice the amount of free disk space available as the installed system memory.
When required
gpudb_hosts_persist_clear.sh
can be used to clear out the persisted data (ensure Kinetica is shutdown before running the script).
6. Start Kinetica
Start Kinetica as the root user by running
service gpudb startVerify that Kinetica is running by navigating to the URL
http://<yourhostname>:8080/gadmin
in the browser. This is the admin page for Kinetica where you can manage data, see stats, query Kinetica, and more.
To enable user authentication, use the program
/opt/gpudb/core/bin/gpudb_accounts.py
to manage users and privileges. This program is similar in operation to the Apache HTTP Server htpasswd program.Users can be granted read, write or admin privileges.
Users with read privileges can access the following endpoints:
- /aggregate/convexhull
- /aggregate/groupby
- /aggregate/histogram
- /aggregate/kmeans
- /aggregate/minmax
- /aggregate/statistics
- /aggregate/statistics/byrange
- /aggregate/unique
- /execute/proc
- /filter
- /filter/byarea
- /filter/bybox
- /filter/bygeometry
- /filter/bylist
- /filter/byradius
- /filter/byseries
- /filter/bystring
- /filter/bytable
- /filter/byvalue
- /get/records
- /get/records/bycolumn
- /get/records/byseries
- /get/records/fromcollection
- /has/table
- /has/type
- /show/system/properties
- /show/system/status
- /show/system/timing
- /show/table
- /show/table/metadata
- /show/table/properties
- /show/tables/bytype
- /show/triggers
- /show/types
- /visualize/image
- /visualize/image/classbreak
- /visualize/image/heatmap
- /visualize/image/labels
- /visualize/video
- /visualize/video/heatmap
Users with write privileges can access all of the read-privileged endpoints above as well as the following endpoints:
- /alter/table
- /alter/table/metadata
- /alter/table/properties
- /clear/table
- /clear/tablemonitor
- /clear/trigger
- /create/jointable
- /create/table
- /create/tablemonitor
- /create/trigger/byarea
- /create/trigger/byrange
- /delete/records
- /insert/records
- /insert/records/random
- /insert/symbol
- /lock/table
- /update/records
- /update/records/byseries
- /update/trigger
Users with admin privileges can access all of the read- and write- privileged endpoints above as well as the following endpoints:
- /admin/getshardassignments
- /admin/offline
- /admin/rebalance
- /admin/setshardassignments
- /admin/shutdown
- /admin/verifydb
- /alter/system/properties
Update the 'accounts_file' config in system config file
/opt/gpudb/etc/gpudb.conf
to point to the accounts file and restart Kinetica.At this point users must provide their credentials along with each request to Kinetica, via HTTP Basic authentication.
Kinetica supports HTTPS as a way to secure communication with the database. To enable HTTPS, edit the system config file/opt/gpudb/etc/gpudb.conf
specifying 'true' for the 'use_https' option. In addition, set the 'https_key_file' and 'https_cert_file' to point to the appropriate .pem files.
This section describes the best practices for an optimal Kinetica installation. After modifying any parameter values, for the modification to take effect, it is recommended to reboot the system.
System-wide resource limits
Linux by default imposes certain limits on the use of system-wide resources. Kinetica recommends (as a safety measure to change) these per-user limits from the defaults to higher values to prevent the system from ever hitting the limits. This can be done either by editing the
/etc/security/limits.conf
or by running the linuxulimit
command which will make the be effective only for the remainder of the session.
File descriptors limit
Kinetica does not require a large number of simultaneously open files during operation, nevertheless it is recommended to set the soft limit to 16384 and the hard limit to 63536. This can be done either the linux ulimit
command or by directy updating (or adding) the following entries in the file /etc/security/limits.conf
gpudb soft nofile 16384
gpudb hard nofile 63536
Process limit
Kinetica does not require a large number of simultaneously open files during operation, nevertheless it is recommended to set both the hard and the soft limits to 16384. This can be done either the linux ulimit
command or by directy updating (or adding) the following entries in the file /etc/security/limits.conf
gpudb soft nproc 16384
gpudb hard nproc 16384
CPU performance throttling
For optimal performance the power scaling setting should be set to performance in the file
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
to disable the on-demand CPU throttling.
Kernel tuning parameters
The following parameters to the Linux Virtual memory subsystem should be tweaked for optimal performance. Unless specified, these values are configured in the file /etc/sysctl.conf
. Login with root privileges to modify (or add) the values and then reboot the system for the value to take effect.
System memory Swapping
The swappiness value (which controls how aggressively the system swaps memory pages) should be set to 10 or less in the sysctl.conf
vm.swappiness = 10
Zone reclaim mode
The Zone reclaim mode (which controls the approach taken by the system taken to reclaim the memory when a zone runs out of memory) should be set to 7 in the sysctl.conf
vm.zone_reclaim_mode = 7
vfs_cache_pressure
This parameter controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects and it is recommended to set this value to 200 or greater
vm.vfs_cache_pressure = 200
Transparent hugepages
Disable transparent hugepages for better performance. This value is specified in file /sys/kernel/mm/transparent_hugepage/enabled
and should be set to 'never' as follows
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
gpudb_migrate_persistence.py
Description:
Read types and tables from persist files (version 2015021000 or later) and create them on a working Kinetica (version 5 or later). Then send all objects in these tables.
Note: Run on Rank0 first, to read types and tables, and then on the rest of the working ranks, to send the objects. Send objects only for tables that already exists in Kinetica (that were created from rank0 files).
Usage:
gpudb_migrate_persistence.py [options] path
Options:
-h, --help show this help message and exit
-d, --dry_run do a dry run; i.e. don't actually send any data to Kinetica
-g HOST Kinetica host, e.g. 127.0.0.1:9191
For exaple:
python gpudb_migrate_persistence.py /opt/gpudb/persist -g 127.0.0.1:9191