Kinetica installation and configuration instructions.
Operating system, hardware, and network requirements to run Kinetica.
CPU Platform | Linux Distribution | Versions |
---|---|---|
x86 | RHEL | 6.x, 7.x |
x86 | Centos | 6.x, 7.x |
x86 | Ubuntu | 14.x LTS , 16.x LTS |
x86 | SUSE | 12, 12 SP1, 12 SP2 |
x86 | Debian | 8.x |
ppc64le | RHEL | 7.2 |
ppc64le | Centos | 6.x, 7.x |
ppc64le | Ubuntu | 14.04 LTS , 16.x LTS |
Kinetica runs on the following 32 or 64-bit Linux-based operating systems.
OS | Supported Versions |
---|---|
Amazon AMI | 2012 |
CentOS | 6, 7 |
Fedora | 20+ |
RedHat | 6, 7 |
SUSE Linux Enterprise | 12+ |
Ubuntu | 12+ |
Component | Specification |
---|---|
CPU | Two socket based server with at least 8 cores Intel x86-64, Power PC 8le, or ARM processor |
GPU | Nvidia K20, K40, K80, P100, GTX Ti780, Tegra, or similar |
Memory | Minimum 8GB |
Hard Drive | SSD or SATA 7200RPM hard drive with 4X memory capacity |
It's recommended that swap space equal to 25-50% of the available memory of a machine is available to avoid disk spilling and out-of-memory issues.
Check if there is active swap using the command:
free -h
total used free shared buff/cache available
Mem: 15G 2.0G 5.3G 32M 8.2G 13G
Swap: 6.5G 0B 6.5G
To create a new swap file:
As root, run the dd
command with bs
set to the desired read and write
limit in bytes (usually 1024) and count
set to the desired file size in
megabytes:
sudo dd if=/dev/zero of=</swapfile/path> bs=1024 count=<file-size>
Make the file only accessible to root
sudo chmod 600 </swapfile/path>
Mark the file as swap space:
sudo mkswap </swapfile/path>
Enable the swap file:
sudo swapon </swapfile/path>
Backup the /etc/fstab
file and then add the swap file to it to make the
swap file permanent:
sudo cp /etc/fstab /etc/fstab.bak
echo `</swapfile/path> none swap sw 0 0` | sudo tee -a /etc/fstab
There are some steps that should be followed to set up your network and server configuration before installing Kinetica.
The first step is to collect the IP addresses of the server or servers that will be running Kinetica. If deploying to a cluster, one server must be designated as the head node. This server receives user requests and parcels them out to the other worker nodes of the system. The head node of the cluster (or only node in a single-node system) will also be used for administration of the system, host all services & applications, and as such, will require special handling during the installation process.
The Kinetica head node will require a number of ports to be open in order to communicate with its applications & services.
Any worker nodes will need ports opened to communicate with the head node and each other, though this set of ports will be smaller than that of the head node.
The default ports used for communication with Kinetica (and between servers, if operating in a cluster) follow. The Nodes column will list either Head--that the corresponding port only needs to be opened on the head node, or All--that the corresponding port needs to be opened on the head node & worker nodes.
Port | Function | Nodes | Usage |
---|---|---|---|
2003 | This port must be open to collect the runtime system statistics. | Head | Required Internally |
4000+N | For installations which have the external text search
server enabled and communicating over TCP
(rankN.text_index_address = tcp://… ), there will be
one instance of the text search server listening for each
rank on every server in the cluster. Each of these
daemons will be listening on a port starting at 4000 on
each server and incrementing by one for each additional
rank. |
All | Optional Internally |
5552 | Host Manager status notification channel | All | Required Internally |
5553 | Host Manager message publishing channel | All | Required Internally |
6555+N | Provides distributed processing of communications between
the network and different ranks used in Kinetica. There
is one port for each rank running on each server,
starting on each server at port 6555 and incrementing
by one for each additional rank. |
All | Required Internally |
8080 | The Tomcat listener for the Kinetica Administration Application (GAdmin) | Head | Optional Externally |
8082 | In installations where users need to be authenticated to
access the database, a preconfigured HTTPd instance listens
on this port, which will authenticate incoming HTTP
requests before passing them along to Kinetica. When
authorization is required, all requests to Kinetica
should be sent here, rather than the standard 9191+
ports. |
All | Optional Externally |
8088 | This is the port on which Kinetica Reveal is exposed. For installations which have this feature enabled, it should be exposed to users. | Head | Optional Externally |
8181 | This is the port used to host the system and process stats server | Head | Optional Externally |
9001 | Database trigger ZMQ publishing server port. Users of database triggers will need the ability to connect to this port to receive data generated via the trigger. | Head | Optional Externally |
9002 | Table monitor publishing server port. Users of database table monitors will need the ability to connect to this port to receive data generated via the table monitor. | Head | Optional Externally |
9191+N | The primary port(s) used for public and internal Kinetica
communications. There is one port for each rank running
on each server, starting on each server at port 9191
and incrementing by one for each additional rank. These
should be exposed for any system using the Kinetica APIs
without authorization and must be exposed between all
servers in the cluster. For installations where users
should be authenticated, these ports should NOT be
exposed publicly, but still should be exposed between
servers within the cluster. |
All | Required Internally, Optional Externally |
9292 | Port on which the ODBC Server listens for connections | Head | Optional Externally |
9300 | Port used to query Host Manager for status | All | Required Internally |
Kinetica highly encourages that proper firewalls be maintained and used to protect the database and the network at large. A full tutorial on how to properly set up a firewall is beyond the scope of this document, but the following are some best practices and starting points for more research.
All machines connected to the Internet at large should be protected from intrusion. As shown in the list above, there are no ports which are necessarily required to be accessible from outside of a trusted network, so we recommend only opening ports to the Internet and/or untrusted network(s) which are truly needed based on requirements.
There are some common scenarios which can act as guidelines on which ports should be available.
If Kinetica is running on a server where it will be accessible to the Internet
at large, it is our strong suggestion that security and authentication be used
and ports 9191+N
and 8080
are NOT exposed to the public, if
possible. Those ports can potentially allow users to run commands anonymously
and unless security is configured to prevent it, any users connecting to them
will have full control of the database.
For applications in which requests are being made to Kinetica via client APIs
that do not use authentication, the 9191+N
ports should be made available to
the relevant set of servers. For applications using authentication via the
bundled version of httpd, port 8082
should be opened. It is possible to
have both ports open at the same time in cases where anonymous access is
permitted, however the security settings should be carefully set in this case to
ensure that anonymous users have the appropriate access limitations.
Additionally, if the API client is using table monitors or triggers, ports
9001
and/or 9002
should also be opened as needed.
In cases where the GUI interface to Reveal is required, the 8088
port should be made available.
System administrators may wish to have access to the administrative web
interface, in which case port 8080
should be opened, but carefully
controlled.
RHEL 6 uses iptables by default to configure its firewall settings. These
can be updated using the /etc/sysconfig/iptables
file, or, if you have
X Server running, there is also a GUI for editing the firewall that can be run
using the command:
system-config-firewall
RHEL 7 continues to use iptables under the hood, but the preferred way
to interact with iptables was updated to using the firewall-cmd
command or firewall-config
GUI. For example, the following commands
will open up port 8082
publicly:
firewall-cmd --zone=public --add-port=8082/tcp --permanent
firewall-cmd --reload
Ubuntu 12 uses iptables by default to configure its firewall settings.
These can be updated using the /etc/sysconfig/iptables
file, or you can
use the iptables
command:
sudo iptables -A INPUT -p tcp --dport 8181 -j ACCEPT
sudo iptables-save
Ubuntu 14 & 16 come with a ufw
(Uncomplicated FireWall) command,
which controls the firewall, for example:
sudo ufw allow 8181
Each server in the Kinetica cluster should be properly prepared before installing Kinetica.
While every system is unique, there are several system parameters which are generally recommended to be set on every installation.
For optimal performance, the power scaling setting should be set to
performance
in the file
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
to disable the
on-demand CPU throttling:
sudo bash -c 'for i in {0..7}; do cpufreq-set -c $i -g performance; done'
Verify that the setting was updated:
cpufreq-info
Transparent Huge Pages are the kernel’s attempt to reduce the overhead of Translation Lookaside Buffer (TLB) lookups by increasing the size of memory pages. This setting is enabled by default, but can lead to sparse memory usage and decreased performance.
sudo sh -c 'echo "never" > /sys/kernel/mm/transparent_hugepage/enabled'
If Nvidia GPUs are present in the target servers, but the drivers have not been installed yet, they should be installed now. See either Install Nvidia Drivers on RHEL or Install Nvidia Drivers on Debian/Ubuntu for details.
Installation of Kinetica involves the deployment of the installation package, and either a browser-based or console-driven initialization step. Afterwards, passwordless SSH should be configured for ease of management of the system.
The installation process also requires a license key. To receive a license key, contact support at support@kinetica.com.
The Kinetica application needs to be deployed to all servers in the target cluster. Deploy the package using the standard procedures for a local package.
On RHEL:
sudo yum install ./gpudb-<gpuhardware>-<licensetype>-<version>-<release>.<architecture>.rpm
On Debian/Ubuntu:
sudo apt install ./gpudb-<gpuhardware>-<licensetype>-<version>-<release>.<architecture>.deb
This installs the package to the directory /opt/gpudb
, creates a group
named gpudb
, and two users (gpudb
& gpudb_proc
) whose home directory
is located at /home/gpudb
. SSH keys are also created to allow
password-less SSH access between servers for the gpudb
user when
configured as a cluster. This will also register two services: gpudb
&
gpudb_host_manager
.
Once the application has been deployed, choose the configuration method:
The Visual Installer is run through the Kinetica Administration Application (GAdmin) and simplifies the installation of Kinetica across a cluster.
Browse to the head node, using IP or host name:
http://localhost:8080/
Once you've arrived at the login page, you'll need to change your password and initialize the system using the following steps:
Log into the admin application
admin
admin
If a license key has not already been configured, a Product Activation page will be displayed, where the license key is to be entered:
At the Setup Wizard page, configure the system basics:
Important
For additional configuration options, see the Configuration Reference.
Start the system. This will start all Kinetica processes on the head node, and if in a clustered environment, the corresponding processes on the worker nodes.
Follow instructions here to update the administration account's password.
Skip ahead to Passwordless SSH.
System configuration is done primarily through the configuration file
/opt/gpudb/core/etc/gpudb.conf
, and while all nodes in a cluster have
this file, only the copy on the head node needs to be modified.
Log in to the head node and open /opt/gpudb/core/etc/gpudb.conf
in an editor.
Specify the head node IP address, the total number of database ranks, and the distribution of ranks across hosts. In this example, there are two servers with three ranks on the first and two ranks on the second:
number_of_ranks = 5
rank0.host = 192.168.0.100
rank1.host = 192.168.0.100
rank2.host = 192.168.0.100
rank3.host = 192.168.0.101
rank4.host = 192.168.0.101
head_ip_address = 192.168.0.100
For CUDA builds, the GPUs need to be assigned to ranks. To display the installed GPUs and their status run:
nvidia-smi
If the program is not installed or doesn't run, see Install Nvidia Drivers.
Once the number of GPUs on each server has been established, enter them into the configuration file by associated rank. In this example, there are two servers with a GPU assigned to each of two ranks per host (none for rank0):
rank0.gpu = 0 # This GPU can be shared with a worker rank, typically rank 1.
rank1.taskcalc_gpu = 0
rank2.taskcalc_gpu = 1
rank3.taskcalc_gpu = 0 # On new host, restart at 0
rank4.taskcalc_gpu = 1
For non-CUDA builds, the Numa CPUs need to be assigned to ranks. To display the Numa nodes, run:
numactl -H
Once the number of Numa nodes on each server has been established, enter them into the configuration file by associated rank. In this example, there are two servers with a Numa node assigned to each of two ranks per host (none for rank0):
rank0.numa_node = # Preferring a node for the head node HTTP server is often not necessary.
rank1.base_numa_node = 0
rank2.base_numa_node = 1
rank3.base_numa_node = 0 # On new host, restart at 0
rank4.base_numa_node = 1
rank1.data_numa_node = 0
rank2.data_numa_node = 1
rank3.data_numa_node = 0 # On new host, restart at 0
rank4.data_numa_node = 1
Determine the directory in which database files will be stored. It should meet the following criteria:
gpudb
userEnter the database file directory path into the configuration:
persist_directory = /opt/gpudb/persist
Set the license key:
license_key = ...
Important
For additional configuration options, see the Configuration Reference.
To bring up the system, start the gpudb
service:
service gpudb start
This will start all Kinetica processes on the head node, and if in a clustered environment, processes on the worker nodes.
If Kinetica is installed in a clustered environment, configuring passwordless
SSH will make management considerably easier. Run the following command on the
head node to set up passwordless SSH between the head node and the
worker nodes for the gpudb
users created during deployment:
sudo /opt/gpudb/core/bin/gpudb_hosts_ssh_copy_id.sh
To validate that Kinetica has been installed and started properly, you can perform the following tests.
To ensure that Kinetica has started (you may have to wait a moment while the system initializes), you can run curl on the head node to check if the server is responding and port is available with respect to any running firewalls:
$ curl localhost:9191
Kinetica is running!
You can also run a test to ensure that the API is responding properly. There is an admin simulator project in Python provided with the Python API, which pulls statistics from the Kinetica instance. Running this on the head node, you should see:
$ python /opt/gpudb/api/python/gpudb/gadmin_sim.py
**********************
Total tables: 0
Total top-level tables: 0
Total collections: 0
Total number of elements: 0
Total number of objects: 0
The administrative interface itself can be used to validate that the system is functioning properly. Simply log into GAdmin. Browse to Dashboard to view the status of the overall system and Ranks to view the status breakdown by rank.
The log file located at /opt/gpudb/core/logs/gpudb.log
should be the
first place to check for any system errors. Any issues which would prevent
successful start-up of Kinetica will be logged as ERROR
in the log.
Consequently, running the following command will return enough information to
provide a good starting point for further investigation:
grep ERROR /opt/gpudb/core/logs/gpudb.log | head -n 10
Should you need to uninstall Kinetica, you'll need to shut down the system, remove the package, and remove related files, directories, & user accounts.
Remove the package from your machine
On RHEL:
sudo yum remove gpudb-<gpuhardware>-<licensetype>.<architecture>
On Debian-based:
sudo dpkg -r gpudb-<gpuhardware>-<licensetype>.<architecture>
Remove any user-defined persist directories (these directories are set
in /opt/gpudb/core/etc/gpudb.conf
)
Clean-up all Kinetica artifacts (for both RHEL and Debian-based):
sudo rm -rf /opt/gpudb
Remove the gpudb
& gpudb_proc
users from the machine
On RHEL:
sudo userdel -r gpudb
sudo userdel -r gpudb_proc
On Debian-based:
sudo deluser --remove-home gpudb
sudo deluser --remove-home gpudb_proc
Remove the gpudb
group from the machine:
groupdel gpudb