The following guide provides step by step instructions to get started using R with Kinetica. The RKinetica connector provides access to the Kinetica database through a REST interface. The RKinetica package can be built locally from source or a prebuilt package release can be downloaded from the release page. Kinetica strongly recommends that release builds be used instead of building the package locally.
RKinetica interfaces with Kinetica similarly to other R-database interactions: using the DBI and dplyr packages. Interacting with RKinetica is easiest using the RStudio interface. RStudio and RKinetica require that R be installed locally.
Source code for the connector can be found at:
Contents
RKinetica depends on several other R packages from the comprehensive R archive network (CRAN):
Note
If installing the RKinetica package in RStudio, the dependencies should be installed automatically prior to RKinetica being installed. Otherwise, all packages can be installed via RStudio (or the R console) like so:
install.packages(c("DBI", "rjson", "httr", "bit64", "hms", "methods"))
If opting to build the RKinetica package instead of downloading, the connector package can be built as follows:
$ git clone https://github.com/kineticadb/RKinetica.git -b release/v7.0 --single-branch
$ R CMD build RKinetica
This sequence produces a tar.gz
file, which, once installed, is made
available to R. The tar.gz
file is created in the same directory the
build
command was issued. Verify the tar.gz
file was created before
installing the RKinetica package:
ls RKinetica*
To install the RKinetica package via the command line:
$ R CMD install RKinetica_<version>.tar.gz
To install the RKinetica package in RStudio (or R console):
> install.packages("/path/to/RKinetica_<version>.tar.gz", repos = NULL, type = "source")
Before using RKinetica, the package must be loaded:
library(RKinetica)
Create a Kinetica connection object using the dbConnect()
method,
passing in Kinetica URL, user, and password parameters:
con <- dbConnect(RKinetica::Kinetica(),
url = "http://<kinetica-host>:9191",
username = "<user>",
password = "<password>")
Note
A parameter cannot be added to an existing KineticaConnection object. Instead, a new KineticaConnection object must be created to properly initialize any functionality enabled by additional parameters.
Important
If using RStudio, you can use the rstudioapi package to instead prompt for username and password:
con <- dbConnect(RKinetica::Kinetica(),
url = "http://<kinetica-host>:9191",
username = rstudioapi::askForPassword("Database username?"),
password = rstudioapi::askForPassword("Database password?"))
If you expect a result set from your queries to exceed 10,000 rows, set the
row_limit
parameter value accordingly:
con <- dbConnect(RKinetica::Kinetica(),
url = "http://<kinetica-host>:9191",
username = "<user>",
password = "<password>",
row_limit=1000000L)
You can then use the KineticaConnection object as a regular DBI connection. Some of the most common commands:
Print connection info:
dbGetInfo(con)
List top level database objects and collections with their types:
dbListObjects(con)
Get a list of top level database tables and collections names:
dbListTables(con)
Drop a table if it exists:
dbRemoveTable(con, "tableA")
Check if the table exists:
dbExistsTable(con, "tableA")
Write a table with 3 columns:
dbWriteTable(con, "tableA", data.frame(a = 1L, b = 2L, c = 3.0), row.names = NULL)
Check if the table exists now:
dbExistsTable(con, "tableA")
List tableA fields:
dbListFields(con, "tableA")
Add records to tableA:
dbAppendTable(con, "tableA", data.frame(a = 2L:3L, b = 3L:4L, c = 4.0:5.0), row.names = NULL)
Read table into variable:
rows <- dbReadTable(con, "tableA")
print(rows)
Disconnect:
dbDisconnect(con)
Large tables, views, or query results can have pagination parameters passed into
the dbSendQuery
or dbSendStatement
methods. Assuming that the
connection was established with a row limit of 1,000,000, i.e.
row_limit = 1000000L
, the following example query extracts all records
from the acquisition
table sorted by l_id
. The data is retrieved in
batches of one million records and the offset
is increased by 1 million each
batch. This loop continues until the dbSendQuery()
resultset is returned
empty:
sql_query <- "SELECT l_id, product_type, term, score FROM acquisition ORDER BY l_id"
offset <- 0L
repeat {
result <- dbSendQuery(con, sql_query, limit = 1000000L, offset = offset)
# work with current data.frame provided in result@data
if (nrow(result@data) > 0) {
# increase offset to get next page of data
offset <- offset + 1000000L
} else {
# exit pagination loop when data.frame is empty
break
}
}
Additional code examples are available in the examples
subdirectory of the
RKinetica repository.
When two or more Kinetica clusters have been configured for an HA ring via
KAgent, RKinetica will automatically discover the additional Kinetica
instance URLs available in the ring. If the connection to the URL of the primary
cluster fails, each additional URL will be tried until a successful connection
is established; every failed connection will result in a warning message. If
all connection attempts fail, an error message will be thrown. Only the URL of
the primary cluster to connect to needs to be specified (via the url
parameter in the dbConnect
method); the URLs for the failover clusters will
be retrieved from the primary cluster upon first connecting to it. The
KineticaConnection object has additional parameters to store these failover
URLs (via the ha_ring
parameter) as well as other connection information:
con <- dbConnect(RKinetica::Kinetica(), url = "http://172.123.45.61:9191")
dbGetInfo(con)
$url
[1] "http://172.123.45.61:9191"
$host
[1] "172.123.45.61"
$port
[1] 9191
$ha_enabled
[1] TRUE
$ha_ring
[1] "http://172.123.45.63:9191" "http://172.123.45.61:9191" "http://172.123.45.62:9191"
URLs in ha_ring
list are randomly selected to balance load on secondary URL
instances when the primary URL fails. You can use a show()
command on a
KineticaConnection object at any time to check which URL is being used in the
current connection:
show(con)
<KineticaConnection>
HA enabled
Current url: http://172.123.45.62:9191
If you want to provide URLs for failover clusters manually, you can do so by
adding the ha_ring
parameter to the dbConnect()
method with a
comma-separated list of URIs for the secondary cluster(s):
con <- dbConnect(RKinetica::Kinetica(), url = "http://172.123.45.61:9191",
ha_ring = "http://172.123.45.62:9191,http://172.123.45.63:9191")
show(con)
<KineticaConnection>
Self-provided HA enabled
Current url: http://172.123.45.61:9191
Detailed RKinetica information including method and function descriptions, usage examples, and arguments can be found in the RKinetica PDF.