Installation & Configuration
Prerequisites
RKinetica depends on the system librariesmethods and stats & several
other R packages from the comprehensive R archive network (CRAN):
All dependencies should be installed prior to installing RKinetica. To
install the dependencies in RStudio or the R Console:
Prebuilt Package Installation
-
Download the latest package
tar.gzfile from the release page. -
Install the RKinetica package in RStudio or R console:
Manually Built Package Installation
It is strongly recommended that a release build be used instead of building the package locally, but the RKinetica package can be built as follows:-
Clone the latest RKinetica repository version:
-
In the same directory, use the R CLI to build the package:
This sequence produces a
tar.gzfile, which, once installed, is made available to R. Thetar.gzfile is created in the same directory thebuildcommand was issued. -
Verify the
tar.gzfile was created before installing the RKinetica package: -
Install the RKinetica package via the R CLI:
Usage
Before using RKinetica, the package must be loaded:dbConnect() method,
passing in Kinetica URL, user, and password parameters:
Strings vs. Factors
When RKinetica reads a character list into an R dataframe, it can be converted into a factor. This option is controlled by thestringsAsFactors
environment property that’s read into the as.data.frame() parameter:
TRUE or FALSE explicitly, use the
following syntax at the beginning of your R script or once per session:
Schema Support
Kinetica schemas, tables, and views must meet standard naming criteria and follow rules for name resolution, when referenced. The KineticaConnection object has a read-only attributedefault_schema
that stores the user’s default schema name:
The user’s default schema cannot be set or managed by the user.
dbCreateTable(), dbAppendTable(), dbReadTable(), dbWriteTable(),
dbExistsTable(), and dbRemoveTable()) have a name argument that
supports passing a character value, such as <schema name>.<table name> (or
simply <table name> if utilizing the user’s default schema), or a
KineticaId object modeled after the DBI::Id class that encapsulates a
named vector with schema and table parameter values defined separately:
KineticaId object can be created by submitting a named character
value that will be parsed into a schema/table pair:
Additional Connection Configuration
Row Limits
If you expect a result set from your queries to exceed 10,000 rows, set therow_limit parameter value accordingly:
A parameter cannot be added to an existing KineticaConnection object.
Instead, a new KineticaConnection object must be created to properly
initialize any functionality enabled by additional parameters.
High Availability (HA)
Automatic Discovery
When two or more Kinetica clusters have been configured for an HA ring via KAgent, RKinetica will automatically discover the additional Kinetica instance URLs available in the ring. If the connection to the URL of the primary cluster fails, each additional URL will be tried until a successful connection is established; every failed connection will result in a warning message. If all connection attempts fail, an error message will be thrown. Only the URL of the primary cluster to connect to needs to be specified (via theurl
parameter in the dbConnect method); the URLs for the failover clusters will
be retrieved from the primary cluster upon first connecting to it. The
KineticaConnection object has additional parameters to store these failover
URLs (via the ha_ring parameter) as well as other connection information:
ha_ring list are randomly selected to balance load on secondary URL
instances when the primary URL fails. You can use a show() command on a
KineticaConnection object at any time to check which URL is being used in the
current connection:
Manual Configuration
If you want to provide URLs for failover clusters manually, you can do so by adding theha_ring parameter to the dbConnect() method with a
comma-separated list of URIs for the secondary cluster(s):
Examples
The following examples assume that this KineticaConnection object has already been established, which you can then use as a regular DBI connection:-
Print connection info:
-
Get the current user’s default schema:
-
List top-level schemas the user has access to, the tables contained within
those schemas, and basic information about the schemas and tables:
-
Get a list of tables within the
ki_homeschema: -
Write a table to the user’s default schema with 3 columns:
-
Check if the table exists in the user’s default schema:
-
Check if the table exists in the
exampleschema using the KineticaId object: -
List
tableA(located in the user’s default schema) fields: -
Add records to
tableA(located in the user’s default schema): -
Read
tableA(located in the user’s default schema) into variable: -
Drop a table in the user’s default schema if it
exists:
-
Disconnect:
dbSendQuery or dbSendStatement methods. Assuming that the
connection was established with a row limit of 1,000,000, i.e.
row_limit = 1000000L, the following example query extracts all records
from the acquisition table in the example schema sorted by l_id.
The data is retrieved in batches of one million records and the offset is
increased by 1 million each batch. This loop continues until the
dbSendQuery() resultset is returned empty:
examples subdirectory of the
RKinetica repository.