Load Data from JDBC

Ingest from any compatible source using a JDBC driver

Loading data from a JDBC data source can be done in three easy steps:

  1. Load the JDBC Driver into KiFS: Load the JDBC driver for the specified data source into the Kinetica File System (KiFS).
  2. Create a data source: Create a data source that holds JDBC connection information.
  3. Ingest the data: Load data from your data source over JDBC using a remote query.

You will need the corresponding JDBC driver for your data source. Kinetica will use this driver to connect to your data source and initiate the data ingress. You will also need to refer to the JDBC driver documentation of your data source to gather the connection details and write the remote query, as these will vary from vendor to vendor.

Before You Begin

To illustrate data ingress from a JDBC data source, let's look at an example that uses PostgreSQL. If you would like to follow along with the examples in this article, you will need:

  • A running instance of PostgreSQL
  • The PostgreSQL JDBC driver for your version of PostgreSQL. We will be using the postgresql-42.3.6.jar for this tutorial.
  • A schema and table name in your PostgreSQL database, from which we will load the data. For this tutorial we will be loading data from public.pg_table.

Upload JDBC Driver

First, we need to upload the JDBC driver into KiFS. For this tutorial, we will upload the file into a directory in KiFS called drivers. The final file path will look like this: kifs://drivers/postgresql-42.3.6.jar.

KiSQL

To upload the driver using KiSQL, Kinetica's CLI SQL interface, issue the following commands.

It is assumed that the driver jar is located in the same directory as KiSQL.

Connect to Kinetica with KiSQL
1
kisql --host http://localhost:9191;RowsPerFetch=20000 --user auser
Create a Directory in KiFS
1
CREATE DIRECTORY 'drivers'
Upload the Driver
1
UPLOAD FILE 'postgresql-42.3.6.jar' INTO 'drivers'

Create Data Source

Once the driver is uploaded into KiFS, it's time to create a data source. Use the driver documentation to construct the LOCATION string. Refer to the Loading the Driver and the Connecting to the Database pages to find JDBC_DRIVER_CLASS_NAME and the connection details respectively.

Note

Kinetica's native JDBC ingress differs from other native ingress modes in that you do not need to first create a credential. Authentication parameters can be passed directly into the data source at the time of creation.

Create Data Source
1
2
3
4
5
6
7
8
9
CREATE OR REPLACE DATA SOURCE postgres_ds
LOCATION = 'jdbc:postgresql://example.com:5432/my_database'
USER = 'myuser'
PASSWORD = 'mysecret'
WITH OPTIONS
(
  JDBC_DRIVER_JAR_PATH = 'kifs://drivers/postgresql-42.3.6.jar',
  JDBC_DRIVER_CLASS_NAME = 'org.postgresql.Driver'
)

Ingest Data

Now that we have created a data source, let's ingest the data. The query specified in the REMOTE QUERY statement will be executed on the data source and it defines the data that Kinetica will receive in return.

Load Data from JDBC Data Source
1
2
3
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table'
WITH OPTIONS (DATA SOURCE = 'postgres_ds')

To load only the rows that satisfy a condition, you can use a WHERE clause in the REMOTE QUERY like this:

Load Data from JDBC Data Source with Filter
1
2
3
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table WHERE col_1 > 25'
WITH OPTIONS (DATA SOURCE = 'postgres_ds')

Considerations

The following are some common options used when loading data. For the full list of options, see LOAD INTO. For copy/paste examples of many of the options, see Loading Data.

Error Handling

Kinetica has several different error handling modes for dealing with erroneous data. To halt ingestion after a bad record is found, use the ABORT mode. To skip erroneous records and continue the ingest, use the SKIP mode. If you wish to ingest erroneous records to the best of Kinetica's ability and continue the ingest, use the PERMISSIVE mode.

To inspect erroneous records, you may use the BAD RECORD TABLE NAME option. All bad records encountered will be stored there for review. The bad records table is limited to 10,000 records by default and may be overridden using the BAD RECORD TABLE LIMIT option.

Abort
1
2
3
4
5
6
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table'
WITH OPTIONS (
    DATA SOURCE = 'postgres_ds',
    ON ERROR = ABORT
)
Log Bad Records
1
2
3
4
5
6
7
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table'
WITH OPTIONS (
    DATA SOURCE = 'postgres_ds',
    BAD RECORD TABLE NAME = 'error_example_invalid',
    ON ERROR = SKIP
)

Load Specific Columns

In same cases, you may only want to store certain columns from your source data. You can simply list the columns you want to load in your REMOTE QUERY.

Load Specific Columns Example
1
2
3
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT column_1, column_2 FROM public.pg_table'
WITH OPTIONS (DATA SOURCE = 'postgres_ds')