Load Data from CData JDBC Drivers

Ingest from a variety of sources using CData JDBC drivers

Loading data from a CData data source can be done in two easy steps:

  1. Create a data source: Create a data source that holds the connection information.
  2. Ingest the data: Load data from the data source using a remote JDBC query.

Kinetica provides pre-packaged, third-party JDBC drivers, out of the box, that allow you to connect to varied and popular data sources. While they are similar to Kinetica's generic JDBC driver data source, they differ in two important ways:

  • JDBC JARs do not need to be uploaded to Kinetica's file system (KiFS) to connect to the data source.
  • They are documented externally. The documentation for the drivers can be found on CData's website. (Be sure to use only the JDBC documentation for each driver.)

Before You Begin

To illustrate data ingress from a CData JDBC data source, let's look at an example that uses PostgreSQL. If you would like to follow along with the examples in this article, you will need:

  • A running instance of PostgreSQL
  • A schema and table name in your PostgreSQL database, from which we will load the data. For this tutorial we will be loading data from public.pg_table.

Create Data Source

The JDBC connection string parameters can be found on CData's PostgreSQL JDBC Driver documentation page. CData connection strings, specified in the LOCATION statement, use the following format:

jdbc:<driver_name>:<param>=<value>;[...]

Note

Like Kinetica's JDBC data source, you do not need to first create a credential. Authentication parameters can be passed directly into the connection string of the data source at the time of creation.

For this example, we will use the following connection parameters, but you may need others for your use case:

  • Server: The host name or IP address of the server.
  • Port: The port number of the PostgreSQL server.
  • Database: The name of the PostgreSQL database.
  • User: The PostgreSQL user account used to authenticate.
  • Password: The password used to authenticate the user.
Create Data Source
1
2
CREATE DATA SOURCE cdata_pg_ds
LOCATION = 'jdbc:postgresql:Server=example.com;Port=5432;Database=my_database;User=my_user;Password=my_pass'

Ingest Data

To initiate the ingress of data from the PostgreSQL instance, issue the following command. The REMOTE QUERY will be executed on the remote data source and it defines the data Kinetica will receive in return.

Load Data from JDBC Data Source
1
2
3
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table'
WITH OPTIONS (DATA SOURCE = 'cdata_pg_ds')

If you only want to return data that satisfies a condition, a WHERE clause can be used to filter the data coming from the data source.

Load Data from JDBC Data Source with Filter
1
2
3
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table WHERE col_1 > 25'
WITH OPTIONS (DATA SOURCE = 'cdata_pg_ds')

Considerations

The following are some common options used when loading data. For the full list of options, see LOAD INTO. For copy/paste examples of many of the options, see Loading Data.

Error Handling

Kinetica has two different error handling modes for dealing with erroneous data. To halt ingestion after a bad record is found, use the ABORT mode. To skip erroneous records and continue the ingest, use the SKIP mode.

To inspect erroneous records, you may use the BAD RECORD TABLE NAME option. All bad records encountered will be stored there for review. The bad records table is limited to 10,000 records by default and may be overridden using the BAD RECORD TABLE LIMIT option.

Abort
1
2
3
4
5
6
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table'
WITH OPTIONS (
    DATA SOURCE = 'postgres_ds',
    ON ERROR = ABORT
)
Log Bad Records
1
2
3
4
5
6
7
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT * FROM public.pg_table'
WITH OPTIONS (
    DATA SOURCE = 'postgres_ds',
    BAD RECORD TABLE NAME = 'error_example_invalid',
    ON ERROR = SKIP
)

Load Specific Columns

In same cases, you may only want to store certain columns from your source data. You can simply list the columns you want to load in your REMOTE QUERY statement.

Load Specific Columns Example
1
2
3
4
5
LOAD INTO ki_home.kinetica_pg_table
FROM REMOTE QUERY 'SELECT column_1, column_2 FROM public.pg_table'
WITH OPTIONS (
  DATA SOURCE = 'cdata_pg_ds'
)