The following guide provides step-by-step instructions to get started integrating Kinetica with Beam.
This project is aimed to make Kinetica accessible to Beam pipelines, both for ingest and egress, meaning data can be transformed from a
Kinetica table or to a Kinetica table via a Beam pipeline.
Important
The Kinetica Beam connector currently only supports Java Beam pipelines
The two generic classes that integrate Kinetica with Beam are:
- com.kinetica.beam.io.KineticaIO.Read<T> -- A class that takes a standard Kinetica table type class to enable reading the data from the given table type
- com.kinetica.beam.io.KineticaIO.Write<T> -- A class that takes a standard Kinetica table type class to enable writing data to the given table type
Source code for the connector can be found in the Kinetica Beam Project on GitHub.
Installation
One JAR file is produced by this project:
- apache-beam-kineticaio-<ver>.jar -- complete connector JAR
To install and use the connector:
- Copy the apache-beam-kineticaio-<ver>.jar library to the target server
- Build your pipeline project against the JAR
Important
Because of the Kinetica Beam connector's dependence on the Kinetica table type class, your pipeline code requires access to the Kinetica Java API
Transforming Data from Kinetica into a Pipeline
The Read<T> class (and its respective method .<T>read()) is used to transform data read from a table in Kinetica into a Beam pipeline using a Beam PCollection class object. A PCollection represents the data set the Beam pipeline will operate on. To access the data in Kinetica, a PCollection backed by the Kinetica table type class is necessary; once the PCollection is established, the .<T>read() transform method is applied to the PCollection.
The .<T>read() transform method is configured using the following methods that are sourced from a Read<T> class object:
Method Name | Required | Description |
---|---|---|
withHeadNodeURL() | Y | The URL of the Kinetica head node (host and port) |
withUsername() | N | Username for authentication |
withPassword() | N | Password for authentication |
withTable() | Y | The name of the table in the database |
withEntity() | Y | The name of the table class |
withCoder() | Y | A serializable coder of the table class |
Transforming Data from a Pipeline into Kinetica
The Write<T> class (and its respective method .<T>write()) is used to transform data from a Beam pipeline (backed by a Beam PCollection class object) and write it to a table in Kinetica. A PCollection represents the data set the Beam pipeline will operate on. To access the data in the pipeline, the table in Kinetica needs to be mapped to the pipeline using the .<T>write() transform method.
The .<T>write() transform method is configured using the following methods that are sourced from a Write<T> class object:
Method Name | Required | Description |
---|---|---|
withHeadNodeURL() | Y | The URL of the Kinetica head node (host and port) |
withUsername() | N | Username for authentication |
withPassword() | N | Password for authentication |
withTable() | Y | The name of the table in the database |
withEntity() | Y | The name of the table class |
Important
Before attempting to use a pipeline to write to a table in Kinetica, the table must be created. It does not matter if the table is created within the pipeline code (since the Beam connector requires the Java API anyway) or before the pipeline is used.