Java Developer Guide
Step-by-step instructions on writing Java applications with Kinetica
Step-by-step instructions on writing Java applications with Kinetica
The following guide provides step-by-step instructions to get started writing Java applications using Kinetica. This guide demonstrates only a small set of the available API. A detailed description of the complete interface is available under Java API Reference.
The tutorial java file makes reference to a data file in the current directory. This path can be updated to point to a valid path on the host where the file will be located, or the script can be run with the data file in the current directory.
|
|
We suggest using Maven as the build tool for your Java project. To use the Kinetica Java API, you must add our Nexus repository and the Kinetica Java API dependency that matches that of the targeted Kinetica database, as shown below:
|
|
|
|
|
|
Important
The pom.xml file used for the tutorial can be found below.
The source code for the Java API is also available for download from the GitHub repository kineticadb/kinetica-api-java. Follow the instructions in the included README file to build the API library.
The tutorial was setup like the below diagram:
|
|
To connect to the database, instantiate an object of the GPUdb class, providing the connection URL, username, & password to the database server. For more details on connecting to the database, see Connecting via API.
|
|
All tables & views must be created within a schema. So, the first step will be to create a schema to contain all of the tables & views created within this tutorial.
|
|
Before a table can be created and any data can be loaded into it, a type needs to be defined. The type is a class, extended from RecordObject, using annotations to describe which class instance variables are fields (i.e., columns), what type they are, and any special handling they should receive.
For this tutorial, three types will be used to create three corresponding tables:
The definition of the type for the vendor table follows:
|
|
Note
Although a constructor is not required, if the class does have any constructors, it must have a constructor with no parameters. Other constructors can be added, as necessary
Once defined, the type needs to be created with a call to createType():
|
|
The returned object from the createType() call contains a unique type identifier assigned by the system. This identifier can then be used in the request to create a corresponding table. The examples below outline creating a table with either a single request object (vendor table) or a parameterized call (payment table):
|
|
|
|
Once the table is created, data can be inserted into it. There is a convenience class called BulkInserter, which facilitates inserting records into a table in batches; see Multi-Head Ingest for details. For this tutorial, only the native Java API call insertRecords() will be used.
In this example, values are assigned individually to fields after creating a Payment object with the default constructor.
|
|
In this example, values are passed in to the Vendor object's parameterized constructor.
|
|
CSV and other data files can be loaded into Kinetica via KiFS using the GPUdbFileHandler. This can upload files into KiFS for later ingestion or stage them temporarily and ingest them all at once.
|
|
Once the table is populated with data, the data can be retrieved from the system by a call to getRecords(tableName, offset, limit, options) using in-line parameter-passing.
|
|
One can also invoke getRecords(request) using the GetRecordsRequest request class. This object contains all the parameters for the method call, and can be reused in successive calls, avoiding re-specifying query parameters.
|
|
For large tables, the data can be easily be retrieved in smaller blocks by using the offset and limit parameters. The returned response also contains the type (column definitions) of the results.
Also, note that all query related methods have the above two versions--with the request object and with the parameters passed directly to the method.
Use updateRecords() to update matching key values for any records in a table.
|
|
Use deleteRecords() to delete records from a table. A list can be used to specify which records to delete based on matching expressions. Set DELETE_ALL_RECORDS to TRUE to delete all records in a table.
|
|
Some properties can be altered or added after table creation, using alterTable(), including indexes and dictionary encoding.
Using the alterTable() method, you can create indexes on columns using the CREATE_INDEX action paired with a column name.
|
|
Applying dictionary encoding via alterTable() involves adding a new property to a column using the CHANGE_COLUMN action paired with a column name & definition.
|
|
Important
To add a new property, all existing column properties must be listed along with any new properties.
Filters are an easy way to reduce larger tables into more concise views using expressions.
|
|
|
|
|
|
|
|
Kinetica supports various aggregate and group-by queries, which group and aggregate your data to return counts and useful statistics.
|
|
|
|
|
|
|
|
Joins allow you to link multiple tables together, along their relations, retrieving associated information from any or all of them. Tables can only be joined if they're sharded similarly or replicated. All join types can be performed using the createJoinTable() method.
An inner join returns only records that have matching values in both tables.
|
|
A left join returns all of the records an inner join does, but additionally, for each record in the table on the left side of the join that has no match along the relation to a record in the table on the right side of the join, a corresponding record will be returned with "left-side" columns populated with the "left-side" record data and the "right-side" columns populated with nulls. Note the usage of left join in the given expression.
|
|
Note
Full outer joins require both tables to be replicated or joined on their shard keys. Set merges that perform deduplication of records, like Union Distinct, Intersect, and Except also need to use replicated tables to ensure the correct results, so a replicated version of the taxi (taxi_trip_data_replicated) table is created at this point in the tutorial using mergeRecords().
|
|
A full outer join returns all of the records a left join does, but additionally, for each record in the table on the right side of the join that has no match along the relation to a record in the table on the left side of the join, a corresponding record will be returned with "right-side" columns populated with the "right-side" record data and the "left-side" columns populated with nulls.
|
|
You can create projections using the createProjection() method.
|
|
To persist a projection:
|
|
Union can be used to combine homogeneous data sets into one larger data set. When calling the createUnion() method, a MODE of UNION or UNION_DISTINCT will both combine data sets, but only retain the records that are unique across the chosen columns, removing all duplicates. A MODE of UNION_ALL will combine data sets, retaining all records from the source data sets.
|
|
Calling the createUnion() method with a MODE of INTERSECT will perform a set intersection, which will combine data sets but only include the records found in both data sets, removing duplicate result records. A MODE of INTERSECT_ALL will retain intersecting duplicates from both sets.
|
|
Calling the createUnion() method with a MODE of EXCEPT will perform a set except (subtraction), which will return records that appear in the first data set but not the second data set. Note that the data sets on each side of the except will have duplicates removed first, and then the set subtraction will be processed. A MODE of EXCEPT_ALL will retain duplicates from the first set.
|
|
Included below is a complete example containing all the above requests, the data file, output, compiled jar, and pom files.
To run the complete sample, ensure the taxi_trip_data.csv
is in the
correct location, given the direction here, and
run the tutorial JAR as an executable jar. For instance, if running from the
API project root:
|
|
Note
As this script creates a schema and several database objects within it, system admin permission is required to run it.