Prerequisites
Data File
The tutorial java file makes reference to a data file in the current directory. This path can be updated to point to a valid path on the host where the file will be located, or the script can be run with the data file in the current directory.Taxi Data File Path
API Download
Maven Download
We suggest using Maven as the build tool for your Java project. To use the Kinetica Java API, ensure the Maven Central repository is reachable and add the Kinetica Java API dependency that matches that of the targeted Kinetica database, as shown below:API Version
API Dependency
The
pom.xml file used for the tutorial can be found
below.Manual Download
The source code for the Java API is also available for download from the GitHub repository kineticadb/kinetica-api-java. Follow the instructions in the included README file to build the API library.File Hierarchy
The tutorial was setup like the below diagram:Connecting to the Database
To connect to the database, instantiate an object of the GPUdb class, providing the connection URL, username, & password to the database server. For more details on connecting to the database, see Connecting via API.Connect to the Database
Creating a Schema
All tables & views must be created within a schema. So, the first step will be to create a schema to contain all of the tables & views created within this tutorial.Create Tutorial Schema
Creating a Type
Before a table can be created and any data can be loaded into it, a type needs to be defined. The type is a class, extended from RecordObject, using annotations to describe which class instance variables are fields (i.e., columns), what type they are, and any special handling they should receive. For this tutorial, three types will be used to create three corresponding tables:- A base table containing a sample of NYC taxi cab ride data
- A lookup table containing sample cab vendor information (created in the example below)
- A 1-to-1 relation table containing payment information for a subset of the cab-ride transactions
vendor table follows:
Define a Type
Although a constructor is not required, if the class does have any
constructors, it must have a constructor with no parameters. Other
constructors can be added, as necessary
Create a Type
Creating a Table
The returned object from the createType() call contains a unique type identifier assigned by the system. This identifier can then be used in the request to create a corresponding table. The examples below outline creating a table with either a single request object (vendor table) or a
parameterized call
(payment table):
Create a Table (via Request Object)
Create a Table (via Parameters)
Inserting Data
Once the table is created, data can be inserted into it. There is a convenience class called BulkInserter, which facilitates inserting records into a table in batches; see Distributed Ingest for details. For this tutorial, only the native Java API call insertRecords() will be used.Key/Value Record
In this example, values are assigned individually to fields after creating aPayment object with the default constructor.
Assign After Default Constructor
In-Line Record
In this example, values are passed in to theVendor object’s parameterized
constructor.
Assign by Parameterized Constructor
CSV
CSV and other data files can be loaded into Kinetica via KiFS using the GPUdbFileHandler. This can upload files into KiFS for later ingestion or stage them temporarily and ingest them all at once.Insert from File
Retrieving Data
Once the table is populated with data, the data can be retrieved from the system by a call to getRecords(tableName, offset, limit, options) using in-line parameter-passing.Retrieve Records (via Parameters)
Retrieve Records (via Request Object)
offset and limit parameters. The returned response also
contains the type (column definitions) of the results.
Also, note that all query related methods have the above two versions—with the
request object and with the parameters passed directly to the method.
Updating Records
Use updateRecords() to update matching key values for any records in a table.Update Records
Deleting Records
Use deleteRecords() to delete records from a table. A list can be used to specify which records to delete based on matching expressions. Set DELETE_ALL_RECORDS to TRUE to delete all records in a table.Delete Records
Alter Table
Some properties can be altered or added after table creation, using alterTable(), including indexes and dictionary encoding.Indexes
Using the alterTable() method, you can create indexes on columns using the CREATE_INDEX action paired with a column name.Index Column
Dictionary Encoding
Applying dictionary encoding via alterTable() involves adding a new property to a column using the CHANGE_COLUMN action paired with a column name & definition.Dictionary Encode Column
To add a new property, all existing column properties must be
listed along with any new properties.
Filters
Filters are an easy way to reduce larger tables into more concise views using expressions.Count Value-Filtered Records
Count Using Filter Chaining
Count List-Filtered Records
Count Range-Filtered Records
Aggregates
Kinetica supports various aggregate and group-by queries, which group and aggregate your data to return counts and useful statistics.Calculate Count/Min/Mean/Max
Retrieve Unique Values
Report Number of Records per Group
Generate Histogram
Joins
Joins allow you to link multiple tables together, along their relations, retrieving associated information from any or all of them. Tables can only be joined if they’re sharded similarly or replicated. All join types can be performed using the createJoinTable() method. An inner join returns only records that have matching values in both tables.Inner Join
left join in the given expression.
Left Join
Full outer joins require both tables to be replicated or joined on
their shard keys. Set merges that perform deduplication of records, like
Union Distinct,
Intersect, and
Except also need to use replicated tables to
ensure the correct results, so a replicated version of the taxi
(
taxi_trip_data_replicated) table is created at this point in the
tutorial using
createProjection().Create Replicated Table for Full Outer Join
Full Outer Join
Projections
You can create projections using the createProjection() method.Create a Temporary Projection
Create a Permanent Projection
Union, Intersect, & Except
Union can be used to combine homogeneous data sets into one larger data set. When calling the createUnion() method, a MODE of UNION or UNION_DISTINCT will both combine data sets, but only retain the records that are unique across the chosen columns, removing all duplicates. A MODE of UNION_ALL will combine data sets, retaining all records from the source data sets.Set Union Retaining Duplicates
Set Intersection
Set Exception (Subtraction)
Download & Run
Included below is a complete example containing all the above requests, the data file, output, compiled jar, and pom files. To run the complete sample, ensure the taxi_trip_data.csv is in the correct location, given the direction here, and run the tutorial JAR as an executable jar. For instance, if running from the API project root:Run Tutorial
As this script creates a schema and several database objects within
it, system admin permission is
required to run it.