> ## Documentation Index > Fetch the complete documentation index at: https://docs.kinetica.com/llms.txt > Use this file to discover all available pages before exploring further. # Java Developer Guide > Step-by-step instructions on writing Java applications with Kinetica The following guide provides step-by-step instructions to get started writing *Java* applications using *Kinetica*. This guide demonstrates only a small set of the available API. A detailed description of the complete interface is available under [Java API Reference](/content/api/java). ## Prerequisites ### Data File The [tutorial java file](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/java/java_tutorial/src/main/java/Tutorial.java) makes reference to a [data file](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/data/taxi_trip_data.csv) in the current directory. This path can be updated to point to a valid path on the host where the file will be located, or the script can be run with the data file in the current directory. ```java Taxi Data File Path theme={null} final String CSV_FILE_PATH = "./taxi_trip_data.csv"; ``` ### API Download #### Maven Download We suggest using *Maven* as the build tool for your *Java* project. To use the *Kinetica Java API*, ensure the Maven Central repository is reachable and add the *Kinetica Java API* dependency that matches that of the targeted *Kinetica* database, as shown below: ```xml API Version theme={null}

[7.2.3.9,7.3.0.0-SNAPSHOT)

``` ```xml API Dependency theme={null} com.gpudb gpudb-api ${gpudb-api.version} jar ``` The `pom.xml` file used for the tutorial can be found [below](/content/guides/java_guide#complete-sample-java). #### Manual Download The source code for the *Java* API is also available for download from the *GitHub* repository [kineticadb/kinetica-api-java](https://github.com/kineticadb/kinetica-api-java). Follow the instructions in the included README file to build the API library. ### File Hierarchy The tutorial was setup like the below diagram: ```bash theme={null} java_tutorial/ ├── java_tutorial.out ├── pom.xml ├── src/ │ └── main/ │ └── java/ │ └── Tutorial.java └── target/ └── docsite-tutorial-3.0-jar-with-dependencies.jar ``` ## Connecting to the Database To connect to the database, instantiate an object of the [GPUdb](/content/api/java/com/gpudb/GPUdb) class, providing the connection URL, username, & password to the database server. For more details on connecting to the database, see [Connecting via API](/content/api/concepts#api-conn). ```java Connect to the Database theme={null} GPUdb.Options options = new GPUdb.Options(); options.setUsername(user); options.setPassword(pass); GPUdb kdb = new GPUdb(url, options); ``` ## Creating a Schema All [tables](/content/concepts/tables) & [views](/content/concepts/views) must be created within a [schema](/content/concepts/schemas). So, the first step will be to create a *schema* to contain all of the *tables* & *views* created within this tutorial. ```java Create Tutorial Schema theme={null} final String SCHEMA_NAME = "tutorial_java"; kdb.createSchema(SCHEMA_NAME, null); ``` ## Creating a Type Before a *table* can be created and any data can be loaded into it, a [type](/content/concepts/types) needs to be defined. The *type* is a class, extended from [RecordObject](/content/api/java/com/gpudb/RecordObject), using annotations to describe which class instance variables are fields (i.e., columns), what type they are, and any special handling they should receive. For this tutorial, three *types* will be used to create three corresponding *tables*: * A base *table* containing a sample of NYC taxi cab ride data * A lookup *table* containing sample cab vendor information (created in the example below) * A 1-to-1 relation *table* containing payment information for a subset of the cab-ride transactions The definition of the *type* for the `vendor` *table* follows: ```java Define a Type theme={null} public static class Vendor extends RecordObject { /* Create column(s), establish its ordering, give it property * sub-type(s), give it a column type, and give it a name. */ @RecordObject.Column(order = 0, properties = { "char4", "primary_key" }) public String vendor_id; @RecordObject.Column(order = 1, properties = { "char64" }) public String vendor_name; @RecordObject.Column(order = 2, properties = { "char16", "nullable" }) public String phone; @RecordObject.Column(order = 3, properties = { "char64", "nullable" }) public String email; @RecordObject.Column(order = 4, properties = { "char64" }) public String hq_street; @RecordObject.Column(order = 5, properties = { "char8", "dict" }) public String hq_city; @RecordObject.Column(order = 6, properties = { "char2", "dict" }) public String hq_state; @RecordObject.Column(order = 7) public Integer hq_zip; @RecordObject.Column(order = 8) public Integer num_emps; @RecordObject.Column(order = 9) public Integer num_cabs; public Vendor() {} /* Create a constructor for the class that will take parameters so that * Bulk Inserting is easier */ public Vendor( String vendor_id, String vendor_name, String phone, String email, String hq_street, String hq_city, String hq_state, Integer hq_zip, Integer num_emps, Integer num_cabs ) { this.vendor_id = vendor_id; this.vendor_name = vendor_name; this.phone = phone; this.email = email; this.hq_street = hq_street; this.hq_city = hq_city; this.hq_state = hq_state; this.hq_zip = hq_zip; this.num_emps = num_emps; this.num_cabs = num_cabs; } } ``` Although a constructor is not required, if the class does have any constructors, it must have a constructor with no parameters. Other constructors can be added, as necessary Once defined, the *type* needs to be created with a call to [createType()](/content/api/java/com/gpudb/RecordObject#createType$java.lang.Class,com.gpudb.GPUdb$): ```java Create a Type theme={null} String vendorTypeId = RecordObject.createType(Vendor.class, kdb); ``` ## Creating a Table The returned object from the [createType()](/content/api/java/com/gpudb/RecordObject#createType$java.lang.Class,com.gpudb.GPUdb$) call contains a unique *type* identifier assigned by the system. This identifier can then be used in the request to create a corresponding *table*. The examples below outline creating a *table* with either a [single request object](/content/api/java/com/gpudb/GPUdb#createTable$com.gpudb.protocol.CreateTableRequest$) (`vendor` *table*) or a [parameterized call](/content/api/java/com/gpudb/GPUdb#createTable$java.lang.String,java.lang.String,java.util.Map$) (`payment` *table*): ```java Create a Table (via Request Object) theme={null} CreateTableRequest vendorCreateReq = new CreateTableRequest(); vendorCreateReq.setTableName(TABLE_VENDOR); vendorCreateReq.setTypeId(vendorTypeId); vendorCreateReq.setOptions(GPUdb.options(CreateTableRequest.Options.IS_REPLICATED, "true")); kdb.createTable(vendorCreateReq); ``` ```java Create a Table (via Parameters) theme={null} kdb.createTable(TABLE_PAYMENT, paymentTypeId, null); ``` ## Inserting Data Once the table is created, data can be inserted into it. There is a convenience class called [BulkInserter](/content/api/java/com/gpudb/BulkInserter), which facilitates inserting records into a table in batches; see [Distributed Ingest](/content/tuning/multihead/multihead_ingest) for details. For this tutorial, only the native Java API call [insertRecords()](/content/api/java/com/gpudb/GPUdb#insertRecords$java.lang.String,java.util.List,java.util.Map$) will be used. ### Key/Value Record In this example, values are assigned individually to fields after creating a `Payment` object with the default constructor. ```java Assign After Default Constructor theme={null} // Create a record object and assign values to properties Payment paymentDatum = new Payment(); paymentDatum.payment_id = 189; paymentDatum.payment_type = "No Charge"; paymentDatum.credit_type = null; paymentDatum.payment_timestamp = null; paymentDatum.fare_amount = 6.5; paymentDatum.surcharge = 0; paymentDatum.mta_tax = 0.6; paymentDatum.tip_amount = 0; paymentDatum.tolls_amount = 0; paymentDatum.total_amount = 7.1; // Insert the record into the table int numInserted = kdb.insertRecords(TABLE_PAYMENT, Arrays.asList(paymentDatum), null).getCountInserted(); ``` ### In-Line Record In this example, values are passed in to the `Vendor` object's parameterized constructor. ```java Assign by Parameterized Constructor theme={null} /* Create a list of in-line records. The order of the values must match * the column order in the type */ List vendorRecords = new ArrayList<>(); vendorRecords.add(new Vendor( "VTS","Vine Taxi Service","9998880001","admin@vtstaxi.com", "26 Summit St.","Flushing","NY",11354,450,400)); vendorRecords.add(new Vendor( "YCAB","Yes Cab","7895444321",null, "97 Edgemont St.","Brooklyn","NY",11223,445,425)); vendorRecords.add(new Vendor( "NYC","New York City Cabs",null,"support@nyc-taxis.com", "9669 East Bayport St.","Bronx","NY",10453,505,500)); vendorRecords.add(new Vendor( "DDS","Dependable Driver Service",null,null, "8554 North Homestead St.","Bronx","NY",10472,200,124)); vendorRecords.add(new Vendor( "CMT","Crazy Manhattan Taxi","9778896500","admin@crazymanhattantaxi.com", "950 4th Road Suite 78","Brooklyn","NY",11210,500,468)); vendorRecords.add(new Vendor( "TNY","Taxi New York",null,null, "725 Squaw Creek St.","Bronx","NY",10458,315,305)); vendorRecords.add(new Vendor( "NYMT","New York Metro Taxi",null,null, "4 East Jennings St.","Brooklyn","NY",11228,166,150)); vendorRecords.add(new Vendor( "5BTC","Five Boroughs Taxi Co.","4566541278","mgmt@5btc.com", "9128 Lantern Street","Brooklyn","NY",11229,193,175)); // Insert the records into the Vendor table numInserted = kdb.insertRecords(TABLE_VENDOR, vendorRecords, null).getCountInserted(); ``` ### CSV CSV and other data files can be loaded into Kinetica via [KiFS](/content/tools/kifs) using the [GPUdbFileHandler](/content/api/java/com/gpudb/GPUdbFileHandler). This can [upload](/content/api/java/com/gpudb/filesystem/GPUdbFileHandler#upload$java.lang.String,java.lang.String$) files into *KiFS* for later ingestion or stage them temporarily and [ingest](/content/api/java/com/gpudb/filesystem/GPUdbFileHandler#ingest$java.util.List,java.lang.String,com.gpudb.filesystem.ingest.IngestOptions,com.gpudb.filesystem.ingest.TableCreationOptions$) them all at once. ```java Insert from File theme={null} GPUdbFileHandler fh = new GPUdbFileHandler(kdb); fh.ingest(Arrays.asList(CSV_FILE_PATH), TABLE_TAXI, null, null); ``` ## Retrieving Data Once the *table* is populated with data, the data can be retrieved from the system by a call to [getRecords(tableName, offset, limit, options)](/content/api/java/com/gpudb/GPUdb#getRecords$java.lang.String,long,long,java.util.Map$) using in-line parameter-passing. ```java Retrieve Records (via Parameters) theme={null} // Retrieve no more than 10 records from payments using in-line request parameters GetRecordsResponse getPaymentRecordsResp = kdb.getRecords( TABLE_PAYMENT, 0, 10, GPUdb.options(GetRecordsRequest.Options.SORT_BY,"payment_id") ); System.out.println( "Payment ID Payment Type Credit Type Payment Timestamp " + "Fare Amount Surcharge MTA Tax Tip Amount Tolls Amount Total Amount" ); System.out.println( "========== ============ =========== ================= " + "=========== ========= ======= ========== ============ ============" ); for (Payment p : getPaymentRecordsResp.getData()) System.out.printf( "%10d %-12s %-11s %17s %11.2f %9.2f %7.2f %10.2f %12.2f %12.2f%n", p.payment_id, Objects.toString(p.payment_type, ""), Objects.toString(p.credit_type, ""), Objects.toString(p.payment_timestamp, ""), p.fare_amount, p.surcharge, p.mta_tax, p.tip_amount, p.tolls_amount, p.total_amount ); ``` One can also invoke [getRecords(request)](/content/api/java/com/gpudb/GPUdb#getRecords$com.gpudb.protocol.GetRecordsRequest$) using the [GetRecordsRequest](/content/api/java/com/gpudb/protocol/GetRecordsRequest) request class. This object contains all the parameters for the method call, and can be reused in successive calls, avoiding re-specifying query parameters. ```java Retrieve Records (via Request Object) theme={null} // Retrieve all records from the Vendor table using a request object GetRecordsRequest vendorReq = new GetRecordsRequest(); vendorReq.setTableName(TABLE_VENDOR); vendorReq.setOffset(0); vendorReq.setLimit(GPUdb.END_OF_SET); vendorReq.setOptions(GPUdb.options(GetRecordsRequest.Options.SORT_BY, "vendor_id")); GetRecordsResponse vendorResp = kdb.getRecords(vendorReq); System.out.println( "Vendor ID Vendor Name Phone Email " + "HQ Street HQ City HQ State HQ Zip # Employees # Cabs" ); System.out.println( "========= ========================== =========== ============================= " + "======================== ======== ======== ====== =========== ======" ); for (Vendor v : vendorResp.getData()) System.out.printf( "%-9s %-26s %-11s %-29s %-24s %-8s %-8s %-6d %11d %6d%n", v.vendor_id, v.vendor_name, Objects.toString(v.phone, ""), Objects.toString(v.email, ""), v.hq_street, v.hq_city, v.hq_state, v.hq_zip, v.num_emps, v.num_cabs ); ``` For large *tables*, the data can be easily be retrieved in smaller blocks by using the `offset` and `limit` parameters. The returned response also contains the *type* (column definitions) of the results. Also, note that all query related methods have the above two versions--with the request object and with the parameters passed directly to the method. ## Updating Records Use [updateRecords()](/content/api/java/com/gpudb/GPUdb#updateRecords$java.lang.String,java.util.List,java.util.List,java.util.List,java.util.Map$) to update matching key values for any records in a *table*. ```java Update Records theme={null} // Update the e-mail, number of employees, and number of cabs of the DDS vendor List> newValsList = new ArrayList<>(); Map newVals = new HashMap<>(); newVals.put("email", "'management@ddstaxico.com'"); newVals.put("num_emps", "num_emps + 2"); newVals.put("num_cabs", "num_cabs + 1"); newValsList.add(newVals); kdb.updateRecords( TABLE_VENDOR, Arrays.asList("vendor_id = 'DDS'"), newValsList, null, GPUdb.options( UpdateRecordsRequest.Options.USE_EXPRESSIONS_IN_NEW_VALUES_MAPS, UpdateRecordsRequest.Options.TRUE ) ); ``` ## Deleting Records Use [deleteRecords()](/content/api/java/com/gpudb/GPUdb#deleteRecords$java.lang.String,java.util.List,java.util.Map$) to delete records from a *table*. A list can be used to specify which records to delete based on matching expressions. Set [DELETE\_ALL\_RECORDS](/content/api/java/com/gpudb/protocol/DeleteRecordsRequest.Options#DELETE_ALL_RECORDS) to [TRUE](/content/api/java/com/gpudb/protocol/DeleteRecordsRequest.Options#TRUE) to delete all records in a table. ```java Delete Records theme={null} // Delete payment 189 String delExpr = "payment_id = 189"; kdb.deleteRecords(TABLE_PAYMENT, Arrays.asList(delExpr), null); ``` ## Alter Table Some properties can be altered or added after *table* creation, using [alterTable()](/content/api/java/com/gpudb/GPUdb#alterTable$java.lang.String,java.lang.String,java.lang.String,java.util.Map$), including [indexes](/content/concepts/indexes) and [dictionary encoding](/content/concepts/dictionary_encoding). ### Indexes Using the [alterTable()](/content/api/java/com/gpudb/GPUdb#alterTable$java.lang.String,java.lang.String,java.lang.String,java.util.Map$) method, you can create *indexes* on columns using the [CREATE\_INDEX](/content/api/java/com/gpudb/protocol/AlterTableRequest.Action#CREATE_INDEX) *action* paired with a column name. ```java Index Column theme={null} /* Add column indexes on: * - payment table, fare_amount (for query-chaining filter example) * - taxi table, passenger_count (for filter-by-range example) */ kdb.alterTable(TABLE_PAYMENT, AlterTableRequest.Action.CREATE_INDEX, "fare_amount", null); kdb.alterTable(TABLE_TAXI, AlterTableRequest.Action.CREATE_INDEX, "passenger_count", null); ``` ### Dictionary Encoding Applying *dictionary encoding* via [alterTable()](/content/api/java/com/gpudb/GPUdb#alterTable$java.lang.String,java.lang.String,java.lang.String,java.util.Map$) involves adding a new property to a column using the [CHANGE\_COLUMN](/content/api/java/com/gpudb/protocol/AlterTableRequest.Action#CHANGE_COLUMN) *action* paired with a column name & definition. ```java Dictionary Encode Column theme={null} // Apply dictionary encoding to the payment type column AlterTableResponse dictEncResp = kdb.alterTable( TABLE_TAXI, AlterTableRequest.Action.CHANGE_COLUMN, columnName, GPUdb.options( AlterTableRequest.Options.COLUMN_PROPERTIES, "char4,dict" ) ); ``` To add a new property, all existing column properties must be listed along with any new properties. ## Filters Filters are an easy way to reduce larger *tables* into more concise [views](/content/concepts/views) using [expressions](/content/concepts/expressions). ```java Count Value-Filtered Records theme={null} // Filter for only payments with no corresponding payment type, returning the // count of records found long f1Count = kdb.filter(TABLE_PAYMENT, VIEW_EXAMPLE1, "IS_NULL(payment_type)", null).getCount(); ``` ```java Count Using Filter Chaining theme={null} // Using query chaining, filter null payment type records with a fare amount greater than 8 long f2Count = kdb.filter(VIEW_EXAMPLE1, VIEW_EXAMPLE2, "fare_amount > 8", null).getCount(); ``` ```java Count List-Filtered Records theme={null} // Filter by list where vendor ID is either NYC or YCAB Map> columnValuesMap = new HashMap<>(); columnValuesMap.put("vendor_id", Arrays.asList("NYC", "YCAB")); long f3Count = kdb.filterByList(TABLE_TAXI, VIEW_EXAMPLE3, columnValuesMap, null).getCount(); ``` ```java Count Range-Filtered Records theme={null} // Filter by range trip with passenger count between 1 and 3 long f4Count = kdb.filterByRange(TABLE_TAXI, VIEW_EXAMPLE4, "passenger_count", 1, 3, null).getCount(); ``` ## Aggregates *Kinetica* supports various aggregate and group-by queries, which group and aggregate your data to return counts and useful statistics. ```java Calculate Count/Min/Mean/Max theme={null} // Aggregate count, min, mean, and max on the trip distance Map a1Resp = kdb.aggregateStatistics( TABLE_TAXI, "trip_distance", AggregateStatisticsRequest.Stats.COUNT + "," + AggregateStatisticsRequest.Stats.MIN + "," + AggregateStatisticsRequest.Stats.MAX + "," + AggregateStatisticsRequest.Stats.MEAN, null ).getStats(); System.out.println("Statistics of values in the trip_distance column:"); System.out.printf( "\tCount: %5.0f%n\tMin: %5.2f%n\tMean: %5.2f%n\tMax: %5.2f%n%n", a1Resp.get(AggregateStatisticsRequest.Stats.COUNT), a1Resp.get(AggregateStatisticsRequest.Stats.MIN), a1Resp.get(AggregateStatisticsRequest.Stats.MEAN), a1Resp.get(AggregateStatisticsRequest.Stats.MAX) ); ``` ```java Retrieve Unique Values theme={null} // Find unique taxi vendor IDs List a2Resp = kdb.aggregateUnique(TABLE_TAXI, "vendor_id", 0, GPUdb.END_OF_SET, null).getData(); System.out.println("Unique vendor IDs in the taxi trip table:"); for (Record vendor : a2Resp) System.out.println("\t* " + vendor.get("vendor_id")); ``` ```java Report Number of Records per Group theme={null} // Find number of trips per vendor List colNames = Arrays.asList("vendor_id", "count(vendor_id)"); List a3Resp = kdb.aggregateGroupBy( TABLE_TAXI, colNames, 0, GPUdb.END_OF_SET, GPUdb.options( AggregateGroupByRequest.Options.SORT_BY, AggregateGroupByRequest.Options.KEY ) ).getData(); System.out.println("Trips per vendor:"); for (Record vendor : a3Resp) System.out.printf("\t%-6s %3d%n", vendor.get("vendor_id") + ":", vendor.get("count(vendor_id)")); ``` ```java Generate Histogram theme={null} // Create a histogram for the different groups of passenger counts float start = 1; float end = 6; float interval = 1; List a4Resp = kdb.aggregateHistogram( TABLE_TAXI, "passenger_count", start, end, interval, null ).getCounts(); System.out.println("Passenger count groups by size:"); System.out.println("Passengers Total Trips"); System.out.println("========== ==========="); List countGroups = Arrays.asList("1", "2", "3", "4", ">5"); for (int hgNum = 0; hgNum < a4Resp.size(); hgNum++) System.out.printf("%10s %11.0f%n", countGroups.get(hgNum), a4Resp.get(hgNum)); ``` ## Joins [Joins](/content/concepts/joins) allow you to link multiple *tables* together, along their relations, retrieving associated information from any or all of them. Tables can only be *joined* if they're [sharded](/content/concepts/tables#sharded) similarly or [replicated](/content/concepts/tables#replicated). All join types can be performed using the [createJoinTable()](/content/api/java/com/gpudb/GPUdb#createJoinTable$java.lang.String,java.util.List,java.util.List,java.util.List,java.util.Map$) method. An *inner join* returns only records that have matching values in both *tables*. ```java Inner Join theme={null} /* Retrieve cab ride transactions and the full name of the associated * vendor for rides having more than three passengers between April 1st * & 16th, 2015 */ kdb.createJoinTable( JOIN_TABLE_INNER, Arrays.asList(TABLE_TAXI + " as t", TABLE_PAYMENT + " as p"), Arrays.asList( "t.payment_id", "payment_type", "total_amount", "passenger_count", "vendor_id", "trip_distance" ), Arrays.asList("t.payment_id = p.payment_id", "passenger_count > 3"), null ); ``` A *left join* returns all of the records an *inner join* does, but additionally, for each record in the table on the *left* side of the *join* that has no match along the relation to a record in the table on the *right* side of the *join*, a corresponding record will be returned with "left-side" columns populated with the "left-side" record data and the "right-side" columns populated with *nulls*. Note the usage of `left join` in the given expression. ```java Left Join theme={null} /* Retrieve cab ride transactions and the full name of the associated * vendor (if available--blank if vendor name is unknown) for * transactions with associated payment data, sorting by increasing * values of transaction ID. */ kdb.createJoinTable( JOIN_TABLE_LEFT, Arrays.asList(TABLE_TAXI + " as t", TABLE_VENDOR + " as v"), Arrays.asList("transaction_id", "pickup_datetime", "trip_distance", "t.vendor_id", "vendor_name"), Arrays.asList("left join t, v on (t.vendor_id = v.vendor_id)", "payment_id <> 0"), null ); ``` Full outer *joins* require both tables to be replicated or joined on their shard keys. Set merges that perform deduplication of records, like [Union Distinct](/content/concepts/unions), [Intersect](/content/concepts/intersect), and [Except](/content/concepts/except) also need to use replicated tables to ensure the correct results, so a replicated version of the taxi (`taxi_trip_data_replicated`) *table* is created at this point in the tutorial using [createProjection()](/content/api/java/com/gpudb/GPUdb#createProjection$java.lang.String,java.lang.String,java.util.List,java.util.Map$). ```java Create Replicated Table for Full Outer Join theme={null} kdb.createProjection( TABLE_TAXI, TABLE_TAXI_REPLICATED, Arrays.asList( "transaction_id", "payment_id", "vendor_id", "pickup_datetime", "dropoff_datetime", "passenger_count", "trip_distance", "pickup_longitude", "pickup_latitude", "dropoff_longitude", "dropoff_latitude" ), GPUdb.options(CreateProjectionRequest.Options.IS_REPLICATED, "true") ); ``` A *full outer join* returns all of the records a *left join* does, but additionally, for each record in the table on the *right* side of the *join* that has no match along the relation to a record in the table on the *left* side of the *join*, a corresponding record will be returned with "right-side" columns populated with the "right-side" record data and the "left-side" columns populated with *nulls*. ```java Full Outer Join theme={null} /* Retrieve the vendor IDs of known vendors with no recorded cab ride * transactions, as well as the vendor ID and number of transactions * for unknown vendors with recorded cab ride transactions */ kdb.createJoinTable( JOIN_TABLE_OUTER, Arrays.asList(TABLE_TAXI_REPLICATED + " as t", TABLE_VENDOR + " as v"), Arrays.asList("t.vendor_id as vendor_id", "v.vendor_id as vendor_id_1"), Arrays.asList("full_outer join t,v on ((v.vendor_id = t.vendor_id))"), null ); ``` ## Projections You can create [projections](/content/concepts/projections) using the [createProjection()](/content/api/java/com/gpudb/GPUdb#createProjection$java.lang.String,java.lang.String,java.util.List,java.util.Map$) method. ```java Create a Temporary Projection theme={null} // Create a projection containing all payments by credit card kdb.createProjection( TABLE_PAYMENT, PROJECTION_EXAMPLE1, Arrays.asList( "payment_id", "payment_type", "credit_type", "payment_timestamp", "fare_amount", "surcharge", "mta_tax", "tip_amount", "tolls_amount", "total_amount" ), GPUdb.options(CreateProjectionRequest.Options.EXPRESSION, "payment_type = 'Credit'") ); ``` To persist a *projection*: ```java Create a Permanent Projection theme={null} /* Create a persisted table with cab ride transactions greater than 5 * miles whose trip started during lunch hours */ kdb.createProjection( TABLE_TAXI, PROJECTION_EXAMPLE2, Arrays.asList( "hour(pickup_datetime) as hour_of_day", "vendor_id", "passenger_count", "trip_distance" ), GPUdb.options( CreateProjectionRequest.Options.EXPRESSION, "(hour(pickup_datetime) >= 11) AND " + "(hour(pickup_datetime) <= 14) AND " + "(trip_distance > 5)", CreateProjectionRequest.Options.PERSIST, CreateProjectionRequest.Options.TRUE ) ); ``` ## Union, Intersect, & Except [Union](/content/concepts/unions) can be used to combine homogeneous data sets into one larger data set. When calling the [createUnion()](/content/api/java/com/gpudb/GPUdb#createUnion$java.lang.String,java.util.List,java.util.List,java.util.List,java.util.Map$) method, a [MODE](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#MODE) of [UNION](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#UNION) or [UNION\_DISTINCT](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#UNION_DISTINCT) will both combine data sets, but only retain the records that are unique across the chosen columns, removing all duplicates. A [MODE](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#MODE) of [UNION\_ALL](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#UNION_ALL) will combine data sets, retaining all records from the source data sets. ```java Set Union Retaining Duplicates theme={null} /* Calculate the average number of passengers, as well as the shortest, * average, and longest trips for all trips in each of the two time * periods--from April 1st through the 15th, 2015 and from April 16th * through the 23rd, 2015--and return those two sets of statistics in a * single result set. */ kdb.aggregateGroupBy( TABLE_TAXI, Arrays.asList( "avg(passenger_count) as avg_pass_count", "avg(trip_distance) as avg_trip_dist", "min(trip_distance) as min_trip_dist", "max(trip_distance) as max_trip_dist" ), 0, GPUdb.END_OF_SET, GPUdb.options( AggregateGroupByRequest.Options.EXPRESSION, "((pickup_datetime >= '2015-04-01') AND " + "(pickup_datetime <= '2015-04-15 23:59:59.999'))", AggregateGroupByRequest.Options.RESULT_TABLE, AGG_GRPBY_UNION_ALL_SRC1 ) ); kdb.aggregateGroupBy( TABLE_TAXI, Arrays.asList( "avg(passenger_count) as avg_pass_count", "avg(trip_distance) as avg_trip_dist", "min(trip_distance) as min_trip_dist", "max(trip_distance) as max_trip_dist" ), 0, GPUdb.END_OF_SET, GPUdb.options( AggregateGroupByRequest.Options.EXPRESSION, "((pickup_datetime >= '2015-04-16') AND " + "(pickup_datetime <= '2015-04-23 23:59:59.999'))", AggregateGroupByRequest.Options.RESULT_TABLE, AGG_GRPBY_UNION_ALL_SRC2 ) ); kdb.createUnion( UNION_ALL_TABLE, Arrays.asList(AGG_GRPBY_UNION_ALL_SRC1, AGG_GRPBY_UNION_ALL_SRC2), Arrays.asList( Arrays.asList( "'2015-04-01 - 2015-04-15'", "avg_pass_count", "avg_trip_dist", "min_trip_dist", "max_trip_dist" ), Arrays.asList( "'2015-04-16 - 2015-04-23'", "avg_pass_count", "avg_trip_dist", "min_trip_dist", "max_trip_dist" ) ), Arrays.asList( "pickup_window_range", "avg_pass_count", "avg_trip", "min_trip", "max_trip" ), GPUdb.options(CreateUnionRequest.Options.MODE, CreateUnionRequest.Options.UNION_ALL) ); ``` Calling the [createUnion()](/content/api/java/com/gpudb/GPUdb#createUnion$java.lang.String,java.util.List,java.util.List,java.util.List,java.util.Map$) method with a [MODE](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#MODE) of [INTERSECT](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#INTERSECT) will perform a set [intersection](/content/concepts/intersect), which will combine data sets but only include the records found in both data sets, removing duplicate result records. A [MODE](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#MODE) of [INTERSECT\_ALL](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#INTERSECT_ALL) will retain intersecting duplicates from both sets. ```java Set Intersection theme={null} /* Retrieve locations (as lat/lon pairs) that were both pick-up and * drop-off points */ kdb.createUnion( UNION_INTERSECT_TABLE, Arrays.asList(TABLE_TAXI_REPLICATED, TABLE_TAXI_REPLICATED), Arrays.asList( Arrays.asList("pickup_latitude", "pickup_longitude"), Arrays.asList("dropoff_latitude", "dropoff_longitude") ), Arrays.asList("latitude", "longitude"), GPUdb.options(CreateUnionRequest.Options.MODE, CreateUnionRequest.Options.INTERSECT) ); ``` Calling the [createUnion()](/content/api/java/com/gpudb/GPUdb#createUnion$java.lang.String,java.util.List,java.util.List,java.util.List,java.util.Map$) method with a [MODE](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#MODE) of [EXCEPT](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#EXCEPT) will perform a set [except](/content/concepts/except) (subtraction), which will return records that appear in the first data set but not the second data set. Note that the data sets on each side of the *except* will have duplicates removed *first*, and then the set subtraction will be processed. A [MODE](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#MODE) of [EXCEPT\_ALL](/content/api/java/com/gpudb/protocol/CreateUnionRequest.Options#EXCEPT_ALL) will retain duplicates from the first set. ```java Set Exception (Subtraction) theme={null} /* Show vendors that operate before noon, but not after noon: retrieve * the unique list of IDs of vendors who provided cab rides between * midnight and noon, and remove from that list the IDs of any vendors * who provided cab rides between noon and midnight */ kdb.createProjection( TABLE_TAXI_REPLICATED, PROJECTION_EXCEPT_SRC1, Arrays.asList("vendor_id"), GPUdb.options( CreateProjectionRequest.Options.EXPRESSION, "((HOUR(pickup_datetime) >= 0) AND (HOUR(pickup_datetime) <= 11))" ) ); kdb.createProjection( TABLE_TAXI_REPLICATED, PROJECTION_EXCEPT_SRC2, Arrays.asList("vendor_id"), GPUdb.options( CreateProjectionRequest.Options.EXPRESSION, "((HOUR(pickup_datetime) >= 12) AND (HOUR(pickup_datetime) <= 23))" ) ); kdb.createUnion( UNION_EXCEPT_TABLE, Arrays.asList(PROJECTION_EXCEPT_SRC1, PROJECTION_EXCEPT_SRC2), Arrays.asList(Arrays.asList("vendor_id"), Arrays.asList("vendor_id")), Arrays.asList("vendor_id"), GPUdb.options(CreateUnionRequest.Options.MODE, CreateUnionRequest.Options.EXCEPT) ); ``` ## Download & Run Included below is a complete example containing all the above requests, the data file, output, compiled jar, and pom files. * [Tutorial.java](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/java/java_tutorial/src/main/java/Tutorial.java) * [Taxi Data File](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/data/taxi_trip_data.csv) * [Java Output](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/java/java_tutorial/java_tutorial.out) * [Tutorial jar](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/java/java_tutorial/target/docsite-tutorial-3.0-jar-with-dependencies.jar) * [pom.xml](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/java/java_tutorial/pom.xml) To run the complete sample, ensure the taxi\_trip\_data.csv is in the correct location, given the direction [here](/content/guides/java_guide#java-tutorial-data-file), and run the tutorial JAR as an executable jar. For instance, if running from the API project root: ```bash title="Run Tutorial" theme={null} java -jar target/docsite-tutorial-3.0-jar-with-dependencies.jar ``` As this script creates a schema and several database objects within it, [system admin permission](/content/security/sec_concepts#security-concepts-permissions-system) is required to run it.