Version:

Java Developer Manual

The following guide provides step-by-step instructions to get started writing Java applications using Kinetica. This guide demonstrates only a small set of the available API. A detailed description of the complete interface is available under Java API Reference.

Prerequisites

Before running the sample Tutorial.java file:

  1. Download the API
  2. Import the Taxi Trip Data CSV file found below using GAdmin

API Download

Maven Download

We suggest using Maven as the build tool for your Java project. To use the Kinetica Java API, you must add our Nexus repository and the Kinetica Java API dependency that matches that of the targeted Kinetica database, as shown below:

<properties>
	<gpudb-api.version>6.1.0</gpudb-api.version>
	<repository>
		<id>gpudb-releases</id>
		<url>http://files.kinetica.com/nexus/content/repositories/releases/</url>
	</repository>
	<dependency>
		<groupId>com.gpudb</groupId>
		<artifactId>gpudb-api</artifactId>
		<version>${gpudb-api.version}</version>
		<type>jar</type>
	</dependency>

Manual Download

The source code for the Java API is also available for download from the GitHub repository kineticadb/kinetica-api-java. Follow the instructions in the included README file to build the API library.

Connecting to the Database

To connect to the database, instantiate an object of the GPUdb class, providing the connection URL, including host & port of the database server:

GPUdb gpudb = new GPUdb("http://localhost:9191");

Creating a Collection

Use createTable() with the IS_COLLECTION option set to true to create a collection. The taxi_info collection is created to house most of the tables and views created within the tutorial.

// Create a collection to house the main tables
gpudb.createTable(COLLECTION_TAXI_INFO, null,
        GPUdb.options(CreateTableRequest.Options.IS_COLLECTION,
                "true"));
System.out.println("Taxi collection created:  OK");

Creating a Type

Before any data can be loaded into the system, a Type needs to be defined in the system. The type is a class, extended from RecordObject, using annotations to describe which class instance variables are fields (i.e. columns), what type they are, and any special handling they should receive. Each field consists of a name and a data type:

   public static class Vendor extends RecordObject
   {
	/* Create column(s), establish its ordering, give it property sub-type(s), give it a
		column type, and give it a name.
	 */

       @RecordObject.Column(order = 0, properties = { "char4", "primary_key" })
       public String vendor_id;

       @RecordObject.Column(order = 1, properties = { "char64" })
       public String vendor_name;

       @RecordObject.Column(order = 2, properties = { "char16", "nullable" })
       public String phone;

       @RecordObject.Column(order = 3, properties = { "char64", "nullable" })
       public String email;

       @RecordObject.Column(order = 4, properties = { "char64" })
       public String hq_street;

       @RecordObject.Column(order = 5, properties = { "char8", "dict" })
       public String hq_city;

       @RecordObject.Column(order = 6, properties = { "char2", "dict" })
       public String hq_state;

       @RecordObject.Column(order = 7)
       public Integer hq_zip;

       @RecordObject.Column(order = 8)
       public Integer num_emps;

       @RecordObject.Column(order = 9)
       public Integer num_cabs;

       public Vendor() {}

       /* Create a constructor for the class that will take parameters so that
        * Bulk Inserting is easier */
       public Vendor(String vendor_id, String vendor_name, String phone,
                     String email, String hq_street, String hq_city,
                     String hq_state, Integer hq_zip, Integer num_emps,
                     Integer num_cabs)
       {
           this.vendor_id = vendor_id;
           this.vendor_name = vendor_name;
           this.phone = phone;
           this.email = email;
           this.hq_street = hq_street;
           this.hq_city = hq_city;
           this.hq_state = hq_state;
           this.hq_zip = hq_zip;
           this.num_emps = num_emps;
           this.num_cabs = num_cabs;
       }
   }

   public static class Payment extends RecordObject
   {

       @RecordObject.Column(order = 0, properties = { "primary_key" })
       public long payment_id;

       @RecordObject.Column(order = 1, properties = { "char16", "nullable" })
       public String payment_type;

       @RecordObject.Column(order = 2, properties = { "char16", "nullable",
               "dict" })
       public String credit_type;

       @RecordObject.Column(order = 3, properties = { "timestamp",
               "nullable" })
       public Long payment_timestamp;

       @RecordObject.Column(order = 4, properties = { "nullable" })
       public double fare_amount;

       @RecordObject.Column(order = 5, properties = { "nullable" })
       public double surcharge;

       @RecordObject.Column(order = 6, properties = { "nullable" })
       public double mta_tax;

       @RecordObject.Column(order = 7, properties = { "nullable" })
       public double tip_amount;

       @RecordObject.Column(order = 8, properties = { "nullable" })
       public double tolls_amount;

       @RecordObject.Column(order = 9, properties = { "nullable" })
       public double total_amount;

       public Payment() {}

       public Payment(long payment_id, String payment_type, String credit_type,
                      Long payment_timestamp, double fare_amount,
                      double surcharge, double mta_tax, double tip_amount,
                      double tolls_amount, double total_amount)
       {
           this.payment_id = payment_id;
           this.payment_type = payment_type;
           this.credit_type = credit_type;
           this.payment_timestamp = payment_timestamp;
           this.fare_amount = fare_amount;
           this.surcharge = surcharge;
           this.mta_tax = mta_tax;
           this.tip_amount = tip_amount;
           this.tolls_amount = tolls_amount;
           this.total_amount = total_amount;
       }
   }

Note

Although a constructor is not required, if the class does have any constructors, it must have a constructor with no parameters. Other constructors can be added, as necessary

Next, the types need to be created:

      String vendorTypeId = RecordObject.createType(Vendor.class, gpudb);
      String paymentTypeId = RecordObject.createType(Payment.class, gpudb);

Creating a Table

The returned object from the call contains a unique type identifier allocated by the system. This identifier can then be used in the request to create a new table as follows:

      // Create tables from the types
      Map<String, String> optionsVendor = new HashMap<>();
      optionsVendor.put("collection_name", COLLECTION_TAXI_INFO);
      optionsVendor.put("is_replicated","true");
      gpudb.createTable(TABLE_VENDOR, vendorTypeId, optionsVendor);
      System.out.println("Vendor table created:  OK");

      gpudb.createTable(TABLE_PAYMENT, paymentTypeId,
              GPUdb.options(CreateTableRequest.Options.COLLECTION_NAME,
                      COLLECTION_TAXI_INFO));
      System.out.println("Payment table created:  OK");

Inserting Data

Once the table is created, data can be inserted into it. There is a convenience class called BulkInserter, which facilitates inserting records into a table in batches, documented under Ingestion. For this tutorial, only the native Java API call insertRecords() will be shown.

Key/Value Record

      // Create a record object and assign values to properties
      Payment payment = new Payment();
      payment.payment_id = 189;
      payment.payment_type = "No Charge";
      payment.credit_type = null;
      payment.payment_timestamp = null;
      payment.fare_amount = 6.5;
      payment.surcharge = 0;
      payment.mta_tax = 0.6;
      payment.tip_amount = 0;
      payment.tolls_amount = 0;
      payment.total_amount = 7;

      // Insert the record within a list into the table
      int numInserted = gpudb.insertRecords(TABLE_PAYMENT,
              Arrays.asList(payment), null).getCountInserted();
      System.out.println("Number of single records inserted into the payments table:  " + numInserted);

In-Line Record

      // Add vendor records to a list
      List<Vendor> vendors = new ArrayList<>();
      vendors.add(new Vendor("VTS","Vine Taxi Service",
              "9998880001","admin@vtstaxi.com",
              "26 Summit St.","Flushing","NY",
              11354,450,400));
      vendors.add(new Vendor("YCAB","Yes Cab",
              "7895444321",null,"97 Edgemont St.",
              "Brooklyn","NY",11223,445,
              425));
      vendors.add(new Vendor("NYC","New York City Cabs",
              null,"support@nyc-taxis.com",
              "9669 East Bayport St.","Bronx","NY",
              10453,505,500));
      vendors.add(new Vendor("DDS",
              "Dependable Driver Service",null,null,
              "8554 North Homestead St.","Bronx",
              "NY",10472,200,124));
      vendors.add(new Vendor("CMT",
              "Crazy Manhattan Taxi","9778896500",
              "admin@crazymanhattantaxi.com",
              "950 4th Road Suite 78","Brooklyn",
              "NY",11210,500,468));
      vendors.add(new Vendor("TNY","Taxi New York",
              null,null,"725 Squaw Creek St.",
              "Bronx","NY",10458,315,
              305));
      vendors.add(new Vendor("NYMT",
              "New York Metro Taxi",null,null,
              "4 East Jennings St.","Brooklyn","NY",
              11228,166,150));
      vendors.add(new Vendor("5BTC",
              "Five Boroughs Taxi Co.","4566541278",
              "mgmt@5btc.com","9128 Lantern Street",
              "Brooklyn","NY",11229,193,
              175));

      // Insert the records into the table
      numInserted = gpudb.insertRecords(TABLE_VENDOR, vendors,
              null).getCountInserted();
      System.out.println("Number of batch records inserted into the vendor table:  " + numInserted);

Important

Additional records are inserted at this point, which can be found in the full Tutorial.java file below.

Retrieving Data

Once the table is populated with data, the data can be retrieved from the system by a call to getRecords(tableName, offset, limit, options), using in-line parameter-passing.

      /* Retrieve no more than 10 records from payments using in-line
       * request parameters */
      Map<String,String> options = GPUdb.options(GetRecordsRequest.Options.SORT_BY,
              COL_PAYTYPE);
      GetRecordsResponse<Payment> grResp = gpudb.getRecords(TABLE_PAYMENT,
              0, 10, options);
      System.out.printf("Payment ID Payment Type Credit Type      Payment Timestamp      Fare Amount Surcharge MTA Tax Tip Amount Tolls Amount Total Amount%n");
      System.out.printf("========== ============ ================ ====================== =========== ========= ======= ========== ============ ============%n");
      for (Payment p : grResp.getData())
          System.out.printf
                  (
                          "%-10d %-12s %-16s %-22d %-11.2f %-9.2f %-7.2f %-10.2f %-12.2f %-12.2f %n",
                          p.payment_id, p.payment_type, p.credit_type,
                          p.payment_timestamp, p.fare_amount, p.surcharge,
                          p.mta_tax, p.tip_amount, p.tolls_amount,
                          p.total_amount
                  );

One can also invoke getRecords(request) using the GetRecordsRequest request class. This object contains all the parameters for the method call, and can be reused in successive calls, avoiding re-specifying query parameters.

        /* Retrieve the remaining records from the payments table
         * using a request object */
        GetRecordsRequest grReq = new GetRecordsRequest();
        grReq.setTableName(TABLE_PAYMENT);
        grReq.setOffset(10);
        grReq.setLimit(15);
        grReq.setOptions(options);
        grResp = gpudb.getRecords(grReq);
        for (Payment p : grResp.getData())
            System.out.printf
                    (
                            "%-10d %-12s %-16s %-22d %-11.2f %-9.2f %-7.2f %-10.2f %-12.2f %-12.2f %n",
                            p.payment_id, p.payment_type, p.credit_type,
                            p.payment_timestamp, p.fare_amount, p.surcharge,
                            p.mta_tax, p.tip_amount, p.tolls_amount,
                            p.total_amount
                    );
        System.out.println("\nNumber of records in new table:  " +
                grResp.getTotalNumberOfRecords());

For large tables, the data can be easily be retrieved in smaller blocks by using the offset and limit parameters. The returned response also contains the schema (or data type) of the results.

Also, note that all query related methods have the above two versions--with the request object and with the parameters passed directly to the method.

Updating and Deleting Data

Use updateRecords() to update matching key values for all records in a table.

        /* Update the e-mail, number of employees, and number of cabs of the DDS
         * vendor */
        List<String> updExp = Arrays.asList("vendor_id = 'DDS'");
        List<Map<String, String>> newValsList = new ArrayList<>();
        Map<String,String> newVals = new HashMap<>();
        newVals.put(COL_EMAIL, "management@ddstaxico.com");
        newVals.put(COL_NUM_EMPS, "156");
        newVals.put(COL_NUM_CABS, "213");
        newValsList.add(newVals);
        gpudb.updateRecords(TABLE_VENDOR, updExp, newValsList, null,
                null);
        GetRecordsResponse<Vendor> updVendRsp = gpudb.getRecords(TABLE_VENDOR,
                0, 10, GPUdb.options(
                        GetRecordsRequest.Options.EXPRESSION,
                        "vendor_id = 'DDS'"));
        System.out.println("Updated DDS vendor information:");
        System.out.printf("Vendor ID Vendor Name               Phone Email                    HQ Street                HQ City HQ State HQ Zip Code # Employees # Cabs%n");
        System.out.printf("========= ========================= ===== ======================== ======================== ======= ======== =========== =========== ======%n");
        for (Vendor v : updVendRsp.getData())
            System.out.printf
                    (
                            "%-9s %-25s %-5s %-24s %-24s %-7s %-8s %-11d %-11d %-6d %n",
                            v.vendor_id, v.vendor_name, v.phone, v.email,
                            v.hq_street, v.hq_city, v.hq_state, v.hq_zip,
                            v.num_emps, v.num_cabs
                    );
        System.out.println();

Use deleteRecords() to delete records from a table. A list can be used to specify which records delete based on matching expressions.

        // Delete payment 189
        List<String> delExp = Arrays.asList("payment_id = 189");
        GetRecordsResponse<Payment> preDel = gpudb.getRecords(TABLE_PAYMENT,
                0, 30, options);
        System.out.println("Records in the payment table (before delete): " +
                preDel.getTotalNumberOfRecords());
        System.out.println("Deleting record where " + delExp);
        gpudb.deleteRecords(TABLE_PAYMENT, delExp, null);
        GetRecordsResponse<Payment> postDel = gpudb.getRecords(TABLE_PAYMENT,
                0, 30, options);
        System.out.println("Record deleted; amount of records in the payment table (after delete): " +
                postDel.getTotalNumberOfRecords());
        System.out.println();

Alter Table

Some properties can be altered or added after table creation, including indexes, dictionary encoding, and compression. Use the alterTable() to specify the table and property you want to alter. The create_index option will add a column index, while set_column_compression option adds the specified compression type to the desired column.

        /* Add column indexes to vendor_id in both the vendor and taxi_trip_data
         * tables */
        AlterTableResponse indexVendRsp = gpudb.alterTable(TABLE_VENDOR,
                "create_index", COL_VENDORID, null);
        System.out.println("Index added to Vendor table: \n" + indexVendRsp);

        AlterTableResponse indexTaxiRsp = gpudb.alterTable(TABLE_TAXI,
                "create_index", COL_VENDORID, null);
        System.out.println("Index added to Taxi table: \n" + indexTaxiRsp);
        System.out.println();

        /* Apply the snappy compression algorithm to the pickup and dropoff
         * datetime columns */
        Map<String,String> snappyComp = GPUdb.options(
                AlterTableRequest.Options.COMPRESSION_TYPE, "snappy");
        AlterTableResponse compPickupRsp = gpudb.alterTable(TABLE_TAXI,
                "set_column_compression", COL_PICKUP, snappyComp);
        System.out.println("Compression added 'pickup_datetime' column: \n" +
                compPickupRsp);

        AlterTableResponse compDropoffRsp = gpudb.alterTable(TABLE_TAXI,
                "set_column_compression", COL_DROPOFF, snappyComp);
        System.out.println("Compression added 'dropoff_datetime' column: \n" +
                compDropoffRsp);
        System.out.println();

Running Queries

Filters

Filters are an easy way to filter down a larger table into more concise views.

Filter Example 1

      // Selects all payments with no corresponding payment type
      Map<String, String> fOptions = new HashMap<>();
      fOptions.put("collection_name", COLLECTION_TAXI_INFO);
      FilterResponse fResp = gpudb.filter(TABLE_PAYMENT, VIEW_NULL_PAYMENTS,
              "IS_NULL(payment_type)", fOptions);
      System.out.println("Number of null payments:  " + fResp.getCount());

Filter Example 2

/* Using query chaining, filter null payment type records with a fare
       * amount greater than 8 */
      fResp = gpudb.filter(VIEW_NULL_PAYMENTS, VIEW_NULLPAY_GT8,
              "fare_amount > 8", fOptions);
      System.out.println("Number of null payments with a fare amount greater than $8.00 (with query chaining):  " +
              fResp.getCount());

Filter Example 3

      // Filter by list where vendor ID is either NYC or YCAB
      Map<String, List<String>> columnValuesMap = new HashMap<>();
      columnValuesMap.put(COL_VENDORID, Arrays.asList("NYC", "YCAB"));
      FilterByListResponse fblResp = gpudb.filterByList(TABLE_TAXI,
              VIEW_VENDOR_NYC_YCAB, columnValuesMap, null);
      System.out.println("Number of records where vendor_id is either NYC or YCAB:  " +
              fblResp.getCount());

Filter Example 4

      // Filter by range trip with passenger count between 1 and 3
      FilterByRangeResponse fbrResp = gpudb.filterByRange(TABLE_TAXI,
              VIEW_PASS_COUNT_BTW1_3, COL_PASS_COUNT, 1,
              3, null);
      System.out.println("Number of trips with passenger_count between 1 and 3:  " +
              fbrResp.getCount());

Aggregates

Kinetica supports various aggregate and group-by queries, which group and aggregate your data to return counts and useful statistics.

Aggregate Example 1

// Aggregate count, min, mean, and max on the trip distance
      Map<String,Double> stats = gpudb.aggregateStatistics(TABLE_TAXI,
              COL_TRIP_DISTANCE, "count,min,max,mean",
              null).getStats();
      System.out.println("Statistics of values in the trip_distance column:");
      System.out.printf
              (
                      "\tCount: %.0f%n\tMin:  %4.2f%n\tMean: %4.2f%n\tMax:  %4.2f%n%n",
                      stats.get("count"), stats.get("min"), stats.get("mean"),
                      stats.get("max")
              );

Aggregate Example 2

// Find unique taxi vendor IDs
      List<Record> taxiVendors = gpudb.aggregateUnique(TABLE_TAXI,
              COL_VENDORID, 0, 1081, null).getData();
      System.out.println("Unique vendor IDs in the taxi trip table:");
      for (Record vendorIds : taxiVendors)
          System.out.println("\t* " + vendorIds.get(COL_VENDORID));
      System.out.println();

Aggregate Example 3

      // Find number of trips per vendor; use binary decoding
      List <String> colNames = Arrays.asList(COL_VENDORID, "count(vendor_id)");
      List<Record> trips = gpudb.aggregateGroupBy(TABLE_TAXI, colNames,
              0, 1081, null).getData();
      System.out.println("Trips per vendor:");
      for (Record trip : trips)
          System.out.printf("\t%-10s%2d%n", trip.get(COL_VENDORID) + ":",
                  trip.get("count(vendor_id)"));
      System.out.println();

Aggregate Example 4

      // Create a histogram for the different groups of passenger counts
      float start = 1;
      float end = 6;
      float interval = 1;
      List<Double> histoGroups = gpudb.aggregateHistogram(TABLE_TAXI,
              COL_PASS_COUNT, start, end, interval, null).getCounts();
      System.out.println("Passenger count groups by size:");
      for (int hgNum = 0; hgNum < histoGroups.size(); hgNum++)
          System.out.printf("\t%.0fs: %2.0f%n", start + hgNum * interval,
                  histoGroups.get(hgNum));

Joins

Joins allow you to link multiple tables together, along their relations, retrieving associated information from any or all of them. Tables can only be joined if they're sharded similarly or replicated.

An inner join returns only records that have matching values in both tables.

        /* Retrieve cab ride transactions and the full name of the associated
         * vendor for rides having more than three passengers between April 1st
         * & 16th, 2015 */
        List<String> innerJoinTables = new ArrayList<>();
        innerJoinTables.add(TABLE_TAXI + " as t");
        innerJoinTables.add(TABLE_VENDOR + " as v");

        List<String> innerJoinCols = new ArrayList<>();
        innerJoinCols.add(COL_TRANSACTIONID);
        innerJoinCols.add(COL_PASS_COUNT);
        innerJoinCols.add(COL_PICKUP);
        innerJoinCols.add(COL_DROPOFF);
        innerJoinCols.add(COL_VENDORNAME);

        List<String> innerJoinExpr = new ArrayList<>();
        innerJoinExpr.add("t.vendor_id = v.vendor_id");
        innerJoinExpr.add("passenger_count > 3");
        innerJoinExpr.add("pickup_datetime >= '2015-04-01'");
        innerJoinExpr.add("pickup_datetime <= '2015-04-16'");

        Map<String, String> joinOptions = new HashMap<>();
        joinOptions.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.createJoinTable(JOIN_TABLE_INNER, innerJoinTables, innerJoinCols,
                innerJoinExpr, joinOptions);
        GetRecordsByColumnResponse getInnerJoinRecs = gpudb.getRecordsByColumn(
                JOIN_TABLE_INNER, innerJoinCols, 0, 25,
                null);
        List<Record> innerJoinRecs = getInnerJoinRecs.getData();
        System.out.println("Inner join table:");
        System.out.printf("Transaction ID Pass. Count    Pickup         Dropoff        Vendor Name       %n");
        System.out.printf("============== ============== ============== ============== ==================%n");
        for (Record rec : innerJoinRecs){
            Type type = rec.getType();
            List<Type.Column> columns = type.getColumns();
            for (int i=0;i<columns.size();i++){
                System.out.printf("%-15s", rec.get(i));
            }
            System.out.println();
        }
        System.out.println();

A left join returns all of the records an inner join does, but additionally, for each record in the table on the left side of the join that has no match along the relation to a record in the table on the right side of the join, a corresponding record will be returned with "left-side" columns populated with the "left-side" record data and the "right-side" columns populated with nulls. Note the usage of left join in the given expression.

        /* Retrieve cab ride transactions and the full name of the associated
         * vendor (if available--blank if vendor name is unknown) for
         * transactions with associated payment data, sorting by increasing
         * values of transaction ID. */
        List<String> leftJoinTables = new ArrayList<>();
        leftJoinTables.add(TABLE_TAXI + " as t");
        leftJoinTables.add(TABLE_VENDOR + " as v");

        List<String> leftJoinCols = new ArrayList<>();
        leftJoinCols.add(COL_TRANSACTIONID);
        leftJoinCols.add(COL_PICKUP);
        leftJoinCols.add(COL_TRIP_DISTANCE);
        leftJoinCols.add("t." + COL_VENDORID);
        leftJoinCols.add(COL_VENDORNAME);

        List<String> leftJoinExpr = new ArrayList<>();
        leftJoinExpr.add("left join t, v on (t.vendor_id = v.vendor_id)");
        leftJoinExpr.add("payment_id <> 0");

        gpudb.createJoinTable(JOIN_TABLE_LEFT, leftJoinTables, leftJoinCols,
                leftJoinExpr, joinOptions);
        leftJoinCols.remove(3);
        leftJoinCols.add(3, COL_VENDORID);
        GetRecordsByColumnResponse getLeftJoinRecs = gpudb.getRecordsByColumn(
                JOIN_TABLE_LEFT, leftJoinCols, 0, 35, null);
        List<Record> leftJoinRecs = getLeftJoinRecs.getData();
        System.out.println("Left join table:");
        System.out.printf("Transaction ID Pickup         Trip Distance  Vendor ID      Vendor Name       %n");
        System.out.printf("============== ============== ============== ============== ==================%n");
        for (Record rec : leftJoinRecs){
            Type type = rec.getType();
            List<Type.Column> columns = type.getColumns();
            for (int i=0;i<columns.size();i++){
                System.out.printf("%-15s", rec.get(i));
            }
            System.out.println();
        }

Note

Full outer joins require both tables to be replicated. Set merges that perform deduplication of records, like UNION DISTINCT, INTERSECT, and EXCEPT also need to use replicated tables to ensure the correct results, so a replicated version of the taxi table (taxi_trip_data_replicated) is created at this point in the tutorial using /merge/records. You could also use /append/records, but the taxi_trip_data_replicated would need to be created before appending records to it

        /* Copy the records from the taxi trip table to a replicated taxi trip
         * table using /merge/records */
        List<String> srcTables = Arrays.asList(TABLE_TAXI);
        List<Map<String, String>> fieldMaps = new ArrayList<>();
        Map<String,String> colMap = new HashMap<>();
        colMap.put(COL_TRANSACTIONID, COL_TRANSACTIONID);
        colMap.put(COL_PAYMENTID, COL_PAYMENTID);
        colMap.put(COL_VENDORID, COL_VENDORID);
        colMap.put(COL_PICKUP, COL_PICKUP);
        colMap.put(COL_DROPOFF, COL_DROPOFF);
        colMap.put(COL_PASS_COUNT, COL_PASS_COUNT);
        colMap.put(COL_TRIP_DISTANCE, COL_TRIP_DISTANCE);
        colMap.put(COL_PICKUP_LONG, COL_PICKUP_LONG);
        colMap.put(COL_PICKUP_LAT, COL_PICKUP_LAT);
        colMap.put(COL_DROPOFF_LONG, COL_DROPOFF_LONG);
        colMap.put(COL_DROPOFF_LAT, COL_DROPOFF_LAT);
        fieldMaps.add(colMap);
        gpudb.mergeRecords(TABLE_TAXI_REPLICATED, srcTables, fieldMaps,
                optionsVendor);

A full outer join returns all of the records a left join does, but additionally, for each record in the table on the right side of the join that has no match along the relation to a record in the table on the left side of the join, a corresponding record will be returned with "right-side" columns populated with the "right-side" record data and the "left-side" columns populated with nulls.

        /* Retrieve the vendor IDs of known vendors with no recorded cab ride
         * transactions, as well as the vendor ID and number of transactions for
         * unknown vendors with recorded cab ride transactions */
        List<String> outerJoinTables = new ArrayList<>();
        outerJoinTables.add(TABLE_TAXI_REPLICATED + " as t");
        outerJoinTables.add(TABLE_VENDOR + " as v");

        List<String> outerJoinCols = new ArrayList<>();
        outerJoinCols.add("t.vendor_id as vendor_id");
        outerJoinCols.add("v.vendor_id as vendor_id_1");

        List<String> outerJoinExpr = new ArrayList<>();
        outerJoinExpr.add("full_outer join t,v on ((v.vendor_id = t.vendor_id))");

        gpudb.createJoinTable(JOIN_TABLE_OUTER, outerJoinTables, outerJoinCols,
                outerJoinExpr, joinOptions);

        /* Aggregate the join table results by vendor ID and count the amount of
         * records */
        colNames = Arrays.asList("vendor_id_1 as vend_table_vendors",
                "vendor_id as taxi_table_vendors", "count(*) as total_records");
        Map<String,String> grpOuterJoinOpts = new HashMap<>();
        grpOuterJoinOpts.put("expression","(is_null(vendor_id_1) OR is_null(vendor_id))");

        List<Record> vendorsList = gpudb.aggregateGroupBy(JOIN_TABLE_OUTER,
                colNames, 0, -9999, grpOuterJoinOpts).getData();
        System.out.println();
        System.out.println("Outer join table:");
        System.out.println("Vend. Table Vendors Taxi Table Vendors Total Records");
        System.out.println("=================== ================== =============");
        for (Record vendor : vendorsList)
            System.out.printf("%-20s%-19s%-19s%n",
                    vendor.get("vend_table_vendors"),
                    vendor.get("taxi_table_vendors"),
                    vendor.get("total_records"));

Projections

Projections essentially create a copy of a specified table, but you can also customize which columns are copied over as well as specify expressions to reduce the amount of data copied over. Projections expire by default, but you can persist them.

Projection Example 1

        List<String> creditProjCols = new ArrayList<>();
        creditProjCols.add(COL_PAYMENTID);
        creditProjCols.add(COL_PAYTYPE);
        creditProjCols.add(COL_CRED_TYPE);
        creditProjCols.add(COL_PAY_TIME);
        creditProjCols.add(COL_FARE);
        creditProjCols.add(COL_SURCHARGE);
        creditProjCols.add(COL_MTA_TAX);
        creditProjCols.add(COL_TIP_AMT);
        creditProjCols.add(COL_TOLLS_AMT);
        creditProjCols.add(COL_TOTAL_AMT);

        Map<String, String> creditProjOptions = new HashMap<>();
        creditProjOptions.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.createProjection(TABLE_PAYMENT, PROJECTION_CREDIT_PAYMENTS,
                creditProjCols, creditProjOptions);

        GetRecordsByColumnResponse getCreditProjRecs = gpudb.getRecordsByColumn(
                PROJECTION_CREDIT_PAYMENTS, creditProjCols, 0, 50,
                null);
        List<Record> credPayRecs = getCreditProjRecs.getData();
        System.out.println("Projection of only credit payment types:");
        System.out.printf("Payment ID       Payment Type     Credit Type      Timestamp        Fare             Surcharge        MTA Tax          Tip              Tolls            Total%n");
        System.out.printf("================ ================ ================ ================ ================ ================ ================ ================ ================ ================%n");
        for (Record rec : credPayRecs){
            Type type = rec.getType();
            List<Type.Column> columns = type.getColumns();
            for (int i=0;i<columns.size();i++){
                System.out.printf("%-17s", rec.get(i));
            }
            System.out.println();
        }

Projection Example 2

         * started during lunch hours */
        List<String> lunchProjCols = new ArrayList<>();
        lunchProjCols.add("hour(pickup_datetime) as hour_of_day");
        lunchProjCols.add(COL_VENDORID);
        lunchProjCols.add(COL_PASS_COUNT);
        lunchProjCols.add(COL_TRIP_DISTANCE);

        Map<String,String> lunchProjOpts = new HashMap<>();
        lunchProjOpts.put("expression",
                "(hour(pickup_datetime) >= 11) AND (hour(pickup_datetime) <= 14)");
        lunchProjOpts.put("persist", "true");
        lunchProjOpts.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.createProjection(TABLE_TAXI, PROJECTION_LUNCH_RIDES,
                lunchProjCols, lunchProjOpts);

        lunchProjCols.remove(0);
        lunchProjCols.add(3, "hour_of_day");
        GetRecordsByColumnResponse getLunchProjRecs = gpudb.getRecordsByColumn(
                PROJECTION_LUNCH_RIDES, lunchProjCols, 0, 50,
                null);
        List<Record> lunchRecs = getLunchProjRecs.getData();
        System.out.println();
        System.out.println("Projection of trips taken during lunch hours:");
        System.out.printf("Vendor ID        Passenger Count  Trip Distance    Hour of Day%n");
        System.out.printf("================ ================ ================ ===========%n");
        for (Record rec : lunchRecs){
            Type type = rec.getType();
            List<Type.Column> columns = type.getColumns();
            for (int i=0;i<columns.size();i++){
                System.out.printf("%-17s", rec.get(i));
            }
            System.out.println();
        }

Union, Intersect, and Except

UNION can be used to combine homogeneous data sets into one larger data set. UNION & UNION DISTINCT will both combine data sets but only retain the records that are unique across the chosen columns, removing all duplicates. UNION ALL will combine data sets, retaining all records from the source data sets.

         * average, and longest trips for all trips in each of the two time
         * periods--from April 1st through the 15th, 2015 and from April 16th
         * through the 23rd, 2015--and return those two sets of statistics in a
         * single result set. */

        List<String> grpByUnionTable1Cols = new ArrayList<>();
        grpByUnionTable1Cols.add("avg(passenger_count) as avg_pass_count");
        grpByUnionTable1Cols.add("avg(trip_distance) as avg_trip_dist");
        grpByUnionTable1Cols.add("min(trip_distance) as min_trip_dist");
        grpByUnionTable1Cols.add("max(trip_distance) as max_trip_dist");

        Map<String,String> unionAllTable1Opts = new HashMap<>();
        unionAllTable1Opts.put("expression",
                "((pickup_datetime >= '2015-04-01') AND (pickup_datetime <= '2015-04-15 23:59:59.999'))");
        unionAllTable1Opts.put("result_table", AGG_GRP_UNION_TABLE1);
        unionAllTable1Opts.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.aggregateGroupBy(TABLE_TAXI, grpByUnionTable1Cols, 0,
                -9999, unionAllTable1Opts);

        List<String> grpByUnionTable2Cols = new ArrayList<>();
        grpByUnionTable2Cols.add("avg(passenger_count) as avg_pass_count");
        grpByUnionTable2Cols.add("avg(trip_distance) as avg_trip_dist");
        grpByUnionTable2Cols.add("min(trip_distance) as min_trip_dist");
        grpByUnionTable2Cols.add("max(trip_distance) as max_trip_dist");

        Map<String,String> unionAllTable2Opts = new HashMap<>();
        unionAllTable2Opts.put("expression",
                "((pickup_datetime >= '2015-04-16') AND (pickup_datetime <= '2015-04-23 23:59:59.999'))");
        unionAllTable2Opts.put("result_table", AGG_GRP_UNION_TABLE2);
        unionAllTable2Opts.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.aggregateGroupBy(TABLE_TAXI, grpByUnionTable2Cols, 0,
                -9999, unionAllTable2Opts);

        List<String> unionAllTables = new ArrayList<>();
        unionAllTables.add(AGG_GRP_UNION_TABLE1);
        unionAllTables.add(AGG_GRP_UNION_TABLE2);

        List<List<String>> unionAllCols = new ArrayList<>();
        List<String> unionAllColsTable1 = new ArrayList<>();
        unionAllColsTable1.add("'2015-04-01 - 2015-04-15'");
        unionAllColsTable1.add("avg_pass_count");
        unionAllColsTable1.add("avg_trip_dist");
        unionAllColsTable1.add("min_trip_dist");
        unionAllColsTable1.add("max_trip_dist");
        List<String> unionAllColsTable2 = new ArrayList<>();
        unionAllColsTable2.add("'2015-04-16 - 2015-04-23'");
        unionAllColsTable2.add("avg_pass_count");
        unionAllColsTable2.add("avg_trip_dist");
        unionAllColsTable2.add("min_trip_dist");
        unionAllColsTable2.add("max_trip_dist");
        unionAllCols.add(unionAllColsTable1);
        unionAllCols.add(unionAllColsTable2);

        List<String> unionedTableCols = new ArrayList<>();
        unionedTableCols.add("pickup_window_range");
        unionedTableCols.add("avg_pass_count");
        unionedTableCols.add("avg_trip");
        unionedTableCols.add("min_trip");
        unionedTableCols.add("max_trip");

        Map<String, String> unionAllOpts = new HashMap<>();
        unionAllOpts.put("mode","union_all");
        unionAllOpts.put("collection_name",COLLECTION_TAXI_INFO);

        gpudb.createUnion(UNION_ALL_TABLE, unionAllTables, unionAllCols,
                unionedTableCols, unionAllOpts);
        GetRecordsByColumnResponse getUnionAllRecs = gpudb.getRecordsByColumn(
                UNION_ALL_TABLE, unionedTableCols, 0, 25,
                null);
        List<Record> unionAllRecs = getUnionAllRecs.getData();
        System.out.println("Union All table:");
        System.out.printf("Pickup Window Range     Avg. Pass. Count        Avg. Trip               Min. Trip               Max Trip           %n");
        System.out.printf("======================= ======================= ======================= ======================= ===================%n");
        for (Record rec : unionAllRecs){
            Type type = rec.getType();
            List<Type.Column> columns = type.getColumns();
            for (int i=0;i<columns.size();i++){
                System.out.printf("%-24s", rec.get(i));
            }
            System.out.println();
        }
        System.out.println();

INTERSECT will combine data sets but only include the records found in both data sets, removing duplicate result records.

         * drop-off points */

        List<String> intProjCols = new ArrayList<>();
        intProjCols.add(COL_PICKUP_LAT);
        intProjCols.add(COL_PICKUP_LONG);

        Map<String,String> intProjOpts = new HashMap<>();
        intProjOpts.put("expression",
                "((pickup_latitude <> 0) AND (pickup_longitude <> 0))");
        intProjOpts.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.createProjection(TABLE_TAXI_REPLICATED, PROJECTION_INTERSECT_SRC,
                intProjCols, intProjOpts);

        List<String> intersectTables = new ArrayList<>();
        intersectTables.add(PROJECTION_INTERSECT_SRC);
        intersectTables.add(TABLE_TAXI_REPLICATED);

        List<List<String>> intersectCols = new ArrayList<>();
        List<String> intersectColsTable1 = new ArrayList<>();
        intersectColsTable1.add(COL_PICKUP_LAT);
        intersectColsTable1.add(COL_PICKUP_LONG);
        List<String> intersectColsTable2 = new ArrayList<>();
        intersectColsTable2.add(COL_DROPOFF_LAT);
        intersectColsTable2.add(COL_DROPOFF_LONG);
        intersectCols.add(intersectColsTable1);
        intersectCols.add(intersectColsTable2);

        List<String> intersectedTableCols = new ArrayList<>();
        intersectedTableCols.add(COL_PICKUP_LAT);
        intersectedTableCols.add(COL_PICKUP_LONG);

        Map<String, String> intersectOpts = new HashMap<>();
        intersectOpts.put("mode","intersect");
        intersectOpts.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.createUnion(UNION_INTERSECT_TABLE, intersectTables, intersectCols,
                intersectedTableCols, intersectOpts);
        GetRecordsByColumnResponse getIntersectRecs = gpudb.getRecordsByColumn(
                UNION_INTERSECT_TABLE, intersectedTableCols, 0, 25,
                null);
        List<Record> intersectRecs = getIntersectRecs.getData();
        System.out.println("Intersect table:");
        System.out.printf("Pickup Latitude Pickup Longitude%n");
        System.out.printf("=============== =================%n");
        for (Record rec : intersectRecs){
            Type type = rec.getType();
            List<Type.Column> columns = type.getColumns();
            for (int i=0;i<columns.size();i++){
                System.out.printf("%-16s", rec.get(i));
            }
            System.out.println();
        }
        System.out.println();

EXCEPT will return records that appear in the first data set but not the second data set. Note that the data sets on each side of the EXCEPT will have duplicates removed first, and then the set subtraction will be processed.

         * the unique list of IDs of vendors who provided cab rides between
         * midnight and noon, and remove from that list the IDs of any vendors
         * who provided cab rides between noon and midnight */

        List<String> exSrcTableCols = new ArrayList<>();
        exSrcTableCols.add(COL_VENDORID);

        Map<String,String> exSrcTable1Opts = new HashMap<>();
        exSrcTable1Opts.put("expression",
                "((HOUR(pickup_datetime) >= 0) AND (HOUR(pickup_datetime) <= 11))");
        exSrcTable1Opts.put("collection_name", COLLECTION_TAXI_INFO);

        Map<String,String> exSrcTable2Opts = new HashMap<>();
        exSrcTable2Opts.put("expression",
                "((HOUR(pickup_datetime) >= 12) AND (HOUR(pickup_datetime) <= 23))");
        exSrcTable2Opts.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.createProjection(TABLE_TAXI_REPLICATED, PROJECTION_EXCEPT_SRC1,
                exSrcTableCols, exSrcTable1Opts);
        gpudb.createProjection(TABLE_TAXI_REPLICATED, PROJECTION_EXCEPT_SRC2,
                exSrcTableCols, exSrcTable2Opts);

        List<String> exceptTables = new ArrayList<>();
        exceptTables.add(PROJECTION_EXCEPT_SRC1);
        exceptTables.add(PROJECTION_EXCEPT_SRC2);

        List<List<String>> exceptCols = new ArrayList<>();
        List<String> exceptColsTable1 = new ArrayList<>();
        exceptColsTable1.add(COL_VENDORID);
        List<String> exceptColsTable2 = new ArrayList<>();
        exceptColsTable2.add(COL_VENDORID);
        exceptCols.add(exceptColsTable1);
        exceptCols.add(exceptColsTable2);

        List<String> exceptedTableCols = new ArrayList<>();
        exceptedTableCols.add(COL_VENDORID);

        Map<String, String> exceptOpts = new HashMap<>();
        exceptOpts.put("mode","except");
        exceptOpts.put("collection_name", COLLECTION_TAXI_INFO);

        gpudb.createUnion(UNION_EXCEPT_TABLE, exceptTables, exceptCols,
                exceptedTableCols, exceptOpts);
        GetRecordsByColumnResponse getExceptRecs = gpudb.getRecordsByColumn(
                UNION_EXCEPT_TABLE, exceptedTableCols, 0, 25,
                null);
        List<Record> exceptRecs = getExceptRecs.getData();
        System.out.println("Except table:");
        System.out.printf("Vendor ID%n");
        System.out.printf("=========%n");
        for (Record rec : exceptRecs){
            Type type = rec.getType();
            List<Type.Column> columns = type.getColumns();
            for (int i=0;i<columns.size();i++){
                System.out.printf("%-16s", rec.get(i));
            }
            System.out.println();
        }

Deleting Data

Using filter expressions to delete large swaths of data can expedite the process of cleaning up tables.

         * be deleted; delete the records, then confirm the deletion by
         * refiltering */
        String deleteExpression = "payment_type = 'Cash'";

        fResp = gpudb.filter(TABLE_PAYMENT, null, deleteExpression,
                null);
        System.out.println("Number of records that meet deletion criteria before deleting:  " +
                fResp.getCount());

        gpudb.deleteRecords(TABLE_PAYMENT, Arrays.asList(deleteExpression),
                null);

        fResp = gpudb.filter(TABLE_PAYMENT, null, deleteExpression,
                null);
        System.out.println("Number of records that meet deletion criteria after deleting:  " +
                fResp.getCount());

Clearing Tables and Collections

Use clearTable() to remove both tables and collections.

        gpudb.clearTable(AGG_GRP_UNION_TABLE2, null, null);
        gpudb.clearTable(COLLECTION_TAXI_INFO, null, null);
        gpudb.clearTable(VIEW_PASS_COUNT_BTW1_3, null,null);
        gpudb.clearTable(VIEW_VENDOR_NYC_YCAB, null,null);

Complete Sample

Included below is a complete example containing all the above requests, the data file, and output.