Version:

Geospatial Examples

The following sections demonstrate some of the geospatial capabilities of Kinetica. Details about the geospatial functions can be found here; a list of potential geospatial operations is here. All geospatial functions are compatible in both normal and SQL expressions.

Prerequisites

SQL

The ODBC connector needs to be enabled and the isql command needs to be working properly, e.g.,

isql <database-name> <username> <password> -v -m25 -3

Should return:

+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL>

For tips on using iSQL, see SQL Developer Manual.

APIs

Before running any sample files:

  1. Install the API:
  2. Import the following CSV files using GAdmin (see Import for more information)

Examples via SQL

Creating a Table and Inserting WKT Data

A collection is created to house the example tables created below.

CREATE SCHEMA geospatial_example;

Per Types, WKT data can be either in byte or string. In the below example, src1_poi and src2_poi are string WKT column type, so the data is enclosed in quotes.

CREATE OR REPLACE TABLE geospatial_example.points_of_interest
(
  src1_poi WKT NOT NULL,
  src2_poi WKT NOT NULL
);

A few example tables are also created.

CREATE OR REPLACE REPLICATED TABLE geospatial_example.nyc_neighborhood
(
  gid INT NOT NULL,
  geom WKT NOT NULL,
  CTLabel VARCHAR(16) NOT NULL,
  BoroCode VARCHAR(16) NOT NULL,
  BoroName VARCHAR(16) NOT NULL,
  CT2010 VARCHAR(16) NOT NULL,
  BoroCT2010 VARCHAR(16) NOT NULL,
  CDEligibil VARCHAR(16) NOT NULL,
  NTACode VARCHAR(16) NOT NULL,
  NTAName VARCHAR(64) NOT NULL,
  PUMA VARCHAR(16) NOT NULL,
  Shape_Leng DOUBLE NOT NULL,
  Shape_Area DOUBLE NOT NULL
);

CREATE OR REPLACE REPLICATED TABLE geospatial_example.taxi_trip_data
(
  transaction_id LONG NOT NULL,
  payment_id LONG NOT NULL,
  vendor_id VARCHAR(4) NOT NULL,
  pickup_datetime TYPE_TIMESTAMP,
  dropoff_datetime TYPE_TIMESTAMP,
  passenger_count TINYINT,
  trip_distance REAL,
  pickup_longitude REAL,
  pickup_latitude REAL,
  dropoff_longitude REAL,
  dropoff_latitude REAL
);

CREATE OR REPLACE TABLE geospatial_example.taxi_zone
(
  geom WKT NOT NULL,
  objectid SMALLINT NOT NULL,
  shape_length DOUBLE NOT NULL,
  shape_area DOUBLE NOT NULL,
  zone VARCHAR(64) NOT NULL,
  locationid INT NOT NULL,
  borough VARCHAR(16) NOT NULL,
  PRIMARY KEY (objectid)
);

Data is inserted into the WKT table, and then the insertion is verified.

INSERT INTO points_of_interest
VALUES
  ('POINT(-73.8455003 40.7577272)','POINT(-73.84550029 40.75772724)'),
  ('POLYGON((-73.9258506 40.8287493, -73.9258292 40.828723, -73.9257795 40.8287387, -73.925801 40.8287635,-73.9258506 40.8287493))','POLYGON((-73.9257795 40.8287387, -73.925801 40.8287635, -73.9258506 40.8287493, -73.9258506 40.8287493,-73.9258292 40.828723))'),
  ('LINESTRING(-73.9942208 40.7504289, -73.993856 40.7500753, -73.9932525 40.7499941)','LINESTRING(-70.9942208 42.7504289, -72.993856 43.7500753, -73.9932525 44.7499941)'),
  ('LINESTRING(-73.9760944 40.6833433, -73.9764404 40.6830626, -73.9763761 40.6828897)','LINESTRING(-70.9761 41.6834, -72.9765 39.6831, -76.9764 38.6829)');

/* Verify the records were inserted successfully */

SELECT *
FROM points_of_interest;

CSV files are used to populate the other example tables. The files have a placeholder file path. This path should be updated to point to a valid path on the ODBC Server host where the file will be located:

INSERT INTO nyc_neighborhood
(
  gid,
  geom,
  CTLabel,
  BoroCode,
  BoroName,
  CT2010,
  BoroCT2010,
  CDEligibil,
  NTACode,
  NTAName,
  PUMA,
  Shape_Leng,
  Shape_Area
)
SELECT *
FROM FILE."/tmp/data/nyc_neighborhood.csv";

INSERT INTO taxi_trip_data
(
  transaction_id,
  payment_id,
  vendor_id,
  pickup_datetime,
  dropoff_datetime,
  passenger_count,
  trip_distance,
  pickup_longitude,
  pickup_latitude,
  dropoff_longitude,
  dropoff_latitude
)
SELECT *
FROM FILE."/tmp/data/taxi_trip_data.csv";

INSERT INTO taxi_zone
(
  geom,
  objectid,
  shape_length,
  shape_area,
  zone,
  locationid,
  borough
)
SELECT *
FROM FILE."/tmp/data/taxi_zone.csv";

Scalar Functions

Scalar Example 1

/* Using GEODIST(), calculate the distance between pickup and dropoff points to
 * see where the calculated distance is less than the recorded trip distance */

SELECT *
FROM taxi_trip_data
WHERE GEODIST(pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude) < trip_distance;

Scalar Example 2

/* Calculate the area of each taxi zone and find the five largest zones */

SELECT TOP 5
  objectid,
  zone,
  ST_AREA(geom) as area
FROM taxi_zone
ORDER BY area DESC;

Geospatial Aggregations

/* Dissolve the boundaries between Queens neighborhood tabulation areas to
 * create a single borough boundary */

SELECT
  ST_DISSOLVE(geom) as neighborhoods
FROM nyc_neighborhood
WHERE BoroName = 'Queens';

Below is a picture of the dissolved neighborhood boundaries:

../_images/queens_neighborhoods.png

Geospatial Joins

Join Example 1

/* Locate which borough and neighborhood taxi pickup points occurred and
 * and display a subset of the points */

SELECT TOP 25
  vendor_id,
  passenger_count,
  pickup_longitude,
  pickup_latitude,
  BoroName,
  NTAName
FROM taxi_trip_data, nyc_neighborhood
WHERE STXY_CONTAINS(geom, pickup_longitude, pickup_latitude) = 1;

Join Example 2

/* Find the top 10 neighborhood pickup locations */

SELECT TOP 10
  ntaname AS "Pickup_NTA",
  COUNT(*) AS "Total_Pickups"
FROM
  taxi_trip_data
  JOIN nyc_neighborhood
    ON STXY_Intersects(pickup_longitude, pickup_latitude, geom) = 1
GROUP BY
  ntaname
ORDER BY
  "Total_Pickups" DESC;

Geospatial Equality

There are three ways of checking if two geometries are equal, which each have different thresholds:

  • ST_EQUALS()
  • ST_ALMOSTEQUALS()
  • ST_EQUALSEXACT()

For details on these functions see here.

The data inserted in the first section includes two POINTs that vary slightly in precision, two POLYGONs that vary in the ordering of their boundary points, and two pairs of LINESTRINGs that vary in both precision and scale. Each of these geometry types is compared to see if each given pair is spatially equal as defined by the equality functions:

SELECT
  ST_ALMOSTEQUALS(src1_poi, src2_poi, 7) as almostequal,
  ST_EQUALS(src1_poi, src2_poi) as equals,
  ST_EQUALSEXACT(src1_poi, src2_poi, 4) as equalsexact
FROM points_of_interest;

Examples via the Python API

Creating a Table and Inserting WKT Data

A collection is created to house the example tables created below. Per Types, WKT data can be either in byte or string. In the below example, src1_poi and src2_poi are string WKT column type, so the data is enclosed in quotes. A table is created for the WKT data and the records are inserted into the table.

    # Create a collection to house the main tables
    response = h_db.create_table(table_name=geo_collection, type_id="", options={"is_collection": "true"})
    print "Collection created: {}".format(response['status_info']['status'])

    # Create a type using the GPUdbRecordType object
    columns = [gpudb.GPUdbRecordColumn("src1_poi", gpudb.GPUdbRecordColumn._ColumnType.STRING,
                                       [gpudb.GPUdbColumnProperty.WKT]),
               gpudb.GPUdbRecordColumn("src2_poi", gpudb.GPUdbRecordColumn._ColumnType.STRING,
                                       [gpudb.GPUdbColumnProperty.WKT])]
    poi_record_type = gpudb.GPUdbRecordType(columns, label="poi_record_type")
    poi_record_type.create_type(h_db)
    poi_type_id = poi_record_type.type_id
    # Create a table from the poi_record_type
    response = h_db.create_table(table_name=poi_table, type_id=poi_type_id, options={"collection_name": geo_collection})
    print "Table created: {}".format(response['status_info']['status'])

    # Prepare the data to be inserted
    pois = [
        "POINT(-73.8455003 40.7577272)",
        "POLYGON((-73.9258506 40.8287493, -73.9258292 40.828723, -73.9257795 40.8287387, -73.925801 40.8287635, "
        "-73.9258506 40.8287493))",
        "LINESTRING(-73.9942208 40.7504289, -73.993856 40.7500753, -73.9932525 40.7499941)",
        "LINESTRING(-73.9760944 40.6833433, -73.9764404 40.6830626, -73.9763761 40.6828897)"
    ]
    pois_compare = [
        "POINT(-73.84550029 40.75772724)",
        "POLYGON((-73.9257795 40.8287387, -73.925801 40.8287635, -73.9258506 40.8287493, -73.9258506 40.8287493, "
        "-73.9258292 40.828723))",
        "LINESTRING(-70.9942208 42.7504289, -72.993856 43.7500753, -73.9932525 44.7499941)",
        "LINESTRING(-70.9761 41.6834, -72.9765 39.6831, -76.9764 38.6829)"
    ]
    # Insert the data into the table using a list
    encoded_obj_list = []
    for val in range(0, 4):
        poi1_val = pois[val]
        poi2_val = pois_compare[val]
        record = gpudb.GPUdbRecord(poi_record_type, [poi1_val, poi2_val])
        encoded_obj_list.append(record.binary_data)
    response = h_db.insert_records(table_name=poi_table, data=encoded_obj_list, list_encoding="binary")
    print "Number of records inserted:  {}".format(response['count_inserted'])

Scalar Functions

Scalar Example 1

    """ Using GEODIST(), calculate the distance between pickup and dropoff points to see where the calculated distance
        is less than the recorded trip distance """
    response = h_db.filter(table_name=taxi_trip_table, view_name=dist_check_view,
                           expression="GEODIST(pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude) "
                                      "< trip_distance",
                           options={"collection_name": geo_collection}
                           )
    print "Number of records where the calculated geographic distance is less than the recorded distance: {}".format(
        response["count"])
    print

Scalar Example 2

    # Calculate the area of each taxi zone and copy the selected data to a projection
    h_db.create_projection(table_name=taxi_zones_table, projection_name=zones_by_area_proj, column_names=[
        "objectid", "zone", "ST_AREA(geom) as area"], options={"order_by": "area desc", "limit": "5",
                                                               "collection_name": geo_collection}
    )
    # Display the top 5 largest taxi zones
    large_zones = h_db.get_records(table_name=zones_by_area_proj, offset=0, limit=5, encoding="json")['records_json']
    print "Top 5 largest taxi zones:"
    print "{:<9s} {:<28s}".format("objectid", "zone")
    print "{:=<9s} {:=<28s}".format("", "", "")
    for zone in large_zones:
        print "{objectid:<9d} {zone:<28s}".format(**json.loads(zone))

Geospatial Aggregations

    # Dissolve the boundaries between Queens neighborhood tabulation areas to create a single borough boundary
    response = h_db.aggregate_group_by(table_name=nyc_nta_table, column_names=["st_dissolve(geom) as neighborhoods"],
                                       offset=0, limit=1, encoding="json",
                                       options={"expression": "BoroName = 'Queens'", "result_table": queens_agb,
                                                "collection_name": geo_collection}
                                       )
    print "Queens neighborhood boundaries dissolved:  {}".format(response['status_info']['status'])

Below is a picture of the dissolved neighborhood boundaries:

../_images/queens_neighborhoods.png

Geospatial Joins

Join Example 1

    # Locate which borough and neighborhood taxi pickup points occurred
    h_db.create_join_table(join_table_name=pickup_point_location,
                           table_names=[taxi_trip_table + " as t", nyc_nta_table + " as n"],
                           column_names=["t.vendor_id", "t.passenger_count", "t.pickup_longitude", "t.pickup_latitude",
                                         "n.BoroName", "n.NTAName"],
                           expressions=["STXY_CONTAINS(n.geom, t.pickup_longitude, t.pickup_latitude)"],
                           options={"collection_name": geo_collection}
                           )
    # Display a subset of the pickup point locations
    pickup_points = h_db.get_records(table_name=pickup_point_location, offset=0, limit=25,
                                     encoding="json")['records_json']
    print "Subset of pickup point neighborhood locations:"
    print "{:<10s} {:<12s} {:<13s} {:<12s} {:<11s} {:<43s}".format("Vendor ID", "Pass. Count", "Pickup Long.",
                                                                   "Pickup Lat.", "Borough", "NTA")
    print "{:=<10s} {:=<12s} {:=<13s} {:=<12s} {:=<11s} {:=<43s}".format("", "", "", "", "", "")
    for point in pickup_points:
        print "{vendor_id:<10s} {passenger_count:<12d} {pickup_longitude:<13f} {pickup_latitude:<12f} {BoroName:<11s} "\
              "{NTAName:<43s}".format(**json.loads(point))
    print

Join Example 2

    # Find the top neighborhood pickup locations
    h_db.create_join_table(join_table_name=top_pickup_locations,
                           table_names=[taxi_trip_table + " as t", nyc_nta_table + " as n"],
                           column_names=["n.NTAName as NTAName"],
                           expressions=["((STXY_INTERSECTS(t.pickup_longitude, "
                                        "t.pickup_latitude, n.geom) = 1))"],
                           options={"collection_name": geo_collection}
                           )
    # Aggregate the neighborhoods and the count of records per neighborhood
    h_db.aggregate_group_by(table_name=top_pickup_locations,
                            column_names=["NTAName as Pickup_NTA", "count(*) as Total_Pickups"], offset=0, limit=-9999,
                            options={"result_table": pickup_agb, "collection_name": geo_collection}
                            )
    # Display the top 10 neighborhood pickup locations
    response = h_db.get_records_by_column(table_name=pickup_agb, column_names=["Pickup_NTA", "Total_Pickups"], offset=0,
                                          limit=10, encoding="json", options={"order_by": "Total_Pickups desc"}
                                          )
    data = h_db.parse_dynamic_response(response)['response']
    print "Top 10 neighborhood pickup locations:"
    print "{:<43s} {:<14s}".format("Pickup NTA", "Total Pickups")
    print "{:=<43s} {:=<14s}".format("", "")
    for pickupLocs in zip(data["Pickup_NTA"], data["Total_Pickups"]):
        print "{:<43s} {:<14d}".format(*pickupLocs)

Geospatial Equality

There are three ways of checking if two geometries are equal, which each have different thresholds:

  • ST_EQUALS()
  • ST_ALMOSTEQUALS()
  • ST_EQUALSEXACT()

For details on these functions see here.

The data inserted in the first section includes two POINTs that vary slightly in precision, two POLYGONs that vary in the ordering of their boundary points, and two pairs of LINESTRINGs that vary in both precision and scale. Each of these geometry types is compared to see if each given pair is spatially equal as defined by the equality functions:

    # Calculate types of geospatial equality for differing sources of geometry
    h_db.create_projection(
        table_name=poi_table,
        projection_name=poi_comp_table,
        column_names=[
            "ST_ALMOSTEQUALS(src1_poi, src2_poi, 7) as almostequal",
            "ST_EQUALS(src1_poi, src2_poi) as equals",
            "ST_EQUALSEXACT(src1_poi, src2_poi, 4) as equalsexact"
        ],
        options={"collection_name": geo_collection}
    )
    equality_comps = h_db.get_records(table_name=poi_comp_table, offset=0, limit=10, encoding="json")['records_json']
    print "Comparing ST_ALMOSTEQUALS(), ST_EQUALS(), and ST_EQUALSEXACT():"
    print "{:<12s} {:<7s} {:<12s}".format("almostequal", "equals", "equalsexact")
    print "{:=<12s} {:=<7s} {:=<12s}".format("", "", "")
    for comp in equality_comps:
        print "{almostequal:<12d} {equals:<7d} {equalsexact:<12d}".format(**json.loads(comp))

Complete Samples

Included below are complete examples containing all the above requests and the output.