> ## Documentation Index > Fetch the complete documentation index at: https://docs.kinetica.com/llms.txt > Use this file to discover all available pages before exploring further. # Using WKT Data and Geospatial Functions > Learn how to ingest and use WKT geometry with geospatial functions The following sections demonstrate how to ingest and work with WKT data as well as how to use geospatial functions via SQL and the *Kinetica Python* API. Details about the geospatial functions can be found under [Geospatial/Geometry Functions](/content/location_intelligence/geo_functions). All geospatial functions are compatible in both native API and SQL. ## Prerequisites * [KiSQL](/content/tools/kisql) *(if running the SQL examples)* * [Python API](/content/guides/python_guide#python-api-installation) *(if running the Python examples)* * Sample CSV data files, which should be downloaded & copied to the host running the SQL client. The data files can be downloaded here: * [NYC Neighborhood](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/data/nyc_neighborhood.csv) * [Taxi Trip Data](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/data/taxi_trip_data.csv) * [Taxi Zone](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/data/taxi_zone.csv) ### Loading Sample Data Data loading will be done via SQL. First log in to the database: ```bash theme={null} kisql --url --user --pwd ``` Create a [schema](/content/sql/ddl#sql-create-schema) to house the example tables. ```sql Create Geospatial Example Schema theme={null} CREATE SCHEMA example_geospatial ``` Create tables for geospatial & temporal data within the *schema*. ```sql Create NYC Neighborhood Shape Table theme={null} CREATE OR REPLACE REPLICATED TABLE example_geospatial.nyc_neighborhood ( gid INT NOT NULL, geom WKT NOT NULL, CTLabel VARCHAR(16) NOT NULL, BoroCode VARCHAR(16) NOT NULL, BoroName VARCHAR(16) NOT NULL, CT2010 VARCHAR(16) NOT NULL, BoroCT2010 VARCHAR(16) NOT NULL, CDEligibil VARCHAR(16) NOT NULL, NTACode VARCHAR(16) NOT NULL, NTAName VARCHAR(64) NOT NULL, PUMA VARCHAR(16) NOT NULL, Shape_Leng DOUBLE NOT NULL, Shape_Area DOUBLE NOT NULL ) ``` ```sql Create NYC Taxi Trip Table theme={null} CREATE OR REPLACE REPLICATED TABLE example_geospatial.taxi_trip_data ( transaction_id LONG NOT NULL, payment_id LONG NOT NULL, vendor_id VARCHAR(4) NOT NULL, pickup_datetime TYPE_TIMESTAMP, dropoff_datetime TYPE_TIMESTAMP, passenger_count TINYINT, trip_distance REAL, pickup_longitude REAL, pickup_latitude REAL, dropoff_longitude REAL, dropoff_latitude REAL ) ``` ```sql Create NYC Taxi Zone Shape Table theme={null} CREATE OR REPLACE TABLE example_geospatial.taxi_zone ( geom WKT NOT NULL, objectid SMALLINT NOT NULL, shape_length DOUBLE NOT NULL, shape_area DOUBLE NOT NULL, zone VARCHAR(64) NOT NULL, locationid INT NOT NULL, borough VARCHAR(16) NOT NULL, PRIMARY KEY (objectid) ) ``` Populate the example tables with source data. The files are staged into [KiFS](/content/sql/kifs) from the directory that is local to where *KiSQL* is running. From there, each is loaded into its respective table. Modify these statements as necessary to reflect the location of the data files on the client machine. ```sql Create KiFS Directory for Geospatial Data Files theme={null} CREATE DIRECTORY 'geo_data' ``` ```sql Stage NYC Neighborhood Data theme={null} UPLOAD FILE '../data/nyc_neighborhood.csv' INTO 'geo_data' ``` ```sql Load NYC Neighborhood Data theme={null} LOAD INTO example_geospatial.nyc_neighborhood FROM FILE PATHS 'kifs://geo_data/nyc_neighborhood.csv' ``` ```sql Stage NYC Taxi Trip Data theme={null} UPLOAD FILE '../data/taxi_trip_data.csv' INTO 'geo_data' ``` ```sql Load NYC Taxi Trip Data theme={null} LOAD INTO example_geospatial.taxi_trip_data FROM FILE PATHS 'kifs://geo_data/taxi_trip_data.csv' ``` ```sql Stage NYC Taxi Zone Data theme={null} UPLOAD FILE '../data/taxi_zone.csv' INTO 'geo_data' ``` ```sql Load NYC Taxi Zone Data theme={null} LOAD INTO example_geospatial.taxi_zone FROM FILE PATHS 'kifs://geo_data/taxi_zone.csv' ``` ## Examples via SQL ### Creating a Table and Inserting WKT Data First, a points-of-interest table is created and loaded with sample WKT data. ```sql Create POI Table theme={null} CREATE OR REPLACE TABLE example_geospatial.points_of_interest ( src1_poi WKT NOT NULL, src2_poi WKT NOT NULL ) ``` Then, data is inserted into the WKT table. ```sql Populate POI Table theme={null} INSERT INTO example_geospatial.points_of_interest VALUES ('POINT(-73.8455003 40.7577272)','POINT(-73.84550029 40.75772724)'), ('POLYGON((-73.9258506 40.8287493, -73.9258292 40.828723, -73.9257795 40.8287387, -73.925801 40.8287635,-73.9258506 40.8287493))','POLYGON((-73.9257795 40.8287387, -73.925801 40.8287635, -73.9258506 40.8287493, -73.9258506 40.8287493,-73.9258292 40.828723))'), ('LINESTRING(-73.9942208 40.7504289, -73.993856 40.7500753, -73.9932525 40.7499941)','LINESTRING(-70.9942208 42.7504289, -72.993856 43.7500753, -73.9932525 44.7499941)'), ('LINESTRING(-73.9760944 40.6833433, -73.9764404 40.6830626, -73.9763761 40.6828897)','LINESTRING(-70.9761 41.6834, -72.9765 39.6831, -76.9764 38.6829)') ``` ### Scalar Functions For the complete list of scalar functions, see both [Scalar Functions](/content/sql/query#sql-geo-functions-scalar) and [Enhanced Performance Scalar Functions](/content/sql/query#sql-geo-functions-scalar-ep). ```sql GEODIST Example theme={null} -- Using GEODIST(), calculate the distance between pickup and dropoff points to -- see where the calculated distance is less than the recorded trip distance SELECT * FROM example_geospatial.taxi_trip_data WHERE GEODIST(pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude) < trip_distance ``` ```sql ST_AREA Example theme={null} -- Calculate the area of each taxi zone and find the five largest zones SELECT TOP 5 objectid AS "Zone_ID", zone AS "Zone_Name", DECIMAL(ST_AREA(geom, 1) / 1000000) AS "Zone_Area_in_Square_Kilometers" FROM example_geospatial.taxi_zone ORDER BY 3 DESC ``` ### Aggregation Functions For the complete list of aggregation functions, see [Aggregation Functions](/content/sql/query#sql-geo-functions-aggregation). ```sql ST_DISSOLVE Example theme={null} -- Dissolve the boundaries between Queens neighborhood tabulation areas to -- create a single borough boundary SELECT ST_DISSOLVE(geom) AS neighborhoods FROM example_geospatial.nyc_neighborhood WHERE BoroName = 'Queens' ``` Below is a picture of the dissolved neighborhood boundaries:

### Joins Many geospatial/geometry functions can be used to join data sets. For the complete list of join and other scalar functions, see both [Scalar Functions](/content/sql/query#sql-geo-functions-scalar) and [Enhanced Performance Scalar Functions](/content/sql/query#sql-geo-functions-scalar-ep). ```sql Implicit Geospatial Containment Join theme={null} SELECT TOP 25 vendor_id AS "Pickup_Vendor_ID", passenger_count AS "Total_Passengers", pickup_longitude AS "Pickup_Longitude", pickup_latitude AS "Pickup_Latitude", BoroName AS "Pickup_Borough", NTAName AS "Pickup_NTA" FROM example_geospatial.taxi_trip_data, example_geospatial.nyc_neighborhood WHERE STXY_CONTAINS(geom, pickup_longitude, pickup_latitude) = 1 ``` ```sql Explicit Geospatial Intersection Join theme={null} -- Find the top 10 neighborhood pickup locations SELECT TOP 10 ntaname AS "Pickup_NTA", COUNT(*) AS "Total_Pickups" FROM example_geospatial.taxi_trip_data JOIN example_geospatial.nyc_neighborhood ON STXY_INTERSECTS(pickup_longitude, pickup_latitude, geom) = 1 GROUP BY ntaname ORDER BY "Total_Pickups" DESC ``` In some cases, it may be useful to aggregate on the geospatial column itself, applying standard aggregation functions to other joined columns. ```sql Explicit Geospatial Intersection Join with Geospatial Aggregation theme={null} -- Find the geospatial objects representing the top 10 neighborhoods, -- measured by distance covered over all taxi trips originating in them SELECT TOP 10 SUM(trip_distance) AS "Total_Trip_Distance", geom AS "Pickup_Geo" FROM example_geospatial.taxi_trip_data JOIN example_geospatial.nyc_neighborhood ON STXY_INTERSECTS(pickup_longitude, pickup_latitude, geom) = 1 GROUP BY geom ORDER BY "Total_Trip_Distance" DESC ``` ### Equality There are three ways of checking if two geometries are equal, which each have different thresholds: * `ST_EQUALS()` * `ST_ALMOSTEQUALS()` * `ST_EQUALSEXACT()` For details on these functions see [Scalar Functions](/content/sql/query#sql-geo-functions-scalar). The data inserted in the first section includes two POINTs that vary slightly in precision, two POLYGONs that vary in the ordering of their boundary points, and two pairs of LINESTRINGs that vary in both precision and scale. Each of these geometry types is compared to see if each given pair is spatially equal as defined by the equality functions: ```sql Equality Comparisons Between Two Points theme={null} SELECT ST_GEOMETRYTYPE(src1_poi) AS "Source_1_Type", ST_GEOMETRYTYPE(src2_poi) AS "Source_2_Type", ST_ALMOSTEQUALS(src1_poi, src2_poi, 7) AS "Almost_Equals", ST_EQUALS(src1_poi, src2_poi) AS "Equals", ST_EQUALSEXACT(src1_poi, src2_poi, 4) AS "Equals_Exact" FROM example_geospatial.points_of_interest ORDER BY 1 ``` ## Examples via the Python API ### Creating a Table and Inserting WKT Data First, a points-of-interest table is created and loaded with sample WKT data. ```python Create POI Table theme={null} # Create a type using the GPUdbRecordType object columns = [gpudb.GPUdbRecordColumn("src1_poi", "string", [gpudb.GPUdbColumnProperty.WKT]), gpudb.GPUdbRecordColumn("src2_poi", "string", [gpudb.GPUdbColumnProperty.WKT])] poi_record_type = gpudb.GPUdbRecordType(columns, label="poi_record_type") poi_record_type.create_type(kinetica) poi_type_id = poi_record_type.type_id # Create a table from the poi_record_type response = kinetica.create_table(table_name=poi_table, type_id=poi_type_id) print(f"Table created: {response['status_info']['status']}") ``` Then, data is inserted into the WKT table, and the insertion is verified. ```python Populate POI Table theme={null} # Prepare the data to be inserted pois = [ "POINT(-73.8455003 40.7577272)", "POLYGON((-73.9258506 40.8287493, -73.9258292 40.828723, -73.9257795 40.8287387, -73.925801 40.8287635, " "-73.9258506 40.8287493))", "LINESTRING(-73.9942208 40.7504289, -73.993856 40.7500753, -73.9932525 40.7499941)", "LINESTRING(-73.9760944 40.6833433, -73.9764404 40.6830626, -73.9763761 40.6828897)" ] pois_compare = [ "POINT(-73.84550029 40.75772724)", "POLYGON((-73.9257795 40.8287387, -73.925801 40.8287635, -73.9258506 40.8287493, -73.9258506 40.8287493, " "-73.9258292 40.828723))", "LINESTRING(-70.9942208 42.7504289, -72.993856 43.7500753, -73.9932525 44.7499941)", "LINESTRING(-70.9761 41.6834, -72.9765 39.6831, -76.9764 38.6829)" ] # Insert the data into the table using a list encoded_obj_list = [] for val in range(0, 4): poi1_val = pois[val] poi2_val = pois_compare[val] record = gpudb.GPUdbRecord(poi_record_type, [poi1_val, poi2_val]) encoded_obj_list.append(record.binary_data) response = kinetica.insert_records(table_name=poi_table, data=encoded_obj_list, list_encoding="binary") print(f"Number of records inserted: {response['count_inserted']}\n") ``` ### Scalar Functions For the complete list of scalar functions, see both [Scalar Functions](/content/location_intelligence/geo_functions#geo-functions-scalar) and [Enhanced Performance Scalar Functions](/content/location_intelligence/geo_functions#geo-functions-scalar-ep). ```python GEODIST Example theme={null} """ Using GEODIST(), calculate the distance between pickup and dropoff points to see where the calculated distance is less than the recorded trip distance """ response = kinetica.filter( table_name=taxi_trip_table, view_name=dist_check_view, expression="GEODIST(pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude) " "< trip_distance" ) print(f"Number of trips where geographic distance < recorded distance: {response['count']}\n") ``` ```python ST_AREA Example theme={null} # Calculate the area of each taxi zone and copy the selected data to a projection kinetica.create_projection( table_name=taxi_zones_table, projection_name=zones_by_area_proj, column_names=["objectid", "zone", "ST_AREA(geom,1) / 1000000 AS area"], options={"order_by": "area desc", "limit": "5"} ) # Display the top 5 largest taxi zones large_zones = kinetica.get_records( table_name=zones_by_area_proj, offset=0, limit=gpudb.GPUdb.END_OF_SET, encoding="json", options={"sort_by":"area", "sort_order": "descending"} )['records_json'] print("Top 5 largest taxi zones:") print("{:<7s} {:<27s} {:<30s}".format("Zone_ID", "Zone_Name", "Zone_Area_in_Square_Kilometers")) print("{:=<7s} {:=<27s} {:=<30s}".format("", "", "")) for zone in large_zones: print("{objectid:>7d} {zone:<27s} {area:>30.4f}".format(**json.loads(zone))) ``` ### Aggregation Functions For the complete list of aggregation functions, see [Aggregation Functions](/content/location_intelligence/geo_functions#geo-functions-aggregation). ```python ST_DISSOLVE Example theme={null} # Dissolve the boundaries between Queens neighborhood tabulation areas to create a single borough boundary response = kinetica.aggregate_group_by( table_name=nyc_nta_table, column_names=["ST_DISSOLVE(geom) AS neighborhoods"], offset=0, limit=1, encoding="json", options={ "expression": "BoroName = 'Queens'", "result_table": queens_agb } ) print(f"Queens neighborhood boundaries dissolved: {response['status_info']['status']}") ``` Below is a picture of the dissolved neighborhood boundaries:

### Joins Many geospatial/geometry functions can be used to join data sets. For the complete list of join and other scalar functions, see both [Scalar Functions](/content/location_intelligence/geo_functions#geo-functions-scalar) and [Enhanced Performance Scalar Functions](/content/location_intelligence/geo_functions#geo-functions-scalar-ep). ```python Implicit Geospatial Containment Join theme={null} # Locate which borough and neighborhood taxi pickup points occurred kinetica.create_join_table( join_table_name=pickup_point_location, table_names=[taxi_trip_table + " AS t", nyc_nta_table + " AS n"], column_names=[ "t.vendor_id", "t.passenger_count", "t.pickup_longitude", "t.pickup_latitude", "n.BoroName", "n.NTAName" ], expressions=["STXY_CONTAINS(n.geom, t.pickup_longitude, t.pickup_latitude)"] ) # Display a subset of the pickup point locations pickup_points = kinetica.get_records( table_name=pickup_point_location, offset=0, limit=25, encoding="json", options={"sort_by":"pickup_longitude"} )['records_json'] print("Subset of pickup point neighborhood locations:") print("{:<10s} {:<12s} {:<13s} {:<12s} {:<11s} {:<43s}".format("Vendor ID", "Pass. Count", "Pickup Long.", "Pickup Lat.", "Borough", "NTA")) print("{:=<10s} {:=<12s} {:=<13s} {:=<12s} {:=<11s} {:=<43s}".format("", "", "", "", "", "")) for point in pickup_points: print("{vendor_id:<10s} {passenger_count:<12d} {pickup_longitude:<13f} {pickup_latitude:<12f} " "{BoroName:<11s} {NTAName:<43s}".format(**json.loads(point))) ``` ```python Implicit Geospatial Intersection Join theme={null} # Find the top neighborhood pickup locations kinetica.create_join_table( join_table_name=top_pickup_locations_by_frequency, table_names=[taxi_trip_table + " AS t", nyc_nta_table + " AS n"], column_names=["n.NTAName AS NTAName"], expressions=["((STXY_INTERSECTS(t.pickup_longitude, t.pickup_latitude, n.geom) = 1))"] ) # Aggregate the neighborhoods and the count of records per neighborhood kinetica.aggregate_group_by( table_name=top_pickup_locations_by_frequency, column_names=["NTAName AS Pickup_NTA", "COUNT(*) AS Total_Pickups"], offset=0, limit=gpudb.GPUdb.END_OF_SET, options={"result_table": pickup_agb} ) # Display the top 10 neighborhood pickup locations response = kinetica.get_records_by_column( table_name=pickup_agb, column_names=["Pickup_NTA", "Total_Pickups"], offset=0, limit=10, encoding="json", options={"order_by": "Total_Pickups desc"} ) data = kinetica.parse_dynamic_response(response)['response'] print("Top 10 neighborhood pickup locations:") print("{:<43s} {:<14s}".format("Pickup NTA", "Total Pickups")) print("{:=<43s} {:=<14s}".format("", "")) for pickupLocs in zip(data["Pickup_NTA"], data["Total_Pickups"]): print("{:<43s} {:<14d}".format(*pickupLocs)) ``` In some cases, it may be useful to aggregate on the geospatial column itself, applying standard aggregation functions to other joined columns. ```python Implicit Geospatial Intersection Join with Geospatial Aggregation theme={null} # Find the geospatial objects representing the top 10 neighborhoods by # distance covered over all taxi trips originating in them response = kinetica.create_join_table( join_table_name=top_pickup_locations_by_distance, table_names=[taxi_trip_table + " AS t", nyc_nta_table + " AS n"], column_names=["n.geom AS geom", "t.trip_distance AS trip_distance"], expressions=["((STXY_INTERSECTS(t.pickup_longitude, t.pickup_latitude, n.geom) = 1))"] ) # Aggregate the neighborhoods and the count of records per neighborhood response = kinetica.aggregate_group_by( table_name=top_pickup_locations_by_distance, column_names=["geom AS Pickup_Geo", "SUM(trip_distance) AS Total_Trip_Distance"], encoding = "json", offset = 0, limit = 10, options = {"sort_by":"value", "sort_order":"descending"} ) ``` ### Equality There are three ways of checking if two geometries are equal, which each have different thresholds: * `ST_EQUALS()` * `ST_ALMOSTEQUALS()` * `ST_EQUALSEXACT()` For details on these functions see [Scalar Functions](/content/location_intelligence/geo_functions#geo-functions-scalar). The data inserted in the first section includes two POINTs that vary slightly in precision, two POLYGONs that vary in the ordering of their boundary points, and two pairs of LINESTRINGs that vary in both precision and scale. Each of these geometry types is compared to see if each given pair is spatially equal as defined by the equality functions: ```python Equality Comparisons Between Two Points theme={null} # Calculate types of geospatial equality for differing sources of geometry kinetica.create_projection( table_name=poi_table, projection_name=poi_comp_table, column_names=[ "ST_GEOMETRYTYPE(src1_poi) AS src1type", "ST_GEOMETRYTYPE(src2_poi) AS src2type", "ST_ALMOSTEQUALS(src1_poi, src2_poi, 7) AS almostequals", "ST_EQUALS(src1_poi, src2_poi) AS equals", "ST_EQUALSEXACT(src1_poi, src2_poi, 4) AS equalsexact" ] ) equality_comps = kinetica.get_records( table_name=poi_comp_table, offset=0, limit=10, encoding="json", options={"sort_by":"src1type"} )['records_json'] print("Comparing ST_ALMOSTEQUALS(), ST_EQUALS(), and ST_EQUALSEXACT():") print("{:<13s} {:<13s} {:<13s} {:<6s} {:<12s}".format( "Source_1_Type", "Source_2_Type", "Almost_Equals", "Equals", "Equals_Exact" )) print("{:=<13s} {:=<13s} {:=<13s} {:=<6s} {:=<12s}".format("", "", "", "", "")) for comp in equality_comps: print("{src1type:<13s} {src2type:<13s} {almostequals:>13d} {equals:>6d} {equalsexact:>12d}".format(**json.loads(comp))) ``` ## Complete Samples Included below are complete examples containing all the above requests and the output. * Python *(see* [Python Developer Guide](/content/guides/python_guide) *for configuration & execution* *details)* * [geospatial\_examples.py](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/python/geospatial_examples.py) * [Python Output](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/python/geospatial_examples.out) * SQL *(see* [ODBC/JDBC Connector](/content/connectors/sql_guide) *for configuration & execution details)* * [geospatial\_examples.kisql](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/sql/geospatial_examples.kisql) * [SQL Output](https://raw.githubusercontent.com/kineticadb/kinetica-docs/master/content/examples/sql/geospatial_examples.out)