Before using this guide, you will need access to an instance of Kinetica in
AWS. If you do not already have one, you can provision an
instance from the AWS Marketplace by following the
instructions in the
provisioning documentation.
Navigate to the Quick Start Workbook
Once your instance has been provisioned,
locate the workbench URL in the output
information provided by AWS.
In Kinetica Workbench, on the left side of the screen, you will see a tabbed
panel called the Explorer. Click on Workbooks and find
and click on the Quick Start Guide workbook.
The SQL in this workbook can be either:
run collectively, by clicking Run All
run individually, by clicking the arrow at the top left of each SQL block
This guide will walk through each SQL block and cover its use.
Initiate Streaming Data Ingest
First, we need to start a stream of synthetic taxi pickups and drop-offs from a
Kafka queue to show Kinetica's ability to build materialized views on top of
dynamic data.
For this guide, we'll use several files with geospatial data in them. The
nyct2010.csv file provides geospatial boundaries for neighborhoods in New
York. The taxi_data_historical.parquet file provides a list of taxi pickups
and dropoffs in New York.
First, let's create the data source. In this case, it is an S3 bucket containing
the data.
Finally, let's load the historical taxi data from S3. Here we leverage
Kinetica's built-in type inferencing, but we will need to correct the
vendor_id and payment_type columns to ensure that they use a
VARCHAR instead of a string so that we can use those columns in the
analytics and Reveal dashboard later.
Compare today's shortest, longest, average, and total trips against our
historical data. For this analytic, we create a materialized view that is
refreshed on every query. It blends the latest streaming data with our
historical data to produce a combined report with a single query.
What are the shortest, average, and longest trip lengths for each vendor? In
this analytic, we use the MIN, MAX, AVG, and COUNT functions to
find these statistics for each vendor.
Q1: What were the 10 most frequent destination neighborhoods for JFK pickups, and on average, what did it cost to go there?
Joining the NTA table and the taxi data using the STXY_Intersects function,
you can group by neighborhood boundary to determine the total number of trips to
the neighborhood and average fare for a trip from JFK to that neighborhood.
+----------------------------------------------+---------------+----------------+
| Neighborhood | Total_Trips | Average_Fare |
+----------------------------------------------+---------------+----------------+
| Midtown-Midtown South | 3219 | 39.2489 |
| Turtle Bay-East Midtown | 1250 | 36.2404 |
| Hudson Yards-Chelsea-Flatiron-Union Square | 757 | 39.4635 |
| Murray Hill-Kips Bay | 675 | 37.3777 |
| Battery Park City-Lower Manhattan | 629 | 43.5058 |
| Upper West Side | 584 | 37.2917 |
| Clinton | 580 | 41.7162 |
| West Village | 493 | 40.4975 |
| Upper East Side-Carnegie Hill | 459 | 35.0270 |
| SoHo-TriBeCa-Civic Center-Little Italy | 455 | 41.9072 |
+----------------------------------------------+---------------+----------------+
Total Trips by Neighborhood
Q2: For neighborhoods with more than 500 pickups, how many had more than half of their pickups at night and by what taxi vendor?
You can use neighborhood boundary data and pickup time to calculate in which NTA
vendors were picking up passengers at night and display the results in a pivot
table containing columns of percentages for all vendors and each individual
vendor.
Q3: How many pickups per hour were there at JFK International Airport?
Using the STXY_Intersects function to determine which pickup points were
located in the JFK International Airport neighborhood boundary, it's simple to
calculate the number of pickups per hour there were at JFK for all cab types.
Kinetica's main visualization tools are Reveal and
the Web Mapping Service (WMS). For our
purposes, let's use the WMS feature.
Kinetica's web map service renders geospatial images that are overlaid on map
tools like Mapbox, OpenLayers, and Esri. Simply point your basemap provider to
our /wms endpoint with your desired rendering parameters, and Kinetica will
generate geospatial imagery on demand. Every pan and zoom will render a new
image of your data, so you can visualize streaming updates in real time.
Q4: Given my storefront's location, how can I calculate the number of drop offs that occurred within 150 meters?
Say you own a small business or restaurant in Chelsea that relies on nearby foot
traffic for new customers. You can easily visualize the density of taxi drop
offs nearby with the help of the GEODIST function.
We used this query to create a table so the results can be seen on a map. To
output the dropoff points to a map, we'll use Kinetica's WMS functionality.
Let's validate this query.
In Workbench, click the Data tab of the Explorer pane.
Under Tables & Views, click the arrow next to ki_home to
expand the listing of tables contained within it. Click on store_front_dropoffs
and then click on WMS Preview.
This will bring up the Map Configuration pop-up for the map.
Move the Blur Radius slider to the right, until it is at 10,
and then click Update at the bottom right to view the map.
The output is a heatmap showing the dropoff points within 150 meters of your
storefront.
Click Close to close the map pop-up window.
Load a Reveal Dashboard
Load the "NYC Taxi" Reveal dashboard from Kinetica's GitHub account.
Congratulations, you have successfully completed the Quick Start Guide! But, you
have only just begun your journey with Kinetica. To learn more about in depth
usage, check out these articles: