Before using this guide, you will need to provision an instance from the
Azure Marketplace. Follow the instructions in the
provisioning documentation before you proceed.
Find the Workbench URL
Once your instance has been provisioned,
locate the workbench URL
in the output information provided by Azure. Use the Workbench
URL to log into Kinetica.
Navigate to the Quick Start Workbook
In Kinetica Workbench, on the left side of the screen, you will see a tabbed
panel called the Explorer. Open the Workbooks panel and find the Quick Start
Guide workbook. Select it and press the Open menu item to navigate to the
workbook.
Ingest Streaming Data
First, we need to start a stream of synthetic taxi pickups and drop-offs from a
Kafka queue to show Kinetica's ability to build materialized views on top of
dynamic data.
For this guide, we'll use several files with geospatial data in them. The
nyct2010.csv file provides geospatial boundaries for neighborhoods in New
York. The taxi_data_historical.parquet file provides a list of taxi pickups
and dropoffs in New York.
Navigate to Workbooks > Quick Start Guide to see the following blocks of SQL.
First, let's create the data source. In this case, it is an S3 bucket containing
the data.
Finally, let's load the historical taxi data from S3. Here we leverage
Kinetica's built-in type inferencing, but we will need to correct the
vendor_id and payment_type columns to ensure that they use a
VARCHAR instead of a string so that we can use those columns in the
analytics and Reveal dashboard below.
Compare today's shortest, longest, average, and total trips against our
historical data. For this analytic, we create a materialized view that is
refreshed on every query. It blends the latest streaming data with our
historical data to produce a combined report with a single query.
What are the shortest, average, and longest trip lengths for each vendor? In
this analytic, we use the MIN, MAX, AVG, and COUNT functions to
find these statistics for each vendor.
Q1: What were the 10 most frequent destination neighborhoods for JFK pickups, and on average, what did it cost to go there?
Joining the NTA table and the taxi data using the STXY_Intersects function,
you can group by neighborhood boundary to determine the total number of trips to
the neighborhood and average fare for a trip from JFK to that neighborhood.
+----------------------------------------------+---------------+----------------+
| Neighborhood | Total_Trips | Average_Fare |
+----------------------------------------------+---------------+----------------+
| Midtown-Midtown South | 3219 | 39.2489 |
| Turtle Bay-East Midtown | 1250 | 36.2404 |
| Hudson Yards-Chelsea-Flatiron-Union Square | 757 | 39.4635 |
| Murray Hill-Kips Bay | 675 | 37.3777 |
| Battery Park City-Lower Manhattan | 629 | 43.5058 |
| Upper West Side | 584 | 37.2917 |
| Clinton | 580 | 41.7162 |
| West Village | 493 | 40.4975 |
| Upper East Side-Carnegie Hill | 459 | 35.0270 |
| SoHo-TriBeCa-Civic Center-Little Italy | 455 | 41.9072 |
+----------------------------------------------+---------------+----------------+
Total Trips by Neighborhood
Q2: For neighborhoods with more than 500 pickups, how many had more than half of their pickups at night and by what taxi vendor?
You can use neighborhood boundary data and pickup time to calculate in which NTA
vendors were picking up passengers at night and display the results in a pivot
table containing columns of percentages for all vendors and each individual
vendor.
Q3: How many pickups per hour were there at JFK International Airport?
Using the STXY_Intersects function to determine which pickup points were
located in the JFK International Airport neighborhood boundary, it's simple to
calculate the number of pickups per hour there were at JFK for all cab types.
Kinetica's main visualization tools are Reveal and the Web Mapping Service
(WMS). For our purposes, let's use the WMS feature.
Kinetica's web map service renders geospatial images that are overlaid on map
tools like Mapbox, OpenLayers, and Esri. Simply point your basemap provider to
our /wms endpoint with your desired rendering parameters, and Kinetica will
generate geospatial imagery on demand. Every pan and zoom will render a new
image of your data, so you can visualize streaming updates in real time.
Q4: Given my storefront's location, how can I calculate the number of drop offs that occurred within 150 meters?
Say you own a small business or restaurant in Chelsea that relies on nearby foot
traffic for new customers. You can easily visualize the density of taxi drop
offs nearby with the help of the GEODIST function.
We used this query to create a table so the results can be seen on a map. To
output the dropoff points to a map, we'll use the WMS functionality of GAdmin.
Let's validate this query.
Next to the store_front_dropoffs table, click Map. The output is a map
showing the dropoff points within 150 meters of your storefront.
Load a Reveal Dashboard
Load the "NYC Taxi" Reveal dashboard from Kinetica's Github account and then navigate to https://<hostname>/reveal and log in to see the dashboard.
Congratulations, you have successfully completed the Quick Start Guide! But, you
have only just begun your journey with Kinetica. To learn more about in depth
usage, check out these articles: