Quick Start Guide

Get started with Kinetica Developer Edition

Install Kinetica

To install the Kinetica Developer Edition. Copy and paste the following command into your terminal. Note, the Developer Edition requires Docker to be pre-installed and running.

Note that you will need to give Docker enough RAM to run smoothly. We recommend 8GB or higher.

Linux/Mac OS

1
curl http://files.kinetica.com/install/kinetica.sh -o kinetica && chmod u+x kinetica && ./kinetica start

Windows 10

From the command prompt (Press WIN + r, then type cmd and press Enter)

1
curl http://files.kinetica.com/install/kinetica.bat -o kinetica.bat && ./kinetica.bat start

You can use the kinetica script to manage your installation, start, and stop the database. To see a full list of commands, run ./kinetica --help on Linux/Mac, or .\kinetica --help on Windows.

Import Data

For this guide, we'll use several files with geospatial data in them. The nyct2010.csv file provides geospatial boundaries for neighborhoods in New York. The taxi_data_historical.parquet file provides a list of taxi pickups and dropoffs in New York.

Navigate to GAdmin > Query > SQL and run the following code.

Create the Data Source
1
2
3
4
5
6
7
8
CREATE DATA SOURCE quickstart
LOCATION = 'S3'
USER = 'AKIA3TZ6GK7TB72X6L52'
PASSWORD = '5NBt4IoDNOKFmnYc8iBVeVEGupCgdk/e9dGccybe'
WITH OPTIONS (
    BUCKET NAME = 'kineticaqsdata',
    REGION = 'us-east-1'
)
Load Boundary Data from S3
1
2
3
4
5
6
LOAD DATA INTO nyct2010 
FROM FILE PATHS 'nyct2010.csv'
FORMAT TEXT 
WITH OPTIONS (
    DATA SOURCE = 'quickstart'
)
Load Taxi Data from S3
1
2
3
4
5
6
LOAD DATA INTO taxi_data_historical
FROM FILE PATHS 'taxi_data.parquet'
FORMAT PARQUET
WITH OPTIONS (
    DATA SOURCE = 'quickstart'
)
Correct Column Types
1
2
ALTER TABLE taxi_data_historical
ALTER COLUMN vendor_id varchar(4)

Query data with SQL

What are the shortest, average, and longest trip lengths for each vendor?

Trip Statistics
1
2
3
4
5
6
7
8
SELECT
    NVL(vendor_id, '<ALL VENDORS>') AS Vendor_Name,
    DECIMAL(MIN(trip_distance)) AS Shortest_Trip,
    DECIMAL(AVG(trip_distance)) AS Average_Trip,
    DECIMAL(MAX(trip_distance)) AS Longest_Trip,
    COUNT(*) AS Hist_Total_Trips
FROM taxi_data_historical
GROUP BY vendor_id
Results
1
2
3
4
5
6
7
8
9
+---------------+-----------------+----------------+----------------+--------------------+
| Vendor_Name   |   Shortest_Trip |   Average_Trip |   Longest_Trip |   Hist_Total_Trips |
+---------------+-----------------+----------------+----------------+--------------------+
| VTS           |          0.0100 |         2.8620 |        53.7500 |             116597 |
| NYC           |          0.0100 |         3.0071 |        55.8700 |             145395 |
| CMT           |          0.1000 |         2.7427 |        50.0000 |              82452 |
| YCAB          |          0.1000 |         2.8762 |        50.9000 |             130815 |
| DDS           |          0.1000 |         2.9158 |        46.0000 |               3584 |
+---------------+-----------------+----------------+----------------+--------------------+

Location Analytics

Q1: What were the 10 most frequent destination neighborhoods for JFK pickups, and on average, what did it cost to go there?

Joining the NTA table and the taxi data using the STXY_Intersects function, you can group by neighborhood boundary to determine the total number of trips to the neighborhood and average fare for a trip from JFK to that neighborhood.

Frequent Destination Neighborhoods from JFK
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT TOP 10
    n_dropoff.NTAName AS "Neighborhood",
    COUNT(*) AS "Total_Trips",
    DECIMAL(AVG(fare_amount)) AS "Average_Fare"
FROM
    taxi_data_historical t
    JOIN nyct2010 n_pickup
        ON STXY_Intersects(t.pickup_longitude, t.pickup_latitude, n_pickup.geom) = 1
        AND n_pickup.NTAName = 'Airport'
    JOIN nyct2010 n_dropoff
        ON STXY_Intersects(t.dropoff_longitude, t.dropoff_latitude, n_dropoff.geom) = 1
WHERE pickup_datetime < '2019-01-01'
GROUP BY 1
ORDER BY 2 DESC;
Results
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
+----------------------------------------------+---------------+----------------+
| Neighborhood                                 |   Total_Trips |   Average_Fare |
+----------------------------------------------+---------------+----------------+
| Midtown-Midtown South                        |          3219 |        39.2489 |
| Turtle Bay-East Midtown                      |          1250 |        36.2404 |
| Hudson Yards-Chelsea-Flatiron-Union Square   |           757 |        39.4635 |
| Murray Hill-Kips Bay                         |           675 |        37.3777 |
| Battery Park City-Lower Manhattan            |           629 |        43.5058 |
| Upper West Side                              |           584 |        37.2917 |
| Clinton                                      |           580 |        41.7162 |
| West Village                                 |           493 |        40.4975 |
| Upper East Side-Carnegie Hill                |           459 |        35.0270 |
| SoHo-TriBeCa-Civic Center-Little Italy       |           455 |        41.9072 |
+----------------------------------------------+---------------+----------------+

Q2: For neighborhoods with more than 500 pickups, how many had more than half of their pickups at night and by what taxi vendor?

You can use neighborhood boundary data and pickup time to calculate in which NTA vendors were picking up passengers at night and display the results in a pivot table containing columns of percentages for all vendors and each individual vendor.

Majority of Night Pickups by Vendor
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SELECT
    NTANAme AS "Neighborhood",
    all_npp AS "All",
    cmt_npp AS "CMT", nyc_npp AS "NYC", vts_npp AS "VTS", ycab_npp AS "YCAB"
FROM
(
    SELECT
        NTAName,
        IF (GROUPING(vendor_id) = 1, CAST ('ALL' as varchar(4)), vendor_id) AS vendor_name,
        COUNT(*) AS total_pickups,
        DECIMAL(SUM(IF(HOUR(pickup_datetime) BETWEEN 5 AND 19, 0, 1)))
            / COUNT(*) * 100 AS night_pickup_percentage
    FROM
        taxi_data_historical t
        JOIN nyct2010 n
            ON STXY_Intersects(t.pickup_longitude, t.pickup_latitude, n.geom) = 1
    WHERE pickup_datetime < '2019-01-01'
    GROUP BY
        NTAName,
        ROLLUP(vendor_id)
)
PIVOT
(
    MAX(total_pickups) AS tp,
    MAX(night_pickup_percentage) AS npp
    FOR vendor_name IN ('ALL', 'CMT', 'NYC', 'VTS', 'YCAB')
)
WHERE all_tp > 500 AND all_npp > 50
ORDER BY all_npp DESC;
Results
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
+-------------------------+-----------+-----------+-----------+-----------+-----------+
| Neighborhood            |       All |       CMT |       NYC |       VTS |      YCAB |
+-------------------------+-----------+-----------+-----------+-----------+-----------+
| East Williamsburg       |   83.1400 |   84.2100 |   81.2500 |   85.2600 |   83.2300 |
| North Side-South Side   |   78.6400 |   80.7500 |   78.7300 |   82.0000 |   74.6600 |
| Park Slope-Gowanus      |   66.7700 |   72.0000 |   67.6800 |   71.2700 |   58.9600 |
| Chinatown               |   64.2100 |   59.1700 |   66.9800 |   64.1300 |   64.7400 |
| Fort Greene             |   60.6200 |   56.2500 |   63.2500 |   62.8000 |   58.1200 |
| East Village            |   56.2800 |   57.6000 |   56.1700 |   55.1900 |   56.7700 |
| Lower East Side         |   52.6500 |   53.7100 |   54.1300 |   50.3300 |   52.9400 |
+-------------------------+-----------+-----------+-----------+-----------+-----------+

Q3: How many pickups per hour were there at JFK International Airport?

Using the STXY_Intersects function to determine which pickup points were located in the JFK International Airport neighborhood boundary, it’s simple to calculate the number of pickups per hour there were at JFK for all cab types.

Pickups per Hour from JFK
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT
    RPAD(LPAD(CHAR2(HOUR(t.pickup_datetime)), 2, '0'), 5, ':00') AS "Pickup_Hour",
    COUNT(*) AS "Total_Pickups"
FROM
    taxi_data_historical t
    JOIN nyct2010 n
    ON STXY_Intersects(t.pickup_longitude, t.pickup_latitude, n.geom) = 1
WHERE
    NTAName = 'Airport' AND
    pickup_datetime < '2019-01-01'
GROUP BY
    HOUR(t.pickup_datetime)
ORDER BY
    1;
Results
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
+---------------+-----------------+
| Pickup_Hour   |   Total_Pickups |
+---------------+-----------------+
| 00:00:00      |             662 |
| 01:00:00      |             270 |
| 02:00:00      |              91 |
| 03:00:00      |              40 |
| 04:00:00      |              24 |
| 05:00:00      |             180 |
| 06:00:00      |             344 |
| 07:00:00      |             472 |
| 08:00:00      |             617 |
| 09:00:00      |             652 |
| 10:00:00      |             746 |
| 11:00:00      |             816 |
| 12:00:00      |             832 |
| 13:00:00      |             905 |
| 14:00:00      |            1040 |
| 15:00:00      |            1156 |
| 16:00:00      |            1181 |
| 17:00:00      |            1249 |
| 18:00:00      |            1110 |
| 19:00:00      |            1134 |
| 20:00:00      |            1172 |
| 21:00:00      |            1284 |
| 22:00:00      |            1199 |
| 23:00:00      |            1102 |
+---------------+-----------------+

Visualization

Kinetica’s main visualization tools are Reveal and the Web Mapping Service (WMS). For our purposes, let's use the WMS feature.

Kinetica’s web map service renders geospatial images that are overlaid on map tools like Mapbox, OpenLayers, and Esri. Simply point your basemap provider to our /wms endpoint with your desired rendering parameters, and Kinetica will generate geospatial imagery on demand. Every pan and zoom will render a new image of your data, so you can visualize streaming updates in real time.

Q4: Given my storefront’s location, how can I calculate the number of drop offs that occurred within 150 meters?

Say you own a small business or restaurant in Chelsea that relies on nearby foot traffic for new customers. You can easily visualize the density of taxi drop offs nearby with the help of the GEODIST function.

Number of Dropoffs Near Storefront
1
2
3
4
CREATE OR REPLACE TABLE store_front_dropoffs AS
SELECT *
FROM taxi_data_historical
WHERE GEODIST(-74.00378, 40.743193, dropoff_longitude, dropoff_latitude) < 150;

We used this query to create a table so the results can be seen on a map. To output the dropoff points to a map, we’ll use the WMS functionality of GAdmin. Let’s validate this query.

Next to the store_front_dropoffs table, click Map. The output is a map showing the dropoff points within 150 meters of your storefront.

storefront_wms_map.png

Next Steps

Congratulations, you have successfully completed the Quick Start Guide! But, you have only just begun your journey with Kinetica. To learn more about in depth usage, check out these articles: