A projection is a concept unique to Kinetica. A projection is created
through the /create/projection endpoint and represents a
set of columns from and/or column expressions applied to a source data set
(table or view). Projections are
in-memory tables by default, but can
be persisted (like a table) using the persist
option.
While projections can simply represent a whole-cloth cross-section of an
existing table, filtering clauses can be applied before creating the column
to create an entirely new data set. Expressions that can be used to define
columns or filters can be found here.
A projection has several advantages:
materialize_on_gpu
option to cache the
projection on the GPU.Projections can be created from any table or view and will create a new type schema using the columns and expressions specified in the /create/projection column list from the source schema. Note that if the source data set is replicated the projection will also be replicated, by default. If the source data set is sharded, the projection will also be sharded by default, but only if the entire shard key is included in the /create/projection call and is specified in the same order as it appears in the source data set. The sharded columns do not have to be listed continguously.
For example, a source table employees
has four columns (in order),
employee_name
, salary
, employee_id
, and has_benefits
, and two
of those columns, salary
and employee_id
, comprise the shard key.
To create a sharded projection using the employees
table, the
shard key will need to be listed in the same order as the source table,
e.g., column_names = ["salary", "employee_name", "employee_id"]
.
A projection of an existing table or view (sharded, replicated, or
otherwise) can be sharded on entirely different columns using the
shard_key
option on the /create/projection endpoint,
regardless of the column(s) the table/view was previously sharded on (if
any). See Shard Keys for more information on shard keys and sharding
restrictions.
A projection is also a means to perform window functions. For examples of using window functions, see Examples.
Several limitations are discussed in further detail in the Limitations and Cautions section.
To create a projection, the /create/projection endpoint requires three parameters:
TableA.id
, TableB.id as b_id
,
length(column_name) as column_name_length
, ...)In Python, given source table nyctaxi
, a projection of the pickup data
for single passengers can be created via:
h_db.create_projection(
table_name = "nyctaxi",
projection_name = "nyctaxi_single_pickups",
column_names = [
"vendor_id",
"pickup_latitude",
"pickup_longitude",
"pickup_datetime",
"trip_distance",
"total_amount"
],
options = {
"expression": "passenger_count = 1"
}
)
To create a projection in Python that calculates a 1-before and 3-after moving average on the trip distance for single passengers per vendor, using the projection created above as the source:
h_db.create_projection(
table_name = "nyctaxi_single_pickups",
projection_name = "nyctaxi_single_pickups_local_avg_dist",
column_names = [
"vendor_id",
"pickup_datetime",
"trip_distance",
"AVG(trip_distance) " \
"OVER (" \
" PARTITION BY vendor_id " \
" ORDER BY pickup_datetime " \
" ROWS BETWEEN 1 PRECEDING AND 3 FOLLOWING " \
") AS local_avg_dist"
],
options = {
"order_by": "vendor_id, pickup_datetime"
}
)
Lastly, to create a projection in Python that calculates the running total revenue per vendor for single passenger trips, using the first projection created above as the source:
h_db.create_projection(
table_name = "nyctaxi_single_pickups",
projection_name = "nyctaxi_single_pickups_running_total_revenue",
column_names = [
"vendor_id",
"pickup_datetime",
"SUM(total_amount) " \
"OVER (" \
" PARTITION BY vendor_id " \
" ORDER BY pickup_datetime " \
") AS running_total_revenue"
],
options = {
"order_by": "vendor_id, pickup_datetime"
}
)
All endpoints currently implemented are able to operate against a projection.
For a Python example filter using the first projection created in the Creating a Projection section, the following would show pickup data for single passengers that traveled more than 10 miles:
h_db.filter(
table_name = "nyctaxi_single_pickups",
view_name = "nyctaxi_single_pickups_over_10_miles",
expression = "trip_distance > 10"
)
When filtering against a projection, the /filter endpoint produces a view, which means you cannot add, edit, or delete rows. A chain of these filters can be used to create more and more restrictive projections of the original data set.
In Python, to aggregate per vendor statistics for single passengers, using the same first projection as the source, the following call can be made:
h_db.aggregate_group_by(
table_name = "nyctaxi_single_pickups",
column_names = [
"vendor_id",
"MIN(total_amount) AS min_revenue",
"AVG(total_amount) AS avg_revenue",
"MAX(total_amount) AS max_revenue"
],
offset = 0,
limit = -9999,
options = {"result_table": "nyctaxi_single_pickups_statistics"}
)
There are several limitations to creating and using projections:
length(column_name) as column_name_length
. For more details on
column naming restrictions, see Tablespersist
option of true
will make the
projection permanent and not expire