A projection is a concept unique to Kinetica. A projection is created through the /create/projection endpoint and represents a set of columns from and/or column expressions applied to a source data set (table or view). While projections can simply represent a whole-cloth cross-section of an existing table, filtering clauses can be applied to before creating the column to create an entirely new data set. Expressions that can be used to define columns or filters can be found here.
A projection has several advantages:
materialize_on_gpu
option to cache the
projection on the GPU./create/projection
call can calculate derived columns natively/create/projection
endpoint is the only endpoint that can calculate
a moving averageProjections can be created from any table or view and will
create a new type schema using the columns and expressions specified in the
/create/projection
column list from the source schema. Note that if the
source data set is replicated the projection will also be
replicated. If the source data set is sharded, the
projection can also be sharded but only if all the sharded columns are
listed in the /create/projection
call and are specified in the same order
as they appear in the source data set. The sharded columns do not have to be
listed continguously.
For example, a source table employees
has four columns (in order),
employee_name
, salary
, employee_id
, and has_benefits
, and two
of those columns, salary
and employee_id
, are sharded. To create a
sharded projection using the employees
table, the sharded columns will
need to be listed in the same order as the source table, e.g.,
column_names = ["salary", "employee_name", "employee_id"]
.
A projection of an existing table or view (sharded, replicated, or
otherwise) can be sharded on entirely different columns using the
shard_key
option on the /create/projection
endpoint, regardless of the
column(s) the table/view was previously sharded on (if any). See
Shard Keys for more information on shard keys and sharding
restrictions.
A projection is also a means to perform a moving average window function,
which is similar to an aggregate function except instead of grouping rows into
a single output row, the rows will still exist separately. A moving average
computes the average over the previous n records and the subsequent n
records for each record in the given column. Any records that were not able to
contain a complete window are set to null
. For more information
on using projections to perform a moving average window function, see
Examples.
The /create/projection endpoint operates within each processing node individually, so all sorting and moving average calculations are only completed with the data that is in each processing node. This means that sorting and/or moving average calculations should be used on a replicated data set or if the data is sharded such that all the records that need to be used for calculations are on the same processing node.
Several limitations are discussed in further detail in the Limitations and Cautions section.
To create a projection, the /create/projection endpoint requires three parameters:
TableA.id
, TableB.id as b_id
,
length(column_name) as column_name_length
, ...)In Python, given source table employees
, a projection can be created
via:
response = gpudb.create_projection(
table_name = "sales",
projection_name = "sales_id_revenue_region"
column_names = ["id", "revenue", "region"]
)
To create a projection in Python that calculates the moving average on
the salary
column using the previous record (1) and the next three records
(3):
response = gpudb.create_projection(
table_name = "sales",
projection_name = "sales_moving_avg_revenue_id",
column_names = ["moving_average(revenue,1,3)", "id"],
options = {"order_by":"revenue asc"}
)
All endpoints currently implemented are able to operate against a projection.
A Python example filter on the projection created in the Creating a Projection section for sales in the Northeast region:
response = gpudb.filter(
table_name = "sales_id_revenue_region",
view_name = "northeast_sales",
expression = "region = Northeast"
)
When filtering against a projection, the /filter endpoint produces a view that is also a projection, which means you cannot add, edit, or delete rows. A chain of these filters can be used to create more and more restrictive projections of the original data set.
In Python, to aggregate average, min, and max revenue by region in the projection created earlier:
response = gpudb.aggregate_group_by(
table_name = "sales_id_revenue_region",
column_names = [
"region",
"avg(revenue) as avg_revenue",
"min(revenue) as min_revenue",
"max(revenue) as max_revenue"
],
offset = 0,
limit = 50,
encoding = "binary"
)
There are several limitations to creating and using projections:
persist
option upon creation, the view
will become a table and no longer have this restrictionlength(column_name) as column_name_length
. For more details on
column naming restrictions, see Tables