persist option. While projections can simply represent a whole-cloth
cross-section of an existing table, filtering clauses can be applied before
creating the column to create an entirely new data set. Expressions that can be
used to define columns or filters can be found
here.
A projection is somewhat analogous to creating a table from a
single-table source query. See CREATE TABLE … AS for
details.
- Because a projection is a copy of your data, querying the projection will be faster than querying a join. You could create a projection of a join to increase the speed of queries against the data set. This also means derived columns and join lookups are precalculated.
- A /create/projection call can calculate derived columns natively
- A projection can be sharded on entirely different columns than its source table or view; a projection can also be sharded even if its source table or view is randomly sharded or replicated
- A projection can also be replicated even if its source table is sharded
employees has four columns (in order),
employee_name, salary, employee_id, and has_benefits, and two
of those columns, salary and employee_id, comprise the shard key.
To create a sharded projection using the employees table, the
shard key will need to be listed in the same order as the source table,
e.g., column_names = ["salary", "employee_name", "employee_id"].
A projection of an existing table or view (sharded, replicated, or
otherwise) can be sharded on entirely different columns using the
shard_key option on the /create/projection endpoint,
regardless of the column(s) the table/view was previously sharded on (if
any). See Shard Keys for more information on shard keys and sharding
restrictions.
A projection is also a means to perform
window functions. For examples of using
window functions, see Examples.
Several limitations are discussed in further detail in the
Limitations and Cautions section.
Creating a Projection
To create a projection, the /create/projection endpoint requires three parameters:- the name of the data set to create the projection from
- the name of the projection to create
- the list of columns and/or column expressions selected from the source data
set with optional aliases (
TableA.id,TableB.id as b_id,length(column_name) as column_name_length, …)
Examples
In Python, given source tablenyctaxi, a projection of the pickup data
for single passengers (filtering out some bad data) can be created via:
Operating on a Projection
All endpoints currently implemented are able to operate against a projection.Examples
For a Python example filter using the first projection created in the Creating a Projection section, the following would show pickup data for single passengers that traveled more than 10 miles:Limitations and Cautions
There are several limitations to creating and using projections:- Performing a projection results in an entirely new data set, so be mindful of the memory usage implications.
- If a projection is created and its source data set’s rows are updated (or removed), the projection will not be updated to reflect the changes in the source data set.
- If using column functions to define projection column names in the
column_names parameter, an alias must be used for the column name,
e.g.,
length(column_name) as column_name_length. For more details on column naming restrictions, see Tables. - A projection is transient, by default, and will expire after the default TTL setting.
- A projection is not persisted, by default, and will not survive a database
restart; specifying a
persistoption oftruewill make the projection permanent and not expire.