Materialized Views

Kinetica supports the concept of a materialized view, which is a greater-performant query across one or more tables. For details on interfacing with materialized views from SQL, see CREATE MATERIALIZED VIEW.

A materialized view name must adhere to the standard naming criteria. Each materialized view exists within a schema and follows the standard name resolution rules for tables.

A materialized view consists of one or more operations against one or more input tables. These operations will result in member tables that will be used to construct the materialized view and keep its data current. The member tables will be managed by the materialized view and will not be available to be operated on directly outside of the materialized view.

Member tables can be the result of any of the following:

The materialized view can be accessed via its root table, which is the final member table in the materialized view creation sequence.

When a change occurs in an input table, the member tables that are derived from that input table are marked as stale and in need of rebuilding if and when a data refresh is requested. When such a request is received, any stale member tables are rebuilt, and the root table will once again contain current data.

There are seven events that will cause the member tables of a materialized view to be marked as stale:

Refresh

There are four modes that govern the way in which a materialized view's data can be kept current:

Note

The materialized view manages the refresh processing for all entities contained within it. The specified refresh mode of any member table (join, for instance) will be ignored.

Manual Refresh

A manual refresh of a materialized view can be performed on any materialized view, regardless of operating mode, to update the view's data. If a refresh is already in progress, the manual refresh request will wait until the currently executing refresh is complete before executing.

An example, in Python, of performing a manual refresh of a materialized view (customer_order_count):

1
h_db.alter_table('example.customer_order_count', 'refresh', '')

Manual Mode

In manual mode, the materialized view will only be made fully current by performing a manual refresh of the view. This will result in the immediate rebuilding of any of the view's member tables that require it.

On Change Mode

In on change mode, the materialized view will be kept current by detecting changes in any member tables or input tables and updating any member tables that require it. This behavior means that the returned data will always be current, with no overhead on query, but with potentially significant refresh processing for each change in the supporting data.

Note

The granularity of the change detection is at the endpoint call level; e.g., a call to /update/records that resulted in 1,000,000 records being updated would be detected as one change, and one refresh would be initiated.

On Query Mode

In on query mode, the materialized view will be refreshed upon query. This behavior means that the returned data will always be current, with some overhead on query.

Periodic Mode

In periodic mode, the materialized view will be kept current by refreshing the data at a given interval. This behavior means that the returned data may not be current for a given query, but there will be no query overhead and no refresh waiting time for modification calls as with on change mode.

When the beginning of a period is reached, the time of the following period is calculated and a refresh process initiated. If the refresh process does not complete by that pre-calculated time, another refresh process will be initiated immediately after the first completes. When this next refresh begins, the time of the next period in the sequence that occurs at some time past the current time will be calculated and the cycle repeated. In this manner, only one catch-up refresh will be processed regardless of the number of periods skipped by a long-running refresh.

Creating a Materialized View

There are three steps in creating a materialized view using the native API:

  1. Reserve the name of the materialized view and receive a unique ID that will be used to identify member tables as being part of the materialized view. The /create/materializedview endpoint will be used for this purpose. Options such as refresh mode and TTL will also be configured in this step.
  2. Issue requests against the input tables and against any result tables created by those requests, tagging each result table created with the unique ID for the materialized view.
  3. Issue the final processing request, naming the resulting table with the reserved name of the materialized view and tagging it with the unique ID. This member table becomes the root table and the materialized view creation sequence is complete.

To create a materialized view in SQL, see CREATE MATERIALIZED VIEW.

Examples

The following examples use the Python API to construct a materialized view.

To create a materialized view (recent_orders) on a single table (orders), and have recent_orders show records in orders that were ordered in the last 24 hours (as denoted by the table's o_orderdate column), refreshed every time the data in orders is updated:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
view_id = h_db.create_materialized_view(
    table_name = 'example.recent_orders',
    options = {'refresh_method': 'on_change'}
)['view_id']

retobj = h_db.filter(
    table_name = 'tpch.orders',
    view_name = 'example.recent_orders',
    expression = 'o_orderdate > NOW() - 1',
    options = {'view_id': view_id}
)

To create a materialized view (customer_order_count) across two tables: a base table (orders) and a parent table (customer), and refresh the order total per customer in customer_order_count every hour:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
view_id = h_db.create_materialized_view(
    table_name = 'example.customer_order_count',
    options = {'refresh_method': 'periodic', 'refresh_period': '3600'}
)['view_id']

h_db.create_join_table(
    join_table_name = 'example.customer_order',
    table_names = ['tpch.orders', 'tpch.customer'],
    column_names = ['*'],
    expressions = ['o_custkey = c_custkey'],
    options = {'view_id': view_id}
)

response = h_db.aggregate_group_by(
    table_name = 'example.customer_order',
    column_names = ['c_name', 'count(*) as num_orders'],
    options = {
        'result_table': 'example.customer_order_count',
        'view_id': view_id
    }
)

Limitations and Cautions

  • A materialized view is not persisted, by default, and will not survive a database restart. However, specifying a persist option of true when creating one will make the view permanent and not expire; database restarts will cause the materialized view to be rebuilt from its input tables.
  • A persisted materialized view cannot have a temporary table or view as a member table; e.g., attempting to add a projection created with a persist option set to false to a persisted materialized view will fail.
  • A /merge/records operation cannot be part of a materialized view.
  • When performing a refresh, a materialized view will not take into account DML operations on member tables that occurred during its creation; these include:
  • When any one of the input tables to a materialized view is altered or dropped, the materialized view will also be dropped.