Kinetica supports the concept of a materialized view, which is a greater-performant query across one or more tables.
In Kinetica, this view can be insert-triggered; meaning that it can be configured to refresh based on the presence of new records--updates & deletes will not prompt the refresh behavior. This scheme is available primarily because Kinetica serves the OLAP use case best, where a system is heavy on inserts into a fact table and light on updates & deletes.
An update to a record in a table supporting the materialized view will be reflected automatically in the view, regardless of whether the update would cause the record to be excluded from the result set, were the query re-issued. For instance, if a record was updated in a way that made it no longer pass the query's filter criteria, it would still be present in the materialized view.
The deletion of a record in a table supporting the materialized view will not be reflected in the view until it is manually refreshed.
A materialized view query will be analyzed by Kinetica in an attempt to determine a base table, which is defined as either:
- the sole table in the query
- the root table, linked directly or indirectly to all other tables through relationships with the primary keys of those tables; it is therefore necessary that all other tables have primary keys
The materialized view will follow an operational course based on whether a base table could be determined, what mode the materialized view has been set to, and what data manipulation operations have been applied to the tables composing it.
There are three modes in which a materialized view can operate: manual, on query, and on insert.
In manual mode, the materialized view will only be made fully current by performing a manual refresh of the view. If an incremental refresh is applied, only inserts into the base table will be reflected in the updated content of the materialized view. If inserts are made into tables other than the base table, a full refresh will be performed. This will result in a rebuilding of all the view's data, accounting for any updates & deletes.
In on query mode, the materialized view will be made current when a query is made against it and records have been inserted into any of the data sets that compose the view. If records have only been added to the base table, an incremental refresh will be performed; otherwise, a full refresh will be performed. If no data is added between the issuance of two queries against the materialized view, no rebuilding of the view will be performed.
In on insert mode, the materialized view will be kept current by automatically adding to it each record inserted into the base table. Outside of base table inserts, this mode functions the same as on query mode.
The primary use case for this mode is for a view against a set of tables where the data manipulation operations consist almost entirely of base table inserts and the queries against the materialized view need to be fast and current, with respect to those inserts. An example would be an OLAP system consisting of mostly inserts into a fact table.
A manual refresh of a materialized view can be performed on any materialized view, regardless of operating mode, to update the view's data. If a materialized view is insert-triggered, it may be necessary to refresh the data periodically to properly reflect updates & deletes that have taken place since the materialized view was created or since the last refresh. There are two types of refresh operations that can be performed: full & incremental.
A full refresh will cause a rebuilding of the entire materialized view, irrespective of what data manipulation operations have taken place. This operation takes the same time as creating the original view, but is the one that is guaranteed to properly reflect updates & deletes made to the underlying tables.
An incremental refresh is insert-based. If one is requested on a materialized view in which the only new data are records inserted into the base table, only those records will be added to the materialized view. If inserts have been made into tables other than the base table, a full refresh will be performed.
Note that it is generally unnecessary to issue an incremental refresh against a materialized view in on insert mode or on query mode, as both of these use the incremental refresh automatically to stay current.
An example, in Python, of performing an incremental refresh of a
materialized view (recent_customer_order
):
retobj = gpudb.create_join_table(
join_table_name = 'recent_customer_order',
table_names = [],
options = {'refresh':'refresh'}
)
An example of performing a full refresh on the same materialized view:
retobj = gpudb.create_join_table(
join_table_name = 'recent_customer_order',
table_names = [],
options = {'refresh':'full_refresh'}
)
A materialized view differs from a filtered view in several important ways:
A standard SQL view acts as a persisted query on a given data set. When a user queries the view, the view's query is re-executed and the new results returned. Kinetica can mimic this behavior, to some extent, with a materialized view. A materialized view can be created in manual mode (to avoid the extra processing that occurs during inserts in on insert mode) and then have a manual full refresh requested of it before being requeried. This will ensure that all recent inserts, updates, & deletes are accurately reflected in the result set.
The following are examples of working with materialized views in Python.
To create a materialized view (recent_order
) on a single table
(order
), and keep recent_order
current as new records are inserted into
order
:
retobj = gpudb.create_join_table(
join_table_name = 'recent_order',
table_names = ['orders'],
column_names = ['*'],
options = {'refresh_method':'on_insert'}
)
To create a materialized view (recent_customer_order
) across three
tables: a base table (orders
), a parent table (customer
), and a
parent table to that (nation
), and keep recent_customer_order
current
as new records are inserted into orders
:
retobj = gpudb.create_join_table(
join_table_name = 'recent_customer_order',
table_names = ['orders','customer','nation'],
column_names = ['*'],
expressions = ['o_custkey = c_custkey','c_nationkey = n_nationkey'],
options = {'refresh_method':'on_insert'}
)
NOTE: In the example above, the customer
& nation
tables must
have primary keys on custkey
& nationkey
,
respectively, for the on insert scheme to function properly.
To create a materialized view (customer_location
) across three tables: a
table (customer
), its parent table (nation
), and the parent table
of that (region
), and make customer_location
current upon querying it
after new records are inserted into customer
, nation
, or region
:
retobj = gpudb.create_join_table(
join_table_name = 'customer_location',
table_names = ['customer','nation','region'],
column_names = ['*'],
expressions = ['c_nationkey = n_nationkey','n_regionkey = r_regionkey'],
options = {'refresh_method':'on_query'}
)
NOTE: In the example above, there are no primary key requirements.