Kinetica supports the concept of a materialized view, which is a greater-performant query across one or more tables.
A materialized view consists of one or more operations against one or more input tables. These operations will result in member tables that will be used to construct the materialized view and keep its data current. The member tables will be managed by the materialized view and will not be available to be operated on directly outside of the materialized view.
Member tables can be the result of any of the following:
result_table
optionresult_table
optionresult_table
optionThe materialized view can be accessed via its root table, which is the final member table in the materialized view creation sequence.
When a change occurs in an input table, the member tables that are derived from that input table are marked as stale and in need of rebuilding if and when a data refresh is requested. When such a request is received, any stale member tables are rebuilt, and the root table will once again contain current data.
There are seven events that will cause the member tables of a materialized view to be marked as stale:
There are three modes that govern the way in which a materialized view's data can be kept current:
Note
The materialized view manages the refresh processing for all entities contained within it. The specified refresh mode of any member table (join, for instance) will be ignored.
In manual mode, the materialized view will only be made fully current by performing a manual refresh of the view. This will result in the immediate rebuilding of any of the view's member tables that require it.
In on change mode, the materialized view will be kept current by detecting changes in any member tables or input tables and updating any member tables that require it. This behavior means that the returned data will always be current, with no overhead on query, but with potentially significant refresh processing for each change in the supporting data.
Note
The granularity of the change detection is at the endpoint call level; e.g., a call to /update/records that resulted in 1,000,000 records being updated would be detected as one change, and one refresh would be initiated.
In periodic mode, the materialized view will be kept current by refreshing the data at a given interval. This behavior means that the returned data may not be current for a given query, but there will be no query overhead and no refresh waiting time for modification calls as with on change mode.
When the beginning of a period is reached, the time of the following period is calculated and a refresh process initiated. If the refresh process does not complete by that pre-calculated time, another refresh process will be initiated immediately after the first completes. When this next refresh begins, the time of the next period in the sequence that occurs at some time past the current time will be calculated and the cycle repeated. In this manner, only one catch-up refresh will be processed regardless of the number of periods skipped by a long-running refresh.
A manual refresh of a materialized view can be performed on any materialized view, regardless of operating mode, to update the view's data. If a refresh is already in progress, the manual refresh request will wait until the currently executing refresh is complete before executing.
An example, in Python, of performing a manual refresh of a
materialized view (customer_order_count
):
retobj = gpudb.alter_table(
table_name = 'customer_order_count',
action = 'refresh',
value = ''
)
There are three steps in creating a materialized view using the native API:
To create a materialized view in SQL, see the CREATE VIEW section under SQL Support.
The following examples use the Python API to construct a materialized view.
To create a materialized view (large_order
) on a single table
(order
), and have large_order
show records in order
that have a
total value greater than $10,000 (as denoted by the table's total
column),
refreshed every time the data in order
is updated:
view_id = gpudb.create_materialized_view(
table_name = 'large_order',
options = {'refresh_method':'on_change'}
)['view_id']
retobj = gpudb.filter(
table_name = 'order',
view_name = 'large_order',
expression = 'total > 10000',
options = {'view_id':view_id}
)
To create a materialized view (customer_order_count
) across two
tables: a base table (order
) and a parent table (customer
), and
refresh the order total per customer in customer_order_count
every hour:
view_id = gpudb.create_materialized_view(
table_name = 'customer_order_count',
options = {'refresh_method':'periodic', 'refresh_period':'3600'}
)['view_id']
retobj = gpudb.create_join_table(
join_table_name = 'customer_order',
table_names = ['order','customer'],
column_names = ['*'],
expressions = ['o_custkey = c_custkey'],
options = {'view_id':view_id}
)
retobj = gpudb.aggregate_group_by(
table_name = 'customer_order',
column_names = ['c_custname', 'count(*) as num_orders'],
options = {
'result_table':'customer_order_count',
'view_id':view_id
}
)
persist
option set to
true
to a materialized view will fail. The materialized view's own
persist
setting will determine whether it is persisted or not.