An intersect is somewhat analogous to creating a table from a SQL
INTERSECT of two tables. See CREATE TABLE … AS and
INTERSECT for details.intersect or intersect_all mode:
- Intersect — all unique rows that exist in both specified data sets
- Intersect All — all rows (including duplicates) that exist in both specified data sets
You can only perform an intersect two data sets, and the columns between the
two must have similar data types. Kinetica will cast compatible
data types as depicted here.
Performing an intersect creates a separate
memory-only table containing the results.
Intersect results can be persisted (like tables) using
the
persist option.
An intersect result table name must adhere to the standard
naming criteria. Each intersect result
exists within a schema and follows the standard
name resolution rules for tables.
Note that if the source data sets are replicated,
the results of the intersect will also be replicated. If the included data
sets are sharded, the resulting memory-only table
from the intersect will also be sharded; this also means that if a
non-sharded data set is included, the resulting memory-only table will also be
non-sharded.
Limitations on using intersect are discussed in further detail in the
Limitations section.
Performing an Intersect
To perform an intersect of two data sets, the /create/union endpoint requires five parameters:- the name of the memory-only table to be created
- the list of the two member data sets to be used in the intersect operation; the result will contain all of the elements from the first data set that are also in the second one
- the list of columns from each of the given data sets to be used in the intersect operation
- the list of column names to be output to the resulting memory-only table
- the intersect mode specified in the
optionsinput parameter
Example
An intersect between thelunch_menu table and the
dinner_menu table would look like:
lunch_menu and
dinner_menu.
Since the example includes
price and all columns selected must
match between the two sets for an item to be included, a lunch item
that is priced differently as a dinner item would not appear in the
result set.Retrieving Intersect Data
To retrieve records from the intersect results:Limitations
- Performing an intersect between two data sets results in an entirely new data set, so be mindful of the memory usage implications.
- All data sets have to be replicated or not replicated, e.g., you cannot intersect replicated and non-replicated data sets.
- If attempting to intersect sharded data sets, all data sets have to be sharded similarly (if all data is not on the same processing node, the intersect can’t be calculated properly).
- The result of an intersect operation does not get updated if source data set(s) are updated.
- The
input_column_nameparameter vector size needs to match the number of data sets listed, i.e. if you want to intersect a data set to itself, the data set will need to be listed twice in thetable_namesparameter. - The
input_column_nameparameter vectors need to be listed in the same order as their source data sets, e.g., if two data sets are listed in thetable_namesparameter, the first data set’s columns should be listed first in theinput_column_nameparameter, etc. - The result of an intersect is transient, by default, and will expire after the default TTL setting.
- The result of an intersect is not persisted, by default, and will not
survive a database restart; specifying a
persistoption oftruewill make the table permanent and not expire.