An intersect is a representation of all rows that appear in both of a pair of specified data sets (tables or views). Intersects on collections are not supported. See INTERSECT for the SQL version of this operation.
An intersect is performed via the /create/union endpoint
using the intersect or intersect_all mode:
You can only perform an intersect two data sets, and the columns between the two must have similar data types. Kinetica will cast compatible data types as depicted here.
Performing an intersect creates a separate
memory-only table containing the results.
Intersects can be persisted (like tables) using the
persist option.
Note that if the source tables or views are replicated, the results of the intersect will also be replicated. If the included tables or views are sharded, the resulting memory-only table from the intersect will also be sharded; this also means that if a non-sharded table or view is included, the resulting view will also not be sharded.
Limitations on using intersect are discussed in further detail in the Limitations section.
To perform an intersect of two data sets, the /create/union endpoint requires five parameters:
options input parameterIn Python, an intersect between the lunch_menu table and the
dinner_menu table would look like:
h_db.create_union(
table_name = "example.lunch_and_dinner_menu",
table_names = ["example.lunch_menu", "example.dinner_menu"],
input_column_names = [
["food_name", "category", "price"],
["food_name", "category", "price"]
],
output_column_names = ["lunch_and_dinner_food_name", "category", "price"],
options = {"mode": "intersect"}
)
The results from the above call would contain only the menu items (excluding
duplicates) found in the extracted columns from both lunch_menu and
dinner_menu.
The intersect created in the Performing an Intersect section can be filtered for lunch & dinner food items that are in the sandwich category:
h_db.filter(
table_name = "example.lunch_and_dinner_menu",
view_name = "example.lunch_and_dinner_sandwiches",
expression = "category = 'sandwich'"
)
When executed against an intersect, the /filter endpoint produces filtered view. A chain of these filters could be used to create more and more restrictive filtered views from the original intersect operation.
To retrieve records from the intersect results in Python:
product_table = gpudb.GPUdbTable(
_type = None,
name = "example.lunch_and_dinner_menu",
db = h_db
)
product_table.get_records_by_column(
[
"lunch_and_dinner_food_name",
"category",
"price"
],
options = {"order_by": "lunch_and_dinner_food_name"},
print_data = True
)
input_column_name parameter vector size needs to match the number of
data sets listed, i.e. if you want to intersect a data set to itself, the
data set will need to be listed twice in the table_names parameter.input_column_name parameter vectors need to be listed in the same
order as their source data sets, e.g., if two data sets are listed in the
table_names parameter, the first data set's columns should be listed first
in the input_column_name parameter, etc.persist option of true will
make the table permanent and not expire.