Kinetica supports the concept of set union, which is a representation of all rows that appear across a group of specified data sets (tables or views). Unions on collections are not supported. See UNION for the SQL version of this operation.
A union is performed using the /create/union endpoint, using one of the following modes:
OR operation, e.g., if filter 1
creates a filtered view using the expression x = 10 and filter 2 creates
a filtered view using the expression x <= 10, then the merge views
operation creates a new view using the expression x = 10 OR x <= 10Note
Set intersection and set substraction are also available, but their descriptions and limitations can be found on Intersect and Except, respectively.
You can union any number or combination of data sets as long as the columns across the data sets being used have similar data types. Kinetica will cast compatible data types as follows:
| Type | Compatible With | Result Type |
|---|---|---|
| int8 | int8 | int8 |
| int16 | int16 | int16 |
| int | int | int |
| long | long | |
| float | float | |
| double | double | |
| decimal | decimal | |
| long | int | long |
| long | ||
| float | float | |
| double | double | |
| decimal | decimal | |
| float | int | float |
| long | ||
| float | ||
| double | double | |
| decimal | float | |
| double | int | double |
| long | ||
| float | ||
| double | ||
| decimal | ||
| decimal | int | decimal |
| long | ||
| float | float | |
| double | double | |
| decimal | decimal | |
| date | date | date |
| datetime | datetime | |
| timestamp | timestamp | |
| time | time | time |
| datetime | date | datetime |
| datetime | ||
| timestamp | date | timestamp |
| timestamp | ||
| charN | charN | charN (whichever N is larger) |
| wkt | wkt | wkt |
| ipv4 | ipv4 | ipv4 |
Performing a union creates a separate
memory-only table containing the results. Unions
can be persisted (like tables) using the persist option.
Note that if the source data sets are replicated, the results of the union will also be replicated. If the included data sets are sharded, the resulting memory-only table from the union will also be sharded; this also means that if a non-sharded data set is included, the resulting memory-only table will also be non-sharded.
Limitations on using union are discussed in further detail in the Limitations and Cautions section.
To perform a union of data sets, the /create/union endpoint requires five parameters:
options input parameterunion_all (the default option)union or union_distinctmerge_views -- Note that if this option is selected, the
input_column_names AND output_column_names parameters are ignoredNote
If you do not specify a union mode, union_all will be used.
In Python, a union all between the lunch_menu table and the
dinner_menu table would look like:
h_db.create_union(
table_name = "example.lunch_or_dinner_menu_all",
table_names = ["example.lunch_menu", "example.dinner_menu"],
input_column_names = [
["food_name", "category", "price"],
["food_name", "category", "price"]
],
output_column_names = ["lunch_or_dinner_food_name", "category", "price"]
)
The results from the above call would contain all menu items (including
duplicates) in the extracted columns from lunch_menu and dinner_menu.
A union (or union distinct) using the same tables can be performed via:
h_db.create_union(
table_name = "example.lunch_or_dinner_menu_unique",
table_names = ["example.lunch_menu", "example.dinner_menu"],
input_column_names = [
["food_name", "category", "price"],
["food_name", "category", "price"]
],
output_column_names = ["lunch_or_dinner_food_name", "category", "price"],
options = {"mode": "union"}
)
The result of the union distinct call above would contain all menu items
(excluding duplicates) in the extracted columns from lunch_menu and
dinner_menu.
If there were a single menu table with two filtered views on it, one filtering for lunch sandwiches and one filtering for dinner sandwiches, a merge views of the two could be performed to arrive at a filtered view of all sandwiches via:
h_db.create_union(
table_name = "example.lunch_merge_dinner_sandwiches",
table_names = ["example.lunch_sandwiches", "example.dinner_sandwiches"],
options = {"mode": "merge_views"}
)
The union created in the Performing a Union section can be filtered for lunch or dinner food items that are in the sandwich category:
h_db.filter(
table_name = "example.lunch_or_dinner_menu_unique",
view_name = "example.lunch_or_dinner_sandwiches_unique",
expression = "category = 'sandwich'"
)
When executed against a union, the /filter endpoint produces a filtered view. A chain of these filters could be used to create more and more restrictive views from the original union operation.
To retrieve records from the union results in Python:
product_table = gpudb.GPUdbTable(
_type = None,
name = "example.lunch_or_dinner_menu_unique",
db = h_db
)
product_table.get_records_by_column(
[
"lunch_or_dinner_food_name",
"category",
"price"
],
options = {"order_by": "lunch_or_dinner_food_name"},
print_data = True
)
There are several limitations to creating and using unions:
input_column_name parameter vector size needs to match the number of
data sets listed, i.e. if you want to union a data set to itself, the
data set will need to be listed twice in the table_names parameter.input_column_name parameter vectors need to be listed in the same
order as their source data sets, e.g., if three data sets are listed in the
table_names parameter, the first data set's columns should be listed first
in the input_column_name parameter, etc.persist option of true will make the
table permanent and not expire.