An except is a representation of all unique rows in one data set (table or view) that do not appear in another. Excepts on collections are not supported. See EXCEPT for the SQL version of this operation.
An except is performed via the /create/union endpoint
using the except
or except_all
mode:
You can only perform an except two data sets, and the columns between the two must have similar data types. Kinetica will cast compatible data types as depicted here.
Performing an except creates a separate
memory-only table containing the results. Excepts
can be persisted (like tables) using the persist
option.
Note that if the source tables or views are replicated, the results of the except will also be replicated. If the included tables or views are sharded, the resulting memory-only table from the except will also be sharded; this also means that if a non-sharded table or view is included, the resulting memory-only table will also not be sharded.
Limitations on using except are discussed in further detail in the Limitations section.
To perform an except on two data sets, the /create/union endpoint requires five parameters:
options
input parameterIn Python, an except between the lunch_menu
table and the
dinner_menu
table would look like:
h_db.create_union(
table_name = "example.lunch_minus_dinner_menu",
table_names = ["example.lunch_menu", "example.dinner_menu"],
input_column_names = [
["food_name", "category", "price"],
["food_name", "category", "price"]
],
output_column_names = ["lunch_food_name", "category", "price"],
options = {"mode": "except"}
)
The results from the above call would contain all menu items (excluding
duplicates) found in the extracted columns from the lunch
table that are
not found in the extracted columns from the dinner
table.
Note
Since the example includes price
and all columns selected must
match between the two sets for an item to be eliminated, a lunch item
that is priced differently as a dinner item would still appear in the
result set.
The except created in the Performing an Except section can be filtered for lunch food items that are in the sandwich category:
h_db.filter(
table_name = "example.lunch_minus_dinner_menu",
view_name = "example.lunch_only_sandwiches",
expression = "category = 'sandwich'"
)
When executed against an except, the /filter endpoint produces a filtered view. A chain of these filters could be used to create more and more restrictive views from the original except operation.
To retrieve records from the except results in Python:
product_table = gpudb.GPUdbTable(
_type = None,
name = "example.lunch_minus_dinner_menu",
db = h_db
)
product_table.get_records_by_column(
[
"lunch_food_name",
"category",
"price"
],
options = {"order_by": "lunch_food_name"},
print_data = True
)
input_column_name
parameter vector size needs to match the number of
data sets listed, i.e. if you want to perform an except between a data set
and itself, the data set will need to be listed twice in the table_names
parameter.input_column_name
parameter vectors need to be listed in the same
order as their source data sets, e.g., if two data sets are listed in the
table_names
parameter, the first data set's columns should be listed first
in the input_column_name
parameter, etc.persist
option of true
will make the
table permanent and not expire.