An except is a representation of all unique rows in one data set
(table or view) that do not appear in another.
Performing an except creates a separate view containing the results. An
except is performed via the /create/union endpoint using the
except
mode.
You can only perform an except on two data sets, but those data sets can be any combination of tables or views as long as the columns across the data sets being used have similar data types; note that you cannot perform an except on collections. Kinetica will cast compatible data types and create a new type schema. The list of compatible data types can be found here.
Note that if the source tables or views are replicated, the results of the except will also be replicated. If the included tables or views are sharded, the resulting view from the except will also be sharded; this also means that if a non-sharded table or view is included, the resulting view will also not be sharded.
Limitations on using except are discussed in further detail in the Limitations section.
To perform an except, the /create/union endpoint requires five parameters:
except
mode specified in the options
input parameterIn Python, an except between tables dinner_menu
and
lunch_menu
would look like:
gpudb.create_union(
table_name = "menu_minus_dinner",
table_names = ["lunch_menu","dinner_menu"],
input_column_names = [
["food_name", "category", "price"],
["food_name", "category", "price"]
],
output_column_names = ["lunch_food", "category", "price"],
options = {"mode":"except"}
)
The results from the above call would contain all menu items (excluding
duplicates) found in the extracted columns from the lunch
table that are
not found in the extracted columns from the dinner
table. The result would
match what would be produced by the SQL
SELECT
food_name,
category,
price
FROM
lunch_menu
EXCEPT
SELECT
food_name,
category,
price
FROM
dinner_menu
Note
Because the example includes price
, if you had two of the same
items that were priced differently for lunch and dinner, the item
would appear twice in the resulting except view because the rows
would not be exact duplicates of each other.
A Python example filter on the except created in the Performing an Except section for food that are the sandwich category:
gpudb.filter(
table_name = "menu_minus_dinner",
view_name = "sandwiches",
expression = "category = sandwich"
)
When excecuted against an except, the /filter endpoint produces a view. A chain of these filters could be used to create more and more restrictive views from the original except operation.
To retrieve records from the except results in Python:
gpudb.get_records(
table_name = "menu_minus_dinner",
offset = 0,
limit = 30,
encoding = "binary"
)
persist
option upon creation, the view
will become a table and no longer have this restrictioninput_column_name
parameter vector size needs to match the number of
data sets listed, i.e. if you want to perform an except between a data set
and itself, the data set will need to be listed twice in the table_names
parameterinput_column_name
parameter vectors need to be listed in the same
order as their source data sets, e.g., if two data sets are listed in the
table_names
parameter, the first data set's columns should be listed first
in the input_column_name
parameter, etc.