Version:

Except

An except is a representation of all unique rows in one data set (table or view) that do not appear in another. Excepts on collections are not supported.

An except is performed via the /create/union endpoint using the except or except_all mode:

  • Except -- all unique rows that exist in one data set, but not the other
  • Except All -- all rows (including duplicates) that exist in one data set, but not the other

You can only perform an except two data sets, and the columns between the two must have similar data types. Kinetica will cast compatible data types as depicted here.

Performing an except creates a separate in-memory table containing the results. Excepts can be persisted (like tables) using the persist option.

Note that if the source tables or views are replicated, the results of the except will also be replicated. If the included tables or views are sharded, the resulting in-memory table from the except will also be sharded; this also means that if a non-sharded table or view is included, the resulting in-memory table will also not be sharded.

Limitations on using except are discussed in further detail in the Limitations section.

Performing an Except

To perform an except on two data sets, the /create/union endpoint requires five parameters:

  1. the name of the in-memory table to be created
  2. the list of the two member data sets to be used in the except operation
  3. the list of columns from each of the given data sets to be used in the except operation
  4. the list of column names to be output to the resulting in-memory table
  5. the except mode specified in the options input parameter

Example

In Python, an except between tables dinner_menu and lunch_menu would look like:

gpudb.create_union(
  table_name = "menu_minus_dinner",
  table_names = ["lunch_menu","dinner_menu"],
  input_column_names = [
    ["food_name", "category", "price"],
    ["food_name", "category", "price"]
  ],
  output_column_names = ["lunch_food", "category", "price"],
  options = {"mode":"except"}
)

The results from the above call would contain all menu items (excluding duplicates) found in the extracted columns from the lunch table that are not found in the extracted columns from the dinner table. The result would match what would be produced by the SQL:

SELECT
  food_name,
  category,
  price
FROM
  lunch_menu

EXCEPT

SELECT
  food_name,
  category,
  price
FROM
  dinner_menu

Note

Because the example includes price, if you had two of the same items that were priced differently for lunch and dinner, the item would appear twice in the resulting except view because the rows would not be exact duplicates of each other.

Operating on an Except

Examples

A Python example filter on the except created in the Performing an Except section for food that are the sandwich category:

gpudb.filter(
  table_name = "menu_minus_dinner",
  view_name = "sandwiches",
  expression = "category = sandwich"
)

When excecuted against an except, the /filter endpoint produces a filtered view. A chain of these filters could be used to create more and more restrictive views from the original except operation.

To retrieve records from the except results in Python:

gpudb.get_records(
  table_name = "menu_minus_dinner",
  offset = 0,
  limit = 30,
  encoding = "binary"
)

Limitations

  • Performing an except between two data sets results in an entirely new data set, so be mindful of the memory usage implications
  • All data sets have to be replicated or not replicated, e.g., you cannot perform an except on replicated and non-replicated data sets
  • If attempting to perform an except on sharded data sets, all data sets have to be sharded similarly (if all data is not on the same processing node, the except can't be calculated properly)
  • The result of an except operation does not get updated if source data set(s) are updated
  • The result of an except has no permanence by default because it is not protected and will expire after the default TTL setting
  • The input_column_name parameter vector size needs to match the number of data sets listed, i.e. if you want to perform an except between a data set and itself, the data set will need to be listed twice in the table_names parameter
  • The input_column_name parameter vectors need to be listed in the same order as their source data sets, e.g., if two data sets are listed in the table_names parameter, the first data set's columns should be listed first in the input_column_name parameter, etc.
  • An except cannot be performed using a data set with non-charN string fields
  • Store-only columns and non-charN string column types cannot have except operations applied to them