An except is a representation of all rows in one data set (table or view) that do not appear in another.
Note
An except is somewhat analogous to creating a table from a SQL EXCEPT of two tables. See CREATE TABLE ... AS and EXCEPT for details.
An except is performed via the /create/union endpoint, using the except or except_all mode:
- Except -- all unique rows that exist in one data set, but not the other
- Except All -- all rows (including duplicates) that exist in one data set, but not the other
Note
Set union and set intersection are also available, and their descriptions and limitations can be found on Union and Intersect, respectively.
You can only perform an except two data sets, and the columns between the two must have similar data types. Kinetica will cast compatible data types as depicted here.
Performing an except creates a separate memory-only table containing the results. Except results can be persisted (like tables) using the persist option.
An except result table name must adhere to the standard naming criteria. Each except result exists within a schema and follows the standard name resolution rules for tables.
Note that if the source data sets are replicated, the results of the except will also be replicated. If the included data sets are sharded, the resulting memory-only table from the except will also be sharded; this also means that if a non-sharded data set is included, the resulting memory-only table will also be non-sharded.
Limitations on using except are discussed in further detail in the Limitations section.
Performing an Except
To perform an except on two data sets, the /create/union endpoint requires five parameters:
- the name of the memory-only table to be created
- the list of the two member data sets to be used in the except operation; the result will contain all of the elements from the first data set that don't also exist in the second one
- the list of columns from each of the given data sets to be used in the except operation
- the list of column names to be output to the resulting memory-only table
- the except mode specified in the options input parameter
Example
An except between the lunch_menu table and the dinner_menu table would look like:
| |
| |
The results from the above call would contain all menu items (excluding duplicates) found in the extracted columns from the lunch table that are not found in the extracted columns from the dinner table.
Note
Since the example includes price and all columns selected must match between the two sets for an item to be eliminated, a lunch item that is priced differently as a dinner item would still appear in the result set.
Retrieving Except Data
To retrieve records from the except results:
| |
| |
Limitations
- Performing an except between two data sets results in an entirely new data set, so be mindful of the memory usage implications.
- All data sets have to be replicated or not replicated, e.g., you cannot except replicated and non-replicated data sets.
- If attempting to perform an except on sharded data sets, all data sets have to be sharded similarly (if all data is not on the same processing node, the except can't be calculated properly).
- The result of an except operation does not get updated if source data set(s) are updated.
- The input_column_name parameter vector size needs to match the number of data sets listed, i.e. if you want to except a data set from itself, the data set will need to be listed twice in the table_names parameter.
- The input_column_name parameter vectors need to be listed in the same order as their source data sets, e.g., if two data sets are listed in the table_names parameter, the first data set's columns should be listed first in the input_column_name parameter, etc.
- The result of an except is transient, by default, and will expire after the default TTL setting.
- The result of an except is not persisted, by default, and will not survive a database restart; specifying a persist option of true will make the table permanent and not expire.