Union - Kinetica Docs

A union is a representation of all rows that appear in any of a set of specified data sets (tables or views).

A union is somewhat analogous to creating a table from a SQL UNION of two or more tables. See CREATE TABLE … AS and UNION for details.

A union is performed via the /create/union endpoint, using the union or union_all mode:

Union — retains all unique rows from the specified data sets
Union All — retains all rows from the specified data sets

Set intersection and set subtraction are also available, and their descriptions and limitations can be found on Intersect and Except, respectively.

Performing a union creates a separate memory-only table containing the results. Union results can be persisted (like tables) using the persist option. A union result table name must adhere to the standard naming criteria. Each union result exists within a schema and follows the standard name resolution rules for tables. Note that if the source data sets are replicated, the results of the union will also be replicated. If the included data sets are sharded, the resulting memory-only table from the union will also be sharded; this also means that if a non-sharded data set is included, the resulting memory-only table will also be non-sharded. Limitations on using union are discussed in further detail in the Limitations and Cautions section.

Union-Compatible Data Types

You can union any number or combination of data sets as long as the columns across the data sets being used have similar data types. Kinetica will cast compatible data types as follows:

int8

Compatible With	Result Type
int8	int8

int16

Compatible With	Result Type
int16	int16

int

Compatible With	Result Type
int	int
long	long
float	float
double	double
decimal	decimal

long

Compatible With	Result Type
int	long
long	long
float	float
double	double
decimal	decimal

float

Compatible With	Result Type
int	float
long	float
float	float
double	double
decimal	float

double

Compatible With	Result Type
int	double
long	double
float	double
double	double
decimal	double

decimal

Compatible With	Result Type
int	decimal
long	decimal
float	float
double	double
decimal	decimal

date

Compatible With	Result Type
date	date
datetime	datetime
timestamp	timestamp

time

Compatible With	Result Type
time	time

datetime

Compatible With	Result Type
date	datetime
datetime	datetime

timestamp

Compatible With	Result Type
date	timestamp
timestamp	timestamp

charN

Compatible With	Result Type
charN	charN (whichever N is larger)

wkt

Compatible With	Result Type
wkt	wkt

ipv4

Compatible With	Result Type
ipv4	ipv4

Performing a Union

To perform a union of data sets, the /create/union endpoint requires five parameters:

the name of the memory-only table to be created
the list of member data sets to be used in the union operation; the result will contain all of the elements from the first data set and all of the elements from the second one
the list of columns from each of the given data sets to be used in the union operation
the list of column names to be output to the resulting memory-only table
the union mode specified in the options input parameter
- union_all (the default option)
- union or union_distinct

If you do not specify a union mode, union_all will be used.

Examples

A union all between the lunch_menu table and the dinner_menu table would look like:

CREATE OR REPLACE TABLE example.lunch_or_dinner_menu_all AS
SELECT
    food_name AS lunch_or_dinner_food_name,
    category,
    price
FROM
    example.lunch_menu
UNION ALL
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

kinetica.create_union(
    table_name = "example.lunch_or_dinner_menu_all",
    table_names = ["example.lunch_menu", "example.dinner_menu"],
    input_column_names = [
        ["food_name", "category", "price"],
        ["food_name", "category", "price"]
    ],
    output_column_names = ["lunch_or_dinner_food_name", "category", "price"]
)

The results from the above call would contain all menu items (including duplicates) in the extracted columns from lunch_menu and dinner_menu. A union (or union distinct) using the same tables can be performed via:

CREATE OR REPLACE TABLE example.lunch_or_dinner_menu_unique AS
SELECT
    food_name AS lunch_or_dinner_food_name,
    category,
    price
FROM
    example.lunch_menu
UNION
SELECT
    food_name,
    category,
    price
FROM
    example.dinner_menu

kinetica.create_union(
    table_name = "example.lunch_or_dinner_menu_unique",
    table_names = ["example.lunch_menu", "example.dinner_menu"],
    input_column_names = [
        ["food_name", "category", "price"],
        ["food_name", "category", "price"]
    ],
    output_column_names = ["lunch_or_dinner_food_name", "category", "price"],
    options = {"mode": "union"}
)

The result of the union call above would contain all menu items (excluding duplicates) in the extracted columns from lunch_menu and dinner_menu.

Retrieving Union Data

To retrieve records from the union results:

SELECT lunch_or_dinner_food_name, category, price
FROM example.lunch_or_dinner_menu_unique
ORDER BY lunch_or_dinner_food_name

gpudb.GPUdbTable(name = "example.lunch_or_dinner_menu_unique", db = kinetica).get_records_by_column(
    ["lunch_or_dinner_food_name", "category", "price"],
    options = {"order_by": "lunch_or_dinner_food_name"},
    print_data = True
)

Limitations and Cautions

Performing a union between two data sets results in an entirely new data set, so be mindful of the memory usage implications.
All data sets have to be replicated or not replicated, e.g., you cannot union replicated and non-replicated data sets.
If attempting to perform a distinct union on sharded data sets, all data sets have to be sharded similarly (if all data is not on the same processing node, the distinct union can’t be calculated properly).
The result of a union operation does not get updated if source data set(s) are updated.
The input_column_name parameter vector size needs to match the number of data sets listed, i.e. if you want to union a data set to itself, the data set will need to be listed twice in the table_names parameter.
The input_column_name parameter vectors need to be listed in the same order as their source data sets, e.g., if three data sets are listed in the table_names parameter, the first data set’s columns should be listed first in the input_column_name parameter, etc.
The result of a union is transient, by default, and will expire after the default TTL setting.
The result of a union is not persisted, by default, and will not survive a database restart; specifying a persist option of true will make the table permanent and not expire.

​Union-Compatible Data Types

​int8

​int16

​int

​long

​float

​double

​decimal

​date

​time

​datetime

​timestamp

​charN

​wkt

​ipv4

​Performing a Union

​Examples

​Retrieving Union Data

​Limitations and Cautions

Union-Compatible Data Types

int8

int16

int

long

float

double

decimal

date

time

datetime

timestamp

charN

wkt

ipv4

Performing a Union

Examples

Retrieving Union Data

Limitations and Cautions