Version:

Merge Records Tables

Merging Records is similar to a union in that multiple tables or views are merged together to create a new merge records table. Since a table is created by the merge records operation, records can be added, modified, or deleted. If you want to duplicate the records in a table or view, merging a table or view with itself is also possible, but the data set will need to be listed twice in the source_table_names parameter. The endpoint is also one of the few to allow operations involving unrestricted string columns, meaning you can merge any number of unrestricted string columns together.

Merge records tables cannot be sharded, regardless of the composition of the source data sets being merged. Merge records tables are also persisted by default, even if only views are merged.

Note that it is possible to create n new columns in a merge records table from different columns in separate tables, but this would add nulls where there normally would be corresponding values.

For example, given the following two tables:

tableA

col_x col_y
3 7
4 8

tableB

col_z col_q
1 9
2 10

Merging the records in such a way that col_x becomes the merged table's first column and col_q becomes the second column, the resulting table would resemble the following:

new_col1 new_col2
3 <null>
4 <null>
<null> 9
<null> 10

Creating a Merge Records Table

To create a merge records table using the /merge/records endpoint requires three parameters:

  1. the name of the table to be created
  2. the name(s) of the source dataset(s)
  3. maps containing the new columns mapped to columns (or column expressions) from the source dataset(s)

Example

Given a lunch_menu table and a dinner_menu table, a merge records table can be created, which would combine all of the items from both menus into a single list with three columns: food_name, category, & price. In Python:

h_db.merge_records(
    table_name = "example.lunch_and_dinner_menu_combined",
    source_table_names = ["example.lunch_menu", "example.dinner_menu"],
    field_maps = [
        {"food_name": "lunch_item_name", "category": "category", "price": "lunch_cost"},
        {"food_name": "dinner_item_name", "category": "category", "price": "dinner_cost"}
    ]
)

Limitations and Cautions

  • A merge records table is persisted by default, which means its TTL is set to never cause it to expire. Note that the merge records table is not protected, which means it can still be deleted.

  • The merge records table will only inherit nullability from its source tables. Other column properties (e.g., compression, store only, etc.), any keys (shard or primary), or replication will not be inherited.

  • The /merge/records endpoint won't merge two columns from one dataset into the same new column in the same map, e.g., the value in the below field_maps parameter is valid, but the values in the make column won't be available in the merge records table, as the second assignment of model to vehicle_type will override the first:

    field_maps = [
        {"vehicle_type": "make", "vehicle_type": "model"}
    ]
    
  • Columns being merged must match types, e.g., a column type of int16 will merge successfully with another column type of int16, but attempting to merge with a type of int, int8, double, etc., will not work.

  • Store-only columns cannot have merge records operations applied to them.

  • Collections cannot have merge records operations applied to them.

  • The merge records table does not get updated if source data set(s) are updated.

  • The number of data sets in the source_table_names parameter needs to match the number of maps included in the field_maps parameter and vice versa.

  • The order of the maps in the field_maps parameter need to match the order of the data sets listed in the source_table_names parameter, e.g. the following is not valid and will return an error:

    h_db.merge_records(
        table_name = "example.invalid_lunch_and_dinner_merge",
        source_table_names = ["example.lunch_menu", "example.dinner_menu"],
        field_maps = [
            {"food_name": "dinner_item_name", "category": "category", "price": "dinner_cost"},
            {"food_name": "lunch_item_name", "category": "category", "price": "lunch_cost"}
        ]
    )