Merging Records is similar to a
union in that multiple tables or
views are merged together to create a new merge records table.
Since a table is created by the merge records operation, records can be
added, modified, or deleted. If you want to duplicate the records in a table
or view, merging a table or view with itself is also possible, but the
data set will need to be listed twice in the source_table_names
parameter.
The endpoint is also one of the few to allow operations involving unrestricted
string columns, meaning you can merge any number of unrestricted string columns
together.
Merge records tables cannot be sharded, regardless of the composition of the source data sets being merged. Merge records tables are also persisted by default, even if only views are merged.
Note that it is possible to create n new columns in a merge records table from different columns in separate tables, but this would add nulls where there normally would be corresponding values.
For example, given the following two tables:
tableA
col_x | col_y |
---|---|
3 | 7 |
4 | 8 |
tableB
col_z | col_q |
---|---|
1 | 9 |
2 | 10 |
Merging the records in such a way that col_x
becomes the merged table's
first column and col_q
becomes the second column, the resulting table would
resemble the following:
new_col1 | new_col2 |
---|---|
3 | <null> |
4 | <null> |
<null> | 9 |
<null> | 10 |
To create a merge records table using the /merge/records endpoint requires three parameters:
Given a lunch_menu
table and a dinner_menu
table, a
merge records table can be created, which would combine all of the items from
both menus into a single list with three columns: food_name
, category
, &
price
. In Python:
h_db.merge_records(
table_name = "example.lunch_and_dinner_menu_combined",
source_table_names = ["example.lunch_menu", "example.dinner_menu"],
field_maps = [
{"food_name": "lunch_item_name", "category": "category", "price": "lunch_cost"},
{"food_name": "dinner_item_name", "category": "category", "price": "dinner_cost"}
]
)
A merge records table is persisted by default, which means its TTL is set to never cause it to expire. Note that the merge records table is not protected, which means it can still be deleted.
The merge records table will only inherit nullability from its source tables. Other column properties (e.g., compression, store only, etc.), any keys (shard or primary), or replication will not be inherited.
The /merge/records endpoint won't merge
two columns from one dataset into the same new column in the same map, e.g.,
the value in the below field_maps
parameter is valid, but the values
in the make
column won't be available in the merge records table, as the
second assignment of model
to vehicle_type
will override the first:
field_maps = [
{"vehicle_type": "make", "vehicle_type": "model"}
]
Columns being merged must match types, e.g., a column
type of int16
will merge successfully with another column type of
int16
, but attempting to merge with a type of int
, int8
,
double
, etc., will not work.
Store-only columns cannot have merge records operations applied to them.
Collections cannot have merge records operations applied to them.
The merge records table does not get updated if source data set(s) are updated.
The number of data sets in the source_table_names
parameter needs to match
the number of maps included in the field_maps
parameter and vice versa.
The order of the maps in the field_maps
parameter need to match the order
of the data sets listed in the source_table_names
parameter, e.g. the
following is not valid and will return an error:
h_db.merge_records(
table_name = "example.invalid_lunch_and_dinner_merge",
source_table_names = ["example.lunch_menu", "example.dinner_menu"],
field_maps = [
{"food_name": "dinner_item_name", "category": "category", "price": "dinner_cost"},
{"food_name": "lunch_item_name", "category": "category", "price": "lunch_cost"}
]
)