Merging Records is similar to a
union in that multiple tables or
views are merged together to create a new merge records table.
Since a table is created by the merge records operation, records can be
added, modified, or deleted. If you want to duplicate the records in a table
or view, merging a table or view with itself is also possible, but the
data set will need to be listed twice in the source_table_names
parameter.
The endpoint is also one of the few to allow operations involving unrestricted
string columns, meaning you can merge any number of unrestricted string columns
together.
Merge records tables cannot be sharded, regardless of the composition of the source data sets being merged. Merge records tables are also persisted by default, even if only views are merged.
Note that it is possible to create n new columns in a merge records table from different columns in separate tables, but this would add nulls where there normally would be corresponding values.
For example, if you have two tables that look like this:
tableA
col_x | col_y |
---|---|
3 | 7 |
4 | 8 |
tableB
col_z | col_q |
---|---|
1 | 9 |
2 | 10 |
And you performed the following merge (in Python):
gpudb.merge_records(
table_name = "tableAB",
source_table_names = ["tableA","tableB"]
field_maps = [
{"new_col1":"col_x"},
{"new_col2":"col_q"}
]
)
The new table would look like this:
new_col1 | new_col2 |
---|---|
3 | <null> |
4 | <null> |
<null> | 9 |
<null> | 10 |
To create a merge records table using the /merge/records endpoint requires three parameters:
The example below outlines merging together june_weather
and
july_weather
tables into the merge records table
summer_weather
in Python
gpudb.merge_records(
table_name = "summer_weather",
source_table_names = ["june_weather","july_weather"],
field_maps = [
{"date":"july_date_of_record", "avg_summer_temps":"july_avg_temp", "time":"july_time_of_record"},
{"date":"june_date_of_record", "avg_summer_temps":"june_avg_temp", "time":"june_time_of_record"}
]
)
A merge records table is persisted by default, which means its TTL will never expire. Note that the merge records table is not protected, which means it can still be deleted.
The merge records table will only inherit nullability from its source tables. Other column properties (e.g., compression, store only, etc.), any keys (shard or primary), or replication will not be inherited
The /merge/records endpoint won't merge
two columns from one dataset into the same new column in the same map, e.g.,
the value in the below field_maps
parameter is valid, but the values
in the tableA_col1
column won't be available in the merge records table
field_maps = [
{"merged_col1":"tableA_col1", "merged_col1":"tableA_col2"}
]
Columns being merged must match types, e.g., a column
type of int16
will merge successfully with another column type of
int16
, but attempting to merge with a type of int
, int8
,
double
, etc., will not work
Store-only columns cannot have merge records operations applied to them
Collections cannot have merge records operations applied to them
The merge records table does not get updated if source data set(s) are updated
The number of data sets in the source_table_names
parameter needs to match
the number of maps included in the field_maps
parameter and vice versa.
The order of the maps in the field_maps
parameter need to match the order
of the data sets listed in the source_table_names
parameter, e.g. the
following is not valid:
gpudb.merge_records(
table_name = "merged_table",
source_table_names = ["table1","table2"],
field_maps = [
{"merged_col1":"col1_table2"},
{"merged_col1":"col1_table1"}
]
)