/admin/rebalance

URL: http://<db.host>:<db.port>/admin/rebalance

Rebalance the data in the cluster so that all nodes contain an equal number of records approximately and/or rebalance the shards to be equally distributed (as much as possible) across all the ranks.

The database must be offline for this operation, see /admin/offline

  • If /admin/rebalance is invoked after a change is made to the cluster, e.g., a host was added or removed, sharded data will be evenly redistributed across the cluster by number of shards per rank while unsharded data will be redistributed across the cluster by data size per rank
  • If /admin/rebalance is invoked at some point when unsharded data (a.k.a. randomly-sharded) in the cluster is unevenly distributed over time, sharded data will not move while unsharded data will be redistributed across the cluster by data size per rank

NOTE: Replicated data will not move as a result of this call

This endpoint's processing time depends on the amount of data in the system, thus the API call may time out if run directly. It is recommended to run this endpoint asynchronously via /create/job.

Input Parameter Description

NameTypeDescription
optionsmap of string to strings

Optional parameters. The default value is an empty map ( {} ).

Supported Parameters (keys)Parameter Description
rebalance_sharded_data

If true, sharded data will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of sharded data, this data transfer could be time consuming and result in delayed query responses. The default value is true. The supported values are:

  • true
  • false
rebalance_unsharded_data

If true, unsharded data (a.k.a. randomly-sharded) will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of unsharded data, this data transfer could be time consuming and result in delayed query responses. The default value is true. The supported values are:

  • true
  • false
table_includesComma-separated list of unsharded table names to rebalance. Not applicable to sharded tables because they are always rebalanced. Cannot be used simultaneously with table_excludes. This parameter is ignored if rebalance_unsharded_data is false.
table_excludesComma-separated list of unsharded table names to not rebalance. Not applicable to sharded tables because they are always rebalanced. Cannot be used simultaneously with table_includes. This parameter is ignored if rebalance_unsharded_data is false.
aggressivenessInfluences how much data is moved at a time during rebalance. A higher aggressiveness will complete the rebalance faster. A lower aggressiveness will take longer but allow for better interleaving between the rebalance and other queries. Valid values are constants from 1 (lowest) to 10 (highest). The default value is '10'.
compact_after_rebalance

Perform compaction of deleted records once the rebalance completes to reclaim memory and disk space. Default is true, unless repair_incorrectly_sharded_data is set to true. The default value is true. The supported values are:

  • true
  • false
compact_only

If set to true, ignore rebalance options and attempt to perform compaction of deleted records to reclaim memory and disk space without rebalancing first. The default value is false. The supported values are:

  • true
  • false
repair_incorrectly_sharded_data

Scans for any data sharded incorrectly and re-routes the data to the correct location. Only necessary if /admin/verifydb reports an error in sharding alignment. This can be done as part of a typical rebalance after expanding the cluster or in a standalone fashion when it is believed that data is sharded incorrectly somewhere in the cluster. Compaction will not be performed by default when this is enabled. If this option is set to true, the time necessary to rebalance and the memory used by the rebalance may increase. The default value is false. The supported values are:

  • true
  • false

Output Parameter Description

The GPUdb server embeds the endpoint response inside a standard response structure which contains status information and the actual response to the query. Here is a description of the various fields of the wrapper:

NameTypeDescription
statusString'OK' or 'ERROR'
messageStringEmpty if success or an error message
data_typeString'admin_rebalance_response' or 'none' in case of an error
dataStringEmpty string
data_strJSON or String

This embedded JSON represents the result of the /admin/rebalance endpoint:

NameTypeDescription
infomap of string to stringsAdditional information.

Empty string in case of an error.