/admin/rebalance

URL: http://<db.host>:<db.port>/admin/rebalance

Rebalance the data in the cluster so that all nodes contain an equal number of records approximately and/or rebalance the shards to be equally distributed (as much as possible) across all the ranks.

The database must be offline for this operation, see /admin/offline

  • If /admin/rebalance is invoked after a change is made to the cluster, e.g., a host was added or removed, sharded data will be evenly redistributed across the cluster by number of shards per rank while unsharded data will be redistributed across the cluster by data size per rank
  • If /admin/rebalance is invoked at some point when unsharded data (a.k.a. randomly-sharded) in the cluster is unevenly distributed over time, sharded data will not move while unsharded data will be redistributed across the cluster by data size per rank

NOTE: Replicated data will not move as a result of this call

This endpoint's processing time depends on the amount of data in the system, thus the API call may time out if run directly. It is recommended to run this endpoint asynchronously via /create/job.

Input Parameter Description

Name Type Description
options map of string to strings

Optional parameters. The default value is an empty map ( {} ).

Supported Parameters (keys) Parameter Description
rebalance_sharded_data

If true, sharded data will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of sharded data, this data transfer could be time consuming and result in delayed query responses. The default value is true. The supported values are:

  • true
  • false
rebalance_unsharded_data

If true, unsharded data (a.k.a. randomly-sharded) will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of unsharded data, this data transfer could be time consuming and result in delayed query responses. The default value is true. The supported values are:

  • true
  • false
table_includes Comma-separated list of unsharded table names to rebalance. Not applicable to sharded tables because they are always rebalanced. Cannot be used simultaneously with table_excludes. This parameter is ignored if rebalance_unsharded_data is false.
table_excludes Comma-separated list of unsharded table names to not rebalance. Not applicable to sharded tables because they are always rebalanced. Cannot be used simultaneously with table_includes. This parameter is ignored if rebalance_unsharded_data is false.
aggressiveness Influences how much data is moved at a time during rebalance. A higher aggressiveness will complete the rebalance faster. A lower aggressiveness will take longer but allow for better interleaving between the rebalance and other queries. Valid values are constants from 1 (lowest) to 10 (highest). The default value is '10'.
compact_after_rebalance

Perform compaction of deleted records once the rebalance completes to reclaim memory and disk space. Default is true, unless repair_incorrectly_sharded_data is set to true. The default value is true. The supported values are:

  • true
  • false
compact_only

If set to true, ignore rebalance options and attempt to perform compaction of deleted records to reclaim memory and disk space without rebalancing first. The default value is false. The supported values are:

  • true
  • false
repair_incorrectly_sharded_data

Scans for any data sharded incorrectly and re-routes the data to the correct location. Only necessary if /admin/verifydb reports an error in sharding alignment. This can be done as part of a typical rebalance after expanding the cluster or in a standalone fashion when it is believed that data is sharded incorrectly somewhere in the cluster. Compaction will not be performed by default when this is enabled. If this option is set to true, the time necessary to rebalance and the memory used by the rebalance may increase. The default value is false. The supported values are:

  • true
  • false

Output Parameter Description

The GPUdb server embeds the endpoint response inside a standard response structure which contains status information and the actual response to the query. Here is a description of the various fields of the wrapper:

Name Type Description
status String 'OK' or 'ERROR'
message String Empty if success or an error message
data_type String 'admin_rebalance_response' or 'none' in case of an error
data String Empty string
data_str JSON or String

This embedded JSON represents the result of the /admin/rebalance endpoint:

Name Type Description
info map of string to strings Additional information.

Empty string in case of an error.