Controlling Shard Rebalance in Elasticsearch and OpenSearch

Shard rebalancing in Elasticsearch is the redistribution of index shards across nodes within a cluster. it sometimes requires supervision to achieve an ideal state. In this post, we will discuss what shard rebalancing is and how it can be configured.

What is Shard Rebalancing?

An imbalanced Elasticsearch cluster is a cluster with shards of different sizes, or different usage patterns (or both) that are grouped on one or few nodes creating hotspots that often cause performance issues.

In Elasticsearch and OpenSearch, shard rebalancing is the process of redistributing shards across the cluster in order to rebalance the cluster. Automated shard rebalancing is enabled by default and can be triggered by events such as a node reaching the high disk watermark threshold (e.g. the node is approaching its storage limits), a node joining or leaving the cluster, and few more cluster and node events.

Elasticsearch is constantly working to maintain a balance amongst the nodes within a cluster. In an ideal world, every node would have the exact same number of shards, every shard would store the same amount of data and have an equally distributed load. During the process of shard rebalancing, Elasticsearch attempts to achieve and maintain this equilibrium amongst nodes.

In practice, however, the rebalancing effort is rarely able to achieve such an equilibrium. Indices can be configured poorly and have too few shards to evenly distribute to each node in the cluster, or just a number of shards that is not a multiple of the number of data nodes. Additionally, it adheres to the settings configured within the cluster for allocation filtering and forced awareness which can prevent it from achieving a true balance across nodes. Lastly, the shard rebalancing algorithm is not sophisticated enough as it only tries to balance the number of shards, not the size of the shards or the work they are being involved in (indexing and search).

To achieve best performance it is not enough to balance the number of shards per node. An Elasticsearch cluster administrator needs to work towards detecting and dissolving cluster hotspots, while keeping in mind shards balance and available disk space.

Shard Allocation in Elasticsearch and OpenSearch

In order to understand how shard rebalancing works, one must understand shard allocations in Elasticsearch. Shard allocation is, simply put, the method in which Elasticsearch decides on which node each shard should reside, and shards rebalance is just an attempt to optimize shard allocation on nodes.

Elasticsearch has over a dozen different allocation deciders that define how shards should be allocated across nodes. Shard rebalancing is subject to the rules defined in these deciders (or more specifically a subset of these deciders like the ClusterRebalanceAllocationDecider, DiskThresholdDecider, AwarenessAllocationDecider, etc). As such, it is subject to both the cluster-level and index-level shard allocation and routing settings configured within the cluster.

When the high disk watermark threshold is met or exceeded, for example, this triggers a rebalancing event to occur. Elasticsearch will attempt to move shards off of this node in order to free up disk space and allow the node to get below the disk threshold. In this instance, the DiskThresholdDecider will decide whether the node Elasticsearch wants to move a shard to is viable. If allocation a shard from the overloaded node to a new node will end up having that node exceed the high disk watermark threshold as well, then the decider will ‘decide’ not to move the shard to that node. Elasticsearch will then search for another node to move the shard to instead.

For most use cases, the settings for rebalancing and shard allocation should be left at their default values. Improper configuration of these settings can lead to cluster-wide failures (e.g. disabling rebalancing can cause a single node to store a disproportionate amount of data and reach its maximum storage limitation while other nodes are nowhere near their limits).

Shard Rack Awareness

In addition to the shard allocation and routing settings, you can also configure shard allocation awareness at the individual index and cluster levels.

Shard allocation awareness refers to a set of configurations that allow you to fine-tune the shard allocations in Elasticsearch and OpenSearch. It is a feature typically reserved for self-hosted clusters intended to allow you to allocate shards across ‘zones’. This is useful for disaster recovery scenarios in the case ‘zone1’ has a blackout, ‘zone2’ can still function as needed. For example, if index1 has 1 shard and 1 replica and you use the shard allocation awareness settings to allocate the primary shard to zone1 and the replica to zone2, this index will still be searchable in the event of a failure to either zone (so long as both zones don’t fail simultaneously).

This feature is not useful and often not available on managed versions of Elasticsearch and OpenSearch.

When to Disable Shard Rebalancing

Though rebalancing should normally be enabled to allow the cluster to “breathe” and manage itself, there are instances where you may want to temporarily or permanently disable rebalancing events in order to prevent these resource heavy operations from occurring. If you are in the process of performing server maintenance (e.g. installing security patches) on a node or set of nodes, for example, you may want to temporarily disable rebalancing during the maintenance period. This way Elasticsearch will not unnecessarily reallocate shards from the node(s) that are down for maintenance to others.

One instance when you want to disable shard rebalancing, most likely permanently, is when your index lifecycle policy (ILM) is built in such a way that indices are deleted and created daily on an equal amount. Since it’s generally balanced, the cluster can maintain itself well during most of the time, and you can trigger rebalance manually periodically to optimize shard balancing. Disabling automatic and continuous shard rebalance in this case will usually optimize performance during normal operation as shard rebalancing is resource intensive operation, and in such cases it isn’t really necessary to be performed on a regular basis.

If you choose to disable rebalancing, you can do so by modifying the cluster settings like so:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.rebalance.enable": "none"
  }
}

To re-enable the settings, set the values to “all” or null (setting values to null re-enables the default values for those settings).

Forcing a Manual Shard Rebalance

There are ways to force a shard rebalancing event to occur, but keep in mind that this is a resource intensive operation, so take care when forcing a rebalance. One such method is to use the _cluster/reroute API. This API allows you to manually move a shard to the desired node which then triggers a rebalance event in order for Elasticsearch to maintain the cluster equilibrium. The API works as follows:

POST /_cluster/reroute?metric=none
{
  "commands": [
    {
      "move": {
        "index": "test", "shard": 0,
        "from_node": "node1", "to_node": "node2"
      }
    },
    {
      "allocate_replica": {
        "index": "test", "shard": 1,
        "node": "node3"
      }
    }
  ]
}

In this example the ‘move’ command is instructing Elasticsearch to move shard 0 from the index named test from node1 to node2. Additionally, the ‘allocate_replica’ command is instructing Elasticsearch to allocate unassigned replica shard 1 from index test to node3.

This API also allows you to perform a “dry run” by adding the ‘dry_run=true’ parameter to the request. This is recommended before performing a manual allocation as you can see what the end result will be before the actual movement occurs. The Elasticsearch response will contain a calculation of the result of the allocation and after rebalancing without actually performing the actions.

Another feature of this API is it allows you to retry failed allocations. This can be used in cases where an allocation failure was due to some temporary or transient issue with the cluster. Once the issue is resolved, you can simple retry the allocations using the ‘retry_failed’ parameter like so: POST /_cluster/reroute?retry_failed=true

Conclusion

Having a balanced cluster is important and necessary for your cluster to keep performing at its optimal level. By balancing shards across nodes, Elasticsearch / OpenSearch is able to distribute load and disk utilization across nodes. Manual configuration of rebalancing and allocation settings can be done, but it is not recommended for most use cases. Shard rebalancing is a resource-intensive process, so care should be taken when forcing a rebalancing event to occur.

The algorithm Elasticsearch uses to perform shard rebalancing is not perfect. It fails to take into account key metrics such as the size of the shards, and instead only looks at the number of shards. For some situations, it’s even recommended to completely disable shard rebalance and manually monitor the shard distribution and fix things when needed.

If you are experiencing issues with shard allocations and/or rebalancing in your cluster and would like some assistance, check out our Pulse solution. It can offer insights into your cluster with actionable recommendations on items such as shard allocations. It also allows you to tap into world-class Elasticsearch experts to help with your needs. If you’re interested in learning more, please reach out to us here.