Rolling Upgrade
Rolling upgrades, sometimes referred to as “node replacement upgrades,” can be performed on running clusters with virtually no downtime. Nodes are individually stopped and upgraded in place. Alternatively, nodes can be stopped and replaced, one at a time, by hosts running the new version. During this process you can continue to index and query data in your cluster.
This document serves as a high-level, platform-agnostic overview of the rolling upgrade procedure. For specific examples of commands, scripts, and configuration files, refer to the Appendix.
Preparing to upgrade
Review Upgrading Lucenia for recommendations about backing up your configuration files and creating a snapshot of the cluster state and indexes before you make any changes to your Lucenia cluster.
Important: Lucenia nodes cannot be downgraded. If you need to revert the upgrade, then you will need to perform a fresh installation of Lucenia and restore the cluster from a snapshot. Take a snapshot and store it in a remote repository before beginning the upgrade procedure.
Performing the upgrade
- Verify the health of your Lucenia cluster before you begin. You should resolve any index or shard allocation issues prior to upgrading to ensure that your data is preserved. A status of green indicates that all primary and replica shards are allocated. See Cluster health for more information. The following command queries the
_cluster/health
API endpoint:GET "/_cluster/health?pretty"
The response should look similar to the following example:
{ "cluster_name":"lucenia-dev-cluster", "status":"green", "timed_out":false, "number_of_nodes":4, "number_of_data_nodes":4, "discovered_cluster_manager" : true, "active_primary_shards":1, "active_shards":4, "relocating_shards":0, "initializing_shards":0, "unassigned_shards":0, "delayed_unassigned_shards":0, "number_of_pending_tasks":0, "number_of_in_flight_fetch":0, "task_max_waiting_in_queue_millis":0, "active_shards_percent_as_number":100.0 }
- Disable shard replication to prevent shard replicas from being created while nodes are being taken offline. This stops the movement of Lucene index segments on nodes in your cluster. You can disable shard replication by querying the
_cluster/settings
API endpoint:PUT "/_cluster/settings?pretty" { "persistent": { "cluster.routing.allocation.enable": "primaries" } }
The response should look similar to the following example:
{ "acknowledged" : true, "persistent" : { "cluster" : { "routing" : { "allocation" : { "enable" : "primaries" } } } }, "transient" : { } }
- Perform a flush operation on the cluster to commit transaction log entries to the Lucene index:
POST "/_flush?pretty"
The response should look similar to the following example:
{ "_shards" : { "total" : 4, "successful" : 4, "failed" : 0 } }
- Review your cluster and identify the first node to upgrade. Eligible cluster manager nodes should be upgraded last because Lucenia nodes can join a cluster with manager nodes running an older version, but they cannot join a cluster with all manager nodes running a newer version.
- Query the
_cat/nodes
endpoint to identify which node was promoted to cluster manager. The following command includes additional query parameters that request only the name, version, node.role, and cluster manager headers.GET "/_cat/nodes?v&h=name,version,node.role,cluster_manager" | column -t
The response should look similar to the following example:
name version node.role cluster_manager lucenia-cluster-manager-1 0.1.0 dimr * lucenia-cluster-manager-0 0.1.0 dimr - lucenia-cluster-manager-2 0.1.0 dimr - lucenia-cluster-manager-3 0.1.0 dimr -
- Stop the node you are upgrading, for example
lucenia-cluster-manager-0
. Do not delete the volume associated with the container when you delete the container. The new Lucenia container will use the existing volume. Deleting the volume will result in data loss. - Confirm that the associated node has been dismissed from the cluster by querying the
_cat/nodes
API endpoint:GET "/_cat/nodes?v&h=name,version,node.role,cluster_manager" | column -t
The response should look similar to the following example:
name version node.role cluster_manager lucenia-cluster-manager-1 0.1.0 dimr * lucenia-cluster-manager-2 0.1.0 dimr - lucenia-cluster-manager-3 0.1.0 dimr -
lucenia-cluster-manager-0
is no longer listed because the container has been stopped and deleted. - Deploy a new container running the desired version of Lucenia and mapped to the same volume as the container you deleted.
- Query the
_cat/nodes
endpoint after Lucenia is running on the new node to confirm that it has joined the cluster:GET "/_cat/nodes?v&h=name,version,node.role,cluster_manager" | column -t
The response should look similar to the following example:
name version node.role cluster_manager lucenia-cluster-manager-1 0.1.0 dimr * lucenia-cluster-manager-0 0.1.0 dimr - lucenia-cluster-manager-2 0.1.0 dimr - lucenia-cluster-manager-3 0.1.0 dimr -
In the example output, the new Lucenia node reports a running version of
0.1.0
to the cluster. This is the result ofcompatibility.override_main_response_version
, which is used when connecting to a cluster with legacy clients that check for a version. You can manually confirm the version of the node by calling the/_nodes
API endpoint, as in the following command. Replace<nodeName>
with the name of your node. See Nodes API to learn more.GET "/_nodes/<nodeName>?pretty=true" | jq -r '.nodes | .[] | "\(.name) v\(.version)"'
The response should look similar to the following example:
lucenia-cluster-manager-1 v0.2.0
- Repeat steps 5 through 9 for each node in your cluster. Remember to upgrade an eligible cluster manager node last. After replacing the last node, query the
_cat/nodes
endpoint to confirm that all nodes have joined the cluster. The cluster is now bootstrapped to the new version of Lucenia. You can verify the cluster version by querying the_cat/nodes
API endpoint:GET "/_cat/nodes?v&h=name,version,node.role,cluster_manager" | column -t
The response should look similar to the following example:
name version node.role cluster_manager lucenia-cluster-manager-1 0.2.0 dimr * lucenia-cluster-manager-0 0.2.0 dimr - lucenia-cluster-manager-2 0.2.0 dimr - lucenia-cluster-manager-3 0.2.0 dimr -
- Reenable shard replication:
PUT "/_cluster/settings?pretty" { "persistent": { "cluster.routing.allocation.enable": "all" } }
The response should look similar to the following example:
{ "acknowledged" : true, "persistent" : { "cluster" : { "routing" : { "allocation" : { "enable" : "all" } } } }, "transient" : { } }
- Confirm that the cluster is healthy:
GET "/_cluster/health?pretty"
The response should look similar to the following example:
{ "cluster_name" : "lucenia-dev-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "discovered_cluster_manager" : true, "active_primary_shards" : 1, "active_shards" : 4, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
- The upgrade is now complete, and you can begin enjoying the latest features and fixes!