Issue:
Sometimes Solr will not be able to establish a leader. These are ways to try and force it to come up. The FORCELEADER command in general is notoriously known for not successfully forcing a leader
Environment:
Solr
Resolution:
There are a few things to take note of here, but we can start with the simplest way out of the situation. It is often best to stop updates during this time.
Option 1 - DELETE/ADDREPLICAs
1) Figure out a node you want to establish as a leader: You can look at the terms node to see the node with the highest value. If these are not in sync then you will want to use the node with the highest number. Also you MUST confirm your chosen leader is not corrupt. Corrupt indexes are OFTEN responsible for leadership issues. The only way to reestablish a corrupt index is through the IndexChecker tool and you are likely to have to lose documents/segments
2) Delete Replicas: You can DELETEREPLICA down to 1 replica and just restart that one node. This should establish itself as a leader as it is the only possible node.
3) Add Replicas Back: Then you would ADDREPLICA all the replicas you deleted.
These steps are less feasible with a large index and lots of replicas because of the massive amount of data that needs to be recopied to the replicas. So then you might think about option 2
Option 2 - Restarting Nodes
1) Figure out a node you want to establish as a leader: Same as above
2) Restart Nodes: Now you will bring all the nodes down and only restart your leader
3) Wait: The leader will see that there are other nodes that are part of the shard and it will wait for them to come up. You should make sure they don't come up until the leader is up and has established itself as leader
4) Start the non-leaders: Now you bring up the non leader nodes and they should sync with the leader to come online
Option 3 - Reestablishing a proper node structure
Background: Sometimes, the reason it is not establishing a leader is there is a mismatch between the election nodes, leader nodes, state.json and the live nodes. This often happens if there is a corrupt index somewhere and it can't open a newSearcher and fails to make this node, causing the election to fail. The top "election node" is the one that should win the election and the "leader" node should point to the same node, which is reflected in state.json. If something is off between this matching, you can take some actions to do minimal restarts to reestablish a proper order. You should note that the "election" nodes and "leader" nodes are ephemeral (they will disappear if the node that owns them is restarted). Therefore a way to restore this properly is to restart nodes that are attached to these ephemeral nodes, causing the mismatch. Then Solr can right the ship automatically.
1) Establish if there is a mismatch between state.json, live nodes and election nodes. Sometimes if there is a mismatch here, you might possibly DELETE a corrupt replica or restart a particular node that reestablishes a proper setup.
This is often a little more complicated to understand, so the easiest route is usually do Option 1 or Option 2, but in general, Option 3 can be more minimally invasive
Election Node - (Note the ephemeral owner. If this value is 0 it is not ephemeral)
Leader Node: (ephemeral)
Terms Node (state.json right above it). Both of those are not ephemeral nodes
Comments
0 comments
Article is closed for comments.