Issue:
This can happen when you set a tight socketTimeout on a query from a SolrJ CloudSolrClient. What can happen is a bad query which can bring down your server may be retried on another server and then after the socketTimeout another server... et al. You should use timeAllowed to control this behavior and not use socketTimeout. timeAllowed should be less than the socketTimeout setting.
Environment:
Solr
Resolution:
You should use timeAllowed to control this behavior and not use socketTimeout. timeAllowed should be significantly less than the socketTimeout setting. You could also write something like the following pseudo-code to have control over retry behavior of CloudSolrClient:
class FixedRetriesLBHttpSolrClient extends LBHttpSolrClient {
....
private Integer maxNumRetries = null
public void setMaxNumRetries(Integer val) {
this.maxNumRetries = val;
}
@Override
public Rsp request(Req req) {
Integer potentiallyOverriddenRetries = determineNumRetries(req);
Req reqWithRetries = new Req(req.getRequest(), req.getServers(), potentiallyOverriddenRetries);
return super.request(reqWithRetries);
}
private Integer determineNumRetries(Req req) {
return req.getNumServersToTry() != null ? req.getNumServersToTry() : maxNumRetries;
}
}
...
FixedRetriesLBHttpSolrClient lbClient = ... lbClient.setMaxNumRetries(3);
CloudSolrClient queryClient = new CloudSolrClient.Builder("localhost:2181", "/solr").withLBHttpSolrClient(lbClient).build();
Cause:
This is caused because CloudSolrClient upon receiving a socketTimeout (or other exceptions) will go on and pass the query to the other servers in it's list to try. This is circumvented by using the timeAllowed setting on the query parameter. You could also control this behavior by using your own modified LBHttpSolrClient as above
Comments
0 comments
Article is closed for comments.