Issue
Sometimes a single query can bring your whole system down. Generally what happens is that something about the query causes a huge increase in memory requirements that your system can't handle, putting it eventually into an OOM situation or Full GCs stopping the world potentially extremely frequently.
Environment
Solr
Resolution
There are generally two ways to attack this - a head dump, or log analysis.
1) Get a heap dump. You can look at the thread and it has an object called SolrRequestParsers$1 and it's MultiMapSolrParams to take a look at the query parameters from the thread. Unfortunately you need to catch the query to still be executing for this to be relevant, which may bring us down to options 2 and 3
2. You can find this through logs, but you have to update your logging to a debug message to catch this unfortunately in the future. Here we have to set the Request Logger to DEBUG (yes, this may cause an issue, because you will be logging two messages per query and resetting your logs is probably wise
Now, we can be a bit clever by looking at the logs. What we know is that for every query that comes in, it will be logged twice by the same thread. Now, we can do the following call:
cmd> grep -F '/select' solr.log | grep "INFO" | awk '{print $4, $(NF-3)}' | sort > info.out
(qtp1458091526-100) params={q=*:*&debug=true&json.nl=arrarr&echoParams=all&fl=score,*&start=0&rows=10&wt=json&qid=13}
(qtp1458091526-101) params={q=*:*&debug=true&json.nl=arrarr&echoParams=all&fl=score,*&start=0&rows=10&wt=json&qid=7}
(qtp1458091526-17) params={q=*:*&debug=true&json.nl=arrarr&echoParams=all&fl=score,*&start=0&rows=10&wt=json&qid=19}
This will print the query along with the thread id that served it for the INFO messages that were successful. Now we can:
cmd> grep -F '/select' solr.log | grep "DEBUG" | awk '{print $4, $NF}' | sort > debug.out
debug.out should if things are working right, be identical. Now we can:
cmd> diff -w info.out debug.out
10d9
< (qtp1458091526-22) params={q=*:*&debug=true&json.nl=arrarr&echoParams=all&fl=score,*&start=0&rows=10&wt=json&qid=0}
3. There is yet another easier way we can skin this cat through the logs. We can:
cmd> grep -F '/select' solr.log | awk '{print $4, $13}' | sort | uniq -c
2 (qtp1458091526-100) params={q=*:*&debug=true&json.nl=arrarr&echoParams=all&fl=score,*&start=0&rows=10&wt=json&qid=13}
2 (qtp1458091526-101) params={q=*:*&debug=true&json.nl=arrarr&echoParams=all&fl=score,*&start=0&rows=10&wt=json&qid=7}
1 (qtp1458091526-17) params={q=*:*&debug=true&json.nl=arrarr&echoParams=all&fl=score,*&start=0&rows=10&wt=json&qid=19}
Theoretically, if we had the same thread ID serving the exact same unfinished query, then this would give us an even number. So this is fallible, but not extremely likely. Your logs may need different awk tokenization. It defaults to spaces, just count up the number of tokens and use that number for the field you want.
Comments
0 comments
Article is closed for comments.