Goal:
How much approximate space are my fields using in fieldCache? Sometimes people have a lot of stuff in the fieldCache. These days, the best practice is really to not have anything in the fieldCache and to instead be using docValues. But this is a command we can use to get an approximate size estimate as to how much each field is taking up in the fieldCache. It is only using MB entries, but it could theoretically be reapplied to KB
Environment:
Solr without using docValues and mistakenly using up too much fieldCache
Guide:
This is a command you might use if you are looking at the fieldCache entries which can be retrieved from either your metrics handler or Plugin Stats.
Note: fieldCache is node wide, so looking at these stats from different cores always yields the same result. Also these are only field specific, not specific as to which collection it is coming from.
Entries in <fieldCache.ent>
.....
entry#8:'org.apache.lucene.index.SegmentCoreReaders@160b4b5f'=>'field1',class org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.uninverting.FieldCacheImpl$SortedDocValuesImpl#1291985840 (size =~ 3.2 MB)
entry#9:'org.apache.lucene.index.SegmentCoreReaders@160b4b5f'=>'field2',class org.apache.lucene.index.SortedDocValues,0.5=>org.apache.lucene.uninverting.FieldCacheImpl$SortedDocValuesImpl#886403925 (size =~ 27.6 MB)
.....
Command -->
cmd> awk -F '=>' '{print $2, $3}' fieldCache.ent | awk -F ',' '{print $1, $3}' | egrep -o "\'.*\'|size.*" | sed 's/size =~//g' | sed -e s/\)//g | xargs -n 3 echo | grep 'MB' | awk '{counts[$1] += $2} END {for (key in counts) printf("%i MB %s\n", counts[key], key)}' | sort -rn
This will give you an aggregated sum of the fields in question. So the output will look like:
50MB <field1>
40MB <field2>
30MB <field3>
These are the fields you should look closest at as to switching them to docValues if your fieldCache is too large.
Comments
0 comments
Article is closed for comments.