Issue
When searching for part numbers ending in uppercase "S" (e.g., AB12CDES), Solr stems the term to ab12cde during query parsing, but the term is stored in the index as ab12cdes, resulting in no matches. The search term fails to match even when enclosed in double quotes.
Diagnosis
This issue arises from a mismatch in the order of filters used in the Solr schema’s index-time and query-time analyzers. Specifically, in the index analyzer for the _text_ field (typically using the text_general type), the LowerCaseFilterFactory is applied after the stemmer, whereas in the query analyzer, it is applied before the stemmer.
When the lowercase filter is applied before the stemmer at query time, the analyzer converts AB12CDES to ab12cdes, which is then incorrectly stemmed to ab12cde. This token does not match the indexed form ab12cdes. The stemmer does this because it assumes a string ending in
"s" is a plural word.
Environment
- Fusion 5.x
- Solr schema field type:
text_general - Solr field name:
_text_ - Issue occurs when stemming is enabled at both index and query time
Cause
The inconsistency in filter application order leads to different tokenization between indexing and querying. Specifically:
-
Index-time analyzer applies the lowercase filter after stemming.
-
Query-time analyzer applies the lowercase filter before stemming.
This order causes query terms to be lowercased before being stemmed, leading to incorrect stemming behavior.
Resolution
To fix the mismatch and preserve expected tokenization, reorder the query-time analysis chain to apply the LowerCaseFilterFactory after the stemmer, matching the index-time analyzer.
Update the query analyzer in the Solr schema configuration as follows:
Note: Always validate schema changes in a lower environment before applying to production.