Issue
When configuring the Highlight stage in a Fusion query pipeline, the snippet length parameter does not consistently control the size of the returned text. The fragSize setting is ignored in some cases, resulting in highlights that are either very short or excessively long.
Diagnosis
To confirm this behavior:
Run the query with the
debug=trueparameter enabled to review the Solr query parameters being sent.Review Solr logs to verify whether the
hl.fragSizeparameter is applied as expected.Check whether the highlighting stage includes any configuration for
hl.bs.type. By default, Solr usesSENTENCEas the break iterator type, which can cause inconsistent snippet lengths when the highlighted text does not contain sentence boundaries.
Environment
Fusion 5.9.1
Solr 9.1.1
Cause
This issue occurs because Solr’s Unified Highlighter uses a break iterator type to divide text before applying the snippet length (fragSize). By default, the iterator type is set to SENTENCE. If the text does not break cleanly into sentences, the configured fragment size may not be respected.
Resolution
Option 1: Override with a query pipeline stage
Add a set-params stage to explicitly set the break iterator type for highlighting. This approach limits the change to a single pipeline. Example configuration:
{
"id": "highlight-breakiterator",
"params": [
{
"key": "hl.bs.type",
"value": "CHARACTER",
"policy": "replace"
}
],
"type": "set-params",
"skip": false,
"label": "Set highlight breakiterator type"
}Values supported for hl.bs.type include CHARACTER and WORD. Using CHARACTER ensures that snippet length is consistently applied by character count.
Option 2: Update Solr default configuration
If you want consistent highlighting behavior across all pipelines, modify Solr’s solrconfig.xml to set the default break iterator type:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="hl.bs.type">WORD</str>
</lst>
</requestHandler>
This applies the configuration cluster-wide but may affect all collections and pipelines. Use this approach if you want uniform snippet length handling across your deployment.