Goal:
Fusion offers various ways of implementing SpellCheck functionality. One of the common ways is via Solr index based spell check which leverages distance algorithms to find nearly matching terms to a given misspelled term based on distance.
It suggests alternatives to misspelling terms under misspelling suggestions responses. However it does not auto correct or merge corrected results into the main response.
Environment:
Fusion 3.x and above
Guide:
- From Fusion UI select the main application RakeshTestSpellCheck, find the list of field names on which you’d like to text search and spell check on. Such fields often are title, description, name etc.
- Check for the field type for these fields. The field type should be no stem & less aggressive tokenizer. As of Solr 7.x under default managed schema it can be of type “text_general”. If the field type is other than “text_general” then you can create a copy field and enable spell check on the copy field.
- The spell check enabled field should be of multivalued field type.
- Once the above is decided then we will have to make changes in solrconfig.xml
(the section below is already present in solrconfig.xml. So I think you can just verify once.)
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<!-- Multiple "Spell Checkers" can be declared and used by this
component
-->
<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">_text_</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
</lst>
<!-- a spellchecker that can break or combine words. See "/spell" handler below for usage -->
<!--
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">name</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
</lst>
-->
</searchComponent>
<!-- A request handler for demonstrating the spellcheck component.
NOTE: This is purely as an example. The whole purpose of the
SpellCheckComponent is to hook it into the request handler that
handles your normal user queries so that a separate request is
not needed to get suggestions.
IN OTHER WORDS, THERE IS REALLY GOOD CHANCE THE SETUP BELOW IS
NOT WHAT YOU WANT FOR YOUR PRODUCTION SYSTEM!
See http://wiki.apache.org/solr/SpellCheckComponent for details
on the request parameters.
-->
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<!-- Solr will use suggestions from both the 'default' spellchecker
and from the 'wordbreak' spellchecker and combine them.
collations (re-written queries) can include a combination of
corrections from both spellcheckers -->
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
- Note the spellcheck.build=true which is needed only once to build the spellcheck index from the main Solr index. It takes time and should not be specified with each request.
- In order to make spellcheck suggestions come along with search results payload, you will need to configure request handlers for both “select” “spell”.
- Lastly, let's now send spell check requests and then we can check for responses.
- First let’s build spellcheck
http://34.83.233.21:6764/api/apps/RakeshTestSpellCheck/query-pipelines/RakeshTestSpellCheck/collections/RakeshTestSpellCheck/spell?echoParams=all&wt=json&json.nl=arrarr&debug=timing&debug=query&debug=results&fl=score,*&sort&start=0&spellcheck=true&spellcheck.collate=true&spellcheck.build=true - Now the actual query
http://34.83.233.21:6764/api/apps/RakeshTestSpellCheck/query-pipelines/RakeshTestSpellCheck/collections/RakeshTestSpellCheck/spell?echoParams=all&wt=json&json.nl=arrarr&debug=timing&debug=query&debug=results&fl=score,*&sort&start=0&spellcheck.q=Tour%20packge&spellcheck=true
Limitations
- Spell check should not be performed on a query that has had Fusion synonyms or query rewrites performed on it. Within the query pipeline, it is best practice to save the original query under a separate parameter, such as “orig_q” or “originalQueryTerms” and perform spell-checking on this.
- Due to the behavior of the Solr spell check feature, it is best practice to identify potentially expensive queries and not perform spell check on them. Queries may be unfit for the spell check feature due to characteristics such as:
- a query longer than 100 characters
- a significant number of changes in character type (alpha, numeric, other)
- a significant number of term delimiters used in the query (spaces and commas)
- over 7 digits consecutively appearing in query string
- over 7 non-alphanumeric characters consecutively appearing in query string
- Perform Solr Synonyms before spell check.
- Set only specific fields to be included in the spellcheck field : For the “field” parameter, ONLY include fields with content known to be spelled correctly and with clear terms.
Comments
0 comments
Article is closed for comments.