Best Practices: Auto-suggest Tips for Single Words

There are four common ways to do auto-suggest: 

1) Download terms from the search index 
There are two techniques for this, facets queries and the terms component. A facet query does a search and returns the terms from all documents which match that search, sorted by count inside each document. This is not often used, as it is slower than the following techniques. The terms component directly walks the entire terms dictionary in the index, giving you the number of times that term appears in a field. It is very fast because it does not filter against searches. Functionally it is the same as pulling facets for "all documents".

2) Wildcards on facets. 
In this technique, when the user types 'mad', the autosuggest does a wildcard search 'mad*' on the facets and finds 'madonna'. Facets are sorted by total number of hits, and so this orders artists by the number of documents. To boost certain artists, the index has to include  that artist multiple times. 

3) SpellCheckComponent 
This technique uses the spellchecking component to provide spelling suggestions. The spellchecker supplies variations on a word. For example, when the user types 'mod' on a music site, the spelling suggestions may include 'Madonna', 'Modern Jazz Quartet', and 'OMD'. This is currently the most commonly used technique. There are a few options for where to get the suggested words, and for creating variations These are all documented in the Solr wiki page for the SpellCheckComponent:

http://wiki.apache.org/solr/SpellCheckComponent

4) Ternary Search Tree (TST) 
A Ternary Search Tree is a memory-only data structure. It is popular for  high-volume implementations of auto-suggest. Netflix, for example, deploys several auto-suggest servers using a custom TST implementation. There is a TST implementation in Solr.

Grant Ingersoll (Chief Architect at LucidWorks) interviewed Walter Underwood about implementing autocomplete for Netflix: 
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Interview-Walter-Underwood

Notes on filtering: 

The TST component, terms component and dictionary-based spellchecker do not do any filtering. That is, they do not allow different users to see different subsets of the full word list. This makes them faster, but not usable for cases where different searchers should see subsets of the available suggestions. Cases of this are when data is multi-tenant in one index, role-based security controls access to documents, or personalizing content for users by countries or languages. The DirectSpellChecker (the default in Solr 4.x and the trunk) can do filtering because it searches directly against the main index. And, of course, facets can also do this.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk