How to implement Multilingual Search using Solr

I am going to show you how to implement multilingual search in Solr. This solution is based on the assumption that we know the language of the incoming document  is in and the language in which the query could be in. 

This use case is fairly common - I can index news from different countries ( I know where the new article originated ) and while searching I know which region the user is searching from.

Indexing

I index data into it's language field. Solr's default schema.xml provides analyzers for several languages. So we need to define fields against these field types and index the incoming documents in that field. 

<field name="body_en" type="text_en" indexed="true" stored="true" multiValued="true"/>

<field name="body_de" type="text_de" indexed="true" stored="true" multiValued="true"/>

You can define additional fields for other languages.

Now all documents need to be indexed against their respective fields.

Querying

While querying I know which region the user is searching from. I am going to search across all languages but boost documents from the regions native language so that they appear on top.

Suppose a user from an English speaking region searches for "fusion"

This is what my query in Solr would look like -

&q=fusion

&defType=edismax   // Use the Extended Dismax Query Parser query parser

&qf=body_en^5 body_de^1 //Boost english documents in this case

 

And that's it. You can search across all languages and boost dynamically based on region.

Also the book Solr in Action by Trey Grainger and Timothy Potter has a great chapter on it too.

Happy searching!

 

 

 

 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk