Goal
Determine the number of tokens in a user query during query processing and dynamically adjust the query logic (e.g., switching between
AND and OR operators) based on the token count. This is particularly useful when optimizing for both precision and recall in multilingual environments.Environment
Fusion 5.5.1+
Guide
To adjust query behavior based on the number of tokens in the user query, use the Solr analysis API within a custom JavaScript query pipeline stage. This allows you to retrieve token information in real time and make conditional modifications to the query request.
Step 1: Make a call to the Solr Analysis API
Use the Solr
/analysis/field endpoint to analyze the input text using a specific field type. This ensures tokenization aligns with Solr’s internal processing, including filters like stopwords and stemming.Example API request:
GET /api/solr/<collection-name>/analysis/field?wt=json
&analysis.fieldvalue=How to find a splunk forwarder
&analysis.fieldtype=text_enReplace:
<collection-name>with the name of the Solr collectionanalysis.fieldvaluewith the user queryanalysis.fieldtypewith the appropriate Solr field type based on the query language
The API response will return a breakdown of tokens at various stages of analysis. Use the last element of the
index array for the most complete processed token list (e.g., output from filters like PorterStemFilter).Step 2: Parse the token count in a JavaScript query pipeline stage
Add a JavaScript query stage to your pipeline that makes the Solr analysis API call, parses the JSON response, and counts the number of tokens from the final stage of analysis.
Example JavaScript snippet:
var http = require("http");
var fieldValue = request.queryParams.q; // Or wherever the query string is sourced
var lang = request.queryParams.lang || "en"; // Assumes language is passed in
var fieldTypeMap = {
"en": "text_en",
"de": "text_de",
"fr": "text_fr"
};
var fieldType = fieldTypeMap[lang] || "text_en";
var analysisUrl = "/api/solr/my_collection/analysis/field?wt=json" +
"&analysis.fieldvalue=" + encodeURIComponent(fieldValue) +
"&analysis.fieldtype=" + fieldType;
var response = http.get(analysisUrl);
var tokens = [];
if (response && response.analysis && response.analysis.field_types && response.analysis.field_types[fieldType]) {
var indexArray = response.analysis.field_types[fieldType].index;
var finalStageTokens = indexArray[indexArray.length - 1]; // Use last filter stage
tokens = finalStageTokens.map(function(token) {
return token.text;
});
}
if (tokens.length < 4) {
// Apply AND logic
request.queryParams.mm = "100%";
request.queryParams.q.op = "AND";
} else {
// Apply OR logic with mm
request.queryParams.mm = "70%";
request.queryParams.q.op = "OR";
}Notes:
- Ensure the JavaScript stage is placed before any parsing or Solr query execution stages.
- This approach works in environments with multiple languages by mapping query language to the appropriate Solr field type.
- Field types must be configured in your Solr schema with appropriate analyzers and filters.
Step 3: Configure field type mapping
For multilingual support, maintain a mapping from language codes to Solr field types. Each language should have a field type defined with proper analyzers, such as
text_en, text_de, etc.Update the field type map in the JavaScript stage to match your schema configuration.
Additional tips
- Use the REST Query pipeline stage if you prefer to externalize the analysis API call outside of the JavaScript stage logic.
- For debugging, inspect the full token output of the analysis API to understand how various filters (e.g.,
StopFilter,LowerCaseFilter,PorterStemFilter) transform the input text. - When extracting tokens, always use the last filter stage as the source of truth unless your use case requires otherwise.