Issue
Search queries return the same number of matching documents across two environments, but the ordering of results and the maximum score values differ. This occurs even when data, schema, query pipelines, and query formation are identical.
Diagnosis
To confirm if collection size differences are affecting ranking:
In Fusion, open the Query Workbench for the collection in question.
Set the view mode to Debug to inspect scoring components.
Compare the
idf(inverse document frequency) andtf(term frequency) values for the same document across the two environments.explain https://someurl.com/page 3.088692 = weight(_text_:something in 0) [SchemaSimilarity], result of: 3.088692 = score(freq=4.0), computed as boost * idf * tf from: 3.6557271 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: 11 = n, number of documents containing term 444 = N, total number of documents with field 0.8448913 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from: 4.0 = freq, occurrences of term within document 1.2 = k1, term saturation parameter 0.75 = b, length normalization parameter 344.0 = dl, length of field (approximate) 712.8108 = avgdl, average length of fieldCheck the total document count in each collection.
Example command to view collection document counts in Solr:
curl -u USERNAME:PASSWORD "https://FUSION_HOST/api/solr/COLLECTION_NAME/select?q=*:*&rows=0"In the JSON response, review the numFound value.
If idf values differ significantly, it is likely due to the collections having different total document counts.
Environment
Any Fusion version. Applicable to all collections using the default Solr tf-idf-based scoring.
Cause
In Solr’s default similarity algorithm, idf is calculated based on the total number of documents in the collection, regardless of how many match the query. When two collections contain the same matching documents but have different total document counts, idf values for the same terms will differ. This leads to different final scores and potentially different ranking order.
Resolution
To achieve consistent ranking between environments:
Ensure that collections being compared contain the same total number of documents.
If only a subset of documents is needed in one environment, expect score and rank differences due to
idfvariations.For testing or staging environments, mirror the full production index when validating search order.