Issue
Why does the number of documents shown in Collection Manager differ from the number of documents shown in Fields Manager, even after a full reindex?
Diagnosis
This discrepancy is usually observed in the Fusion UI after reindexing a collection. For example:
Collection Manager reports:
13,186,210documentsFields Manager reports:
15,113,516index documents for a field
You can confirm this by comparing the output of a *:* query in Query Workbench (which should match Collection Manager) with the field-level document count reported by Fields Manager.
Environment
Fusion 5.11.0
Solr 9.6.1
Kubernetes (AKS)
Cause
The count in Collection Manager reflects the number of live, retrievable documents in the collection.
The higher count in Fields Manager comes from the Solr terms index and includes all documents where the field is present — even if the document has been deleted or overwritten. Deleted documents remain on disk in segment files until Solr merges those segments. During this time, they still count as indexed documents but are no longer retrievable via query.
This behavior is expected and is part of how Solr handles deletions and updates.
Resolution
Understand Solr’s segment merging behavior
Solr deletes documents by marking them as deleted in segment metadata. These documents continue to occupy disk space and contribute to field-level statistics until a merge operation rewrites the segments and removes them physically.
Over time, as indexing continues, Solr will automatically merge segments that have a high proportion of deletions. This process frees up disk space and updates field-level metrics.
Optional: Force deletion cleanup manually
While generally not recommended in production, you can initiate explicit segment merging using commands like forceMerge or expungeDeletes. These are resource-intensive operations and should only be done with caution.
For more information and best practices, see:
Note: These operations should never be triggered automatically by client applications or scripts. If used, schedule them during off-peak hours and validate their impact before running them in production.
Consider blue/green deployment architecture
For clients needing optimized collections, a common approach is to use a blue/green deployment strategy:
One collection handles queries (blue), while another is prepared (green)
The green collection is cleared, reindexed, and optimized
Once ready, green is swapped into production and blue is prepped for the next cycle
This strategy allows you to manage segment merging proactively without impacting query performance.