Discrepancy between document counts in Collection Manager and Fields Manager – Lucidworks

Issue

Why does the number of documents shown in Collection Manager differ from the number of documents shown in Fields Manager, even after a full reindex?

Diagnosis

This discrepancy is usually observed in the Fusion UI after reindexing a collection. For example:

Collection Manager reports: 13,186,210 documents
Fields Manager reports: 15,113,516 index documents for a field

You can confirm this by comparing the output of a *:* query in Query Workbench (which should match Collection Manager) with the field-level document count reported by Fields Manager.

Environment

Fusion 5.11.0
Solr 9.6.1
Kubernetes (AKS)

Cause

The count in Collection Manager reflects the number of live, retrievable documents in the collection.

The higher count in Fields Manager comes from the Solr terms index and includes all documents where the field is present — even if the document has been deleted or overwritten. Deleted documents remain on disk in segment files until Solr merges those segments. During this time, they still count as indexed documents but are no longer retrievable via query.

This behavior is expected and is part of how Solr handles deletions and updates.

Resolution

Understand Solr’s segment merging behavior

Solr deletes documents by marking them as deleted in segment metadata. These documents continue to occupy disk space and contribute to field-level statistics until a merge operation rewrites the segments and removes them physically.

Over time, as indexing continues, Solr will automatically merge segments that have a high proportion of deletions. This process frees up disk space and updates field-level metrics.

Optional: Force deletion cleanup manually

While generally not recommended in production, you can initiate explicit segment merging using commands like forceMerge or expungeDeletes. These are resource-intensive operations and should only be done with caution.

For more information and best practices, see:

Apache Solr 9.6 documentation on segment merging

Note: These operations should never be triggered automatically by client applications or scripts. If used, schedule them during off-peak hours and validate their impact before running them in production.

Consider blue/green deployment architecture

For clients needing optimized collections, a common approach is to use a blue/green deployment strategy:

One collection handles queries (blue), while another is prepared (green)
The green collection is cleared, reindexed, and optimized
Once ready, green is swapped into production and blue is prepped for the next cycle

This strategy allows you to manage segment merging proactively without impacting query performance.