Issue description
Fusion deployments using the Prometheus solr-exporter integration may intermittently encounter the following error in logs:
SolrCore is loading
503 Service Unavailable
org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrExceptionThis message appears when the solr-exporter attempts to collect metrics from a Solr core that is still initializing.
Cause
This issue typically occurs when Prometheus attempts to scrape metrics from a Solr core before it has completed its loading process. During this time, Solr returns a 503 Service Unavailable response with the message SolrCore is loading. This may happen:
Shortly after a Solr pod has started
During core reloads (e.g., triggered by config changes)
If a core is slow to initialize due to index size or I/O constraints
The solr-exporter logs this error but does not retry immediately, which can lead to transient gaps in metrics or alerts depending on Prometheus configuration.
Recommendations
Retry behavior: Allow Solr more time to initialize. This error is usually transient and resolves automatically once the core is fully loaded.
Prometheus scrape interval: Consider increasing the scrape interval slightly to avoid hitting Solr before initialization completes.
Alert thresholds: If alerts are triggered based on metrics availability, review threshold sensitivity to avoid false positives due to short-lived 503s.
Monitoring slow core loading: If this message appears persistently or frequently, investigate the affected core’s size, schema, and startup behavior. Persistent loading issues may require tuning of Solr heap or filesystem I/O performance.
Additional notes
This issue does not typically indicate a fault with solr-exporter itself. However, if metrics remain unavailable for a core long after startup, or if this behavior is observed consistently during normal operation, contact the Lucidworks Support Team for deeper investigation.