Goal
Export data from an existing Solr collection and import it into a new collection—commonly done for archiving, data segmentation (such as by year), or testing in lower environments.
This article outlines how to correctly perform large-scale data exports and imports using Solr’s /export and /select handlers, especially when working with collections exceeding 10 million records.
Environment
Fusion environments running Solr 8.x (including Solr 8.11.2)
Applicable for:
Clients managing large datasets (>10M documents)
Use cases requiring year-wise export, schema compatibility, or full collection duplication
Guide
Prepare schema for export
The /export handler requires that:
All fields listed in the
flparameter havedocValues="true"The field used for
sortmust be:Single-valued
Have
docValues="true"
If some fields do not meet this, consider using <copyField> directives in the schema to point to fields that do support docValues.
Example workaround:
<copyField source="legacy_field" dest="legacy_field_docval"/>
<field name="legacy_field_docval" type="string" docValues="true" stored="true"/>Export documents from a collection
Use Solr's /export handler via curl. This handler supports efficient streaming of large result sets.
curl --user USER:PASSWORD "http://localhost:8983/solr/<collection_name>/export?q=*:*&fq=fiscal_year:2025&sort=id+asc&fl=field1,field2,field3" -o /path/to/output.jsonImportant notes:
Wildcards in
fl=*are not supportedAll fields in
flmust be explicitly listed and must havedocValues=true/exportdoes not returnnumFoundin the response
Verify document counts
Because /export does not return total document counts (numFound), run a separate /select query to get the actual expected count:
curl "http://localhost:8983/solr/<collection_name>/select?q=fiscal_year:2025&rows=0&wt=json"This returns the numFound field in the response for validation.
To count the number of records actually exported to disk:
jq -c '.response.docs[]' /path/to/output.json | wc -lAlternatively:
jq '.response.docs | length' /path/to/output.jsonUnderstand distributed export limitations
The /export handler is not a distributed search. This means:
It must be run on each shard/replica individually
Consolidate results manually if needed
Data mismatch may occur if soft commits are pending or if replicas are not fully synchronized
To avoid discrepancies:
Export only from leader replicas of each shard
Ensure soft commits are flushed before export
Run
/exportand/selecton the same replica if verifying counts
Import data into a new collection
Import data using the /dataimport handler or direct indexing (e.g., POST to /update/json).
Ensure:
Schema compatibility between source and target collections
Field mappings align for the import method used
Example using curl to POST exported data:
curl --user USER:PASSWORD "http://localhost:8983/solr/<new_collection>/update?commit=true" \
-H "Content-Type: application/json" \
--data-binary @/path/to/output.jsonFor structured ingestion, transform the data as needed to match the schema before importing.
Additional considerations
Use the
rowsparameter with/selectif performing test exportsFor collections with
uninvertiblefield types oruseDocValuesAsStored,docValues=trueis required or Solr will return errorsWhen working with over 10 million records, break the export into year or segment-based queries using filters (
fq)
Let us know if you need further help with schema changes or handling bulk ingestion workflows.