Search and index nested documents while avoiding child document duplication – Lucidworks

Goal

Enable search functionality across both parent and child documents such that:

Child documents are only returned within their parent document.
All parent and child fields are searchable.

Environment

Fusion 5.x

Guide

Querying: Return parent documents with children, excluding standalone children

To return only the parent documents while embedding any matching child documents, use the fq=NOT _nest_path_:* filter. This ensures that only top-level (parent) documents are returned in the results:

q=your_search_term&start=0&rows=50&fl=*,score,[child]&fq=NOT _nest_path_:*

This query uses the child document transformer to include nested child documents with their parent and avoids returning child documents separately.

Querying: Search both parent and child fields

To ensure both parent and child fields are searched and still return only the parent documents with their children embedded, use a block join query. Here's an example:

q=(term OR {!parent which='*:*'} term)&fl=*,score,[child]&fq=NOT _nest_path_:*

Replace term with the desired search term. You can also restrict the which clause to only match parent documents based on a field:

q=(term OR {!parent which='positionCode_t:ABC' v='term'})&fl=*,score,[child]

This technique ensures that if either a parent or its child matches the term, the parent is returned along with all children.

Scoring considerations for child documents

When using block join queries, scoring is only applied to parent documents. Child documents do not receive individual scores and will show 0.0 in the results. This is a limitation of how scoring works with Solr's block join architecture.

Indexing: Handling nested documents in Fusion

Fusion's index pipeline does not preserve nested JSON structure by default. Instead, it flattens documents unless explicitly configured otherwise.

To retain child documents during indexing:

Option 1: Use the Solr Partial Update Indexer Stage to manage child documents if using Solr’s atomic update API.
Option 2: Index directly into Solr using _childDocuments_ format and bypass Fusion pipelines.
Option 3: Use a custom JavaScript stage to extract child nodes and index them separately, manually maintaining relationships.