Run cross-collection joins using Fusion query pipelines – Lucidworks

Goal

Run a cross-collection query in Fusion that joins documents from one Solr collection to another, applies user-based authorization filtering, and returns the joined results through a Fusion query pipeline.

Environment

Fusion 5.9.x deployed on Kubernetes
Apache Solr 9.x

This approach applies to self-hosted Fusion clusters where collections are configured to support cross-collection joins.

Guide

Use Solr cross-collection join syntax

Fusion query pipelines ultimately execute against Solr. Cross-collection joins can be performed using Solr’s join query parser with the crossCollection method.

Example Solr JSON request:

{
  "query": "kyle",
  "filter": "{!join from=id to=id fromIndex=DirectReports method=crossCollection}authGroup_benefits_ss:\"fdb6ac16-a0fb-4f53-99a5-c956288b1a1e\"",
  "limit": 50,
  "offset": 0,
  "fields": "*"
}

In this example:

fromIndex specifies the secondary collection.
from and to define the join fields.
The filter clause limits the joined dataset based on an authorization field.
The main query is executed against the target collection after the join filter is applied.

Ensure that:

Both collections exist in the same SolrCloud cluster.
Replicas are deployed to support distributed joins.
The join fields are indexed and compatible.

Pass join filters through a Fusion query pipeline

To execute this logic through Fusion:

Open the Fusion UI.
Navigate to Query Pipelines.
Select the pipeline associated with the application.
Add or edit stages to construct the join filter dynamically.

Fusion’s Solr Query stage forwards filter queries to Solr. The join expression must be constructed before this stage executes.

Build the join filter using a JavaScript query stage

A JavaScript stage can dynamically construct the filter query based on the logged-in user context.

Example logic:

var authGroup = request.getParam("authGroup");
var joinFilter = '{!join from=id to=id fromIndex=DirectReports method=crossCollection}' +
                 'authGroup_benefits_ss:"' + authGroup + '"';

request.addFilterQuery(joinFilter);

This stage:

Extracts a user-specific authorization value.
Builds the join filter.
Appends it to the request before the Solr Query stage runs.

Ensure that:

The Solr Query stage remains after the JavaScript stage.
The filter is added using addFilterQuery.
The parameter source is validated and sanitized if required.

Use Fusion security trimming stages

If the requirement is user-based authorization filtering, consider using Fusion’s built-in security trimming stages:

Security trimming query stage
Security trimming graph query stage

These stages generate filter queries at query time and correctly handle result counts and facet calculations.

They are appropriate when:

An ACL collection exists.
Authorization metadata is maintained separately.
Consistent security enforcement across applications is required.

Use an ACL collection strategy

For complex authorization models:

Build a dedicated ACL collection.
Populate it using a connector or batch process.
Maintain mappings between user identities and document identifiers.

At query time:

Use security trimming stages or custom join filters.
Avoid complex runtime logic where possible.

This approach provides better maintainability and separation of concerns.

Use single-collection filtering with partial updates

If authorization logic is straightforward:

Store authorization fields directly in the primary collection.
Use a simple filter query instead of a join.

Example:

fq=authGroup_benefits_ss:"fdb6ac16-a0fb-4f53-99a5-c956288b1a1e"

To keep authorization data current:

Use the Solr Partial Update Indexer stage.
Update only the authorization field as needed.
Avoid full reindexing when possible.

This approach reduces query complexity and improves performance compared to cross-collection joins.

Validate deployment configuration

Before using cross-collection joins:

Confirm that Solr collections are deployed with adequate replicas.
Ensure ZooKeeper coordination is healthy.
Verify that inter-collection communication is permitted within the Kubernetes namespace.
Test join performance under expected load.

Cross-collection joins introduce distributed query overhead. Evaluate performance impact in non-production environments before rollout.

This approach allows Fusion query pipelines to support cross-collection joins while maintaining flexibility for authorization and additional processing stages.