Fusion Ground Truth job – Lucidworks

Goal

What are Fusion Ground Truth and Ranking Metrics job and how can we set one up?

Environment

Fusion 4.2.6 & above, Spark

Guide

Fusion’s Ground Truth and Ranking Metrics job is used to obtain an objective measure of search quality a given query pipeline has. This will allow us to iterate, test and measure different relevance models in offline manner.

Overview

Ground truth can be obtained by using Signals : Signals are analysed to infer which documents are relevant to each query based on users click/skip behaviour.

Steps to configure

Step 1: From Fusion UI -> Create a new app (e.g. Rakesh-TestGT)

Step 2: Select Rakesh-TestGT App -> Datasources -> Add any datasource of your choice (for demonstration I’ll go a head and create a web datasource - TestWebDS give any name of your choice).

Configure the web datasource by providing a startLink & in this case I would like to limit the crawl upto 100 documents.

Save the datasource -> start the job -> & wait until it successfully complete the crawl.

Step 3: Ground Truth Job requires users click and response signal types, so we will have to send some click signals to <YOURAPPNAME>_signals collection (which is an auxiliary collection created by Fusion).

We will be using Fusion's built in APP STUDIO UI to send click signals.

Step 4: From Fusion UI -> APP STUDIO -> Build new UI

Select your profile -> Click Next.

In the Next screen it will analyse your data -> Click Next.

Step 5: Leave default configuration under “Set Result Title” -> Click Next.

Step 6: Leave default configuration under “Set Result Description” -> Click Next.

Step 7: Leave default configuration under Set Facets and Additional Fields -> Click Next.

Step 8: Customise UI -> Enter a Title of your choice (GroundTruthTest) Select a theme of your choice -> Click on Save and Launch UI

It will launch a app studio UI, where you can search your documents and send click signals.

Step 9: While launching the UI if a popup occurs -> click ok.

From the app studio UI -> Pass a search query of your choice in the search box (backpack) -> hit search -> it will list down all the documents that has backpack term in it.

Send a click signal by clicking on any of the three documents listed in the UI, ensure to select documents with different titles to get more clarity on how ground truth works.

Step 10: Repeat step 9 with a different search term of your choice (waterfall) and send 3 more click signals

Step 11: In Fusion admin Ui -> jobs -> Run the YOURAPPNAME_click_signal_aggregation job and wait until it returns a success.

Step 12: Fusion admin UI -> query workbench -> Select <Yourappname>_signals collection -> Add a field facet of (Type) & (query)

Step 13: Filter on query (backpack) and (click)

Copy the value of “fusion_query_id”, So this click signal should have a corresponding response signal tied to it.

Step 14: Clear all the existing filters, and now filter on query (backpack) and response

Step 15: Under Fusion query pipeline -> add a new pipeline -> disable all stages except for Query Fields, Field facet, Solr Query and save the pipeline.

Step 16: Fusion UI -> Query Profile -> Select your query profile -> Enable experiment.

Follow the screen shots to create a ground truth job.

Note: While capturing the above screen shot I forgot to fill the Base collection for signals field.So I’ve added it as a comment later.

Step 17: After saving the experiment you should be able to see a Run Experiment button.

Please go ahead and hit the Run Experiment button, it will start to create 2 jobs unders jobs and an AppInsights dashboard that will capture the ground truth data.

Step 18: Fusion UI -> Jobs -> in the search bar, start typing relevance, you should see 2 jobs created

Exp-Rakesh-TestGT-groundTruth-relevance-metric (Ground Truth)
Exp-Rakesh-TestGT-rankingMetrics-relevance-metric (Ranking Metrics)

Step 19: Leave the jobs config as it is, go ahead and run the Ground Truth job, on success you can verify the docsWritten

Step 20: run the Ranking Metrics job, on success you can verify the groundTruthQueries

Step 21: Fusion UI -> App Insights -> Experiment Results -> Click on the experiment to choose from (Exp-Rakesh-TestGT) it will launch a new dashboard.

We can also export the data to an excel sheet & use it for analysis.

How is weight calculated?

Lw : Using Click/Skip behaviour.

Cdoci,Sdoci=doci Reldoci=CdociSdoci+Cdoci

The no. of times document is clicked and the no. of times doc is skipped.

So for the query waterfall, we have three documents, which means the third document was clicked once and skipped twice therefore the result is 0.3333

References

Ground Truth Jobs