Deep diving into the trending recommendation job – Lucidworks

Environment

Fusion 5.5, 5.6, 5.7, 5.8, 5.9, 5.10, 5.11

Pre-requisite for the job

Training Collection : <Your_Main_Collection>_signals Collection
Output Collection : <Your_Main_Collection>_trending Collection
Click or Response signals should be present in your <Your_Main_Collection>_signals Collection (Based on what signal events you want to generate the recommendations)
doc_id and query field should be present for the respective signals.

Job Guide

Use this job to identify spikes in popularity for specific items or queries, then display those items to your users or analyze the trends for business purposes.

Trending items or Trending queries

To fetch the trending item or trending queries , you need to change the value of Document ID field in the job configuration. I have added below the sample output document from the trending recommendation job for both the types.

To fetch the trending items , you need to add Document ID Field : doc_id in the job configuration

To fetch the trending queries , you need to add Document ID Field : query in the job configuration

{
"query" : "hcms strategy and services testing",
"ref_hits" : 5,
"ref_rank" : 1,
"trgt_hits" : 5,
"trgt_rank" : 1,
"vol_diff" : 3.333333333333333,
"average_weekly_vol" : 1.6666666666666667,
"hit_vol_ratio" : 3.0,
"combine_score" : 10.0,
"vol_diff_ratio" : 1.9999999999999998,
"ref_wt_vol_diff_ratio" : 9.999999999999998,
"vol_diff_wt_vol_diff_ratio" : 6.666666666666665,
"log_diff_wt_ratio" : 4.203972804325936,
"trend-type" : "prds_weekly",
"id" : "10613eb1-4f10-4369-b7a4-834a8a8b8f49",
"_version_" : 1785723116807258112,
"score" : 1.0
}

{
"doc_id": 250,
"ref_hits": 1,
"ref_rank": 1,
"trgt_hits": 1,
"trgt_rank": 1,
"vol_diff": 0.5,
"average_weekly_vol": 0.5,
"hit_vol_ratio": 2,
"combine_score": 1,
"vol_diff_ratio": 1,
"ref_wt_vol_diff_ratio": 1,
"vol_diff_wt_vol_diff_ratio": 0.5,
"log_diff_wt_ratio": 1.3068528194400546,
"trend-type": "prds_weekly",
"id": "284f930d-d750-49a2-90ac-be4692bddda9",
"_version_": 1682995654191743000
}

Fetching the latest Relative recommendation based on historical data

Using Reference time days and Target time days

You need to use the log_diff_wt_ratio to sort your result set based on latest recommendation but every time you re-run the job. the existing recommendation are going to overlap with previous signals data.

Let analyze the situation using the below figures. Since you are starting the job for the very first time with 30 days of data present in your signals collection. Usually customer's keep 30 days of signals in their raw signals collection (signal clean up job mostly have 30 days configured by default). Now you want to analyze the trend spikes in any product document for the last 10 days compared with the first 20 days of signals data. In that Case you will have configuration as such

30 days of data in your signals collection - RefTimeDays (20 days) - TarTimeDays (10 days)

What is going to happen in this case is that its going to analyze the 30 days of signals data by only focusing on last 10 days spikes in any product.

Another use case this solves is that if you only want to analyze the trends on the fresh signals data. Lets Say , you want to have the latest set of recommendation , then you can have the TarTimeDays set for days you have the fresh signals data and RefTimeDays set for the days you have the historical data in your signals collection.

Common Issues while working with the trending recommendation job

Missing log_diff_wt_ratio field in the output collection

Note that items with a negative ratio will not have this field since it's a logarithmic field which allows you to sort items which are trending upwards.
Items with a negative weight (e.g. they are trending downwards) will not have this field, since you cannot take the logarithm of a negative number.
Using the log scale here gives more meaningful results, If you sort by the log_diff_wt_ratio field in query workbench you get some nice results.

Filtering your data set based on recommendation

Filtering your data set won't be possible based on timestamp or any other filter field. Since these fields are not present in the output document of the job. Also the whole concept is to sort based on the log_diff_wt_ratio to identify the trending queries or trending items. if you still wants to leverage any filter field logic, then you first need first run the job to fetch trending items which is going to have all the doc_id which can further be used to fetch the desired data out of your main collection and accordingly you perform the partial update on your trend collection if you want to leverage any filtering logic in your trends data.

Reference articles

How Trending job works :

https://doc.lucidworks.com/fusion/5.10/yihzwd/trending-recommender-jobs?q=trending%20recommender%20jobs

Job configuration in detail :

https://doc.lucidworks.com/docs/managed-fusion/07-improve-your-queries/recommendations/trending-items-queries#identify-trending-items-or-queries