Issue:
Tutorials or examples for the given example docs - Queries for Query or Query to Query Similarity.
docs -
https://doc.lucidworks.com/fusion-ai/4.2/concepts/boosting/queries-for-query.html
https://doc.lucidworks.com/fusion-ai/4.2/reference/jobs/query-to-query-similarity-computation.html
Environment:
Fusion 4.2
Resolution:
Query to Query Similarity job output is used to support recommendations on Search Results Pages, specifically the Items for Query which is one of the outputs from this job. Details on this recommender are here:
https://doc.lucidworks.com/fusion-ai/4.2/concepts/boosting/items-for-query.html
As an example, if you search on "red jacket", similar queries might be "green jacket" or "wind jacket" or "green coat." Many customers might have something like a related queries widget and those could be sourced from the query for queries output from the similarity job.
In 4.2, the “Query-to-Query Similarity” job is fed from regular click signal aggregations as completed by the job that feeds query time signal boosts, typically “_click_signals_aggregation”.
This job is an ALS job just like the “ALS Recommender” which produces items for item and items for user recommendations and will take resources and time to run. When tuning this job, we suggest starting with the Training Data Filter Query to filter on weight_d as generated from the click signals job in order to limit the amount of data fed into this job.
Out of the box Fusion Query Pipelines have two stages that exist today: the items per item and the items per user. You could drop those into a pipeline and they directly interact with the output from the item recommender job. However there is no corresponding stage out of the box that goes directly against the queries for query or the items for query. So what you have to do is reuse the two stages items for item and items per user and then configure them so those two stages go against the queries for query and items per query.
Query Similarity Recommender Job Output -
The records that the query similarity job outputs are bungled out-of-the-box. However, all of the dots can be matched up. When you run the query similarity job, it will create a record like the one below where it will have item ID and userId. The item Id is the query (the query terms) and the userId is the item or document ID. This might be confusing as we talk about document IDs and the recommenders talk about items. We can consider that itemId is a generic term when we use it in the scope of the recommendations job this can basically mean that it is the documentID, productID, or any other ID that you are keying off of.
A sample below:
Augmenting your Query Pipeline with the “Recommend Items for Item” Stage
In your pipeline, you can drop in the recommended items for item which is an out of the box stage, and you can use that for the item per query. The Boost Field indicates when the stage finishes it will create a bunch of boosts, and this field is what it will boost on. The Boost Param is where it puts the doc id. The item ID and User ID you could be keep the same, just know that those have already been mapped to query term and the documentID. This acts very similar to a click signal boost stage and the outputs from this will add boosts to the documents that came out of this query similarity job. Below is an example:
Parameter |
Setting |
Notes |
Boost Field |
id |
This field is what it will boost on. Change this to the document/product id that makes sense in your implementation |
Boost Method |
query-param |
Or query-parser as necessary for your application |
Boost Param |
bq |
Or boost, depending on what makes sense for your application. You could consider post-processing the bq/boost parameter and perform a subsequent subquery to retrieve metadata to augment the final query pipeline payload. |
Item ID Request Parameter |
preparedSearchTerms (or q) |
Search terms typically come into the pipeline in the q request parameter. |
Item ID Field |
itemId |
This lines up with how the job bungles up the fields. |
Recommended item ID Field |
userId |
This lines up with how the job bungles up the fields. |
Configuration Tuning the Query to Query Similarity Job
- Run the job out of the box
- Look at the output of the job and do the mapping. We recommend these settings for the query_to_query similarity properties:
Property | Initial Value | Notes |
Training Data Filter Query | *:* | This means that all the signals have a query term. Make sure you put some filter here so you are only sending the signals you want to be considered. |
Training Collection Item Id Field | page_id | Out of the box this is document_id or item_id, you want to make sure this is whatever is in your source collection. |
Grid Search Width | 0 | You should set this to off. If you set it to 1, when the job finishes running it will spit information into the log file and the system will try to come up with the best settings. This would help you to further tune the job, but it will make the job run longer. |
- Use it at query time. This is where you drop either one of the items_for_user stage or the items_for_items stage into a pipeline. These two stages map to the queries for query and the items for query. Once you drop those into a query pipeline, those two stages will go and grab the recommendations that are created from the job. Once you have those recommendations, then you can do some post processing on it or leave it as is and have it go out in the payload, or leave it as boosts so they boost some existing results depending on your use case.
Comments
0 comments
Please sign in to leave a comment.