Issue:
What are signals? How do I configure them, why use them and how do I capture them?
Environment:
Fusion
Resolution:
Note: This article is intended for users running Fusion 3.1.x or 4.x - for other versions, please consult the Fusion Documentation.
What are signals?
Fusion signals are recorded events such as queries and clicks that inform you of user behavior on your application.
Why use signals?
Signals are useful for improving search relevancy and for analytics purposes. More specifically, signals can be used to:
• enrich the results set for that search query, i.e., improve the items returned for that query.
• enrich the information about the item clicked on, i.e., improve the queries for that item.
• uncover similarities between items, i.e., - cluster items based on other clicks on for queries.
• make recommendations of the form: “other customers who entered this query clicked on that”, “customers who bought this also bought that”.
How to capture signals?
To begin using signals, you must first determine which events should be considered signal events. Common event types are click, rating/like/dislike, add-to-cart, and purchase.
For each signal event, you will also determine which attributes you would like to collect about a user to use later for personalization or analytics. Any information that your organization deems important can be captured in a signal. Common attributes to capture are as follows:
• Search term
• Document/product clicked
• Document/product category
• Geo-location
• User behavior and preferences
• User history and past orders
• Device
• User role
• Rating
Your client application is responsible for capturing the signal attributes and then formatting them as a JSON file like the sample listed below:
Sample Signal (click.json file):
{
"timestamp": "2015-06-01T23:44:52.533Z",
"params": {
"query": "ipad",
"docId": "2125233",
"location":"us-west",
"role":"agent",
"filterQueries": [
"cat00000",
"abcat0100000",
"abcat0101000",
"abcat0101001"
]
},
"type": "click"
}
Once you've captured signals events and their associated attributes, you can post this information to the Fusion Signals API. Each of these separate signal events gets submitted to a signals index pipeline which formats the signal and then indexes it to a “signals collection”, e.g. products_signals.
Signals API:
curl -u admin:password -X POST -H ‘Content-type:application/json’ -d@click.json “http://fusion-host:8764/api/signals/{primary_collection}”
Aggregating Signals
The following screenshots walk you through locating Fusion’s built in click_signals_aggregation job in Fusion 4.
1. Select the “Collections” panel in the Fusion menu and navigate to “Jobs”.
2. Use the search filter in the Jobs panel to search for “click” and filter down to click_signals_aggregation jobs.
3. Open the {primary_collection}_click_signals_aggregation job and enable Advanced Settings.
The click_signals_aggregation job is configured to aggregate the raw signals which exist in the _signals collection. You can modify this configuration to aggregate based on the attributes you're capturing in a signal. For example, a common technique is to aggregate signals based on a user's location. To do this, you will enable “Legacy Aggregations” and add the "location" field to the GroupingFields property in the aggregation job configuration. You could also write your own SQL aggregation to aggregate data using your own custom logic.
It is good practice to keep raw signals and back them up for analytics purposes or in case an aggregation job fails.
Use the Fusion scheduler tool to schedule aggregation jobs.
Set the interval to a time period which supports your use case and business needs.
Boosting with Signals
The final step in implementing signals is to boost popular documents/products using the signals data you’ve gathered. Fusion does this for you using the “Boost with Signals” query pipeline stage.
Using this stage, you can specify which field you would like to boost results on, the number of signals to process, any custom parameters you would like to send in when retrieving signals, and more. You can also use the “Scale Boosts” property to scale the overall weights (scores) of boosted results within a finite range. This prevents extreme re-ranking of boosted results by normalizing all result scores.
The Cold Start Problem
The "cold start" problem means it is hard to personalize the search experience when insufficient signals have been captured and aggregated. You will likely run into this issue when you first implement signals and do not have enough signals data to offer true automated relevancy tuning. This issue also presents itself when offering personalized recommendations and a new user visits your application. A common solution to this issue is to boost on freshness (boost by recency) or on some other field until you’ve aggregated enough data to automate relevancy.
Summary
1. Capture signal events and attributes and POST them to the Fusion Signals API.
2. Aggregate signals using the click_signals_aggregation job.
3. Use the "Boost with Signals" query pipeline stage to boost popular results.
Additional Resources:
Blog post introduction to signals
Cause:
Comments
0 comments
Please sign in to leave a comment.