Technical documentation about how to setup and configure Correlated Items.
The input into the correlated item algorithm is sale transaction data (Point-of-sale data), i.e. what each customer puts in their cart and then purchases.
So the first thing we need to do is to data-map sale transaction and transform them to the right format.
The process goes like this:
[nav].[TransSalesEntry]
.[raw].[v_SALE_TRANSACTION]
to table [raw].[SALE_TRANSACTION]
.[ml].[sale_transaction_stg_to_ready]
. The desired format is so called basket format where we have item_no or item_id of all items that were bought together in the same transaction.[ml].[sale_transaction]
[ml].[sale_transaction]
For sale transactions there are two settings:
setting_key | Default setting_value | Description |
---|---|---|
ml_days_back_sale_transaction_data |
35 | Number of days to retrieve new sale transaction data |
ml_days_back_keep_sale_trans_baskets |
366 | Number of days to keep sale transactions in [ml].[sale_transaction] table - older ones will be deleted |
(Remember that the standard method is to run the data mapping for correlated items once every month.)
The default setting is to keep sale transaction for 1 year (366 days) back in time.
Then, as the job is ran once every month, each month we retrieve sale transaction from the last 35 days and delete transaction older than 1 year (see image):
The first time setting up correlated item, we want to retrieve sale transactions over the whole period we are interested in finding correlated items.
It depends on each customer how long period this period should be. The standard setup is to gather in the beginning sale transaction over 1 year.
So when first setting up, we need to change the setting that controls how long back in time we are gathering transactions.
Change ml_days_back_sale_transaction_data
, from 35
to 366
.
Next, execute the stg_group for sale transaction, use this code snippet:
EXEC [core].[execute_stg_run] @group_list='ml_sale_transaction'
Note, this run can take quite some time to execute the first time, it depends on the size of [raw].[SALE_TRANSACTION]
table.
Now we have sale transaction data on preferred format ready in table [ml].[sale_transaction]
for 1 year.
So there is no need to always get the sale transaction data 1 year back in time, so we change the setting back to 35 days. (Change back the setting ml_days_back_sale_transaction_data
, from 366
to 35
).
Now when the monthly job will run, we get new sale transaction data from the last 35 days(ml_days_back_sale_transaction_data
), but delete all data older than 366 days(ml_days_back_keep_sale_trans_baskets
)
Note that the choice of the periods depends on each customer. If there are few transaction data, you can retrieve data further back in time.
And if we have a lot of transaction sale data, it could be wise to use sale transactions from a shorter period, as the POS data can be very heavy.
The process for finding correlated items from sale transaction data is ran with stg_group ‘ml_correlated_item’. But broken down into steps, it goes like this:
The process for correlated item goes like this:
[ml].[correlated_item_run]
.
[ml].[correlated_item_create]
, which contains the python code for correlated item calculcations. This procedure should not be changed at any time..[ml].[correlated_item_stg]
[ml].[correlated_item]
There are four settings used in the correlated algorithm:
setting_key | Default setting_value | Description |
---|---|---|
ml_correlated_item_confidence | 0.1 | Confidence tells how often the rule has been found to be true (see description below) |
ml_correlated_item_level | item_sku | Possible values are ‘item_no’ or ‘item_sku’ - tells what item level should be used |
ml_correlated_item_maximum_items | 20 | Maximum number of calculated correlated items that are inserted into the ml.correlated_item_stg table |
ml_correlated_item_support | 0.01 | Support tells how frequently the parent item appears in the database (see description below) |
When you have altered the setting values to your customer needs, it is now time to run the correlated item process.
Run this code snippet:
EXEC [core].[execute_stg_run] @group_list='ml_correlated_item'`
This run can take some time to execute. For Krónan a supermarket chain in Iceland it took on around 30 minutes.
But the run time depends on the setting values and the amount of sale transactions.
It is possible to filter the correlated items results in the process when mapping to prod.
Use the stg_element ml_correlated_item_stg_to_prod
and change data_merge_filter
field to alter the results that are shown on prod.
Examples could be:
** add screenshot as an example **
If you want to learn more about how correlated items works take a look at this article.
If the lift value is higher than 1, it means that the occurrences of two items are dependent on one another. If lift is lower than 1, it means that the presence of one item has a negative effect on the presence of the other item. So, we are only interested when the lift is higher than 1. And the higher the lift gets, the greater are the chances of preference to buy item B when already having item A in the cart. Meaning, the higher the lift the more relevant the correlated items are. Sometimes, lift is referred to as the measure that can help store managers to decide product placements in stores.
Support tells how frequently the parent item appears in the database.
Value of support helps us identify the rules worth considering for further analysis. For example, one might want to consider only items that occur at least 50 times out of 10,000 transactions i.e. support = 0.005.
So we need to consider how often we like the item to be out of total transaction.
E.g. if we have 100.000 transactions over 1 year, and we would like to only consider items that were bought on average at least twice every day, we get support=(2*730)/100.000 = 0.0073.
Keep in mind that as the support level gets lower, the algorithm will yield more correlated item results and it will also take longer time for the run to execute.
This is because when lowering the support value, you are including more items into the correlated item pool.
We have confidence, which is the measure of the likeliness of buying item B when already having item A in the cart. Possible scenario could be, that when having pizza sauce in the cart, in around 29% of cases the customer will also buy pizza cheese. Meaning, confidence = 0.29 .