Correlated Items

Technical documentation about how to setup and configure Correlated Items.

Sale transactions data mapping - TransSalesEntry

The input into the correlated item algorithm is sale transaction data (Point-of-sale data), i.e. what each customer puts in their cart and then purchases.
So the first thing we need to do is to data-map sale transaction and transform them to the right format. The process goes like this:

  1. The transaction data is found in NAV in table [nav].[TransSalesEntry].
  2. The data is mapped through view [raw].[v_SALE_TRANSACTION] to table [raw].[SALE_TRANSACTION].
  3. Then the data is transformed to the right format using procedure [ml].[sale_transaction_stg_to_ready]. The desired format is so called basket format where we have item_no or item_id of all items that were bought together in the same transaction.
  4. The basket format data is kept in table [ml].[sale_transaction]

Configure [ml].[sale_transaction]

For sale transactions there are two settings:

setting_key Default setting_value Description
ml_days_back_sale_transaction_data 35 Number of days to retrieve new sale transaction data
ml_days_back_keep_sale_trans_baskets 366 Number of days to keep sale transactions in [ml].[sale_transaction] table - older ones will be deleted

(Remember that the standard method is to run the data mapping for correlated items once every month.)
The default setting is to keep sale transaction for 1 year (366 days) back in time. Then, as the job is ran once every month, each month we retrieve sale transaction from the last 35 days and delete transaction older than 1 year (see image):

sale_transaction_time_periods


First time running sale transactions

The first time setting up correlated item, we want to retrieve sale transactions over the whole period we are interested in finding correlated items.
It depends on each customer how long period this period should be. The standard setup is to gather in the beginning sale transaction over 1 year.

So when first setting up, we need to change the setting that controls how long back in time we are gathering transactions.
Change ml_days_back_sale_transaction_data, from 35 to 366.

Next, execute the stg_group for sale transaction, use this code snippet:

EXEC [core].[execute_stg_run] @group_list='ml_sale_transaction'

Note, this run can take quite some time to execute the first time, it depends on the size of [raw].[SALE_TRANSACTION] table.

Now we have sale transaction data on preferred format ready in table [ml].[sale_transaction] for 1 year.

So there is no need to always get the sale transaction data 1 year back in time, so we change the setting back to 35 days. (Change back the setting ml_days_back_sale_transaction_data, from 366 to 35).
Now when the monthly job will run, we get new sale transaction data from the last 35 days(ml_days_back_sale_transaction_data), but delete all data older than 366 days(ml_days_back_keep_sale_trans_baskets)

Note that the choice of the periods depends on each customer. If there are few transaction data, you can retrieve data further back in time.
And if we have a lot of transaction sale data, it could be wise to use sale transactions from a shorter period, as the POS data can be very heavy.


Creating correlated items

The process for finding correlated items from sale transaction data is ran with stg_group ‘ml_correlated_item’. But broken down into steps, it goes like this:

The process for correlated item goes like this:

  1. Run the correlated algorithm using procedure [ml].[correlated_item_run].
    • This procedure runs this one [ml].[correlated_item_create], which contains the python code for correlated item calculcations. This procedure should not be changed at any time..
  2. Generated correlated items are inserted into table [ml].[correlated_item_stg]
  3. Then, correlated items are mapped to prod into table [ml].[correlated_item]

There are four settings used in the correlated algorithm:

setting_key Default setting_value Description
ml_correlated_item_confidence 0.1 Confidence tells how often the rule has been found to be true (see description below)
ml_correlated_item_level item_sku Possible values are ‘item_no’ or ‘item_sku’ - tells what item level should be used
ml_correlated_item_maximum_items 20 Maximum number of calculated correlated items that are inserted into the ml.correlated_item_stg table
ml_correlated_item_support 0.01 Support tells how frequently the parent item appears in the database (see description below)

Run correlated items for the first time

When you have altered the setting values to your customer needs, it is now time to run the correlated item process.
Run this code snippet:

EXEC [core].[execute_stg_run] @group_list='ml_correlated_item'`

This run can take some time to execute. For Krónan a supermarket chain in Iceland it took on around 30 minutes.
But the run time depends on the setting values and the amount of sale transactions.


Filter Correlated item before going to prod

It is possible to filter the correlated items results in the process when mapping to prod.
Use the stg_element ml_correlated_item_stg_to_prod and change data_merge_filter field to alter the results that are shown on prod. Examples could be:

  • You like to only include results with higher lift than the standard minimum lift of 1. Then add e.g. ’lift>2.5’
  • Exclude all results containing a certain item. E.g. exclude all results with banana, add e.g. item_id<>‘12345’

** add screenshot as an example **


Correlated item measures

If you want to learn more about how correlated items works take a look at this article.

Lift

If the lift value is higher than 1, it means that the occurrences of two items are dependent on one another. If lift is lower than 1, it means that the presence of one item has a negative effect on the presence of the other item. So, we are only interested when the lift is higher than 1. And the higher the lift gets, the greater are the chances of preference to buy item B when already having item A in the cart. Meaning, the higher the lift the more relevant the correlated items are. Sometimes, lift is referred to as the measure that can help store managers to decide product placements in stores.

Support

Support tells how frequently the parent item appears in the database.
Value of support helps us identify the rules worth considering for further analysis. For example, one might want to consider only items that occur at least 50 times out of 10,000 transactions i.e. support = 0.005. So we need to consider how often we like the item to be out of total transaction. E.g. if we have 100.000 transactions over 1 year, and we would like to only consider items that were bought on average at least twice every day, we get support=(2*730)/100.000 = 0.0073. Keep in mind that as the support level gets lower, the algorithm will yield more correlated item results and it will also take longer time for the run to execute.
This is because when lowering the support value, you are including more items into the correlated item pool.

Confidence

We have confidence, which is the measure of the likeliness of buying item B when already having item A in the cart. Possible scenario could be, that when having pizza sauce in the cart, in around 29% of cases the customer will also buy pizza cheese. Meaning, confidence = 0.29 .