Technical documentation about how to setup and configure Similar Items.
The following tables are used for similar items:
[ml].[similar_item]
[ml].[similar_item_setting]
The following procedures are used for similar items:
[ml].[similar_item_run]
[ml].[similar_item_create]
To compute the similarity between items, several settings need to be configured.
Those settings are inserted into the [ml].[similar_item_setting]
table;
id
(int - required)
input_item_selection
(nvarchar - not required)
input_features
(nvarchar - required)
item_id,name,description‚product_group_name_level_1,product_group_name_level_2
.language_stop_words
(nvarchar - required)
no_of_similar_items
(int - required)
no_of_similar_items = 5
, there will be generated maximum 5 similar items for each item. If the algorithm finds more than 5 similar items it will only yield those 5 items with the highest similarity score. (So this setting is used to limit the number of similar items. If you like to include all similar items you can place a high number in this field.)min_similarity_score
(decimal - required)
In most cases, we only have one line in this setting table, but it is possible to insert more lines.
Let’s say you like to create similar items based on different features or similarity score, you can simply add a new line with your preferred settings and then change in core.stg_element
what @control_table_id should be used. Then the similar items results will generate similar items based on the features and settings provided.
In [core].[stg_element]
we specify which setting from [ml].[similar_item_setting]
should be used when generating the similar items.
So, for stg_element with name = ml_run_similar_item in the parameters field we want to have @control_table_id=1
, where 1 is the id of the setting line from [ml].[similar_item_setting]
.
The parameters field looks should like this: @batch_id = *BATCH_ID*, @origin_id = *ORIGIN_ID*, @parent_id = *PARENT_ID*, @control_table_id = 1
If you like to learn more about the similarity algorithm you can take a look at this article.
[ml].[similar_item_create]
[procedure] - contains the python algorithm that finds the similar items.
[ml].[similar_item_run]
[procedure] - takes into account the settings from [ml].[similar_item_setting]
and runs the create procedure.The similarity algorithm works on item_id level, where it compares all items based on the features provided.
So all items are compared within each location (it is not possible to exclude some locations).