Using Similar Items Detector to Solve Association Problems

Rulex generates description-based and sales-based replacement rules with the Similar Items Detector task.

This task uses description-based matching, which can be used with newly introduced items and helps solve cold start problems.

Prerequisites

a Rulex process has been created
the required datasets have been imported into the process
the data used for the model has been well prepared
a Frequent Itemsets Mining task must be present in the process and provide input data for the Similar Items Detector.

Additional tabs

The following additional tabs are provided:

Documentation tab where you can document your task,
Parametric options tab where you can configure process variables instead of fixed values. Parametric equivalents are expressed in italics in this page.
Replacement rules & Results tabs, where you can see the output of the task computation. See Results table below.

Procedure

Drag and drop the Similar Items Detector task onto the stage.
Connect a task that contains frequent itemsets to the new task.
Double click the Similar Items Detector task. The left-hand pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required.
To generate description-based replacement rules, click on the Text based matching tab and configure the options as described in the table below.
To generate sales based-replacement rules, click on the Sales based matching tab and configure the options as described in the table below.
Save and compute the task.

Name	Parametric options	Description
Similar Items Detector options
Text based matching options
Category attribute	popcatname	Select the attribute that represents the category from the drop-down list. This can be used to match only descriptions that belong to the same category.
Description attribute	popdescname	Select the attribute that represents the description from the drop-down list, which will be used for text matching.
Word separator	popwordsep	Select how words are separated from one of the following possibilities: Space Tab Newline
Minimum word length	popminwordlen	Words that are shorter than the value entered here will not be used for text matching. This helps to eliminate words such as the, a, one, at etc.
Minimum unadjusted similarity cosine	popsimcosth	The minimum similarity of pure text matching, without considering Preferential requirements attributes. Entering 1 means the text must be identical, 0 corresponds to no match required.
Case sensitive matching	popcasesens	If selected, the upper or lower case will be taken into consideration when matching text.
Item key attributes	popdescname	Drag and drop the nominal attributes that uniquely identify the item from the Attributes list. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.
Preferential requirements attributes	mbaitemchildnames	Drag and drop the attributes which will influence the similarity score when they match. When they match, a weight is added to the similarity score. This weight is defined in the Preferential requirements weights. These attributes could, for example, define brand, packaging or size. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.
Ignored char list	popignoredchars	Select the characters you want to eliminate from text matching.
Preferential requirements weights		The weight awarded to matching Preferential requirements attributes.
Sales based matching options
Takes also sales data into account	popusetransactions	Select this option to include sales data in the task execution.
Minimum alternativeness coefficient	popminalternativeness	The degree of alternativeness between the purchase of two items: 1 (max) if they are never sold together 0 (min) if if one item is always sold with the other one. If a pair of items ensures the Minimum alternativeness coefficient, the corresponding replacement rule is discarded.
Minimum volume replacement score	poprepcoeffth	The minimum percentage of orders in which a replaced item is expected to be replaceable by the replacing item. If this minimum threshold is not satisfied by a replacement rule, it is discarded.

Results

The results of the task are displayed in two separate tabs:

The Replacement rules tab displays the generated item sets, where:
- Rule Replacement ID: the sequential ID number for replacement rules.
- Category:
- Replaced item ID: IDs of replaced items
- Replacing item ID: IDs of replacing items
- Similarity score:

The Results tab displays details on the execution of the analysis, where:
- Task Identifier: the ID code for the task, internally used by the Rulex engine.
- Task Name: simply the name of the task.
- Elapsed time (sec): the time required for latest computation (in seconds).
- Number of generated replacement rules: the number of replacements rules which were generated by the task.