...
...
Info |
---|
View in Help Center (registration required) |
...
Rulex extracts frequent sequences from event logs with the Sequence Analysis task.
Prerequisites
a dataset containing an event log has been imported into the process
the data used for the model has been well prepared
Info | |
---|---|
title | Additional tabs The following additional tabs are provided:
|
Procedure
Drag and drop the Sequence Analysis
tasktask onto the stage.
Connect a task which contains event log data to the task Sequence Analysis task.
Double click the Sequence Analysis task. The left-side pane displays a list of all the available attributes in the dataset, which can be ordered and searched as required.
Configure the basic and advanced options as described in the table below.
Save and compute the task.
Sequence Analysis Basic options | ||
---|---|---|
Parameter Name | PO | Description |
Minimum event support (#samples) | supth | All events which appear in orders fewer times than this threshold are discarded. This value is relevant only if the Auto (specify #events) option is not selected. |
Auto (specify #events) | mbaspecnitem | If this option is selected, the minimum support for events is automatically computed: the user shall specify the number of events to take into account (most frequent first). |
#Events to consider | mbanitemsup | Number of events to take into account (most frequent first). This value is relevant only if the Auto (specify #events) option is selected. |
Minimum sequence support (#samples) | assupth | All sequences which are verified fewer times than this threshold are discarded. This value is relevant only if the Auto (above average) option is not selected. |
Auto (above average) | abavassupth | If this option is selected, the minimum sequence support is set to the average support of sequences with the same dimension (i.e. constituted by the same number of events). |
Maximum sequence cardinality | fitmaxdim | Maximum cardinality of generated sequences. |
No maximum sequence cardinality | fitnomaxdim | If this option is selected, all sequences with higher support than the specified threshold are generated, regardless of their cardinality. |
Time attribute | seqname, timeunit | Attribute including the timestamp for each of the events. The reference time unit can also be specified via the drop-down menu. |
Minimum and maximum interval between sequence elements | sanminintseq, sanmaxintseq | Consecutive events in sequences are bound to these minimum and maximum thresholds of temporal distance. |
Allow repetitions (the same event can occur more than one time in a sequence) | sanallrep | If this option is selected, repetitions of the same event in a single sequence are allowed. |
Only print cyclic sequences (start event and end event have the same ID) | sanonlycic | If this option is selected, the output is constituted only by the sequences in which the first event is characterized by the same ID as the last one. |
Sequence ID attributes (NOMINAL) | mbaorderkeynames | Drag and drop here the nominal attributes which identify the sequences. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Event ID attributes (NOMINAL) | mbaitemkeynames | Drag and drop here the nominal attributes which characterize the events. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Sequence Analysis Advanced options | ||
Attribute to filter to select relevant data | Drag and drop here the attribute you want to use as a filter to select relevant data, from the Available attributes or Proximity attributes lists and configure the filter in the attribute filter dialog box. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. | |
Attribute to filter to discard irrelevant data | Drag and drop here the attribute you want to use as a filter to discard irrelevant data, from the Available attributes or Proximity attributes lists and configure the filter in the attribute filter dialog box. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. | |
Proximity attributes | mbaitemchildnames | Drag and drop here the ordered item attributes which characterize the proximity among events together with time onto the Proximity attributes list (mbaitemchildnames), and then set the corresponding thresholds in the Minimum-maximum proximity thresholds edit box. For example, if you need to mine frequent sub-sequences of events which occur in locations close to each other, spatial coordinates shall be dragged in this list. Instead of manually dragging and dropping attributes, they can be defined via a filtered list. |
Minimum-maximum proximity thresholds | Set the minimum and maximum proximity thresholds for the corresponding attribute in the Proximity attributes edit box. |
Info |
---|
...
Results The results of the Sequence Analysis task can be viewed in two separate tabs (the respective columns are described in the table below):
|
...
|
Example
In the example process, frequent sequences are extracted from a dataset with the Sequence Analysis task.
Info |
---|
Scenario data can be found in the Datasets folder in your Rulex installation. |
...
The following steps were performed:
First we import the dataset.
The dataset is rearranged in the Reshape To Long task.
Frequent sequences are extracted with the Sequence Analysis task.
The relative results are viewed via the Take a look functionality.
Procedure | Screenshot |
---|---|
First we import the san-test dataset, retrieving attribute names from line 1 and attribute types from line 2. Each row of the dataset represents a sequence, composed by Sequence ID, the date of occurrence, and a variable number of Event IDs. |
Then add a Reshape to Long task to the process to |
re-arrange the dataset, so that the information concerning a purchase of N items is distributed over N rows, with each row including a Order ID/Item ID pair. |
Then, we connect the Sequence Analysis task to the Reshape to Long task. Configure the task as follows:
|
|
|
|
The extracted frequent sequences can be seen in the Frequent Sequences tab. |