Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

Additional tabs

The following additional tabs are provided:

  • Documentation tab where you can document your task,

  • Parametric options tab where you can configure process variables instead of fixed values. Parametric equivalents are expressed in italics in this page (PO). 

  • Anomaly Detection and Results tabs, where you can see the output of the task computation. See Results table below.

...

Info

Scenario data can be found in the Datasets folder in your Rulex installation.

Image RemovedImage Added

The following steps were performed:

...

Procedure

Screenshot

First we import the san-test dataset, retrieving attribute names from line 1 and attribute types from line 2. Each row of the dataset represents a sequence, composed by Sequence ID, the date of occurrence, and a variable number of Event IDs.


Image RemovedImage Added

Then add a Reshape to Long task to the process to re-arrange the dataset, so that the information concerning a purchase of N items is distributed over N rows, with each row including a Order ID/Item ID pair. 

Then, we connect the Sequence Analysis task to the Reshape to Long task.

Configure the task as follows:

  • Drag and drop the Sequence ID attribute in the Sequence ID attributes list and the Wide_1 attribute in the Event ID attributes list.

  • Select the Auto option (to the right) for the Minimum event support.

  • Set the #Events to consider to 30 (if you have problems setting this number deselect and reselect the Auto option above).

  • Deselect the Auto (above average) option for Minimum sequence support (#samples) and set the value to 10.

  • Set the Maximum sequence cardinality to 2.

  • Select Date as the Time attribute (and Day as the unit of measure).

  • Set the Minimum and maximum interval between sequence elements respectively to 0 and 1

The extracted frequent sequences can be seen in the Frequent Sequences tab.

Now we connect the Anomaly Detection task to the Sequence Analysis task, and configure it as follows:

  • Drag and drop the Sequence ID attribute in the Sequence ID attributes list and the Wide_1 attribute in the Event ID attributes list.

  • Select Date as the Time attribute 

In the Compression tab of the Options panel, select Closed frequent sequences as Model compression method.

To check the results of the computation, right-click the task in the process and select Take a look.

Supplementary attributes, generated by the Anomaly Detection task, have been generated, allowing us to determine if, with respect to the considered model, the event is an anomaly.

For each anomalous event, if previous events constituting an incomplete frequent sequence involving it were detected, their IDs are printed in the Detected Event column(s) and the one which should be next is included in the Missing Event column. The timeout period after which the missing event was not detected is stored in the Timeout column. Otherwise, if the event is anomalous by itself, i.e. if it is not frequent enough to be included in the (compressed) frequent sequences model, the Detectedevent column is filled with the ID of the event itself, and both the Timeout and the Missing Event columns are left blank.

...