Applying Models to Data

The Apply Model task applies models generated by classification, regression and clustering tasks to new datasets.

The options to be configured in the task depend greatly on the model the task receives in input.

Prerequisites

a Rulex process has been created
the required datasets have been imported into the process
there is a task in the process before the Apply Model task, that has generated an applicable model.

Additional tabs

The following additional tabs are provided:

Documentation tab where you can document your task,
Parametric options tab where you can configure process variables instead of fixed values. Parametric equivalents are expressed in italics in this page (PO).

Procedure

Drag and drop the Apply Model task onto the stage.
Connect a task which contains the rules or clusters to be applied to the Apply Model task.
Double-click the Apply Model task.
Configure the options described in the Apply Model options table below.
Save and compute the task.

Name	PO	Description
Apply model general options
Available models		Select the currently available input you want to apply from the drop-down list. Possible options are: Rules (LLM tasks) Association rules (Association tasks) Models Clusters (standard Clustering task) If no model is selected, the last generated model is used by default.
Save confusion matrix	saveconfmat	If selected, the confusion matrix is saved in the execution information of the task. This information is displayed in the Results tab of the computed task. As this may result in a large amount of data, it may be preferable not to save it.
Use output to index previous clustering	combineclust	If selected, both rules and clusters will be applied, when applicable. Consequently when rules are applied the characteristics of clusters associated to the rule output are added (for example, the centroid of cluster 7 is added to rule 7).
Append results	append	If selected, the results of the current computation are appended to the dataset, otherwise they replace the results of the previous computations.
Apply Model LLM options
Chose method for testing	testmode	Select how you want to apply rules to the data: Standard test: one output value is considered at a time and all the relevances of the rules pertaining to that value and satisfied by the input pattern are summed; the relevance values obtained are then compared and the output associated with the greatest is assigned to the pattern. Modified test: it is similar to the standard test, but it also considers a relevance measure for each single condition in the rules satisfied by the input pattern. AND-OR test: rules are listed according to their relevance, then the output associated with the first rule which covers the input pattern is assigned to it.
Add output score (classification rules only)	addscore	If selected a column is added, with a continuous value between -1 and +1, which represents the precision of the classification. For example, if the class "true" is +1, a score of 0.99 means the output almost certainly belongs to the class "true".
Add verified rules for each pattern	addrules	If selected, all verified rules are displayed, instead of the most important rule only.
Add probabilities for output values (classification rules only)	addprobs	If selected a column is added, with a probability the precision of the class classification.
Add equivalent group indexes to output results	addgroups	If selected, the index of the ambiguity group is added. An ambiguity group is a group of rows with the same input value. For example: 1, 10, 35 > ambiguity group 1 1, 10, 35 > ambiguity group 1 10, 20, 40 > ambiguity group 2 .....
Use absolute weights instead of relative ones	absconf	If selected the frequency of the class within the training set is considered when calculating the weight associated with each rule.
Delete rules after execution	deleterules	If selected, rules are deleted after they are applied. This is useful when you want to apply the rules once only.
Merge results with original data	addaux	If selected, once applied the attributes and results are saved in the same structure.
Put results next to the related output attribute	nearout	If selected, the results of each attribute are displayed next to the attribute itself. This option is available only if you have selected the previous option to merge results with previous data.
Apply model association rules options
Print suggestions corresponding to items included in current order	forembaprconfirm	If selected, the apply model can also suggest an item which is already included in the order, as a confirmation. otherwise only items which are not included in the order are suggested to extend it.
Maximum number of suggestions per order	forembamaxnsugg	Enter the maximum number of suggested items for each order.
Apply model cluster options
Distance method for evaluation	evaldistmethod	Select the method required for distance, from the possible values: Euclidean, Euclidean (normalized), Manhattan, Manhattan (normalized), Pearson. For details on these methods see the Managing Attribute Properties page.
Replace output after forecast	replaceout	If selected, during the execution the Apply Model task searches for a Cluster id column and turns it into an Output. Each row of this column is then filled with the index value of the corresponding cluster.
Use distance between profiles in Label Clustering	usedistance	Applies label clustering using profiles instead of labels, as if it was were a normal clustering system.

Results

Apart from the general information on the task, such as its ID, name and elapsed time, the contents of the Results pane depends on the task input. However, the results always consist of a a set of additional columns which are added to the input data. For example, in a classification scenario the Apply Model will produce:

a prevision output class
prediction confidence
the index of the rule used to determine the output
the error association with the prevision (0 if correct, otherwise 1).

Results are provided on values such as accuracy and precision for the training and test sets, the whole dataset and the valid datasets.