Splitting Data with the Data Manager

When you want more control on how the dataset is divided, you can split the dataset with a Data Manager task.

In this way you can specify criteria with which the dataset is split.

Prerequisites

Data Manager

For more information on what you can do with the Data Manager task, see Overview of Data Exploration in the Data Manager

 

Procedure

  1. Add a new Data Manager task to the process.

  2. Drag and drop the attributes you want to filter by to create the dataset division onto the Filter column in the Query Manager.

  3. Configure the filters to create the required view.

  4. Right-click on any cell in the data sheet and select Assign view to > Test/Training/Validation set, accordingly (by default, patterns are all in the training set).

  5. Remove the filter by selecting the filter cell in the Query Manager and pressing DELETE.

  6. Save and compute the task.


Example

The following examples are based on the Adult dataset.

 

Scenario data can be found in the Datasets folder in your Rulex installation.

 

 

The following steps were performed:

  1. We import the adult dataset.

  2. The we add a Data Manager task to visualize the initial data and to create the training and test sets.

Procedure

Screenshot

Procedure

Screenshot

After importing the adult.set dataset via an Import from Text File task, add a Data Manager task to the stage to display its contents.

As we can see the original dataset contains 32561 patterns.

We want to divide the data from the source as follows:

  • Training set contains values for the hours-per-week attribute lower than 50

  • Test set contains the remaining values

We do not need a validation set.

So we need to drag and drop the hours-per-week attribute onto the Filter column in the Query Manager and configure the filter as shown

We now click on a cell in the filtered dataset and select Assign view to > Test set.

Then select the cell in the Filter column and delete it.

Save and compute the task.

The dataset is now divided into a test and training set, as can be checked from the corresponding drop down list in the top right-hand corner.



Need to get in touch? https://www.rulex.ai/contact/ - Need a license renewal? https://rulex.atlassian.net/servicedesk/customer/user/login?destination=portals
© 2024 Rulex, Inc. All rights reserved.