Splitting Data into Training and Test Sets

Before building a predictive model it is recommended that you split the dataset into subsets.

Splitting the dataset allows us to use a dataset to create our predictive models and then immediately test the validity of these models on different data.

The following datasets can be created:

the training set, used to identify patterns in the data and build the model,
the test set, used to assess the accuracy of the model and
the optional validation set, which can be used for tuning the model parameters.

Splitting methods

There are two distinct tasks for splitting datasets in Rulex:

Task name	Description	Corresponding page

Task name	Description	Corresponding page
Split Data	Splits the dataset randomly or sequentially.	Splitting Data with the Split Data Task
Data Manager	Splits datasets according to specified criteria.	Splitting Data with the Data Manager