Rulex Language Functions

A number of functions are provided with Rulex to be used to compute formulas in the Data Manager.

These functions are categorized according to their main use:

Arrays
Logical
Statistical
Math and trigonometry
Text
Date/Time
Graphs
System
Data

Mandatory and optional parameters and examples of use are displayed in Rulex when you click on the name of the function in the function bar.

When entering the parameters for a function, you can either:

simply respect the order of the parameters, for example in cast($"Att1", $""Att2"), the first attribute (Att1) is the column, while the second attribute (Att2) is the value for newtype.
specify which parameter the value applies to by using keywords, for example root($"att1", $"att2", whichpath = "all", separator = "-"). This is particularly useful when there are many parameters, and you don't need to provide a value for all of them.

Available functions

Finding functions

Click on the header columns in the following table to sort by category or function.

Function	Formula	Description

Function	Formula	Description
Arrays
enum	enum(group)	Enumerates the patterns inside each group for a given group of attributes
fillDown	fillDown(column, group)	Returns a copy of the column, filling all the missing values with the last valid value, according to the groups defined in the group parameter.
fillLinear	fillLinear(column, group)	Returns a copy of the column, filling all the missing values of the column with the linear interpolation, according to the groups defined in the group parameter.
fillUp	fillLinear(column, group)	Returns a copy of the column, filling all the missing values with the subsequent valid value, according to the groups defined in the group parameter.
len	len(column)	Returns the column with all values equal to the size (i.e. the total number of elements) of the column.
perm	perm(column)	Returns a random permutation of the column.
shift	shift(column, shift, group, cyclic)	Returns the attribute column shifted by the shift value. The shift can be performed according to the groups defined in the group parameter.
Logical
ifelse	ifelse(condition, iftrue, iffalse)	Returns the column with the value of iftrue if the condition is verified, or iffalse if not. If the value of the condition is missing, missing is returned.
isDate	isDate(string, binary)	Checks whether a string corresponds to a date value.
isDatetime	isDatetime(string, binary)	Checks whether a string corresponds to a datetime value.
isFloat	isFloat(string, binary)	Checks whether a string corresponds to a float value.
isInteger	isInteger(string, binary)	Checks whether a string corresponds to an integer value.
isMonth	isMonth(string, binary)	Checks whether a string corresponds to a month value.
isQuarter	isQuarter(string, binary)	Checks whether a string corresponds to a quarter value.
isTime	isTime(string, binary)	Checks whether a string corresponds to a time value.
isType	isType(string, type, binary)	Checks whether a string corresponds to a given type.
isWeek	isWeek(string, binary)	Checks whether a string corresponds to a week value.
Statistical
anovap	anovap(column, attclass, group, usemissing)	Returns the column with all values equal to the p value associated with the ANOVA t statistics relative to the column, according to the groups defined in the group parameter.
anovat	anovat(column, attclass, group, usemissing)	Returns the column with all values equal to the value of the ANOVA t statistics associated with the column, according to the groups defined in the group parameter.
argMax	argMax(column, group)	Returns the position of the maximum column as an index, evaluated within groups defined by the group parameter if required.
argMin	argMin(column, group)	Returns the position of the minimum column as an index, evaluated within groups defined by the group parameter if required.
chisquare	chisquare(column1, column2, group, usemissing)	Returns the column with the chisquare statistics computed from the contingency table of column1 and column2, according to the groups defined in the group parameter.
chisquarep	chisquarep(column1, column2, group, usemissing)	Returns the column with the p value associated with the chisquare statistics computed from the contingency table of column1 and column2, evaluated within groups defined by the group parameter if required.
cohenk	cohenk(column1, column2, group, usemissing)	Returns the Cohen K coefficient between column1 and column2, evaluated within groups defined by the group parameter if required.
count	count(group)	Returns the number of different combinations of values in the list group.
countIf	countIf(condition, group)	Returns the count of the records satisfying a given condition, according to the groups defined in the group parameter, if required.
covariance	covariance(column1, column2, group)	Evaluates the covariance between column1 and column2, according to the groups defined in the group parameter, if required.
cumMax	cumMax(column, group)	Returns the cumulative maximum of the column, evaluated within groups defined by the group parameter if required.
cumMin	cumMin(column, group)	Returns the cumulative minimum of the column, evaluated within groups defined by the group parameter if required.
distinct	distinct(column, group)	Returns the number of distinct values of the column, evaluated within groups defined by the group parameter if required.
entropy	entropy(column, group, usemissing)	Returns the entropy of the column, evaluated within groups defined by the group parameter if required.
gini	gini(column, group, usemissing)	Returns the Gini index of the column, evaluated within groups defined by the group parameter if required.
inIqr	inIqr(column, coeff)	Returns the column with a True/False value according to the interquantile range. If $"att" is in [Q1-coeff(Q3-Q1), Q3+coeff(Q3-Q1)] (where Q1 and Q3 are the first and the third quartiles, respectively, and coeff is a parameter fixed by the user), iniqr returns True, otherwise it returns False. The parameter a is set to 1.5 by default.
max	max(column, group)	Returns the maximum of the column, evaluated within groups defined by the group parameter if required.
max2	max2(column1, column2)	Returns the column with values equal to the maximum value between $"att1" and $"att2".
maxyoudencut	maxyoudencut(column, attclass, defclass, group)	Returns the value which maximizes the youden index of the ROC curve defined by column1 and by the class attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. The computation can be performed according to the groups defined in the group parameter, if required.
mean	mean(column, group)	Returns the mean of the column, evaluated within groups defined by the group parameter if required.
median	median(column, group)	Returns the median of the column, evaluated within groups defined by the group parameter if required.
min	min(column, group)	Returns the minimum of the column, evaluated within groups defined by the group parameter if required.
min2	min2(column1, column2)	Returns the column with values equal to the minimum values between $"att1" and $"att2".
mode	mode(column, group, usemissing)	Returns the mode of the column, evaluated within groups defined by the group parameter if required.
movMean	movMean(column, lag, group, front)	Returns the moving average of the column, evaluated on the lag continuous rows, computed according to groups defined by the values of group entry if required.
pearson	pearson(column1, column2, group)	Returns the Pearson coefficient between column1 and column2, evaluated within groups defined by the group parameter if required.
quantile	quantile(column, quant, group, weights)	Returns the quant quantile of the column, evaluated within groups defined by the group parameter if required. A column of weights can also be defined.
roc	roc(column, attclass, defclass, group)	Returns the area under the ROC curve defined by column1 and by the class attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. All computation can be performed according to the groups defined in the group parameter.
std	std(column, group)	Returns the standard deviation of the column, evaluated within groups defined by the group parameter if required.
variance	variance(column, group)	Returns the variance of the column, evaluated within groups defined by the group parameter if required.
Math & trigonometry
abs	abs(column)	Returns the absolute value of each row of the column.
acos	acos(column)	Returns the arccosine values of each row of the column.
acosh	acosh(column)	Returns the hyperbolic arccosine of each row of the column.
asin	asin(column)	Returns the arcsine of each row of the column.
asinh	asinh(column)	Returns the hyperbolic arcsine of each row of the column.
atan	atan(column)	Returns the arctangent of each row of the column.
atanh	atanh(column)	Returns the hyperbolic arctangent of each row of the column.
baseConv	baseConv(column, basein, baseout, compflagin, compflagout)	Converts a base 10 integer, or a string that corresponds to an integer, to a different base. Optional parameters allow the user to have a 2-complement code (if set to True) in the input and/or in the output value.
ceil	ceil(column)	Returns the smallest following values of each row of the column.
cos	cos(column)	Returns the cosine of each row of the column.
cosh	cosh(column)	Returns the hyperbolic cosine of each row of the column.
cumProd	cumProd(column, group)	Returns the cumulative product of the column, evaluated within groups defined by the group parameter if required.
cumSum	cumSum(column, group)	Returns the cumulative sum of the column, evaluated within groups defined by the group parameter if required.
exp	exp(column)	Returns the exponential of each row of the column.
floor	floor(column)	Returns the largest previous values of each row of the column.
isInteger	isInteger(string, binary)	Checks whether a string corresponds to an integer value.
log	log(column)	Returns the natural logarithm of each row of the column.
log10	log10(column)	Returns the logarithm (with respect to 10) of each row of the column.
prod	prod(column, group)	Returns the product of the column, evaluated within groups defined by the group parameter if required.
rand	rand(n, seed)	Returns a random column with the specified number of elements. If the number of elements is specified, a random column is created with n (n=number of examples) elements.
round	round(column)	Returns the nearest integer value of each row of the column.
sign	sign(column)	Returns the sign of each row of the column.
sin	sin(column)	Returns the sine of each row of the column.
sinh	sinh(column)	Returns the hyperbolic sine of each row of the column.
sqrt	sqrt(column)	Returns the square root of each row of the column.
sum	sum(column, group)	Returns the sum of the column, evaluated within groups defined by the group parameter if required.
tan	tan(column)	Returns the tangent of each row of the column.
tanh	tanh(column)	Returns the hyperbolic tangent of each row of the column.
Text
distance	distance(column1, column2, method)	Computes the distance between the values of two columns, column1, column2, according to one of the following methods: "levenshtein" ("I"), "damerau-levenshtein" ("dl"), "lcs", "hamming"
find	find(column, value, binary)	Returns in each row of the result True if the value is contained in the corresponding row of the column; otherwise else.
head	head(column, nchar)	Returns in each row of the result with the first n letters of the corresponding value contained in the column.
isPrefix	isPrefix(column, value, binary)	Returns True in the rows of the column starting with the string value; otherwise False.
isSuffix	isSuffix(column, value, binary)	Returns True in the rows of the column ending with the string value; otherwise False.
isWord	isWord(substring, delimiter, binary)	Returns a column with value 1 if str ($"att2" respectively) is word separated by a specified delimiter in the values of $"att1", otherwise 0. The default delimiter is (space).
numExt	numExt(column, onlyint, separator)	Returns a string containing only the numerical characters of the input string. If more than one number is present, numbers are delimited by a separator decided by the user (by default "-").
pad	pad(column, len, value, where)	Returns in each row of the result, the values of the column, filled (padded) with the padstring value to reach the specified length len. The string can be added at the beginning (where = "begin" or by default) or at the end (where = "end") of the string, according to the value of the flag where.
phonetic	phonetic(column, component)	Returns the phonetic encoding of the strings contained in the column using the Metaphone algorithm. Phonetic may return the primary Metaphone component (component = "Primary" or component = "P") or the secondary component (component = "Secondary" or component = "S"). By default the primary component is returned.
prefix	prefix(column, value)	Returns the part of the string before its passed string value (prefix) for each string contained in the column.
replace	replace(column, oldvalue, newvalue, ntimes)	Returns, in each row of the result, the values of the column, with the first (last) ntimes occurrences of oldvalue replaced by newvalue in each row.
strip	strip(column, value, where, ischarlist)	Returns, in each row of the result, the values of the column, in which all the characters included in a given string are removed from the beginning (where = "begin"), the end (where = "end") or from both of them (where = "both" or by default). The list of characters can be defined as a list or a substring according to the parameter ischarlist parameter.
suffix	suffix(column, value, last)	Returns the part of the string after its passed string value (suffix) for each string contained in the column.
tail	tail(column, nchar)	Returns in each row of the result with the last n letters of the corresponding value contained in the column.
textConcat	textConcat(column, separator, group)	Returns the concatenation of the strings in the column (if there is a customized separator), evaluated within groups defined by the group parameter if required. The column must be nominal.
textExtract	textExtract(column, startpos, endpos)	Returns in each row of the result the part of the corresponding string in the column between the starting startpos and the ending endpos position.
textLen	textLen(column)	Returns the length of the string contained in each row of the column.
textSort	textSort(column, ascending)	Returns a copy of the column with the strings contained in each row, sorted according to the ascending order.
Date/Time
addMonth	addMonth(date, nmonth)	Adds a given number of months to a date attribute.
addQuarter	addQuarter(date, nquarter)	Adds a given number of quarters to a date attribute.
addWorkingDays	addWorkingDays(date, nday)	Adds a given number of working days (excluding weekends) to a date attribute.
addYear	addYear(date, nyear)	Returns the column with the value of $”att” adding nyear years, if att is a date.
currDate	currDate(utc)	Returns the current date according to local or UTC settings.
currDatetime	currDatetime(utc)	Returns the current datetime according to local or UTC settings.
date	date(year, month, day)	Returns a column with all values equal to the date consisting of given year, month and day.
datetime	datetime(date, time)	Returns in each row of the result the datetime value obtained by the composition of the date value contained in the date entry and the time value contained in the time entry.
day	day(date)	Returns the day value of date.
hour	hour(time)	Returns the hour value of time.
isDate	isDate(string, binary)	Checks whether a string corresponds to a date value.
isDatetime	isDatetime(string, binary)	Checks whether a string corresponds to a datetime value.
isMonth	isMonth(string, binary)	Checks whether a string corresponds to a month value.
isQuarter	isQuarter(string, binary)	Checks whether a string corresponds to a quarter value.
isTime	isTime(string, binary)	Checks whether a string corresponds to a time value
isWeek	isWeek(string, binary)	Checks whether a string corresponds to a week value
minute	minute(time)	Returns the minute value of time.
month	month(date)	Returns the month value of date.
second	second(time)	Returns the second value of date.
time	time(hour, minute, second)	Composes a time starting from hours, minutes and seconds.
timeZone	timeZone()	Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time.
week	week(date)	Returns the week value of date.
weekDay	weekDay(date, mondaystart)	Returns the day of the week as an integer for each value of date. If mondaystart is True Monday is 1 and Sunday is 7; otherwise Sunday is 1 and Saturday is 7.
year	year(date)	Returns the year value of date.
Graphs
connComp	connComp(parent, son, group)	This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the index of the connected component associated with the node contained in the son attribute, according to the groups defined in the group parameter, if required.
leaf	leaf(parent, son, group, whichpath, separator, weights, operator)	This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the leaf of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered: the shortest one (whichpath = "minimum"), the longest one (whichpath = "maximum") or all the paths (whichpath = "all", in this case the leafs are concatenated in a single string with a separator settable by the user. The default separator is "-".) It is also possible to introduce weights into the computation and control the way the result must be shown.
leafDistance	leafDistance(parent, son, group, whichpath, separator, weights, operator)	This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the inverse level (distance from the leaf) of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered: the shortest one (whichpath = "minimum"), the longest one (whichpath = "maximum") or all the paths (whichpath = "all", in this case the distances are concatenated in a single string with a separator settable by the user. The default separator is "-"). It is also possible to introduce weights into the computation and control the way the result must be shown.
root	root(parent, son, group, whichpath, separator, weights, operator)	This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the root of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered: the shortest one (whichpath = "minimum"), the longest one (whichpath = "maximum") or all the paths (whichpath = "all", in this case the distances are concatenated in a single string with a separator settable by the user. The default separator is "-"). It is also possible to introduce weights into the computation and control the way the result must be shown.
rootDistance	rootDistance(parent, son, group, whichpath, separator, weights, operator)	This function operates on a directed graph, whose relationship is defined by the parent and son parameters. It returns in each row of the result the level (distance from the root) of the node contained in the son attribute, according to the groups defined in the group parameter, if required. The whichpath parameter allows the user to choose which path is to be considered: the shortest one (whichpath = "minimum"), the longest one (whichpath = "maximum") or all the paths (whichpath = "all", in this case the distances are concatenated in a single string with a separator settable by the user. The default separator is "-"). It is also possible to introduce weights into the computation and control the way the result must be shown.
System
currDate	currDate(utc)	Returns the current date according to local or UTC settings.
currDatetime	currDatetime(utc)	Returns the current datetime according to local or UTC settings.
hostName	hostName()	Returns the hostname of the machine where Rulex is running.
ipAddress	ipAddress()	Returns the IP address of the machine where Rulex is running.
timeZone	timeZone()	Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time.
Data
cast	cast(column, newtype, forced)	Casts a column to the specified newtype. If the flag forces is set to true (false by default) the operation is performed even if this would cause a loss in information.
catNames	catNames(indatt, values, separator, negate)	Concatenates the name of the list of attributes passed in the indatt parameter, according to the condition specified in the values entry. The separator parameter can be used to introduce a separator in the concatenation.
decideType	decideType(column)	Automatically converts a column to the correct type, based on the data.
disc	disc(column, cutoffs, rank)	Discretizes an attribute according the provided cutoffs vector. If the flag rank is True (false by default), the rank (i.e. an integer 1,2,3,...) is returned.
discEqualFrequencies	discef(column, nvalue, rank, quantile)	Discretizes an attribute according to an equal frequency criterion.
discEqualWidth	discew(column, nvalue, rank, min, max)	Discretizes an attribute according to an equal width criterion.
discretize	discretize(column, nvalue, cutoffs, mode, rank, quantile, min, max)	Discretizes an attribute using the set of provided cut-offs, or using an equal width or equal frequency criterion.
isAttribute	isAttribute(name, binary)	Checks whether an attribute with a given name is present in the dataset.
isFloat	isFloat(string, binary)	Checks whether a string corresponds to a float value.
isType	isType(string, type, binary)	Checks whether a string corresponds to a given type.
type	type(column)	Returns the type of a column as a string.