This function creates a grid of combinations of pairs of columns specified
by xvars
and yvars
(see also var_grid()
). After that, it enumerates all
conditions created from data in x
(by calling dig()
) and for each such
condition and for each row of the grid of combinations, a user-defined
function f
is executed on each sub-data created from x
by selecting all
rows of x
that satisfy the generated condition and by selecting the
columns in the grid's row.
Usage
dig_grid(
x,
f,
condition = where(is.logical),
xvars = where(is.numeric),
yvars = where(is.numeric),
na_rm = FALSE,
type = "bool",
min_length = 0L,
max_length = Inf,
min_support = 0,
threads = 1,
...
)
Arguments
- x
a matrix or data frame with data to search in.
- f
the callback function to be executed for each generated condition. The arguments of the callback function differ based on the value of the
type
argument (see below). Iftype = "bool"
, the callback functionf
must accept a single argumentd
of typedata.frame
with two columns (xvar and yvar). It is a subset of the original data frame with all rows that satisfy the generated condition. Iftype = "fuzzy"
, the callback functionf
must accept an argumentd
of typedata.frame
with two columns (xvar and yvar) and a numericweights
argument with the same length as the number of rows ind
. Theweights
argument contains the truth degree of the generated condition for each row ofd
. The truth degree is a number in the interval \([0, 1]\) that represents the degree of satisfaction of the condition for the row. In all cases, the function must return a list of scalar values, which will be converted into a single row of result of final tibble.- condition
a tidyselect expression (see tidyselect syntax) specifying the columns to use as condition predicates. The selected columns must be logical or numeric. If numeric, fuzzy conditions are considered.
- xvars
a tidyselect expression (see tidyselect syntax) specifying the columns of
x
, whose names will be used as a domain for combinations use at the first place (xvar)- yvars
a tidyselect expression (see tidyselect syntax) specifying the columns of
x
, whose names will be used as a domain for combinations use at the second place (yvar)- na_rm
a logical value indicating whether to remove rows with missing values from sub-data before the callback function
f
is called- type
a character string specifying the type of conditions to be processed. The
"bool"
type accepts only logical columns as condition predicates. The"fuzzy"
type accepts both logical and numeric columns as condition predicates where numeric data are in the interval \([0, 1]\). The callback functionf
differs based on the value of thetype
argument (see the description off
above).- min_length
the minimum size (the minimum number of predicates) of the condition to be generated (must be greater or equal to 0). If 0, the empty condition is generated in the first place.
- max_length
the maximum size (the maximum number of predicates) of the condition to be generated. If equal to Inf, the maximum length of conditions is limited only by the number of available predicates.
- min_support
the minimum support of a condition to trigger the callback function for it. The support of the condition is the relative frequency of the condition in the dataset
x
. For logical data, it equals to the relative frequency of rows such that all condition predicates are TRUE on it. For numerical (double) input, the support is computed as the mean (over all rows) of multiplications of predicate values.- threads
the number of threads to use for parallel computation.
- ...
Further arguments, currently unused.
See also
dig()
, var_grid()
, and dig_correlations()
, as it is using this
function internally