
Convert columns of a data frame to Boolean or fuzzy sets (triangular, trapezoidal, or raised-cosine)
Source:R/partition.R
partition.Rd
Transform selected columns of a data frame into either dummy logical variables or membership degrees of fuzzy sets, while leaving all remaining columns unchanged. Each transformed column typically produces multiple new columns in the output.
Usage
partition(
.data,
.what = everything(),
...,
.breaks = NULL,
.labels = NULL,
.na = TRUE,
.keep = FALSE,
.method = "crisp",
.right = TRUE,
.span = 1,
.inc = 1
)
Arguments
- .data
A data frame to be processed.
- .what
A tidyselect expression (see tidyselect syntax) selecting the columns to transform.
- ...
Additional tidyselect expressions selecting more columns.
- .breaks
Ignored if
.method = "dummy"
. For other methods, either an integer scalar (number of intervals/sets) or a numeric vector of breakpoints.- .labels
Optional character vector with labels used for new column names. If
NULL
, labels are generated automatically.- .na
If
TRUE
, an extra logical column is created for each source column that containsNA
values (e.g.x=NA
).- .keep
If
TRUE
, keep the original columns in the output.- .method
Transformation method for numeric columns:
"dummy"
,"crisp"
,"triangle"
, or"raisedcos"
.- .right
For
"crisp"
, whether intervals are right-closed and left-open (TRUE
), or left-closed and right-open (FALSE
).- .span
Number of consecutive breaks forming a set. For
"crisp"
, controls interval width. For"triangle"
/"raisedcos"
,.span = 1
produces triangular sets,.span = 2
trapezoidal sets.- .inc
Step size for shifting breaks when generating successive sets. With
.inc = 1
, all possible sets are created; larger values skip sets.
Details
These transformations are most often used as a preprocessing step before
calling dig()
or one of its derivatives, such as
dig_correlations()
, dig_paired_baseline_contrasts()
,
or dig_associations()
.
The transformation depends on the column type:
logical column
x
is expanded into two logical columns:x=TRUE
andx=FALSE
;factor column
x
with levelsl1
,l2
,l3
becomes three logical columns:x=l1
,x=l2
, andx=l3
;numeric column
x
is transformed according to.method
:.method = "dummy"
: the column is treated as a factor with one level for each unique value, then expanded to dummy columns. This produces one logical column per unique value;.method = "crisp"
: the column is discretized into intervals (defined by.breaks
) and then expanded to dummy columns representing these intervals;.method = "triangle"
or.method = "raisedcos"
: the column is converted into one or more fuzzy sets. Each new column contains values in \([0,1]\) representing degrees of membership to the fuzzy set (triangular or raised-cosine shaped).
Details of numeric transformations are controlled by .breaks
, .labels
,
.right
, .span
, and .inc
.
Crisp partitioning is recommended for efficiency and works best when sharp category boundaries are meaningful for the analysis.
Fuzzy partitioning is useful when attributes change gradually or when uncertainty should be modeled explicitly. It allows smooth transitions between categories and may yield more interpretable patterns, but is more computationally demanding.
Crisp transformation of numeric data
For .method = "crisp"
, numeric columns are converted into sets of dummy
logical variables, each representing one interval of values defined by
.breaks
.
If
.breaks
is an integer, it specifies the number of equal-width intervals into which the column range is divided. The first and last intervals extend to infinity.If
.breaks
is a numeric vector, it specifies interval boundaries directly. Infinite values are allowed.
With .span = 1
and .inc = 1
, the intervals are consecutive and
non-overlapping. For example, with
.breaks = c(1, 3, 5, 7, 9, 11)
and .right = TRUE
,
the intervals are \((1;3]\), \((3;5]\), \((5;7]\), \((7;9]\),
and \((9;11]\). If .right = FALSE
, the intervals are left-closed:
\([1;3)\), \([3;5)\), etc.
Larger .span
values make intervals overlap. For example, with
.span = 2
, .inc = 1
, and .right = TRUE
, the intervals are
\((1;5]\), \((3;7]\), \((5;9]\), and \((7;11]\).
The .inc
argument modifies how far the window shifts along .breaks
.
For example:
.span = 1
,.inc = 2
→ \((1;3]\), \((5;7]\), \((9;11]\)..span = 2
,.inc = 3
→ \((1;5]\), \((9;11]\).
Fuzzy transformation of numeric data
For .method = "triangle"
or .method = "raisedcos"
, numeric columns are
converted into fuzzy membership degrees \([0,1]\).
If
.breaks
is an integer, it specifies the number of fuzzy sets to generate (breakpoints are chosen automatically).If
.breaks
is a numeric vector, it directly defines the fuzzy set boundaries. Infinite values are allowed, which produces fuzzy sets with open ends.
With .span = 1
, each fuzzy set is defined by three consecutive breaks:
membership is 0 outside the outer breaks, increases to 1 at the middle
break, and then decreases back to 0. This yields triangular or raised-cosine
sets.
With .span > 1
, fuzzy sets are defined by four breaks: the degree
increases between the first two, stays 1 between the middle two, and
decreases between the last two. This produces trapezoidal fuzzy sets, with
linear borders if .method = "triangle"
, or cosine-shaped borders if
.method = "raisedcos"
.
As with crisp sets, .inc
determines how far the break window shifts when
creating the next fuzzy set. For example:
.span = 1
,.inc = 1
→ \((1;3;5)\), \((3;5;7)\), \((5;7;9)\), \((7;9;11)\)..span = 2
,.inc = 1
→ \((1;3;5;7)\), \((3;5;7;9)\), \((5;7;9;11)\)..span = 1
,.inc = 3
→ \((1;3;5)\), \((7;9;11)\).
See the examples for further details.
Examples
# Transform logical columns and factors
d <- data.frame(a = c(TRUE, TRUE, FALSE),
b = factor(c("A", "B", "A")),
c = c(1, 2, 3))
partition(d, a, b, c, .method = "dummy")
#> # A tibble: 3 × 7
#> `a=T` `a=F` `b=A` `b=B` `c=1` `c=2` `c=3`
#> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 TRUE FALSE TRUE FALSE TRUE FALSE FALSE
#> 2 TRUE FALSE FALSE TRUE FALSE TRUE FALSE
#> 3 FALSE TRUE TRUE FALSE FALSE FALSE TRUE
# Crisp transformation of numeric data
partition(CO2, conc:uptake, .method = "crisp", .breaks = 3)
#> # A tibble: 84 × 9
#> Plant Type Treatment `conc=(-Inf;397]` `conc=(397;698]` `conc=(698;Inf]`
#> <ord> <fct> <fct> <lgl> <lgl> <lgl>
#> 1 Qn1 Quebec nonchilled TRUE FALSE FALSE
#> 2 Qn1 Quebec nonchilled TRUE FALSE FALSE
#> 3 Qn1 Quebec nonchilled TRUE FALSE FALSE
#> 4 Qn1 Quebec nonchilled TRUE FALSE FALSE
#> 5 Qn1 Quebec nonchilled FALSE TRUE FALSE
#> 6 Qn1 Quebec nonchilled FALSE TRUE FALSE
#> 7 Qn1 Quebec nonchilled FALSE FALSE TRUE
#> 8 Qn2 Quebec nonchilled TRUE FALSE FALSE
#> 9 Qn2 Quebec nonchilled TRUE FALSE FALSE
#> 10 Qn2 Quebec nonchilled TRUE FALSE FALSE
#> # ℹ 74 more rows
#> # ℹ 3 more variables: `uptake=(-Inf;20.3]` <lgl>, `uptake=(20.3;32.9]` <lgl>,
#> # `uptake=(32.9;Inf]` <lgl>
# Triangular fuzzy sets
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3)
#> # A tibble: 84 × 9
#> Plant Type Treatment `conc=(-Inf;95;548)` `conc=(95;548;1000)`
#> <ord> <fct> <fct> <dbl> <dbl>
#> 1 Qn1 Quebec nonchilled 1 0
#> 2 Qn1 Quebec nonchilled 0.823 0.177
#> 3 Qn1 Quebec nonchilled 0.658 0.342
#> 4 Qn1 Quebec nonchilled 0.437 0.563
#> 5 Qn1 Quebec nonchilled 0.106 0.894
#> 6 Qn1 Quebec nonchilled 0 0.719
#> 7 Qn1 Quebec nonchilled 0 0
#> 8 Qn2 Quebec nonchilled 1 0
#> 9 Qn2 Quebec nonchilled 0.823 0.177
#> 10 Qn2 Quebec nonchilled 0.658 0.342
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(548;1000;Inf)` <dbl>,
#> # `uptake=(-Inf;7.7;26.6)` <dbl>, `uptake=(7.7;26.6;45.5)` <dbl>,
#> # `uptake=(26.6;45.5;Inf)` <dbl>
# Raised-cosine fuzzy sets
partition(CO2, conc:uptake, .method = "raisedcos", .breaks = 3)
#> # A tibble: 84 × 9
#> Plant Type Treatment `conc=(-Inf;95;548)` `conc=(95;548;1000)`
#> <ord> <fct> <fct> <dbl> <dbl>
#> 1 Qn1 Quebec nonchilled 1 0
#> 2 Qn1 Quebec nonchilled 0.925 0.0750
#> 3 Qn1 Quebec nonchilled 0.738 0.262
#> 4 Qn1 Quebec nonchilled 0.402 0.598
#> 5 Qn1 Quebec nonchilled 0.0274 0.973
#> 6 Qn1 Quebec nonchilled 0 0.818
#> 7 Qn1 Quebec nonchilled 0 0
#> 8 Qn2 Quebec nonchilled 1 0
#> 9 Qn2 Quebec nonchilled 0.925 0.0750
#> 10 Qn2 Quebec nonchilled 0.738 0.262
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(548;1000;Inf)` <dbl>,
#> # `uptake=(-Inf;7.7;26.6)` <dbl>, `uptake=(7.7;26.6;45.5)` <dbl>,
#> # `uptake=(26.6;45.5;Inf)` <dbl>
# Trapezoidal fuzzy sets, overlapping to satisfy the Ruspini condition
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3,
.span = 2, .inc = 2)
#> # A tibble: 84 × 9
#> Plant Type Treatment `conc=(-Inf;95;276;457)` `conc=(276;457;638;819)`
#> <ord> <fct> <fct> <dbl> <dbl>
#> 1 Qn1 Quebec nonchilled 1 0
#> 2 Qn1 Quebec nonchilled 1 0
#> 3 Qn1 Quebec nonchilled 1 0
#> 4 Qn1 Quebec nonchilled 0.591 0.409
#> 5 Qn1 Quebec nonchilled 0 1
#> 6 Qn1 Quebec nonchilled 0 0.796
#> 7 Qn1 Quebec nonchilled 0 0
#> 8 Qn2 Quebec nonchilled 1 0
#> 9 Qn2 Quebec nonchilled 1 0
#> 10 Qn2 Quebec nonchilled 1 0
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(638;819;1000;Inf)` <dbl>,
#> # `uptake=(-Inf;7.7;15.3;22.8)` <dbl>, `uptake=(15.3;22.8;30.4;37.9)` <dbl>,
#> # `uptake=(30.4;37.9;45.5;Inf)` <dbl>
# Complex transformation with different settings per column
CO2 |>
partition(Plant:Treatment) |>
partition(conc,
.method = "raisedcos",
.breaks = c(-Inf, 95, 175, 350, 675, 1000, Inf)) |>
partition(uptake,
.method = "triangle",
.breaks = c(-Inf, 7.7, 28.3, 45.5, Inf),
.labels = c("low", "medium", "high"))
#> # A tibble: 84 × 24
#> `Plant=Qn1` `Plant=Qn2` `Plant=Qn3` `Plant=Qc1` `Plant=Qc3` `Plant=Qc2`
#> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 TRUE FALSE FALSE FALSE FALSE FALSE
#> 2 TRUE FALSE FALSE FALSE FALSE FALSE
#> 3 TRUE FALSE FALSE FALSE FALSE FALSE
#> 4 TRUE FALSE FALSE FALSE FALSE FALSE
#> 5 TRUE FALSE FALSE FALSE FALSE FALSE
#> 6 TRUE FALSE FALSE FALSE FALSE FALSE
#> 7 TRUE FALSE FALSE FALSE FALSE FALSE
#> 8 FALSE TRUE FALSE FALSE FALSE FALSE
#> 9 FALSE TRUE FALSE FALSE FALSE FALSE
#> 10 FALSE TRUE FALSE FALSE FALSE FALSE
#> # ℹ 74 more rows
#> # ℹ 18 more variables: `Plant=Mn3` <lgl>, `Plant=Mn2` <lgl>, `Plant=Mn1` <lgl>,
#> # `Plant=Mc2` <lgl>, `Plant=Mc3` <lgl>, `Plant=Mc1` <lgl>,
#> # `Type=Quebec` <lgl>, `Type=Mississippi` <lgl>,
#> # `Treatment=nonchilled` <lgl>, `Treatment=chilled` <lgl>,
#> # `conc=(-Inf;95;175)` <dbl>, `conc=(95;175;350)` <dbl>,
#> # `conc=(175;350;675)` <dbl>, `conc=(350;675;1000)` <dbl>, …