Skip to contents

Transform selected columns of a data frame into either dummy logical variables or membership degrees of fuzzy sets, while leaving all remaining columns unchanged. Each transformed column typically produces multiple new columns in the output.

Usage

partition(
  .data,
  .what = everything(),
  ...,
  .breaks = NULL,
  .labels = NULL,
  .na = TRUE,
  .keep = FALSE,
  .method = "crisp",
  .right = TRUE,
  .span = 1,
  .inc = 1
)

Arguments

.data

A data frame to be processed.

.what

A tidyselect expression (see tidyselect syntax) selecting the columns to transform.

...

Additional tidyselect expressions selecting more columns.

.breaks

Ignored if .method = "dummy". For other methods, either an integer scalar (number of intervals/sets) or a numeric vector of breakpoints.

.labels

Optional character vector with labels used for new column names. If NULL, labels are generated automatically.

.na

If TRUE, an extra logical column is created for each source column that contains NA values (e.g. x=NA).

.keep

If TRUE, keep the original columns in the output.

.method

Transformation method for numeric columns: "dummy", "crisp", "triangle", or "raisedcos".

.right

For "crisp", whether intervals are right-closed and left-open (TRUE), or left-closed and right-open (FALSE).

.span

Number of consecutive breaks forming a set. For "crisp", controls interval width. For "triangle"/"raisedcos", .span = 1 produces triangular sets, .span = 2 trapezoidal sets.

.inc

Step size for shifting breaks when generating successive sets. With .inc = 1, all possible sets are created; larger values skip sets.

Value

A tibble with .data transformed into Boolean or fuzzy predicates.

Details

These transformations are most often used as a preprocessing step before calling dig() or one of its derivatives, such as dig_correlations(), dig_paired_baseline_contrasts(), or dig_associations().

The transformation depends on the column type:

  • logical column x is expanded into two logical columns: x=TRUE and x=FALSE;

  • factor column x with levels l1, l2, l3 becomes three logical columns: x=l1, x=l2, and x=l3;

  • numeric column x is transformed according to .method:

    • .method = "dummy": the column is treated as a factor with one level for each unique value, then expanded to dummy columns. This produces one logical column per unique value;

    • .method = "crisp": the column is discretized into intervals (defined by .breaks) and then expanded to dummy columns representing these intervals;

    • .method = "triangle" or .method = "raisedcos": the column is converted into one or more fuzzy sets. Each new column contains values in \([0,1]\) representing degrees of membership to the fuzzy set (triangular or raised-cosine shaped).

Details of numeric transformations are controlled by .breaks, .labels, .right, .span, and .inc.

  • Crisp partitioning is recommended for efficiency and works best when sharp category boundaries are meaningful for the analysis.

  • Fuzzy partitioning is useful when attributes change gradually or when uncertainty should be modeled explicitly. It allows smooth transitions between categories and may yield more interpretable patterns, but is more computationally demanding.

Crisp transformation of numeric data

For .method = "crisp", numeric columns are converted into sets of dummy logical variables, each representing one interval of values defined by .breaks.

  • If .breaks is an integer, it specifies the number of equal-width intervals into which the column range is divided. The first and last intervals extend to infinity.

  • If .breaks is a numeric vector, it specifies interval boundaries directly. Infinite values are allowed.

With .span = 1 and .inc = 1, the intervals are consecutive and non-overlapping. For example, with .breaks = c(1, 3, 5, 7, 9, 11) and .right = TRUE, the intervals are \((1;3]\), \((3;5]\), \((5;7]\), \((7;9]\), and \((9;11]\). If .right = FALSE, the intervals are left-closed: \([1;3)\), \([3;5)\), etc.

Larger .span values make intervals overlap. For example, with .span = 2, .inc = 1, and .right = TRUE, the intervals are \((1;5]\), \((3;7]\), \((5;9]\), and \((7;11]\).

The .inc argument modifies how far the window shifts along .breaks. For example:

  • .span = 1, .inc = 2 → \((1;3]\), \((5;7]\), \((9;11]\).

  • .span = 2, .inc = 3 → \((1;5]\), \((9;11]\).

Fuzzy transformation of numeric data

For .method = "triangle" or .method = "raisedcos", numeric columns are converted into fuzzy membership degrees \([0,1]\).

  • If .breaks is an integer, it specifies the number of fuzzy sets to generate (breakpoints are chosen automatically).

  • If .breaks is a numeric vector, it directly defines the fuzzy set boundaries. Infinite values are allowed, which produces fuzzy sets with open ends.

With .span = 1, each fuzzy set is defined by three consecutive breaks: membership is 0 outside the outer breaks, increases to 1 at the middle break, and then decreases back to 0. This yields triangular or raised-cosine sets.

With .span > 1, fuzzy sets are defined by four breaks: the degree increases between the first two, stays 1 between the middle two, and decreases between the last two. This produces trapezoidal fuzzy sets, with linear borders if .method = "triangle", or cosine-shaped borders if .method = "raisedcos".

As with crisp sets, .inc determines how far the break window shifts when creating the next fuzzy set. For example:

  • .span = 1, .inc = 1 → \((1;3;5)\), \((3;5;7)\), \((5;7;9)\), \((7;9;11)\).

  • .span = 2, .inc = 1 → \((1;3;5;7)\), \((3;5;7;9)\), \((5;7;9;11)\).

  • .span = 1, .inc = 3 → \((1;3;5)\), \((7;9;11)\).

See the examples for further details.

Author

Michal Burda

Examples

# Transform logical columns and factors
d <- data.frame(a = c(TRUE, TRUE, FALSE),
                b = factor(c("A", "B", "A")),
                c = c(1, 2, 3))
partition(d, a, b, c, .method = "dummy")
#> # A tibble: 3 × 7
#>   `a=T` `a=F` `b=A` `b=B` `c=1` `c=2` `c=3`
#>   <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 TRUE  FALSE TRUE  FALSE TRUE  FALSE FALSE
#> 2 TRUE  FALSE FALSE TRUE  FALSE TRUE  FALSE
#> 3 FALSE TRUE  TRUE  FALSE FALSE FALSE TRUE 

# Crisp transformation of numeric data
partition(CO2, conc:uptake, .method = "crisp", .breaks = 3)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;397]` `conc=(397;698]` `conc=(698;Inf]`
#>    <ord> <fct>  <fct>      <lgl>             <lgl>            <lgl>           
#>  1 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  2 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  3 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  4 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  5 Qn1   Quebec nonchilled FALSE             TRUE             FALSE           
#>  6 Qn1   Quebec nonchilled FALSE             TRUE             FALSE           
#>  7 Qn1   Quebec nonchilled FALSE             FALSE            TRUE            
#>  8 Qn2   Quebec nonchilled TRUE              FALSE            FALSE           
#>  9 Qn2   Quebec nonchilled TRUE              FALSE            FALSE           
#> 10 Qn2   Quebec nonchilled TRUE              FALSE            FALSE           
#> # ℹ 74 more rows
#> # ℹ 3 more variables: `uptake=(-Inf;20.3]` <lgl>, `uptake=(20.3;32.9]` <lgl>,
#> #   `uptake=(32.9;Inf]` <lgl>

# Triangular fuzzy sets
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;95;548)` `conc=(95;548;1000)`
#>    <ord> <fct>  <fct>                     <dbl>                <dbl>
#>  1 Qn1   Quebec nonchilled                1                    0    
#>  2 Qn1   Quebec nonchilled                0.823                0.177
#>  3 Qn1   Quebec nonchilled                0.658                0.342
#>  4 Qn1   Quebec nonchilled                0.437                0.563
#>  5 Qn1   Quebec nonchilled                0.106                0.894
#>  6 Qn1   Quebec nonchilled                0                    0.719
#>  7 Qn1   Quebec nonchilled                0                    0    
#>  8 Qn2   Quebec nonchilled                1                    0    
#>  9 Qn2   Quebec nonchilled                0.823                0.177
#> 10 Qn2   Quebec nonchilled                0.658                0.342
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(548;1000;Inf)` <dbl>,
#> #   `uptake=(-Inf;7.7;26.6)` <dbl>, `uptake=(7.7;26.6;45.5)` <dbl>,
#> #   `uptake=(26.6;45.5;Inf)` <dbl>

# Raised-cosine fuzzy sets
partition(CO2, conc:uptake, .method = "raisedcos", .breaks = 3)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;95;548)` `conc=(95;548;1000)`
#>    <ord> <fct>  <fct>                     <dbl>                <dbl>
#>  1 Qn1   Quebec nonchilled               1                    0     
#>  2 Qn1   Quebec nonchilled               0.925                0.0750
#>  3 Qn1   Quebec nonchilled               0.738                0.262 
#>  4 Qn1   Quebec nonchilled               0.402                0.598 
#>  5 Qn1   Quebec nonchilled               0.0274               0.973 
#>  6 Qn1   Quebec nonchilled               0                    0.818 
#>  7 Qn1   Quebec nonchilled               0                    0     
#>  8 Qn2   Quebec nonchilled               1                    0     
#>  9 Qn2   Quebec nonchilled               0.925                0.0750
#> 10 Qn2   Quebec nonchilled               0.738                0.262 
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(548;1000;Inf)` <dbl>,
#> #   `uptake=(-Inf;7.7;26.6)` <dbl>, `uptake=(7.7;26.6;45.5)` <dbl>,
#> #   `uptake=(26.6;45.5;Inf)` <dbl>

# Trapezoidal fuzzy sets, overlapping to satisfy the Ruspini condition
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3,
          .span = 2, .inc = 2)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;95;276;457)` `conc=(276;457;638;819)`
#>    <ord> <fct>  <fct>                         <dbl>                    <dbl>
#>  1 Qn1   Quebec nonchilled                    1                        0    
#>  2 Qn1   Quebec nonchilled                    1                        0    
#>  3 Qn1   Quebec nonchilled                    1                        0    
#>  4 Qn1   Quebec nonchilled                    0.591                    0.409
#>  5 Qn1   Quebec nonchilled                    0                        1    
#>  6 Qn1   Quebec nonchilled                    0                        0.796
#>  7 Qn1   Quebec nonchilled                    0                        0    
#>  8 Qn2   Quebec nonchilled                    1                        0    
#>  9 Qn2   Quebec nonchilled                    1                        0    
#> 10 Qn2   Quebec nonchilled                    1                        0    
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(638;819;1000;Inf)` <dbl>,
#> #   `uptake=(-Inf;7.7;15.3;22.8)` <dbl>, `uptake=(15.3;22.8;30.4;37.9)` <dbl>,
#> #   `uptake=(30.4;37.9;45.5;Inf)` <dbl>

# Complex transformation with different settings per column
CO2 |>
  partition(Plant:Treatment) |>
  partition(conc,
            .method = "raisedcos",
            .breaks = c(-Inf, 95, 175, 350, 675, 1000, Inf)) |>
  partition(uptake,
            .method = "triangle",
            .breaks = c(-Inf, 7.7, 28.3, 45.5, Inf),
            .labels = c("low", "medium", "high"))
#> # A tibble: 84 × 24
#>    `Plant=Qn1` `Plant=Qn2` `Plant=Qn3` `Plant=Qc1` `Plant=Qc3` `Plant=Qc2`
#>    <lgl>       <lgl>       <lgl>       <lgl>       <lgl>       <lgl>      
#>  1 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  2 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  3 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  4 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  5 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  6 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  7 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  8 FALSE       TRUE        FALSE       FALSE       FALSE       FALSE      
#>  9 FALSE       TRUE        FALSE       FALSE       FALSE       FALSE      
#> 10 FALSE       TRUE        FALSE       FALSE       FALSE       FALSE      
#> # ℹ 74 more rows
#> # ℹ 18 more variables: `Plant=Mn3` <lgl>, `Plant=Mn2` <lgl>, `Plant=Mn1` <lgl>,
#> #   `Plant=Mc2` <lgl>, `Plant=Mc3` <lgl>, `Plant=Mc1` <lgl>,
#> #   `Type=Quebec` <lgl>, `Type=Mississippi` <lgl>,
#> #   `Treatment=nonchilled` <lgl>, `Treatment=chilled` <lgl>,
#> #   `conc=(-Inf;95;175)` <dbl>, `conc=(95;175;350)` <dbl>,
#> #   `conc=(175;350;675)` <dbl>, `conc=(350;675;1000)` <dbl>, …