Convert columns of data frame to Boolean or fuzzy sets (of triangular, trapezoidal, or raised-cosinal shape)

Convert the selected columns of the data frame into either dummy logical columns, or into membership degrees of fuzzy sets, while leaving the remaining columns untouched. Each column selected for transformation typically results in multiple columns in the output.

Usage

partition(
  .data,
  .what = everything(),
  ...,
  .breaks = NULL,
  .labels = NULL,
  .na = TRUE,
  .keep = FALSE,
  .method = "crisp",
  .right = TRUE,
  .span = 1,
  .inc = 1
)

Arguments

.data: the data frame to be processed
.what: a tidyselect expression (see tidyselect syntax) specifying the columns to be transformed
...: optional other tidyselect expressions selecting additional columns to be processed
.breaks: for numeric columns, this has to be either an integer scalar or a numeric vector. If .breaks is an integer scalar, it specifies the number of resulting intervals to break the numeric column to (for .method="crisp") or the number of target fuzzy sets (for .method="triangle" or .method="raisedcos). If .breaks is a vector, the values specify the borders of intervals (for .method="crisp") or the breaking points of fuzzy sets.
.labels: character vector specifying the names used to construct the newly created column names. If NULL, the labels are generated automatically.
.na: if TRUE, an additional logical column is created for each source column that contains NA values. For column named x, the newly created column's name will be x=NA.
.keep: if TRUE, the original columns being transformed remain present in the resulting data frame.
.method: The method of transformation for numeric columns. Either "crisp", "triangle", or "raisedcos" is required.
.right: If .method="crisp", this argument specifies if the intervals should be closed on the right (and open on the left) or vice versa.
.span: The span of the intervals for numeric columns. If .method="crisp", this argument specifies the number of consecutive breaks in a single resulting interval. If .method="triangle" or .method="raisedcos", this argument specifies the number of breaks that should form the core of the fuzzy set, (i.e. where the membership degrees are 1). For .span = 1, the fuzzy set has a triangular shape with only a single value with membership equal to 1, for .span = 2, the fuzzy set has a trapezoidal shape.
.inc: how many breaks to move on to the right when creating the next column from a numeric column in x. In other words, if .inc = 1, all resulting columns are created (by shifting breaks by 1), if .inc = 2, the first, third, fifth, etc. columns are created, i.e., every second resulting column is skipped.

Value

A tibble created by transforming .data.

Details

Transformations performed by this function are typically useful as a preprocessing step before using the dig() function or some of its derivatives (dig_correlations(), dig_paired_baseline_contrasts(), dig_associations()).

The transformation of selected columns differ based on the type. Concretely:

logical column x is transformed into pair of logical columns, x=TRUE andx=FALSE;
factor column x, which has levels l1, l2, and l3, is transformed into three logical columns named x=l1, x=l2, and x=l3;
numeric columnx is transformed accordingly to .method argument:
- if .method="crisp", the column is first transformed into a factor with intervals as factor levels and then it is processed as a factor (see above);
- for other .method (triangle or raisedcos), several new columns are created, where each column has numeric values from the interval \([0,1]\) and represents a certain fuzzy set (either triangular or raised-cosinal). Details of transformation of numeric columns can be specified with additional arguments (.breaks, .labels, .right).

The processing of source numeric columns is quite complex and depends on the following arguments: .method, .breaks, .right, .span, and .inc.

Crisp transformation of numeric data

For .method = "crisp", the numeric column is transformed into a set of logical columns where each column represents a certain interval of values. The intervals are determined by the .breaks argument.

If .breaks is an integer scalar, it specifies the number of resulting intervals to break the numeric column to. The intervals are obtained automatically from the source column by splitting the range of the source values into .breaks intervals of equal length. The first and the last interval are defined from the minimum infinity to the first break and from the last break to the maximum infinity, respectively.

If .breaks is a vector, the values specify the manual borders of intervals. (Infinite values are allowed.)

For .span = 1 and .inc = 1, the intervals are consecutive and non-overlapping. If .breaks = c(1, 3, 5, 7, 9, 11) and .right = TRUE, for example, the following intervals are considered: \((1;3]\), \((3;5]\), \((5;7]\), \((7;9]\), and \((9;11]\). (If .right = FALSE, the intervals are: \([1;3)\), \([3;5)\), \([5;7)\), \([7;9)\), and \([9;11)\).)

For .span > 1, the intervals overlap in .span breaks. For .span = 2, .inc = 1, and .right = TRUE, the intervals are: \((1;5]\), \((3;7]\), \((5;9]\), and \((7;11]\).

As can be seen, so far the next interval was created by shifting in 1 position in .breaks. The .inc argument modifies that shift. If .inc = 2 and .span = 1, the intervals are: \((1;3]\), \((5;7]\), and \((9;11]\). For .span = 2 and .inc = 3, the intervals are: \((1;5]\), and \((9;11]\).

Fuzzy transformation of numeric data

For .method = "triangle" or .method = "raisedcos", the numeric column is transformed into a set of columns where each column represents membership degrees to a certain fuzzy set. The shape of the underlying fuzzy sets is again determined by the .breaks argument.

If .breaks is an integer scalar, it specifies the number of target fuzzy sets. The breaks are determined automatically from the source data column similarly as in the crisp transformation described above.

If .breaks is a vector, the values specify the breaking points of fuzzy sets. Infinite values as breaks produce fuzzy sets with open borders.

For .span = 1, each underlying fuzzy set is determined by three consecutive breaks. Outside of these breaks, the membership degree is 0. In the interval between the first two breaks, the membership degree is increasing and in the interval between the last two breaks, the membership degree is decreasing. Hence the membership degree 1 is obtained for values equal to the middle break. This practically forms fuzzy sets of triangular or raised-cosinal shape.

For .span > 1, the fuzzy set is determined by four breaks. Outside of these breaks, the membership degree is 0. In the interval between the first and the second break, the membership degree is increasing, in the interval between the third and the fourth break, the membership degree is decreasing, and in the interval between the second and the third break, the membership degree is 1. This practically forms fuzzy sets of trapezoidal shape.

Similar to the crisp transformation, the .inc argument determines the shift of breaks when creating the next underlying fuzzy set.

Let .breaks = c(1, 3, 5, 7, 9, 11). For .span = 1 and .inc = 1, the fuzzy sets are determined by the following triplets having effectively the triangular or raised-cosinal shape: \((1;3;5)\), \((3;5;7)\), \((5;7;9)\), and \((7;9;11)\).

For .span = 2 and .inc = 1, the fuzzy sets are determined by the following quadruplets: \((1;3;5;7)\), \((3;5;7;9)\), and \((5;7;9;11)\). These fuzzy sets have the trapezoidal shape with linear (if .method = "triangle") or cosine (if .method = "raisedcos") increasing and decreasing border-parts.

For .span = 1 and .inc = 3, the fuzzy sets are determined by the following triplets: \((1;3;5)\), and \((7;9;11)\) while skipping 2nd and 3rd fuzzy sets.

See the examples for more details.

Author

Michal Burda

Examples

# transform logical columns and factors
d <- data.frame(a = c(TRUE, TRUE, FALSE),
                b = factor(c("A", "B", "A")),
                c = c(1, 2, 3))
partition(d, a, b)
#> # A tibble: 3 × 5
#>       c `a=T` `a=F` `b=A` `b=B`
#>   <dbl> <lgl> <lgl> <lgl> <lgl>
#> 1     1 TRUE  FALSE TRUE  FALSE
#> 2     2 TRUE  FALSE FALSE TRUE 
#> 3     3 FALSE TRUE  TRUE  FALSE

# transform numeric columns to logical columns (crisp transformation)
partition(CO2, conc:uptake, .method = "crisp", .breaks = 3)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;397]` `conc=(397;698]` `conc=(698;Inf]`
#>    <ord> <fct>  <fct>      <lgl>             <lgl>            <lgl>           
#>  1 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  2 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  3 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  4 Qn1   Quebec nonchilled TRUE              FALSE            FALSE           
#>  5 Qn1   Quebec nonchilled FALSE             TRUE             FALSE           
#>  6 Qn1   Quebec nonchilled FALSE             TRUE             FALSE           
#>  7 Qn1   Quebec nonchilled FALSE             FALSE            TRUE            
#>  8 Qn2   Quebec nonchilled TRUE              FALSE            FALSE           
#>  9 Qn2   Quebec nonchilled TRUE              FALSE            FALSE           
#> 10 Qn2   Quebec nonchilled TRUE              FALSE            FALSE           
#> # ℹ 74 more rows
#> # ℹ 3 more variables: `uptake=(-Inf;20.3]` <lgl>, `uptake=(20.3;32.9]` <lgl>,
#> #   `uptake=(32.9;Inf]` <lgl>

# transform numeric columns to triangular fuzzy sets:
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;95;548)` `conc=(95;548;1000)`
#>    <ord> <fct>  <fct>                     <dbl>                <dbl>
#>  1 Qn1   Quebec nonchilled                1                    0    
#>  2 Qn1   Quebec nonchilled                0.823                0.177
#>  3 Qn1   Quebec nonchilled                0.658                0.342
#>  4 Qn1   Quebec nonchilled                0.437                0.563
#>  5 Qn1   Quebec nonchilled                0.106                0.894
#>  6 Qn1   Quebec nonchilled                0                    0.719
#>  7 Qn1   Quebec nonchilled                0                    0    
#>  8 Qn2   Quebec nonchilled                1                    0    
#>  9 Qn2   Quebec nonchilled                0.823                0.177
#> 10 Qn2   Quebec nonchilled                0.658                0.342
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(548;1000;Inf)` <dbl>,
#> #   `uptake=(-Inf;7.7;26.6)` <dbl>, `uptake=(7.7;26.6;45.5)` <dbl>,
#> #   `uptake=(26.6;45.5;Inf)` <dbl>

# transform numeric columns to raised-cosinal fuzzy sets
partition(CO2, conc:uptake, .method = "raisedcos", .breaks = 3)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;95;548)` `conc=(95;548;1000)`
#>    <ord> <fct>  <fct>                     <dbl>                <dbl>
#>  1 Qn1   Quebec nonchilled               1                    0     
#>  2 Qn1   Quebec nonchilled               0.925                0.0750
#>  3 Qn1   Quebec nonchilled               0.738                0.262 
#>  4 Qn1   Quebec nonchilled               0.402                0.598 
#>  5 Qn1   Quebec nonchilled               0.0274               0.973 
#>  6 Qn1   Quebec nonchilled               0                    0.818 
#>  7 Qn1   Quebec nonchilled               0                    0     
#>  8 Qn2   Quebec nonchilled               1                    0     
#>  9 Qn2   Quebec nonchilled               0.925                0.0750
#> 10 Qn2   Quebec nonchilled               0.738                0.262 
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(548;1000;Inf)` <dbl>,
#> #   `uptake=(-Inf;7.7;26.6)` <dbl>, `uptake=(7.7;26.6;45.5)` <dbl>,
#> #   `uptake=(26.6;45.5;Inf)` <dbl>

# transform numeric columns to trapezoidal fuzzy sets overlapping in non-core
# regions so that the membership degrees sum to 1 along the consecutive fuzzy sets
# (i.e., the so-called Ruspini condition is met)
partition(CO2, conc:uptake, .method = "triangle", .breaks = 3, .span = 2, .inc = 2)
#> # A tibble: 84 × 9
#>    Plant Type   Treatment  `conc=(-Inf;95;276;457)` `conc=(276;457;638;819)`
#>    <ord> <fct>  <fct>                         <dbl>                    <dbl>
#>  1 Qn1   Quebec nonchilled                    1                        0    
#>  2 Qn1   Quebec nonchilled                    1                        0    
#>  3 Qn1   Quebec nonchilled                    1                        0    
#>  4 Qn1   Quebec nonchilled                    0.591                    0.409
#>  5 Qn1   Quebec nonchilled                    0                        1    
#>  6 Qn1   Quebec nonchilled                    0                        0.796
#>  7 Qn1   Quebec nonchilled                    0                        0    
#>  8 Qn2   Quebec nonchilled                    1                        0    
#>  9 Qn2   Quebec nonchilled                    1                        0    
#> 10 Qn2   Quebec nonchilled                    1                        0    
#> # ℹ 74 more rows
#> # ℹ 4 more variables: `conc=(638;819;1000;Inf)` <dbl>,
#> #   `uptake=(-Inf;7.7;15.3;22.8)` <dbl>, `uptake=(15.3;22.8;30.4;37.9)` <dbl>,
#> #   `uptake=(30.4;37.9;45.5;Inf)` <dbl>

# complex transformation with different settings for each column
CO2 |>
    partition(Plant:Treatment) |>
    partition(conc,
              .method = "raisedcos",
              .breaks = c(-Inf, 95, 175, 350, 675, 1000, Inf)) |>
    partition(uptake,
              .method = "triangle",
              .breaks = c(-Inf, 7.7, 28.3, 45.5, Inf),
              .labels = c("low", "medium", "high"))
#> # A tibble: 84 × 24
#>    `Plant=Qn1` `Plant=Qn2` `Plant=Qn3` `Plant=Qc1` `Plant=Qc3` `Plant=Qc2`
#>    <lgl>       <lgl>       <lgl>       <lgl>       <lgl>       <lgl>      
#>  1 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  2 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  3 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  4 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  5 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  6 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  7 TRUE        FALSE       FALSE       FALSE       FALSE       FALSE      
#>  8 FALSE       TRUE        FALSE       FALSE       FALSE       FALSE      
#>  9 FALSE       TRUE        FALSE       FALSE       FALSE       FALSE      
#> 10 FALSE       TRUE        FALSE       FALSE       FALSE       FALSE      
#> # ℹ 74 more rows
#> # ℹ 18 more variables: `Plant=Mn3` <lgl>, `Plant=Mn2` <lgl>, `Plant=Mn1` <lgl>,
#> #   `Plant=Mc2` <lgl>, `Plant=Mc3` <lgl>, `Plant=Mc1` <lgl>,
#> #   `Type=Quebec` <lgl>, `Type=Mississippi` <lgl>,
#> #   `Treatment=nonchilled` <lgl>, `Treatment=chilled` <lgl>,
#> #   `conc=(-Inf;95;175)` <dbl>, `conc=(95;175;350)` <dbl>,
#> #   `conc=(175;350;675)` <dbl>, `conc=(350;675;1000)` <dbl>, …