Remove almost constant columns from a data frame
Source:R/remove_almost_constant.R
remove_almost_constant.Rd
Function tests all columns that are specified by the .what
argument
and removes those that are almost constant. A column is considered
almost constant if the proportion of the most frequent value is greater
than the threshold specified by the .threshold
argument. See
is_almost_constant()
for details.
Usage
remove_almost_constant(
.data,
.what = everything(),
...,
.threshold = 1,
.na_rm = FALSE,
.verbose = FALSE
)
Arguments
- .data
a data frame
- .what
a tidyselect expression (see tidyselect syntax) selecting the columns to be processed
- ...
optional other tidyselect expressions selecting additional columns to be processed
- .threshold
a numeric scalar in the range \([0, 1]\) specifying the threshold for the proportion of the most frequent value
- .na_rm
a logical scalar indicating whether to remove
NA
values before computing the proportion of the most frequent value. Seeis_almost_constant()
for details of howNA
values are handled.- .verbose
a logical scalar indicating whether to print a message about removed columns
Value
A data frame with removed all columns specified by the .what
argument that are also (almost) constant
Examples
d <- data.frame(a1 = 1:10,
a2 = c(1:9, NA),
b1 = "b",
b2 = NA,
c1 = rep(c(TRUE, FALSE), 5),
c2 = rep(c(TRUE, NA), 5),
d = c(rep(TRUE, 4), rep(FALSE, 4), NA, NA))
remove_almost_constant(d, .threshold = 1.0, .na_rm = FALSE)
#> # A tibble: 10 × 5
#> a1 a2 c1 c2 d
#> <int> <int> <lgl> <lgl> <lgl>
#> 1 1 1 TRUE TRUE TRUE
#> 2 2 2 FALSE NA TRUE
#> 3 3 3 TRUE TRUE TRUE
#> 4 4 4 FALSE NA TRUE
#> 5 5 5 TRUE TRUE FALSE
#> 6 6 6 FALSE NA FALSE
#> 7 7 7 TRUE TRUE FALSE
#> 8 8 8 FALSE NA FALSE
#> 9 9 9 TRUE TRUE NA
#> 10 10 NA FALSE NA NA
remove_almost_constant(d, .threshold = 1.0, .na_rm = TRUE)
#> # A tibble: 10 × 4
#> a1 a2 c1 d
#> <int> <int> <lgl> <lgl>
#> 1 1 1 TRUE TRUE
#> 2 2 2 FALSE TRUE
#> 3 3 3 TRUE TRUE
#> 4 4 4 FALSE TRUE
#> 5 5 5 TRUE FALSE
#> 6 6 6 FALSE FALSE
#> 7 7 7 TRUE FALSE
#> 8 8 8 FALSE FALSE
#> 9 9 9 TRUE NA
#> 10 10 NA FALSE NA
remove_almost_constant(d, .threshold = 0.5, .na_rm = FALSE)
#> # A tibble: 10 × 3
#> a1 a2 d
#> <int> <int> <lgl>
#> 1 1 1 TRUE
#> 2 2 2 TRUE
#> 3 3 3 TRUE
#> 4 4 4 TRUE
#> 5 5 5 FALSE
#> 6 6 6 FALSE
#> 7 7 7 FALSE
#> 8 8 8 FALSE
#> 9 9 9 NA
#> 10 10 NA NA
remove_almost_constant(d, .threshold = 0.5, .na_rm = TRUE)
#> # A tibble: 10 × 2
#> a1 a2
#> <int> <int>
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 4
#> 5 5 5
#> 6 6 6
#> 7 7 7
#> 8 8 8
#> 9 9 9
#> 10 10 NA
remove_almost_constant(d, a1:b2, .threshold = 0.5, .na_rm = TRUE)
#> # A tibble: 10 × 5
#> a1 a2 c1 c2 d
#> <int> <int> <lgl> <lgl> <lgl>
#> 1 1 1 TRUE TRUE TRUE
#> 2 2 2 FALSE NA TRUE
#> 3 3 3 TRUE TRUE TRUE
#> 4 4 4 FALSE NA TRUE
#> 5 5 5 TRUE TRUE FALSE
#> 6 6 6 FALSE NA FALSE
#> 7 7 7 TRUE TRUE FALSE
#> 8 8 8 FALSE NA FALSE
#> 9 9 9 TRUE TRUE NA
#> 10 10 NA FALSE NA NA