Check if a vector contains (almost) the same value in the majority of its
elements. The function returns TRUE if the proportion of the most frequent
value in x is greater than or equal to the specified threshold.
Arguments
- x
A vector to be tested.
- threshold
A numeric scalar in the interval \([0,1]\) specifying the minimum required proportion of the most frequent value. Defaults to 1.
- na_rm
Logical; if
TRUE,NAvalues are removed before computing proportions. IfFALSE,NAis treated as an ordinary value, so a large number ofNAs can cause the function to returnTRUE.
Value
A logical scalar. Returns TRUE in the following cases:
xis empty or has length one.xcontains onlyNAvalues.The proportion of the most frequent value in
xis greater than or equal tothreshold. Otherwise, returnsFALSE.
Details
This is useful for detecting low-variability or degenerate variables, which may be uninformative in modeling or analysis.
Examples
is_almost_constant(1)
#> [1] TRUE
is_almost_constant(1:10)
#> [1] FALSE
is_almost_constant(c(NA, NA, NA), na_rm = TRUE)
#> [1] TRUE
is_almost_constant(c(NA, NA, NA), na_rm = FALSE)
#> [1] TRUE
is_almost_constant(c(NA, NA, NA, 1, 2), threshold = 0.5, na_rm = FALSE)
#> [1] TRUE
is_almost_constant(c(NA, NA, NA, 1, 2), threshold = 0.5, na_rm = TRUE)
#> [1] TRUE
