Balance Binary Data by Resampling: Under-Over Sampling

This function lets the user balance a given data.frame by resampling with a given relation rate and a binary feature.

Usage

balance_data(df, var, rate = 1, target = "auto", seed = 0, quiet = FALSE)

Arguments

df: Vector or Dataframe. Contains different variables in each column, separated by a specific character
var: Variable. Which variable should we used to re-sample dataset?
rate: Numeric. How many X for every Y we need? Default: 1. If there are more than 2 unique values, rate will represent percentage for number of rows
target: Character. If binary, which value should be reduced? If kept in "auto", then the most frequent value will be reduced.
seed: Numeric. Seed to replicate and obtain same values
quiet: Boolean. Keep quiet? If not, informative messages will be shown.

Value

data.frame. Reduced sampled data.frame following the rate of appearance of a specific variable.

Examples

data(dft) # Titanic dataset
df <- balance_data(dft, Survived, rate = 1)
#> Resampled from: 549 v 342
#> Reducing size for label: FALSE
#> New label distribution: 342 v 342
df <- balance_data(dft, .data$Survived, rate = 0.5, target = "TRUE")
#> Resampled from: 549 v 342
#> Reducing size for label: TRUE
#> New label distribution: 549 v 274

Balance Binary Data by Resampling: Under-Over Sampling

Usage

Arguments

Value

See also

Examples