Skip to contents

Use Lasso regression to identify the most relevant variables that can predict/identify another variable. You might want to compare with corr_var() and/or x2y() results to compliment the analysis No need to standardize, center or scale your data. Tidyverse friendly.

Usage

lasso_vars(
  df,
  variable,
  ignore = NULL,
  nlambdas = 100,
  nfolds = 10,
  top = 20,
  quiet = FALSE,
  seed = 123,
  ...
)

Arguments

df

Dataframe. Any dataframe is valid as ohse will be applied to process categorical values, and values will be standardize automatically.

variable

Variable. Dependent variable or response.

ignore

Character vector. Variables to exclude from study.

nlambdas

Integer. Number of lambdas to be used in a search.

nfolds

Integer. Number of folds for K-fold cross-validation (>= 2).

top

Integer. Plot top n results only.

quiet

Boolean. Keep quiet? Else, show messages.

seed

Numeric.

...

Additional parameters passed to ohse().

Value

List. Contains lasso model coefficients, performance metrics, the actual model fitted and a plot.

Examples

if (FALSE) {
# CRAN
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

m <- lasso_vars(dft, Survived, ignore = c("Cabin"))
print(m$coef)
print(m$metrics)
plot(m$plot)
}