Use Lasso regression to identify the most relevant variables that
can predict/identify another variable. You might want to compare
with corr_var() and/or x2y() results to compliment
the analysis No need to standardize, center or scale your data.
Tidyverse friendly.
Usage
lasso_vars(
df,
variable,
ignore = NULL,
nlambdas = 100,
nfolds = 10,
top = 20,
quiet = FALSE,
seed = 123,
...
)Arguments
- df
Dataframe. Any dataframe is valid as
ohsewill be applied to process categorical values, and values will be standardize automatically.- variable
Variable. Dependent variable or response.
- ignore
Character vector. Variables to exclude from study.
- nlambdas
Integer. Number of lambdas to be used in a search.
- nfolds
Integer. Number of folds for K-fold cross-validation (>= 2).
- top
Integer. Plot top n results only.
- quiet
Boolean. Keep quiet? If not, informative messages will be shown.
- seed
Numeric.
- ...
Additional parameters passed to
ohse().
Value
List. Contains lasso model coefficients, performance metrics, the actual model fitted and a plot.
See also
Other Machine Learning:
ROC(),
conf_mat(),
export_results(),
gain_lift(),
h2o_automl(),
h2o_predict_MOJO(),
h2o_selectmodel(),
impute(),
iter_seeds(),
model_metrics(),
model_preprocess(),
msplit()
Other Exploratory:
corr_cross(),
corr_var(),
crosstab(),
df_str(),
distr(),
freqs(),
freqs_df(),
freqs_list(),
freqs_plot(),
missingness(),
plot_cats(),
plot_df(),
plot_nums(),
tree_var()
