Use Lasso regression to identify the most relevant variables that
can predict/identify another variable. You might want to compare
with corr_var()
and/or x2y()
results to compliment
the analysis No need to standardize, center or scale your data.
Tidyverse friendly.
Usage
lasso_vars(
df,
variable,
ignore = NULL,
nlambdas = 100,
nfolds = 10,
top = 20,
quiet = FALSE,
seed = 123,
...
)
Arguments
- df
Dataframe. Any dataframe is valid as
ohse
will be applied to process categorical values, and values will be standardize automatically.- variable
Variable. Dependent variable or response.
- ignore
Character vector. Variables to exclude from study.
- nlambdas
Integer. Number of lambdas to be used in a search.
- nfolds
Integer. Number of folds for K-fold cross-validation (>= 2).
- top
Integer. Plot top n results only.
- quiet
Boolean. Keep quiet? Else, show messages.
- seed
Numeric.
- ...
Additional parameters passed to
ohse()
.
Value
List. Contains lasso model coefficients, performance metrics, the actual model fitted and a plot.
See also
Other Machine Learning:
ROC()
,
conf_mat()
,
export_results()
,
gain_lift()
,
h2o_automl()
,
h2o_predict_MOJO()
,
h2o_selectmodel()
,
impute()
,
iter_seeds()
,
model_metrics()
,
model_preprocess()
,
msplit()
Other Exploratory:
corr_cross()
,
corr_var()
,
crosstab()
,
df_str()
,
distr()
,
freqs()
,
freqs_df()
,
freqs_list()
,
freqs_plot()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()