This function lets the user create a robust and fast model, using H2O's AutoML function. The result is a list with the best model, its parameters, datasets, performance metrics, variables importance, and plots. Read more about the h2o_automl() pipeline here.

h2o_automl(
  df,
  y = "tag",
  ignore = NULL,
  train_test = NA,
  split = 0.7,
  weight = NULL,
  target = "auto",
  balance = FALSE,
  impute = FALSE,
  no_outliers = TRUE,
  unique_train = TRUE,
  center = FALSE,
  scale = FALSE,
  thresh = 10,
  seed = 0,
  nfolds = 5,
  max_models = 3,
  max_time = 10 * 60,
  start_clean = FALSE,
  exclude_algos = c("StackedEnsemble", "DeepLearning"),
  include_algos = NULL,
  plots = TRUE,
  alarm = TRUE,
  quiet = FALSE,
  print = TRUE,
  save = FALSE,
  subdir = NA,
  project = "AutoML Results",
  ...
)

# S3 method for h2o_automl
plot(x, ...)

# S3 method for h2o_automl
print(x, importance = TRUE, ...)

Arguments

df

Dataframe. Dataframe containing all your data, including the independent variable labeled as 'tag'. If you want to define which variable should be used instead, use the y parameter.

y

Variable or Character. Name of the independent variable.

ignore

Character vector. Force columns for the model to ignore

train_test

Character. If needed, df's column name with 'test' and 'train' values to split

split

Numeric. Value between 0 and 1 to split as train/test datasets. Value is for training set. Set value to 1 to train with all available data and test with same data (cross-validation will still be used when training). If train_test is set, value will be overwritten with its real split rate.

weight

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.

target

Value. Which is your target positive value? If set to 'auto', the target with largest mean(score) will be selected. Change the value to overwrite. Only used when binary categorical model.

balance

Boolean. Auto-balance train dataset with under-sampling?

impute

Boolean. Fill NA values with MICE?

no_outliers

Boolean/Numeric. Remove y's outliers from the dataset? Will remove those values that are farther than n standard deviations from the independent variable's mean (Z-score). Set to TRUE for default (3) or numeric to set a different multiplier.

unique_train

Boolean. Keep only unique row observations for training data?

center, scale

Boolean. Using the base function scale, do you wish to center and/or scale all numerical values?

thresh

Integer. Threshold for selecting binary or regression models: this number is the threshold of unique values we should have in 'tag' (more than: regression; less than: classification)

seed

Integer. Set a seed for reproducibility. AutoML can only guarantee reproducibility if max_models is used because max_time is resource limited.

nfolds

Number of folds for k-fold cross-validation. Must be >= 2; defaults to 5. Use 0 to disable cross-validation; this will also disable Stacked Ensemble (thus decreasing the overall model performance).

max_models, max_time

Numeric. Max number of models and seconds you wish for the function to iterate. Note that max_models guarantees reproducibility and max_time not (because it depends entirely on your machine's computational characteristics)

start_clean

Boolean. Erase everything in the current h2o instance before we start to train models? You may want to keep other models or not. To group results into a custom common AutoML project, you may use project_name argument.

exclude_algos, include_algos

Vector of character strings. Algorithms to skip or include during the model-building phase. Set NULL to ignore. When both are defined, only include_algos will be valid.

plots

Boolean. Create plots objects?

alarm

Boolean. Ping (sound) when done. Requires beepr.

quiet

Boolean. Quiet all messages, warnings, recommendations?

print

Boolean. Print summary when process ends?

save

Boolean. Do you wish to save/export results into your working directory?

subdir

Character. In which directory do you wish to save the results? Working directory as default.

project

Character. Your project's name

...

Additional parameters on h2o::h2o.automl

x

h2o_automl object

importance

Boolean. Print important variables?

Value

List. Trained model, predicted scores and datasets used, performance metrics, parameters, importance data.frame, seed, and plots when plots=TRUE.

List of algorithms

-> Read more here

DRF

Distributed Random Forest, including Random Forest (RF) and Extremely-Randomized Trees (XRT)

GLM

Generalized Linear Model

XGBoost

eXtreme Grading Boosting

GBM

Gradient Boosting Machine

DeepLearning

Fully-connected multi-layer artificial neural network

StackedEnsemble

Stacked Ensemble

Methods

print

Use print method to print models stats and summary

plot

Use plot method to plot results using mplot_full()

Examples

# \donttest{
# CRAN
data(dft) # Titanic dataset
dft <- subset(dft, select = -c(Ticket, PassengerId, Cabin))

# Classification: Binomial - 2 Classes
r <- h2o_automl(dft, y = Survived, max_models = 1, impute = FALSE, target = "TRUE", alarm = FALSE)
#> 2022-01-27 15:56:23 | Started process...
#> - INDEPENDENT VARIABLE: Survived
#> - MODEL TYPE: Classification
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100  
#> - MISSINGS: The following variables contain missing observations: Age (19.87%). Consider using the impute parameter.
#> - CATEGORICALS: There are 3 non-numerical features. Consider using ohse() or equivalent prior to encode categorical variables.
#> >>> Splitting data: train = 0.7 & test = 0.3
#> train_size  test_size 
#>        623        268 
#> - REPEATED: There were 57 repeated rows which are being suppressed from the train dataset
#> - ALGORITHMS: excluded 'StackedEnsemble', 'DeepLearning'
#> - CACHE: Previous models are not being erased. You may use 'start_clean' [clear] or 'project_name' [join]
#> - UI: You may check results using H2O Flow's interactive platform: http://localhost:54321/flow/index.html
#> >>> Iterating until 1 models or 600 seconds...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
#> 15:56:29.573: Project: AutoML_1_20220127_155629
#> 15:56:29.579: Setting stopping tolerance adaptively based on the training frame: 0.0400641540107502
#> 15:56:29.579: Build control seed: 0
#> 15:56:29.580: training frame: Frame key: AutoML_1_20220127_155629_training_train_sid_a6cb_1    cols: 8    rows: 623  chunks: 1    size: 9336  checksum: 375468970279251540
#> 15:56:29.580: validation frame: NULL
#> 15:56:29.580: leaderboard frame: NULL
#> 15:56:29.580: blending frame: NULL
#> 15:56:29.580: response column: tag
#> 15:56:29.581: fold column: null
#> 15:56:29.581: weights column: null
#> 15:56:29.591: Loading execution steps: [{XGBoost : [def_2 (1g, 10w), def_1 (2g, 10w), def_3 (3g, 10w), grid_1 (4g, 90w), lr_search (6g, 30w)]}, {GLM : [def_1 (1g, 10w)]}, {DRF : [def_1 (2g, 10w), XRT (3g, 10w)]}, {GBM : [def_5 (1g, 10w), def_2 (2g, 10w), def_3 (2g, 10w), def_4 (2g, 10w), def_1 (3g, 10w), grid_1 (4g, 60w), lr_annealing (6g, 10w)]}, {DeepLearning : [def_1 (3g, 10w), grid_1 (4g, 30w), grid_2 (5g, 30w), grid_3 (5g, 30w)]}, {completion : [resume_best_grids (10g, 60w)]}, {StackedEnsemble : [best_of_family_1 (1g, 5w), best_of_family_2 (2g, 5w), best_of_family_3 (3g, 5w), best_of_family_4 (4g, 5w), best_of_family_5 (5g, 5w), all_2 (2g, 10w), all_3 (3g, 10w), all_4 (4g, 10w), all_5 (5g, 10w), monotonic (6g, 10w), best_of_family_xgboost (6g, 10w), best_of_family_gbm (6g, 10w), all_xgboost (7g, 10w), all_gbm (7g, 10w), best_of_family_xglm (8g, 10w), all_xglm (8g, 10w), best_of_family (10g, 10w), best_N (10g, 10w)]}]
#> 15:56:29.625: Disabling Algo: StackedEnsemble as requested by the user.
#> 15:56:29.625: Disabling Algo: DeepLearning as requested by the user.
#> 15:56:29.625: Defined work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}]
#> 15:56:29.625: Actual work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}]
#> 15:56:29.626: AutoML job created: 2022.01.27 15:56:29.547
#> 15:56:29.627: AutoML build started: 2022.01.27 15:56:29.627
#> 15:56:29.631: Time assigned for XGBoost_1_AutoML_1_20220127_155629: 199.999s
#> 15:56:29.634: AutoML: starting XGBoost_1_AutoML_1_20220127_155629 model training
#> 15:56:29.715: XGBoost_1_AutoML_1_20220127_155629 [XGBoost def_2] started
#> 15:56:31.814: XGBoost_1_AutoML_1_20220127_155629 [XGBoost def_2] complete
#> 15:56:31.815: Adding model XGBoost_1_AutoML_1_20220127_155629 to leaderboard Leaderboard_AutoML_1_20220127_155629@@tag. Training time: model=0s, total=1s
#> 15:56:31.836: New leader: XGBoost_1_AutoML_1_20220127_155629, auc: 0.8445152383986407
#> 15:56:31.836: AutoML: hit the max_models limit; skipping GLM def_1
#> 15:56:31.836: AutoML: hit the max_models limit; skipping GBM def_5
#> 15:56:31.837: Skipping StackedEnsemble 'best_of_family_1' due to the exclude_algos option or it is already trained.
#> 15:56:31.837: AutoML: hit the max_models limit; skipping XGBoost def_1
#> 15:56:31.837: AutoML: hit the max_models limit; skipping DRF def_1
#> 15:56:31.837: AutoML: hit the max_models limit; skipping GBM def_2
#> 15:56:31.837: AutoML: hit the max_models limit; skipping GBM def_3
#> 15:56:31.837: AutoML: hit the max_models limit; skipping GBM def_4
#> 15:56:31.837: Skipping StackedEnsemble 'best_of_family_2' due to the exclude_algos option or it is already trained.
#> 15:56:31.838: Skipping StackedEnsemble 'all_2' due to the exclude_algos option or it is already trained.
#> 15:56:31.838: AutoML: hit the max_models limit; skipping XGBoost def_3
#> 15:56:31.838: AutoML: hit the max_models limit; skipping DRF XRT (Extremely Randomized Trees)
#> 15:56:31.838: AutoML: hit the max_models limit; skipping GBM def_1
#> 15:56:31.838: AutoML: hit the max_models limit; skipping DeepLearning def_1
#> 15:56:31.838: Skipping StackedEnsemble 'best_of_family_3' due to the exclude_algos option or it is already trained.
#> 15:56:31.838: Skipping StackedEnsemble 'all_3' due to the exclude_algos option or it is already trained.
#> 15:56:31.838: AutoML: hit the max_models limit; skipping XGBoost grid_1
#> 15:56:31.838: AutoML: hit the max_models limit; skipping GBM grid_1
#> 15:56:31.838: AutoML: hit the max_models limit; skipping DeepLearning grid_1
#> 15:56:31.839: Skipping StackedEnsemble 'best_of_family_4' due to the exclude_algos option or it is already trained.
#> 15:56:31.839: Skipping StackedEnsemble 'all_4' due to the exclude_algos option or it is already trained.
#> 15:56:31.839: AutoML: hit the max_models limit; skipping DeepLearning grid_2
#> 15:56:31.839: AutoML: hit the max_models limit; skipping DeepLearning grid_3
#> 15:56:31.839: Skipping StackedEnsemble 'best_of_family_5' due to the exclude_algos option or it is already trained.
#> 15:56:31.839: Skipping StackedEnsemble 'all_5' due to the exclude_algos option or it is already trained.
#> 15:56:31.840: AutoML: hit the max_models limit; skipping XGBoost lr_search
#> 15:56:31.840: AutoML: hit the max_models limit; skipping GBM lr_annealing
#> 15:56:31.840: Skipping StackedEnsemble 'monotonic' due to the exclude_algos option or it is already trained.
#> 15:56:31.841: Skipping StackedEnsemble 'best_of_family_xgboost' due to the exclude_algos option or it is already trained.
#> 15:56:31.841: Skipping StackedEnsemble 'best_of_family_gbm' due to the exclude_algos option or it is already trained.
#> 15:56:31.841: Skipping StackedEnsemble 'all_xgboost' due to the exclude_algos option or it is already trained.
#> 15:56:31.841: Skipping StackedEnsemble 'all_gbm' due to the exclude_algos option or it is already trained.
#> 15:56:31.842: Skipping StackedEnsemble 'best_of_family_xglm' due to the exclude_algos option or it is already trained.
#> 15:56:31.842: Skipping StackedEnsemble 'all_xglm' due to the exclude_algos option or it is already trained.
#> 15:56:31.842: AutoML: hit the max_models limit; skipping completion resume_best_grids
#> 15:56:31.842: Skipping StackedEnsemble 'best_of_family' due to the exclude_algos option or it is already trained.
#> 15:56:31.845: Skipping StackedEnsemble 'best_N' due to the exclude_algos option or it is already trained.
#> 15:56:31.845: Actual modeling steps: [{XGBoost : [def_2 (1g, 10w)]}]
#> 15:56:31.845: AutoML build stopped: 2022.01.27 15:56:31.845
#> 15:56:31.846: AutoML build done: built 1 models
#> 15:56:31.846: AutoML duration:  2.218 sec
#> 15:56:31.857: Verifying training frame immutability. . .
#> 15:56:31.857: Training frame was not mutated (as expected).
#> - EUREKA: Succesfully generated 1 models
#>                             model_id       auc   logloss     aucpr
#> 1 XGBoost_1_AutoML_1_20220127_155629 0.8445152 0.4667281 0.8209007
#>   mean_per_class_error     rmse       mse
#> 1            0.2133057 0.385946 0.1489543
#> SELECTED MODEL: XGBoost_1_AutoML_1_20220127_155629
#> - NOTE: The following variables were the least important: SibSp, Embarked.S, Parch, Pclass.2, Embarked.C
#> >>> Running predictions for Survived...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> Warning: Test/Validation dataset column 'Embarked' has levels not trained on: [""]
#> Target value: TRUE
#> >>> Generating plots...
#> Model (1/1): XGBoost_1_AutoML_1_20220127_155629
#> Independent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: XGBOOST
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.87173
#>    ACC = 0.14925
#>    PRC = 0.125
#>    TPR = 0.28571
#>    TNR = 0.086957
#> 
#> Most important variables:
#>    Sex.female (25.8%)
#>    Age (18.4%)
#>    Fare (16.2%)
#>    Sex.male (16.1%)
#>    Pclass.3 (13.2%)
#> Process duration: 18.6s

# Let's see all the stuff we have inside:
lapply(r, names)
#> $model
#> NULL
#> 
#> $y
#> NULL
#> 
#> $scores_test
#> [1] "tag"    "scores"
#> 
#> $metrics
#> [1] "dictionary"       "confusion_matrix" "gain_lift"        "metrics"         
#> [5] "cv_metrics"       "max_metrics"     
#> 
#> $parameters
#>  [1] "model_id"                          "training_frame"                   
#>  [3] "nfolds"                            "keep_cross_validation_models"     
#>  [5] "keep_cross_validation_predictions" "fold_assignment"                  
#>  [7] "stopping_tolerance"                "seed"                             
#>  [9] "distribution"                      "categorical_encoding"             
#> [11] "ntrees"                            "max_depth"                        
#> [13] "min_rows"                          "min_child_weight"                 
#> [15] "sample_rate"                       "subsample"                        
#> [17] "col_sample_rate"                   "colsample_bylevel"                
#> [19] "col_sample_rate_per_tree"          "colsample_bytree"                 
#> [21] "score_tree_interval"               "tree_method"                      
#> [23] "dmatrix_type"                      "backend"                          
#> [25] "x"                                 "y"                                
#> 
#> $importance
#> [1] "variable"            "relative_importance" "scaled_importance"  
#> [4] "importance"         
#> 
#> $datasets
#> [1] "global" "test"  
#> 
#> $scoring_history
#> [1] "timestamp"                     "duration"                     
#> [3] "number_of_trees"               "training_rmse"                
#> [5] "training_logloss"              "training_auc"                 
#> [7] "training_pr_auc"               "training_lift"                
#> [9] "training_classification_error"
#> 
#> $categoricals
#> [1] "Pclass"     "Sex"        "Embarked"   "train_test" "predict"   
#> 
#> $type
#> NULL
#> 
#> $split
#> NULL
#> 
#> $threshold
#> NULL
#> 
#> $model_name
#> NULL
#> 
#> $algorithm
#> NULL
#> 
#> $leaderboard
#> [1] "model_id"             "auc"                  "logloss"             
#> [4] "aucpr"                "mean_per_class_error" "rmse"                
#> [7] "mse"                 
#> 
#> $project
#> NULL
#> 
#> $seed
#> NULL
#> 
#> $h2o
#> NULL
#> 
#> $plots
#> [1] "dashboard"  "metrics"    "importance"
#> 

# Classification: Multi-Categorical - 3 Classes
r <- h2o_automl(dft, Pclass, ignore = c("Fare", "Cabin"), max_time = 30, plots = FALSE)
#> 2022-01-27 15:56:42 | Started process...
#> - INDEPENDENT VARIABLE: Pclass
#> - MODEL TYPE: Classification
#> # A tibble: 3 × 5
#>   tag       n     p order  pcum
#>   <fct> <int> <dbl> <int> <dbl>
#> 1 n_3     491  55.1     1  55.1
#> 2 n_1     216  24.2     2  79.4
#> 3 n_2     184  20.6     3 100  
#> - MISSINGS: The following variables contain missing observations: Age (19.87%). Consider using the impute parameter.
#> - CATEGORICALS: There are 3 non-numerical features. Consider using ohse() or equivalent prior to encode categorical variables.
#> >>> Splitting data: train = 0.7 & test = 0.3
#> train_size  test_size 
#>        623        268 
#> - REPEATED: There were 65 repeated rows which are being suppressed from the train dataset
#> - ALGORITHMS: excluded 'StackedEnsemble', 'DeepLearning'
#> - CACHE: Previous models are not being erased. You may use 'start_clean' [clear] or 'project_name' [join]
#> - UI: You may check results using H2O Flow's interactive platform: http://localhost:54321/flow/index.html
#> >>> Iterating until 3 models or 30 seconds...
#> 
#> 15:56:42.810: Project: AutoML_2_20220127_155642
#> 15:56:42.811: Setting stopping tolerance adaptively based on the training frame: 0.0400641540107502
#> 15:56:42.811: Build control seed: 0
#> 15:56:42.811: training frame: Frame key: AutoML_2_20220127_155642_training_train_sid_9574_82    cols: 8    rows: 623  chunks: 1    size: 9412  checksum: -6266075352297987636
#> 15:56:42.811: validation frame: NULL
#> 15:56:42.811: leaderboard frame: NULL
#> 15:56:42.811: blending frame: NULL
#> 15:56:42.812: response column: tag
#> 15:56:42.812: fold column: null
#> 15:56:42.812: weights column: null
#> 15:56:42.812: Loading execution steps: [{XGBoost : [def_2 (1g, 10w), def_1 (2g, 10w), def_3 (3g, 10w), grid_1 (4g, 90w), lr_search (6g, 30w)]}, {GLM : [def_1 (1g, 10w)]}, {DRF : [def_1 (2g, 10w), XRT (3g, 10w)]}, {GBM : [def_5 (1g, 10w), def_2 (2g, 10w), def_3 (2g, 10w), def_4 (2g, 10w), def_1 (3g, 10w), grid_1 (4g, 60w), lr_annealing (6g, 10w)]}, {DeepLearning : [def_1 (3g, 10w), grid_1 (4g, 30w), grid_2 (5g, 30w), grid_3 (5g, 30w)]}, {completion : [resume_best_grids (10g, 60w)]}, {StackedEnsemble : [best_of_family_1 (1g, 5w), best_of_family_2 (2g, 5w), best_of_family_3 (3g, 5w), best_of_family_4 (4g, 5w), best_of_family_5 (5g, 5w), all_2 (2g, 10w), all_3 (3g, 10w), all_4 (4g, 10w), all_5 (5g, 10w), monotonic (6g, 10w), best_of_family_xgboost (6g, 10w), best_of_family_gbm (6g, 10w), all_xgboost (7g, 10w), all_gbm (7g, 10w), best_of_family_xglm (8g, 10w), all_xglm (8g, 10w), best_of_family (10g, 10w), best_N (10g, 10w)]}]
#> 15:56:42.814: Disabling Algo: StackedEnsemble as requested by the user.
#> 15:56:42.814: Disabling Algo: DeepLearning as requested by the user.
#> 15:56:42.814: Defined work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}]
#> 15:56:42.814: Actual work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}]
#> 15:56:42.814: AutoML job created: 2022.01.27 15:56:42.810
#> 15:56:42.816: AutoML build started: 2022.01.27 15:56:42.816
#> 15:56:42.817: Time assigned for XGBoost_1_AutoML_2_20220127_155642: 10.0s
#> 15:56:42.817: AutoML: starting XGBoost_1_AutoML_2_20220127_155642 model training
#> 15:56:42.836: XGBoost_1_AutoML_2_20220127_155642 [XGBoost def_2] started
#> 15:56:43.894: XGBoost_1_AutoML_2_20220127_155642 [XGBoost def_2] complete
#> 15:56:43.894: Adding model XGBoost_1_AutoML_2_20220127_155642 to leaderboard Leaderboard_AutoML_2_20220127_155642@@tag. Training time: model=0s, total=0s
#> 15:56:43.896: New leader: XGBoost_1_AutoML_2_20220127_155642, mean_per_class_error: 0.4946035431716546
#> 15:56:43.897: Time assigned for GLM_1_AutoML_2_20220127_155642: 14.4595s
#> 15:56:43.899: AutoML: starting GLM_1_AutoML_2_20220127_155642 model training
#> 15:56:43.918: GLM_1_AutoML_2_20220127_155642 [GLM def_1] started
#> 15:56:50.131: GLM_1_AutoML_2_20220127_155642 [GLM def_1] complete
#> 15:56:50.131: Adding model GLM_1_AutoML_2_20220127_155642 to leaderboard Leaderboard_AutoML_2_20220127_155642@@tag. Training time: model=1s, total=5s
#> 15:56:50.133: New leader: GLM_1_AutoML_2_20220127_155642, mean_per_class_error: 0.47423245614035087
#> 15:56:50.133: Time assigned for GBM_1_AutoML_2_20220127_155642: 22.683s
#> 15:56:50.135: AutoML: starting GBM_1_AutoML_2_20220127_155642 model training
#> 15:56:50.136: GBM_1_AutoML_2_20220127_155642 [GBM def_5] started
#> 15:56:51.208: GBM_1_AutoML_2_20220127_155642 [GBM def_5] complete
#> 15:56:51.208: Adding model GBM_1_AutoML_2_20220127_155642 to leaderboard Leaderboard_AutoML_2_20220127_155642@@tag. Training time: model=0s, total=1s
#> 15:56:51.209: Skipping StackedEnsemble 'best_of_family_1' due to the exclude_algos option or it is already trained.
#> 15:56:51.210: AutoML: hit the max_models limit; skipping XGBoost def_1
#> 15:56:51.210: AutoML: hit the max_models limit; skipping DRF def_1
#> 15:56:51.210: AutoML: hit the max_models limit; skipping GBM def_2
#> 15:56:51.210: AutoML: hit the max_models limit; skipping GBM def_3
#> 15:56:51.210: AutoML: hit the max_models limit; skipping GBM def_4
#> 15:56:51.210: Skipping StackedEnsemble 'best_of_family_2' due to the exclude_algos option or it is already trained.
#> 15:56:51.210: Skipping StackedEnsemble 'all_2' due to the exclude_algos option or it is already trained.
#> 15:56:51.210: AutoML: hit the max_models limit; skipping XGBoost def_3
#> 15:56:51.210: AutoML: hit the max_models limit; skipping DRF XRT (Extremely Randomized Trees)
#> 15:56:51.210: AutoML: hit the max_models limit; skipping GBM def_1
#> 15:56:51.210: AutoML: hit the max_models limit; skipping DeepLearning def_1
#> 15:56:51.210: Skipping StackedEnsemble 'best_of_family_3' due to the exclude_algos option or it is already trained.
#> 15:56:51.210: Skipping StackedEnsemble 'all_3' due to the exclude_algos option or it is already trained.
#> 15:56:51.211: AutoML: hit the max_models limit; skipping XGBoost grid_1
#> 15:56:51.211: AutoML: hit the max_models limit; skipping GBM grid_1
#> 15:56:51.211: AutoML: hit the max_models limit; skipping DeepLearning grid_1
#> 15:56:51.211: Skipping StackedEnsemble 'best_of_family_4' due to the exclude_algos option or it is already trained.
#> 15:56:51.211: Skipping StackedEnsemble 'all_4' due to the exclude_algos option or it is already trained.
#> 15:56:51.211: AutoML: hit the max_models limit; skipping DeepLearning grid_2
#> 15:56:51.211: AutoML: hit the max_models limit; skipping DeepLearning grid_3
#> 15:56:51.211: Skipping StackedEnsemble 'best_of_family_5' due to the exclude_algos option or it is already trained.
#> 15:56:51.211: Skipping StackedEnsemble 'all_5' due to the exclude_algos option or it is already trained.
#> 15:56:51.212: AutoML: hit the max_models limit; skipping XGBoost lr_search
#> 15:56:51.212: AutoML: hit the max_models limit; skipping GBM lr_annealing
#> 15:56:51.212: Skipping StackedEnsemble 'monotonic' due to the exclude_algos option or it is already trained.
#> 15:56:51.212: Skipping StackedEnsemble 'best_of_family_xgboost' due to the exclude_algos option or it is already trained.
#> 15:56:51.212: Skipping StackedEnsemble 'best_of_family_gbm' due to the exclude_algos option or it is already trained.
#> 15:56:51.212: Skipping StackedEnsemble 'all_xgboost' due to the exclude_algos option or it is already trained.
#> 15:56:51.213: Skipping StackedEnsemble 'all_gbm' due to the exclude_algos option or it is already trained.
#> 15:56:51.213: Skipping StackedEnsemble 'best_of_family_xglm' due to the exclude_algos option or it is already trained.
#> 15:56:51.213: Skipping StackedEnsemble 'all_xglm' due to the exclude_algos option or it is already trained.
#> 15:56:51.213: AutoML: hit the max_models limit; skipping completion resume_best_grids
#> 15:56:51.213: Skipping StackedEnsemble 'best_of_family' due to the exclude_algos option or it is already trained.
#> 15:56:51.214: Skipping StackedEnsemble 'best_N' due to the exclude_algos option or it is already trained.
#> 15:56:51.214: Actual modeling steps: [{XGBoost : [def_2 (1g, 10w)]}, {GLM : [def_1 (1g, 10w)]}, {GBM : [def_5 (1g, 10w)]}]
#> 15:56:51.214: AutoML build stopped: 2022.01.27 15:56:51.214
#> 15:56:51.214: AutoML build done: built 3 models
#> 15:56:51.214: AutoML duration:  8.398 sec
#> 15:56:51.222: Verifying training frame immutability. . .
#> 15:56:51.222: Training frame was not mutated (as expected).
#> - EUREKA: Succesfully generated 3 models
#>                             model_id mean_per_class_error   logloss      rmse
#> 1     GLM_1_AutoML_2_20220127_155642            0.4742325 0.8170807 0.5409388
#> 2 XGBoost_1_AutoML_2_20220127_155642            0.4946035 0.8255072 0.5392879
#> 3     GBM_1_AutoML_2_20220127_155642            0.5037168 0.8620436 0.5616772
#>         mse
#> 1 0.2926147
#> 2 0.2908315
#> 3 0.3154812
#> SELECTED MODEL: GLM_1_AutoML_2_20220127_155642
#> - NOTE: The following variables were the least important: Sex.male, Sex.female, Parch
#> >>> Running predictions for Pclass...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> Model (1/3): GLM_1_AutoML_2_20220127_155642
#> Independent Variable: Pclass
#> Type: Classification (3 classes)
#> Algorithm: GLM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.76337
#>    ACC = 0.64179
#> 
#> Most important variables:
#>    Embarked.Q (25.3%)
#>    Embarked.C (13.5%)
#>    Embarked.S (13.3%)
#>    Age (11.9%)
#>    Survived.FALSE (10.6%)
#> Process duration: 16s

# Regression: Continuous Values
r <- h2o_automl(dft, y = "Fare", ignore = c("Pclass"), exclude_algos = NULL, quiet = TRUE)
#> 
#> 15:56:58.813: Project: AutoML_3_20220127_155658
#> 15:56:58.813: Setting stopping tolerance adaptively based on the training frame: 0.04052204492365539
#> 15:56:58.813: Build control seed: 0
#> 15:56:58.814: training frame: Frame key: AutoML_3_20220127_155658_training_train_sid_93b3_174    cols: 8    rows: 609  chunks: 1    size: 9258  checksum: 6553451394746990112
#> 15:56:58.814: validation frame: NULL
#> 15:56:58.814: leaderboard frame: NULL
#> 15:56:58.814: blending frame: NULL
#> 15:56:58.814: response column: tag
#> 15:56:58.814: fold column: null
#> 15:56:58.814: weights column: null
#> 15:56:58.814: Loading execution steps: [{XGBoost : [def_2 (1g, 10w), def_1 (2g, 10w), def_3 (3g, 10w), grid_1 (4g, 90w), lr_search (6g, 30w)]}, {GLM : [def_1 (1g, 10w)]}, {DRF : [def_1 (2g, 10w), XRT (3g, 10w)]}, {GBM : [def_5 (1g, 10w), def_2 (2g, 10w), def_3 (2g, 10w), def_4 (2g, 10w), def_1 (3g, 10w), grid_1 (4g, 60w), lr_annealing (6g, 10w)]}, {DeepLearning : [def_1 (3g, 10w), grid_1 (4g, 30w), grid_2 (5g, 30w), grid_3 (5g, 30w)]}, {completion : [resume_best_grids (10g, 60w)]}, {StackedEnsemble : [best_of_family_1 (1g, 5w), best_of_family_2 (2g, 5w), best_of_family_3 (3g, 5w), best_of_family_4 (4g, 5w), best_of_family_5 (5g, 5w), all_2 (2g, 10w), all_3 (3g, 10w), all_4 (4g, 10w), all_5 (5g, 10w), monotonic (6g, 10w), best_of_family_xgboost (6g, 10w), best_of_family_gbm (6g, 10w), all_xgboost (7g, 10w), all_gbm (7g, 10w), best_of_family_xglm (8g, 10w), all_xglm (8g, 10w), best_of_family (10g, 10w), best_N (10g, 10w)]}]
#> 15:56:58.815: Defined work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{best_of_family_1, StackedEnsemble, ModelBuild, group=1, weight=5}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{best_of_family_2, StackedEnsemble, ModelBuild, group=2, weight=5}, Work{all_2, StackedEnsemble, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{def_1, DeepLearning, ModelBuild, group=3, weight=10}, Work{best_of_family_3, StackedEnsemble, ModelBuild, group=3, weight=5}, Work{all_3, StackedEnsemble, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{grid_1, DeepLearning, HyperparamSearch, group=4, weight=30}, Work{best_of_family_4, StackedEnsemble, ModelBuild, group=4, weight=5}, Work{all_4, StackedEnsemble, ModelBuild, group=4, weight=10}, Work{grid_2, DeepLearning, HyperparamSearch, group=5, weight=30}, Work{grid_3, DeepLearning, HyperparamSearch, group=5, weight=30}, Work{best_of_family_5, StackedEnsemble, ModelBuild, group=5, weight=5}, Work{all_5, StackedEnsemble, ModelBuild, group=5, weight=10}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{monotonic, StackedEnsemble, ModelBuild, group=6, weight=10}, Work{best_of_family_xgboost, StackedEnsemble, ModelBuild, group=6, weight=10}, Work{best_of_family_gbm, StackedEnsemble, ModelBuild, group=6, weight=10}, Work{all_xgboost, StackedEnsemble, ModelBuild, group=7, weight=10}, Work{all_gbm, StackedEnsemble, ModelBuild, group=7, weight=10}, Work{best_of_family_xglm, StackedEnsemble, ModelBuild, group=8, weight=10}, Work{all_xglm, StackedEnsemble, ModelBuild, group=8, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}, Work{best_of_family, StackedEnsemble, ModelBuild, group=10, weight=10}, Work{best_N, StackedEnsemble, ModelBuild, group=10, weight=10}]
#> 15:56:58.815: Actual work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{best_of_family_1, StackedEnsemble, ModelBuild, group=1, weight=5}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{best_of_family_2, StackedEnsemble, ModelBuild, group=2, weight=5}, Work{all_2, StackedEnsemble, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{def_1, DeepLearning, ModelBuild, group=3, weight=10}, Work{best_of_family_3, StackedEnsemble, ModelBuild, group=3, weight=5}, Work{all_3, StackedEnsemble, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{grid_1, DeepLearning, HyperparamSearch, group=4, weight=30}, Work{best_of_family_4, StackedEnsemble, ModelBuild, group=4, weight=5}, Work{all_4, StackedEnsemble, ModelBuild, group=4, weight=10}, Work{grid_2, DeepLearning, HyperparamSearch, group=5, weight=30}, Work{grid_3, DeepLearning, HyperparamSearch, group=5, weight=30}, Work{best_of_family_5, StackedEnsemble, ModelBuild, group=5, weight=5}, Work{all_5, StackedEnsemble, ModelBuild, group=5, weight=10}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{monotonic, StackedEnsemble, ModelBuild, group=6, weight=10}, Work{best_of_family_xgboost, StackedEnsemble, ModelBuild, group=6, weight=10}, Work{best_of_family_gbm, StackedEnsemble, ModelBuild, group=6, weight=10}, Work{all_xgboost, StackedEnsemble, ModelBuild, group=7, weight=10}, Work{all_gbm, StackedEnsemble, ModelBuild, group=7, weight=10}, Work{best_of_family_xglm, StackedEnsemble, ModelBuild, group=8, weight=10}, Work{all_xglm, StackedEnsemble, ModelBuild, group=8, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}, Work{best_of_family, StackedEnsemble, ModelBuild, group=10, weight=10}, Work{best_N, StackedEnsemble, ModelBuild, group=10, weight=10}]
#> 15:56:58.815: AutoML job created: 2022.01.27 15:56:58.813
#> 15:56:58.816: AutoML build started: 2022.01.27 15:56:58.816
#> 15:56:58.816: Time assigned for XGBoost_1_AutoML_3_20220127_155658: 171.428578125s
#> 15:56:58.816: AutoML: starting XGBoost_1_AutoML_3_20220127_155658 model training
#> 15:56:58.821: XGBoost_1_AutoML_3_20220127_155658 [XGBoost def_2] started
#> 15:56:59.824: XGBoost_1_AutoML_3_20220127_155658 [XGBoost def_2] complete
#> 15:56:59.824: Adding model XGBoost_1_AutoML_3_20220127_155658 to leaderboard Leaderboard_AutoML_3_20220127_155658@@tag. Training time: model=0s, total=0s
#> 15:56:59.825: New leader: XGBoost_1_AutoML_3_20220127_155658, mean_residual_deviance: 830.4367206836267
#> 15:56:59.826: Time assigned for GLM_1_AutoML_3_20220127_155658: 239.596s
#> 15:56:59.826: AutoML: starting GLM_1_AutoML_3_20220127_155658 model training
#> 15:56:59.827: GLM_1_AutoML_3_20220127_155658 [GLM def_1] started
#> 15:57:00.953: GLM_1_AutoML_3_20220127_155658 [GLM def_1] complete
#> 15:57:00.953: Adding model GLM_1_AutoML_3_20220127_155658 to leaderboard Leaderboard_AutoML_3_20220127_155658@@tag. Training time: model=0s, total=0s
#> 15:57:00.955: New leader: GLM_1_AutoML_3_20220127_155658, mean_residual_deviance: 739.2163401351919
#> 15:57:00.955: Time assigned for GBM_1_AutoML_3_20220127_155658: 398.574s
#> 15:57:00.955: AutoML: starting GBM_1_AutoML_3_20220127_155658 model training
#> 15:57:00.976: GBM_1_AutoML_3_20220127_155658 [GBM def_5] started
#> 15:57:03.0: GBM_1_AutoML_3_20220127_155658 [GBM def_5] complete
#> 15:57:03.0: Adding model GBM_1_AutoML_3_20220127_155658 to leaderboard Leaderboard_AutoML_3_20220127_155658@@tag. Training time: model=0s, total=1s
#> 15:57:03.8: Time assigned for StackedEnsemble_BestOfFamily_1_AutoML_3_20220127_155658: 595.808s
#> 15:57:03.10: AutoML: starting StackedEnsemble_BestOfFamily_1_AutoML_3_20220127_155658 model training
#> 15:57:03.25: StackedEnsemble_BestOfFamily_1_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_1 (built with AUTO metalearner, using top model from each algorithm type)] started
#> 15:57:04.109: StackedEnsemble_BestOfFamily_1_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_1 (built with AUTO metalearner, using top model from each algorithm type)] complete
#> 15:57:04.109: Adding model StackedEnsemble_BestOfFamily_1_AutoML_3_20220127_155658 to leaderboard Leaderboard_AutoML_3_20220127_155658@@tag. Training time: model=0s, total=0s
#> 15:57:04.111: New leader: StackedEnsemble_BestOfFamily_1_AutoML_3_20220127_155658, mean_residual_deviance: 730.9754634709968
#> 15:57:04.111: AutoML: hit the max_models limit; skipping XGBoost def_1
#> 15:57:04.111: AutoML: hit the max_models limit; skipping DRF def_1
#> 15:57:04.111: AutoML: hit the max_models limit; skipping GBM def_2
#> 15:57:04.111: AutoML: hit the max_models limit; skipping GBM def_3
#> 15:57:04.111: AutoML: hit the max_models limit; skipping GBM def_4
#> 15:57:04.111: AutoML: hit the max_models limit; skipping XGBoost def_3
#> 15:57:04.111: AutoML: hit the max_models limit; skipping DRF XRT (Extremely Randomized Trees)
#> 15:57:04.111: AutoML: hit the max_models limit; skipping GBM def_1
#> 15:57:04.111: AutoML: hit the max_models limit; skipping DeepLearning def_1
#> 15:57:04.112: AutoML: hit the max_models limit; skipping XGBoost grid_1
#> 15:57:04.112: AutoML: hit the max_models limit; skipping GBM grid_1
#> 15:57:04.112: AutoML: hit the max_models limit; skipping DeepLearning grid_1
#> 15:57:04.112: AutoML: hit the max_models limit; skipping DeepLearning grid_2
#> 15:57:04.112: AutoML: hit the max_models limit; skipping DeepLearning grid_3
#> 15:57:04.113: AutoML: hit the max_models limit; skipping XGBoost lr_search
#> 15:57:04.113: AutoML: hit the max_models limit; skipping GBM lr_annealing
#> 15:57:04.113: No base models, due to timeouts or the exclude_algos option. Skipping StackedEnsemble 'monotonic'.
#> 15:57:04.113: Time assigned for StackedEnsemble_BestOfFamily_2_AutoML_3_20220127_155658: 99.117171875s
#> 15:57:04.114: AutoML: starting StackedEnsemble_BestOfFamily_2_AutoML_3_20220127_155658 model training
#> 15:57:04.119: StackedEnsemble_BestOfFamily_2_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_xgboost (built with xgboost metalearner, using top model from each algorithm type)] started
#> 15:57:09.711: StackedEnsemble_BestOfFamily_2_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_xgboost (built with xgboost metalearner, using top model from each algorithm type)] complete
#> 15:57:09.711: Adding model StackedEnsemble_BestOfFamily_2_AutoML_3_20220127_155658 to leaderboard Leaderboard_AutoML_3_20220127_155658@@tag. Training time: model=4s, total=4s
#> 15:57:09.715: Time assigned for StackedEnsemble_BestOfFamily_3_AutoML_3_20220127_155658: 117.820203125s
#> 15:57:09.715: AutoML: starting StackedEnsemble_BestOfFamily_3_AutoML_3_20220127_155658 model training
#> 15:57:09.725: StackedEnsemble_BestOfFamily_3_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_gbm (built with gbm metalearner, using top model from each algorithm type)] started
#> 15:57:13.966: StackedEnsemble_BestOfFamily_3_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_gbm (built with gbm metalearner, using top model from each algorithm type)] complete
#> 15:57:13.967: Adding model StackedEnsemble_BestOfFamily_3_AutoML_3_20220127_155658 to leaderboard Leaderboard_AutoML_3_20220127_155658@@tag. Training time: model=4s, total=4s
#> 15:57:13.970: Time assigned for StackedEnsemble_BestOfFamily_4_AutoML_3_20220127_155658: 292.4235s
#> 15:57:13.970: AutoML: starting StackedEnsemble_BestOfFamily_4_AutoML_3_20220127_155658 model training
#> 15:57:13.972: StackedEnsemble_BestOfFamily_4_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_xglm (built with AUTO metalearner, using top model from each algorithm type)] started
#> 15:57:15.81: StackedEnsemble_BestOfFamily_4_AutoML_3_20220127_155658 [StackedEnsemble best_of_family_xglm (built with AUTO metalearner, using top model from each algorithm type)] complete
#> 15:57:15.81: Adding model StackedEnsemble_BestOfFamily_4_AutoML_3_20220127_155658 to leaderboard Leaderboard_AutoML_3_20220127_155658@@tag. Training time: model=0s, total=0s
#> 15:57:15.83: AutoML: hit the max_models limit; skipping completion resume_best_grids
#> 15:57:15.84: Actual modeling steps: [{XGBoost : [def_2 (1g, 10w)]}, {GLM : [def_1 (1g, 10w)]}, {GBM : [def_5 (1g, 10w)]}, {StackedEnsemble : [best_of_family_1 (1g, 5w), best_of_family_xgboost (6g, 10w), best_of_family_gbm (6g, 10w), best_of_family_xglm (8g, 10w)]}]
#> 15:57:15.84: AutoML build stopped: 2022.01.27 15:57:15.84
#> 15:57:15.84: AutoML build done: built 3 models
#> 15:57:15.84: AutoML duration: 16.268 sec
#> 15:57:15.91: Verifying training frame immutability. . .
#> 15:57:15.91: Training frame was not mutated (as expected).
print(r)
#> Model (1/7): StackedEnsemble_BestOfFamily_1_AutoML_3_20220127_155658
#> Independent Variable: Fare
#> Type: Regression
#> Algorithm: STACKEDENSEMBLE
#> Split: 70% training data (of 871 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    rmse = 20.17
#>    mae = 14.079
#>    mape = 0.068862
#>    mse = 406.82
#>    rsq = 0.367
#>    rsqa = 0.3645
#> 
#> 

# WITH PRE-DEFINED TRAIN/TEST DATAFRAMES
splits <- msplit(dft, size = 0.8)
#> train_size  test_size 
#>        712        179 
splits$train$split <- "train"
splits$test$split <- "test"
df <- rbind(splits$train, splits$test)
r <- h2o_automl(df, "Survived", max_models = 1, train_test = "split")
#> 2022-01-27 15:57:27 | Started process...
#> - INDEPENDENT VARIABLE: Survived
#> - MODEL TYPE: Classification
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100  
#> - MISSINGS: The following variables contain missing observations: Age (19.87%). Consider using the impute parameter.
#> - CATEGORICALS: There are 4 non-numerical features. Consider using ohse() or equivalent prior to encode categorical variables.
#> 
#>  test train 
#>   179   712 
#> - REPEATED: There were 77 repeated rows which are being suppressed from the train dataset
#> - ALGORITHMS: excluded 'StackedEnsemble', 'DeepLearning'
#> - CACHE: Previous models are not being erased. You may use 'start_clean' [clear] or 'project_name' [join]
#> - UI: You may check results using H2O Flow's interactive platform: http://localhost:54321/flow/index.html
#> >>> Iterating until 1 models or 600 seconds...
#> 
#> 15:57:27.747: Project: AutoML_4_20220127_155727
#> 15:57:27.747: Setting stopping tolerance adaptively based on the training frame: 0.03747658444979307
#> 15:57:27.747: Build control seed: 0
#> 15:57:27.748: training frame: Frame key: AutoML_4_20220127_155727_training_train_sid_80c2_272    cols: 9    rows: 712  chunks: 1    size: 10886  checksum: 2760773517365953337
#> 15:57:27.748: validation frame: NULL
#> 15:57:27.748: leaderboard frame: NULL
#> 15:57:27.748: blending frame: NULL
#> 15:57:27.748: response column: tag
#> 15:57:27.748: fold column: null
#> 15:57:27.748: weights column: null
#> 15:57:27.748: Loading execution steps: [{XGBoost : [def_2 (1g, 10w), def_1 (2g, 10w), def_3 (3g, 10w), grid_1 (4g, 90w), lr_search (6g, 30w)]}, {GLM : [def_1 (1g, 10w)]}, {DRF : [def_1 (2g, 10w), XRT (3g, 10w)]}, {GBM : [def_5 (1g, 10w), def_2 (2g, 10w), def_3 (2g, 10w), def_4 (2g, 10w), def_1 (3g, 10w), grid_1 (4g, 60w), lr_annealing (6g, 10w)]}, {DeepLearning : [def_1 (3g, 10w), grid_1 (4g, 30w), grid_2 (5g, 30w), grid_3 (5g, 30w)]}, {completion : [resume_best_grids (10g, 60w)]}, {StackedEnsemble : [best_of_family_1 (1g, 5w), best_of_family_2 (2g, 5w), best_of_family_3 (3g, 5w), best_of_family_4 (4g, 5w), best_of_family_5 (5g, 5w), all_2 (2g, 10w), all_3 (3g, 10w), all_4 (4g, 10w), all_5 (5g, 10w), monotonic (6g, 10w), best_of_family_xgboost (6g, 10w), best_of_family_gbm (6g, 10w), all_xgboost (7g, 10w), all_gbm (7g, 10w), best_of_family_xglm (8g, 10w), all_xglm (8g, 10w), best_of_family (10g, 10w), best_N (10g, 10w)]}]
#> 15:57:27.749: Disabling Algo: StackedEnsemble as requested by the user.
#> 15:57:27.749: Disabling Algo: DeepLearning as requested by the user.
#> 15:57:27.749: Defined work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}]
#> 15:57:27.749: Actual work allocations: [Work{def_2, XGBoost, ModelBuild, group=1, weight=10}, Work{def_1, GLM, ModelBuild, group=1, weight=10}, Work{def_5, GBM, ModelBuild, group=1, weight=10}, Work{def_1, XGBoost, ModelBuild, group=2, weight=10}, Work{def_1, DRF, ModelBuild, group=2, weight=10}, Work{def_2, GBM, ModelBuild, group=2, weight=10}, Work{def_3, GBM, ModelBuild, group=2, weight=10}, Work{def_4, GBM, ModelBuild, group=2, weight=10}, Work{def_3, XGBoost, ModelBuild, group=3, weight=10}, Work{XRT, DRF, ModelBuild, group=3, weight=10}, Work{def_1, GBM, ModelBuild, group=3, weight=10}, Work{grid_1, XGBoost, HyperparamSearch, group=4, weight=90}, Work{grid_1, GBM, HyperparamSearch, group=4, weight=60}, Work{lr_search, XGBoost, Selection, group=6, weight=30}, Work{lr_annealing, GBM, Selection, group=6, weight=10}, Work{resume_best_grids, virtual, Dynamic, group=10, weight=60}]
#> 15:57:27.750: AutoML job created: 2022.01.27 15:57:27.747
#> 15:57:27.759: AutoML build started: 2022.01.27 15:57:27.759
#> 15:57:27.759: Time assigned for XGBoost_1_AutoML_4_20220127_155727: 200.0s
#> 15:57:27.760: AutoML: starting XGBoost_1_AutoML_4_20220127_155727 model training
#> 15:57:27.767: _train param, Dropping bad and constant columns: [train_test]
#> 15:57:27.769: XGBoost_1_AutoML_4_20220127_155727 [XGBoost def_2] started
#> 15:57:29.987: XGBoost_1_AutoML_4_20220127_155727 [XGBoost def_2] complete
#> 15:57:29.987: Adding model XGBoost_1_AutoML_4_20220127_155727 to leaderboard Leaderboard_AutoML_4_20220127_155727@@tag. Training time: model=0s, total=1s
#> 15:57:29.989: New leader: XGBoost_1_AutoML_4_20220127_155727, auc: 0.8450812059141
#> 15:57:29.989: AutoML: hit the max_models limit; skipping GLM def_1
#> 15:57:29.989: AutoML: hit the max_models limit; skipping GBM def_5
#> 15:57:29.989: Skipping StackedEnsemble 'best_of_family_1' due to the exclude_algos option or it is already trained.
#> 15:57:29.989: AutoML: hit the max_models limit; skipping XGBoost def_1
#> 15:57:29.989: AutoML: hit the max_models limit; skipping DRF def_1
#> 15:57:29.989: AutoML: hit the max_models limit; skipping GBM def_2
#> 15:57:29.989: AutoML: hit the max_models limit; skipping GBM def_3
#> 15:57:29.989: AutoML: hit the max_models limit; skipping GBM def_4
#> 15:57:29.989: Skipping StackedEnsemble 'best_of_family_2' due to the exclude_algos option or it is already trained.
#> 15:57:29.989: Skipping StackedEnsemble 'all_2' due to the exclude_algos option or it is already trained.
#> 15:57:29.990: AutoML: hit the max_models limit; skipping XGBoost def_3
#> 15:57:29.990: AutoML: hit the max_models limit; skipping DRF XRT (Extremely Randomized Trees)
#> 15:57:29.990: AutoML: hit the max_models limit; skipping GBM def_1
#> 15:57:29.990: AutoML: hit the max_models limit; skipping DeepLearning def_1
#> 15:57:29.990: Skipping StackedEnsemble 'best_of_family_3' due to the exclude_algos option or it is already trained.
#> 15:57:29.990: Skipping StackedEnsemble 'all_3' due to the exclude_algos option or it is already trained.
#> 15:57:29.991: AutoML: hit the max_models limit; skipping XGBoost grid_1
#> 15:57:29.991: AutoML: hit the max_models limit; skipping GBM grid_1
#> 15:57:29.991: AutoML: hit the max_models limit; skipping DeepLearning grid_1
#> 15:57:29.991: Skipping StackedEnsemble 'best_of_family_4' due to the exclude_algos option or it is already trained.
#> 15:57:29.991: Skipping StackedEnsemble 'all_4' due to the exclude_algos option or it is already trained.
#> 15:57:29.991: AutoML: hit the max_models limit; skipping DeepLearning grid_2
#> 15:57:29.991: AutoML: hit the max_models limit; skipping DeepLearning grid_3
#> 15:57:29.992: Skipping StackedEnsemble 'best_of_family_5' due to the exclude_algos option or it is already trained.
#> 15:57:29.992: Skipping StackedEnsemble 'all_5' due to the exclude_algos option or it is already trained.
#> 15:57:29.992: AutoML: hit the max_models limit; skipping XGBoost lr_search
#> 15:57:29.992: AutoML: hit the max_models limit; skipping GBM lr_annealing
#> 15:57:29.992: Skipping StackedEnsemble 'monotonic' due to the exclude_algos option or it is already trained.
#> 15:57:29.992: Skipping StackedEnsemble 'best_of_family_xgboost' due to the exclude_algos option or it is already trained.
#> 15:57:29.992: Skipping StackedEnsemble 'best_of_family_gbm' due to the exclude_algos option or it is already trained.
#> 15:57:29.993: Skipping StackedEnsemble 'all_xgboost' due to the exclude_algos option or it is already trained.
#> 15:57:29.993: Skipping StackedEnsemble 'all_gbm' due to the exclude_algos option or it is already trained.
#> 15:57:29.993: Skipping StackedEnsemble 'best_of_family_xglm' due to the exclude_algos option or it is already trained.
#> 15:57:29.993: Skipping StackedEnsemble 'all_xglm' due to the exclude_algos option or it is already trained.
#> 15:57:29.993: AutoML: hit the max_models limit; skipping completion resume_best_grids
#> 15:57:29.993: Skipping StackedEnsemble 'best_of_family' due to the exclude_algos option or it is already trained.
#> 15:57:29.994: Skipping StackedEnsemble 'best_N' due to the exclude_algos option or it is already trained.
#> 15:57:29.994: Actual modeling steps: [{XGBoost : [def_2 (1g, 10w)]}]
#> 15:57:29.994: AutoML build stopped: 2022.01.27 15:57:29.994
#> 15:57:29.994: AutoML build done: built 1 models
#> 15:57:29.994: AutoML duration:  2.235 sec
#> 15:57:29.999: Verifying training frame immutability. . .
#> 15:57:29.999: Training frame was not mutated (as expected).
#> - EUREKA: Succesfully generated 1 models
#>                             model_id       auc   logloss     aucpr
#> 1 XGBoost_1_AutoML_4_20220127_155727 0.8450812 0.4500589 0.8188965
#>   mean_per_class_error      rmse       mse
#> 1            0.2150513 0.3784227 0.1432037
#> SELECTED MODEL: XGBoost_1_AutoML_4_20220127_155727
#> - NOTE: The following variables were the least important: Embarked.S, SibSp, Embarked.C, Pclass.2
#> >>> Running predictions for Survived...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> Target value: FALSE
#> >>> Generating plots...
#> Model (1/1): XGBoost_1_AutoML_4_20220127_155727
#> Independent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: XGBOOST
#> Split: 80% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.87174
#>    ACC = 0.84358
#>    PRC = 0.89091
#>    TPR = 0.69014
#>    TNR = 0.94444
#> 
#> Most important variables:
#>    Sex.female (32.8%)
#>    Fare (21.8%)
#>    Age (11.7%)
#>    Pclass.3 (11.6%)
#>    Sex.male (9.2%)
#> Process duration: 13.9s
# }