.. _sec_tabularadvanced: Predicting Columns in a Table - In Depth ======================================== **Tip**: If you are new to AutoGluon, review :ref:`sec_tabularquick` to learn the basics of the AutoGluon API. This tutorial describes how you can exert greater control when using AutoGluon's ``fit()`` by specifying the appropriate arguments. Using the same census data table as :ref:`sec_tabularquick`, we will try to predict the ``occupation`` of an individual - a multi-class classification problem. Start by importing AutoGluon, specifying TabularPrediction as the task, and loading the data. .. code:: python import autogluon as ag from autogluon import TabularPrediction as task train_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv') train_data = train_data.head(500) # subsample 500 data points for faster demo (comment this out to run on full dataset instead) print(train_data.head()) val_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv') label_column = 'occupation' print("Summary of occupation column: \n", train_data['occupation'].describe()) .. parsed-literal:: :class: output Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv | Columns = 15 / 15 | Rows = 39073 -> 39073 Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769 .. parsed-literal:: :class: output age workclass fnlwgt education education-num marital-status \ 0 25 Private 178478 Bachelors 13 Never-married 1 23 State-gov 61743 5th-6th 3 Never-married 2 46 Private 376789 HS-grad 9 Never-married 3 55 ? 200235 HS-grad 9 Married-civ-spouse 4 36 Private 224541 7th-8th 4 Married-civ-spouse occupation relationship race sex capital-gain \ 0 Tech-support Own-child White Female 0 1 Transport-moving Not-in-family White Male 0 2 Other-service Not-in-family White Male 0 3 ? Husband White Male 0 4 Handlers-cleaners Husband White Male 0 capital-loss hours-per-week native-country class 0 0 40 United-States <=50K 1 0 35 United-States <=50K 2 0 15 United-States <=50K 3 0 50 United-States >50K 4 0 40 El-Salvador <=50K Summary of occupation column: count 500 unique 14 top Exec-managerial freq 69 Name: occupation, dtype: object To demonstrate how you can provide your own validation dataset against which AutoGluon tunes hyperparameters, we'll use the test dataset from the previous tutorial as validation data. If you don't have a strong reason to provide your own validation dataset, we recommend you omit the ``tuning_data`` argument. This lets AutoGluon automatically select validation data from your provided training set (it uses smart strategies such as stratified sampling). For greater control, you can specify the ``holdout_frac`` argument to tell AutoGluon what fraction of the provided training data to hold out for validation. **Caution:** Since AutoGluon tunes internal knobs based on this validation data, performance estimates reported on this data may be over-optimistic. For unbiased performance estimates, you should always call ``predict()`` on a separate dataset (that was never passed to ``fit()``), as we did in the previous **Quick-Start** tutorial. We also emphasize that most options specified in this tutorial are chosen to minimize runtime for the purposes of demonstration and you should select more reasonable values in order to obtain high-quality models. ``fit()`` trains neural networks and various types of tree ensembles by default. You can specify various hyperparameter values for each type of model. For each hyperparameter, you can either specify a single fixed value, or a search space of values to consider during the hyperparameter optimization. Hyperparameters which you do not specify are left at default settings chosen automatically by AutoGluon, which may be fixed values or search spaces. .. code:: python hp_tune = True # whether or not to do hyperparameter optimization nn_options = { # specifies non-default hyperparameter values for neural network models 'num_epochs': 10, # number of training epochs (controls training time of NN models) 'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True), # learning rate used in training (real-valued hyperparameter searched on log-scale) 'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'), # activation function used in NN (categorical hyperparameter, default = first entry) 'layers': ag.space.Categorical([100],[1000],[200,100],[300,200,100]), # Each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use 'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1), # dropout probability (real-valued hyperparameter) } gbm_options = { # specifies non-default hyperparameter values for lightGBM gradient boosted trees 'num_boost_round': 100, # number of boosting rounds (controls training time of GBM models) 'num_leaves': ag.space.Int(lower=26, upper=66, default=36), # number of leaves in trees (integer hyperparameter) } hyperparameters = {'NN': nn_options, 'GBM': gbm_options} # hyperparameters of each model type # If one of these keys is missing from hyperparameters dict, then no models of that type are trained. time_limits = 2*60 # train various models for ~2 min num_trials = 5 # try at most 3 different hyperparameter configurations for each type of model search_strategy = 'skopt' # to tune hyperparameters using SKopt Bayesian optimization routine output_directory = 'agModels-predictOccupation' # folder where to store trained models predictor = task.fit(train_data=train_data, tuning_data=val_data, label=label_column, output_directory=output_directory, time_limits=time_limits, num_trials=num_trials, hyperparameter_tune=hp_tune, hyperparameters=hyperparameters, search_strategy=search_strategy) .. parsed-literal:: :class: output Warning: `hyperparameter_tune=True` is currently experimental and may cause the process to hang. Setting `auto_stack=True` instead is recommended to achieve maximum quality models. Beginning AutoGluon training ... Time limit = 120s AutoGluon will save models to agModels-predictOccupation/ AutoGluon Version: 0.0.12b20200713 Train Data Rows: 500 Train Data Columns: 15 Tuning Data Rows: 9769 Tuning Data Columns: 15 Preprocessing data ... Here are the first 10 unique label values in your data: [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty'] AutoGluon infers your prediction problem is: multiclass (because dtype of label-column == object). If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998 Train Data Class Count: 13 Feature Generator processed 10223 data points with 14 features Original Features (raw dtypes): int64 features: 6 object features: 8 Original Features (inferred dtypes): int features: 6 object features: 8 Generated Features (special dtypes): Final Features (raw dtypes): int features: 6 category features: 8 Final Features: int features: 6 category features: 8 Data preprocessing and feature engineering runtime = 0.11s ... AutoGluon will gauge predictive performance using evaluation metric: accuracy To change this, specify the eval_metric argument of fit() AutoGluon will early stop models using evaluation metric: accuracy scheduler_options: Key 'training_history_callback_delta_secs': Imputing default value 60 scheduler_options: Key 'delay_get_config': Imputing default value True Starting Experiments Num of Finished Tasks is 0 Num of Pending Tasks is 5 .. parsed-literal:: :class: output HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value=''))) .. parsed-literal:: :class: output Time out (secs) is 54.0 .. parsed-literal:: :class: output .. parsed-literal:: :class: output 0.2866 = Validation accuracy score 6.68s = Training runtime 0.04s = Validation runtime 0.2738 = Validation accuracy score 14.1s = Training runtime 0.04s = Validation runtime 0.2919 = Validation accuracy score 5.93s = Training runtime 0.06s = Validation runtime 0.2849 = Validation accuracy score 9.18s = Training runtime 0.26s = Validation runtime 0.2984 = Validation accuracy score 6.09s = Training runtime 0.06s = Validation runtime scheduler_options: Key 'training_history_callback_delta_secs': Imputing default value 60 scheduler_options: Key 'delay_get_config': Imputing default value True Starting Experiments Num of Finished Tasks is 0 Num of Pending Tasks is 5 .. parsed-literal:: :class: output HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value=''))) .. parsed-literal:: :class: output Time out (secs) is 54.0 .. parsed-literal:: :class: output .. parsed-literal:: :class: output Please either provide filename or allow plot in get_training_curves 0.1223 = Validation accuracy score 9.6s = Training runtime 0.84s = Validation runtime 0.0994 = Validation accuracy score 9.92s = Training runtime 0.86s = Validation runtime 0.1344 = Validation accuracy score 9.49s = Training runtime 0.84s = Validation runtime 0.2432 = Validation accuracy score 9.94s = Training runtime 0.78s = Validation runtime 0.2817 = Validation accuracy score 9.76s = Training runtime 0.78s = Validation runtime Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 119.89s of the 15.45s of remaining time. 0.3102 = Validation accuracy score 2.58s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 107.17s ... .. figure:: output_tabular-indepth_108df8_3_9.png We again demonstrate how to use the trained models to predict on the validation data (We caution again that performance estimates here are biased because the same data was used to tune hyperparameters). .. code:: python test_data = val_data.copy() y_test = test_data[label_column] test_data = test_data.drop(labels=[label_column],axis=1) # delete label column y_pred = predictor.predict(test_data) print("Predictions: ", list(y_pred)[:5]) perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=False) .. parsed-literal:: :class: output Evaluation: accuracy on test data: 0.3087317023236769 .. parsed-literal:: :class: output Predictions: [' Other-service', ' ?', ' Exec-managerial', ' Sales', ' Other-service'] Use the following to view a summary of what happened during fit. This command will shows details of the hyperparameter-tuning process for each type of model: .. code:: python results = predictor.fit_summary() .. parsed-literal:: :class: output *** Summary of fit() *** Estimated performance of each model: model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer 0 weighted_ensemble_k0_l1 0.310160 3.446678 74.166373 0.003475 2.577527 1 True 1 LightGBMClassifier/trial_4 0.298437 0.059784 6.093583 0.059784 6.093583 0 True 2 LightGBMClassifier/trial_2 0.291855 0.058525 5.929050 0.058525 5.929050 0 True 3 LightGBMClassifier/trial_0 0.286610 0.044747 6.675452 0.044747 6.675452 0 True 4 LightGBMClassifier/trial_3 0.284862 0.259758 9.181610 0.259758 9.181610 0 True 5 NeuralNetClassifier/trial_9 0.281674 0.775697 9.759481 0.775697 9.759481 0 True 6 LightGBMClassifier/trial_1 0.273756 0.039599 14.100856 0.039599 14.100856 0 True 7 NeuralNetClassifier/trial_8 0.243213 0.779406 9.939281 0.779406 9.939281 0 True 8 NeuralNetClassifier/trial_7 0.134410 0.844418 9.492824 0.844418 9.492824 0 True 9 NeuralNetClassifier/trial_5 0.122275 0.841027 9.598320 0.841027 9.598320 0 True 10 NeuralNetClassifier/trial_6 0.099445 0.858345 9.923985 0.858345 9.923985 0 True Number of models trained: 11 Types of models trained: {'LGBModel', 'TabularNeuralNetModel', 'WeightedEnsembleModel'} Bagging used: False Stack-ensembling used: False Hyperparameter-tuning used: True User-specified hyperparameters: {'default': {'NN': [{'num_epochs': 10, 'learning_rate': Real: lower=0.0001, upper=0.01, 'activation': Categorical['relu', 'softrelu', 'tanh'], 'layers': Categorical[[100], [1000], [200, 100], [300, 200, 100]], 'dropout_prob': Real: lower=0.0, upper=0.5}], 'GBM': [{'num_boost_round': 100, 'num_leaves': Int: lower=26, upper=66}]}} Plot summary of models saved to file: agModels-predictOccupation/SummaryOfModels.html Plot summary of models saved to file: agModels-predictOccupation/LightGBMClassifier_HPOmodelsummary.html Plot summary of models saved to file: LightGBMClassifier_HPOmodelsummary.html Plot of HPO performance saved to file: agModels-predictOccupation/LightGBMClassifier_HPOperformanceVStrials.png .. figure:: output_tabular-indepth_108df8_7_1.png .. parsed-literal:: :class: output Plot summary of models saved to file: agModels-predictOccupation/NeuralNetClassifier_HPOmodelsummary.html Plot summary of models saved to file: NeuralNetClassifier_HPOmodelsummary.html Plot of HPO performance saved to file: agModels-predictOccupation/NeuralNetClassifier_HPOperformanceVStrials.png .. figure:: output_tabular-indepth_108df8_7_3.png .. parsed-literal:: :class: output *** Details of Hyperparameter optimization *** HPO for LightGBMClassifier model: Num. configurations tried = 5, Time spent = 43.257811307907104, Search strategy = skopt Best hyperparameter-configuration (validation-performance: accuracy = -0.7163718634306869): {'feature_fraction': 0.879600073116919, 'learning_rate': 0.015098990505197188, 'min_data_in_leaf': 14, 'num_leaves': 60} HPO for NeuralNetClassifier model: Num. configurations tried = 5, Time spent = 54.27782368659973, Search strategy = skopt Best hyperparameter-configuration (validation-performance: accuracy = 0.2816742081447964): {'activation.choice': 2, 'dropout_prob': 0.30984082610744046, 'embedding_size_factor': 0.9844052303126255, 'layers.choice': 3, 'learning_rate': 0.0009630384401533714, 'network_type.choice': 1, 'use_batchnorm.choice': 0, 'weight_decay': 4.663554649835668e-06} *** End of fit() summary *** In the above example, the predictive performance may be poor because we specified very little training to ensure quick runtimes. You can call ``fit()`` multiple times while modifying the above settings to better understand how these choices affect performance outcomes. For example: you can comment out the ``train_data.head`` command to train using a larger dataset, increase the ``time_limits``, and increase the ``num_epochs`` and ``num_boost_round`` hyperparameters. To see more detailed output during the execution of ``fit()``, you can also pass in the argument: ``verbosity = 3``. Specifying performance metrics ------------------------------ Performance in certain applications may be measured by different metrics than the ones AutoGluon optimizes for by default. If you know the metric that counts most in your application, you can specify it as done below to utilize the balanced accuracy metric instead of standard accuracy (the default): .. code:: python metric = 'balanced_accuracy' predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, output_directory=output_directory, time_limits=60) performance = predictor.evaluate(val_data) .. parsed-literal:: :class: output Beginning AutoGluon training ... Time limit = 60s AutoGluon will save models to agModels-predictOccupation/ AutoGluon Version: 0.0.12b20200713 Train Data Rows: 500 Train Data Columns: 15 Preprocessing data ... Here are the first 10 unique label values in your data: [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty'] AutoGluon infers your prediction problem is: multiclass (because dtype of label-column == object). If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998 Train Data Class Count: 13 Feature Generator processed 499 data points with 14 features Original Features (raw dtypes): int64 features: 6 object features: 8 Original Features (inferred dtypes): int features: 6 object features: 8 Generated Features (special dtypes): Final Features (raw dtypes): int features: 6 category features: 8 Final Features: int features: 6 category features: 8 Data preprocessing and feature engineering runtime = 0.06s ... AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy To change this, specify the eval_metric argument of fit() AutoGluon will early stop models using evaluation metric: balanced_accuracy Fitting model: RandomForestClassifierGini ... Training model for up to 59.94s of the 59.94s of remaining time. 0.256 = Validation balanced_accuracy score 0.61s = Training runtime 0.11s = Validation runtime Fitting model: RandomForestClassifierEntr ... Training model for up to 59.19s of the 59.19s of remaining time. 0.2462 = Validation balanced_accuracy score 0.61s = Training runtime 0.11s = Validation runtime Fitting model: ExtraTreesClassifierGini ... Training model for up to 58.45s of the 58.45s of remaining time. 0.2088 = Validation balanced_accuracy score 0.5s = Training runtime 0.11s = Validation runtime Fitting model: ExtraTreesClassifierEntr ... Training model for up to 57.79s of the 57.79s of remaining time. 0.2069 = Validation balanced_accuracy score 0.5s = Training runtime 0.11s = Validation runtime Fitting model: KNeighborsClassifierUnif ... Training model for up to 57.14s of the 57.14s of remaining time. 0.0902 = Validation balanced_accuracy score 0.01s = Training runtime 0.11s = Validation runtime Fitting model: KNeighborsClassifierDist ... Training model for up to 57.02s of the 57.02s of remaining time. 0.1136 = Validation balanced_accuracy score 0.01s = Training runtime 0.11s = Validation runtime Fitting model: LightGBMClassifier ... Training model for up to 56.91s of the 56.91s of remaining time. 0.2171 = Validation balanced_accuracy score 7.08s = Training runtime 0.01s = Validation runtime Fitting model: CatboostClassifier ... Training model for up to 49.81s of the 49.81s of remaining time. 0.2852 = Validation balanced_accuracy score 5.57s = Training runtime 0.01s = Validation runtime Fitting model: NeuralNetClassifier ... Training model for up to 44.22s of the 44.22s of remaining time. 0.1565 = Validation balanced_accuracy score 3.4s = Training runtime 0.03s = Validation runtime Fitting model: LightGBMClassifierCustom ... Training model for up to 40.78s of the 40.78s of remaining time. Ran out of time, early stopping on iteration 145. Best iteration is: [133] train_set's multi_logloss: 0.0993952 train_set's balanced_accuracy: 1 valid_set's multi_logloss: 2.73162 valid_set's balanced_accuracy: 0.195132 0.1951 = Validation balanced_accuracy score 41.38s = Training runtime 0.02s = Validation runtime Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.94s of the -1.99s of remaining time. 0.3144 = Validation balanced_accuracy score 0.53s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 62.54s ... .. parsed-literal:: :class: output Predictive performance on given dataset: balanced_accuracy = 0.24542813882862827 Some other non-default metrics you might use include things like: ``f1`` (for binary classification), ``roc_auc`` (for binary classification), ``log_loss`` (for classification), ``mean_absolute_error`` (for regression), ``median_absolute_error`` (for regression). You can also define your own custom metric function, see examples in the folder: ``autogluon/utils/tabular/metrics/`` Model ensembling with stacking/bagging -------------------------------------- Beyond hyperparameter-tuning with a correctly-specified evaluation metric, two other methods to boost predictive performance are bagging and stack-ensembling. You'll often see performance improve if you specify ``num_bagging_folds`` = 5-10, ``stack_ensemble_levels`` = 1-3 in the call to ``fit()``, but this will increase training times. .. code:: python predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, num_bagging_folds=5, stack_ensemble_levels=1, hyperparameters = {'NN':{'num_epochs':5}, 'GBM':{'num_boost_round':100}}) .. parsed-literal:: :class: output No output_directory specified. Models will be saved in: AutogluonModels/ag-20200713_072410/ Beginning AutoGluon training ... AutoGluon will save models to AutogluonModels/ag-20200713_072410/ AutoGluon Version: 0.0.12b20200713 Train Data Rows: 500 Train Data Columns: 15 Preprocessing data ... Here are the first 10 unique label values in your data: [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty'] AutoGluon infers your prediction problem is: multiclass (because dtype of label-column == object). If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998 Train Data Class Count: 13 Feature Generator processed 499 data points with 14 features Original Features (raw dtypes): int64 features: 6 object features: 8 Original Features (inferred dtypes): int features: 6 object features: 8 Generated Features (special dtypes): Final Features (raw dtypes): int features: 6 category features: 8 Final Features: int features: 6 category features: 8 Data preprocessing and feature engineering runtime = 0.07s ... AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy To change this, specify the eval_metric argument of fit() AutoGluon will early stop models using evaluation metric: balanced_accuracy Fitting model: LightGBMClassifier_STACKER_l0 ... 0.2122 = Validation balanced_accuracy score 21.93s = Training runtime 0.05s = Validation runtime Fitting model: NeuralNetClassifier_STACKER_l0 ... 0.0996 = Validation balanced_accuracy score 2.98s = Training runtime 0.16s = Validation runtime Fitting model: weighted_ensemble_k0_l1 ... 0.2122 = Validation balanced_accuracy score 0.18s = Training runtime 0.0s = Validation runtime Fitting model: LightGBMClassifier_STACKER_l1 ... 0.2154 = Validation balanced_accuracy score 23.24s = Training runtime 0.07s = Validation runtime Fitting model: NeuralNetClassifier_STACKER_l1 ... 0.1014 = Validation balanced_accuracy score 3.13s = Training runtime 0.2s = Validation runtime Fitting model: weighted_ensemble_k0_l2 ... 0.2154 = Validation balanced_accuracy score 0.19s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 52.49s ... You should not provide ``tuning_data`` when stacking/bagging, and instead provide all your available data as ``train_data`` (which AutoGluon will split in more intellgent ways). Rather than manually searching for good bagging/stacking values yourself, AutoGluon will automatically select good values for you if you specify ``auto_stack`` instead: .. code:: python predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, auto_stack=True, hyperparameters = {'NN':{'num_epochs':5}, 'GBM':{'num_boost_round':100}}, time_limits = 60) # last 2 arguments are just for quick demo, should be omitted .. parsed-literal:: :class: output No output_directory specified. Models will be saved in: AutogluonModels/ag-20200713_072503/ Beginning AutoGluon training ... Time limit = 60s AutoGluon will save models to AutogluonModels/ag-20200713_072503/ AutoGluon Version: 0.0.12b20200713 Train Data Rows: 500 Train Data Columns: 15 Preprocessing data ... Here are the first 10 unique label values in your data: [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty'] AutoGluon infers your prediction problem is: multiclass (because dtype of label-column == object). If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998 Train Data Class Count: 13 Feature Generator processed 499 data points with 14 features Original Features (raw dtypes): int64 features: 6 object features: 8 Original Features (inferred dtypes): int features: 6 object features: 8 Generated Features (special dtypes): Final Features (raw dtypes): int features: 6 category features: 8 Final Features: int features: 6 category features: 8 Data preprocessing and feature engineering runtime = 0.07s ... AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy To change this, specify the eval_metric argument of fit() AutoGluon will early stop models using evaluation metric: balanced_accuracy Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 59.93s of the 59.93s of remaining time. 0.2122 = Validation balanced_accuracy score 21.93s = Training runtime 0.05s = Validation runtime Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 37.88s of the 37.88s of remaining time. 0.0825 = Validation balanced_accuracy score 3.0s = Training runtime 0.16s = Validation runtime Repeating k-fold bagging: 2/20 Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 34.69s of the 34.69s of remaining time. 0.2021 = Validation balanced_accuracy score 43.82s = Training runtime 0.11s = Validation runtime Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 12.7s of the 12.7s of remaining time. 0.0816 = Validation balanced_accuracy score 6.02s = Training runtime 0.3s = Validation runtime Completed 2/20 k-fold bagging repeats ... Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.93s of the 9.49s of remaining time. 0.2055 = Validation balanced_accuracy score 0.18s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 50.7s ... Getting predictions (inference-time options) -------------------------------------------- Even if you've started a new Python session since last calling ``fit()``, you can still load a previously trained predictor from disk: .. code:: python predictor = task.load(output_directory) Here, ``output_directory`` is the same folder previously passed to ``fit()``, in which all the trained models have been saved. You can train easily models on one machine and deploy them on another. Simply copy the ``output_directory`` folder to the new machine and specify its new path in ``task.load()``. ``predictor`` can make a prediction on an individual example rather than a full dataset: .. code:: python datapoint = test_data.iloc[[0]] # Note: .iloc[0] won't work because it returns pandas Series instead of DataFrame print(datapoint) print(predictor.predict(datapoint)) .. parsed-literal:: :class: output age workclass fnlwgt education education-num marital-status \ 0 31 Private 169085 11th 7 Married-civ-spouse relationship race sex capital-gain capital-loss hours-per-week \ 0 Wife White Female 0 0 20 native-country class 0 United-States <=50K [' Other-service'] To output predicted class probabilities instead of predicted classes, you can use: .. code:: python class_probs = predictor.predict_proba(datapoint) print(class_probs) .. parsed-literal:: :class: output [[0.02064052 0.14969262 0.06306401 0.08968029 0.02190279 0.04729904 0.1106412 0.22255376 0. 0.08385979 0.02047405 0.09446129 0.02806029 0.04767033]] By default, ``predict()`` and ``predict_proba()`` will utilize the model that AutoGluon thinks is most accurate, which is usually an ensemble of many individual models. We can instead specify a particular model to use for predictions (e.g. to reduce inference latency). Before deciding which model to use, let's evaluate all of the models AutoGluon has previously trained using our validation dataset: .. code:: python results = predictor.leaderboard(val_data) .. parsed-literal:: :class: output model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer 0 weighted_ensemble_k0_l1 0.245428 0.314354 1.873537 0.382244 17.706696 0.011260 0.000920 0.533870 1 True 1 CatboostClassifier 0.242090 0.285176 0.046611 0.011814 5.572967 0.046611 0.011814 5.572967 0 True 2 RandomForestClassifierGini 0.239836 0.256020 0.229170 0.110779 0.607253 0.229170 0.110779 0.607253 0 True 3 RandomForestClassifierEntr 0.235068 0.246162 0.229928 0.110748 0.608642 0.229928 0.110748 0.608642 0 True 4 ExtraTreesClassifierEntr 0.232380 0.206911 0.241727 0.110790 0.502666 0.241727 0.110790 0.502666 0 True 5 ExtraTreesClassifierGini 0.232062 0.208832 0.255994 0.110586 0.501962 0.255994 0.110586 0.501962 0 True 6 LightGBMClassifier 0.196499 0.217144 0.030356 0.010706 7.080877 0.030356 0.010706 7.080877 0 True 7 LightGBMClassifierCustom 0.178989 0.195132 0.543602 0.015779 41.375606 0.543602 0.015779 41.375606 0 True 8 NeuralNetClassifier 0.159724 0.156478 1.204566 0.028949 3.401347 1.204566 0.028949 3.401347 0 True 9 KNeighborsClassifierUnif 0.073979 0.090152 0.109848 0.108286 0.007716 0.109848 0.108286 0.007716 0 True 10 KNeighborsClassifierDist 0.071566 0.113624 0.111167 0.108525 0.007238 0.111167 0.108525 0.007238 0 True Here's how to specify a particular model to use for prediction instead of AutoGluon's default model-choice: .. code:: python i = 0 # index of model to use model_to_use = predictor.model_names[i] model_pred = predictor.predict(datapoint, model=model_to_use) print("Prediction from %s model: %s" % (model_to_use, model_pred)) .. parsed-literal:: :class: output WARNING: `predictor.model_names` is a deprecated `predictor` variable. Use `predictor.get_model_names()` instead. Use of `predictor.model_names` will result in an exception starting in autogluon==0.1 .. parsed-literal:: :class: output Prediction from RandomForestClassifierGini model: [' Other-service'] The ``predictor`` also remembers what metric predictions should be evaluated with, which can be done with ground truth labels as follows: :: y_pred = predictor.predict(test_data) predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True) However, you must be careful here as certain metrics require predicted probabilities rather than classes. Since the label columns remains in the ``val_data`` DataFrame, we can instead use the shorthand: :: predictor.evaluate(val_data) which will correctly select between ``predict()`` or ``predict_proba()`` depending on the evaluation metric. Maximizing predictive performance --------------------------------- To get the best predictive accuracy with AutoGluon, you should generally use it like this: .. code:: python long_time = 60 # for quick demonstration only, you should set this to longest time you are willing to wait predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, auto_stack=True, time_limits=long_time) .. parsed-literal:: :class: output No output_directory specified. Models will be saved in: AutogluonModels/ag-20200713_072558/ Beginning AutoGluon training ... Time limit = 60s AutoGluon will save models to AutogluonModels/ag-20200713_072558/ AutoGluon Version: 0.0.12b20200713 Train Data Rows: 500 Train Data Columns: 15 Preprocessing data ... Here are the first 10 unique label values in your data: [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty'] AutoGluon infers your prediction problem is: multiclass (because dtype of label-column == object). If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold. Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998 Train Data Class Count: 13 Feature Generator processed 499 data points with 14 features Original Features (raw dtypes): int64 features: 6 object features: 8 Original Features (inferred dtypes): int features: 6 object features: 8 Generated Features (special dtypes): Final Features (raw dtypes): int features: 6 category features: 8 Final Features: int features: 6 category features: 8 Data preprocessing and feature engineering runtime = 0.06s ... AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy To change this, specify the eval_metric argument of fit() AutoGluon will early stop models using evaluation metric: balanced_accuracy Fitting model: RandomForestClassifierGini_STACKER_l0 ... Training model for up to 59.94s of the 59.94s of remaining time. 0.2257 = Validation balanced_accuracy score 3.04s = Training runtime 0.55s = Validation runtime Fitting model: RandomForestClassifierEntr_STACKER_l0 ... Training model for up to 56.21s of the 56.21s of remaining time. 0.2115 = Validation balanced_accuracy score 3.05s = Training runtime 0.55s = Validation runtime Fitting model: ExtraTreesClassifierGini_STACKER_l0 ... Training model for up to 52.48s of the 52.48s of remaining time. 0.2214 = Validation balanced_accuracy score 2.52s = Training runtime 0.55s = Validation runtime Fitting model: ExtraTreesClassifierEntr_STACKER_l0 ... Training model for up to 49.22s of the 49.22s of remaining time. 0.212 = Validation balanced_accuracy score 2.52s = Training runtime 0.55s = Validation runtime Fitting model: KNeighborsClassifierUnif_STACKER_l0 ... Training model for up to 45.96s of the 45.96s of remaining time. 0.0689 = Validation balanced_accuracy score 0.05s = Training runtime 0.55s = Validation runtime Fitting model: KNeighborsClassifierDist_STACKER_l0 ... Training model for up to 45.36s of the 45.36s of remaining time. 0.0708 = Validation balanced_accuracy score 0.05s = Training runtime 0.54s = Validation runtime Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 44.76s of the 44.76s of remaining time. Ran out of time, early stopping on iteration 165. Best iteration is: [25] train_set's multi_logloss: 1.29038 train_set's balanced_accuracy: 0.625395 valid_set's multi_logloss: 2.27283 valid_set's balanced_accuracy: 0.220647 Ran out of time, early stopping on iteration 174. Best iteration is: [82] train_set's multi_logloss: 0.478952 train_set's balanced_accuracy: 0.985162 valid_set's multi_logloss: 2.56851 valid_set's balanced_accuracy: 0.227756 Ran out of time, early stopping on iteration 185. Best iteration is: [37] train_set's multi_logloss: 1.00849 train_set's balanced_accuracy: 0.789105 valid_set's multi_logloss: 2.37467 valid_set's balanced_accuracy: 0.175384 0.2122 = Validation balanced_accuracy score 37.74s = Training runtime 0.05s = Validation runtime Fitting model: CatboostClassifier_STACKER_l0 ... Training model for up to 6.9s of the 6.9s of remaining time. 0.2677 = Validation balanced_accuracy score 6.52s = Training runtime 0.04s = Validation runtime Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 0.32s of the 0.32s of remaining time. Ran out of time, stopping training early. Time limit exceeded... Skipping NeuralNetClassifier_STACKER_l0. Completed 1/20 k-fold bagging repeats ... Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.94s of the -0.01s of remaining time. 0.2692 = Validation balanced_accuracy score 0.58s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 60.6s ... This command implements the following strategy to maximize accuracy: - Specify the ``auto_stack`` argument, which allows AutoGluon to automatically construct model ensembles based on multi-layer stack ensembling with repeated bagging, and will greatly improve the resulting predictions if granted sufficient training time. - Provide the ``eval_metric`` if you know what metric will be used to evaluate predictions in your application (e.g. ``roc_auc``, ``log_loss``, ``mean_absolute_error``, etc.) - Include all your data in ``train_data`` and do not provide ``tuning_data`` (AutoGluon will split the data more intelligently to fit its needs). - Do not specify the ``hyperparameter_tune`` argument (counterintuitively, hyperparameter tuning is not the best way to spend a limited training time budgets, as model ensembling is often superior). We recommend you only use ``hyperparameter_tune`` if your goal is to deploy a single model rather than an ensemble. - Do not specify ``hyperparameters`` argument (allow AutoGluon to adaptively select which models/hyperparameters to use). - Set ``time_limits`` to the longest amount of time (in seconds) that you are willing to wait. AutoGluon's predictive performance improves the longer ``fit()`` is allowed to run.