.. _sec_tabularadvanced:

Predicting Columns in a Table - In Depth
========================================


**Tip**: If you are new to AutoGluon, review :ref:`sec_tabularquick`
to learn the basics of the AutoGluon API.

This tutorial describes how you can exert greater control when using
AutoGluon's ``fit()`` by specifying the appropriate arguments. Using the
same census data table as :ref:`sec_tabularquick`, we will try to
predict the ``occupation`` of an individual - a multi-class
classification problem.

Start by importing AutoGluon, specifying TabularPrediction as the task,
and loading the data.

.. code:: python

    import autogluon as ag
    from autogluon import TabularPrediction as task
    
    train_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
    train_data = train_data.head(500) # subsample 500 data points for faster demo (comment this out to run on full dataset instead)
    print(train_data.head())
    
    val_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
    
    label_column = 'occupation'
    print("Summary of occupation column: \n", train_data['occupation'].describe())


.. parsed-literal::
    :class: output

    Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv | Columns = 15 / 15 | Rows = 39073 -> 39073
    Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769


.. parsed-literal::
    :class: output

       age   workclass  fnlwgt   education  education-num       marital-status  \
    0   25     Private  178478   Bachelors             13        Never-married   
    1   23   State-gov   61743     5th-6th              3        Never-married   
    2   46     Private  376789     HS-grad              9        Never-married   
    3   55           ?  200235     HS-grad              9   Married-civ-spouse   
    4   36     Private  224541     7th-8th              4   Married-civ-spouse   
    
               occupation    relationship    race      sex  capital-gain  \
    0        Tech-support       Own-child   White   Female             0   
    1    Transport-moving   Not-in-family   White     Male             0   
    2       Other-service   Not-in-family   White     Male             0   
    3                   ?         Husband   White     Male             0   
    4   Handlers-cleaners         Husband   White     Male             0   
    
       capital-loss  hours-per-week  native-country   class  
    0             0              40   United-States   <=50K  
    1             0              35   United-States   <=50K  
    2             0              15   United-States   <=50K  
    3             0              50   United-States    >50K  
    4             0              40     El-Salvador   <=50K  
    Summary of occupation column: 
     count                  500
    unique                  14
    top        Exec-managerial
    freq                    69
    Name: occupation, dtype: object


To demonstrate how you can provide your own validation dataset against
which AutoGluon tunes hyperparameters, we'll use the test dataset from
the previous tutorial as validation data.

If you don't have a strong reason to provide your own validation
dataset, we recommend you omit the ``tuning_data`` argument. This lets
AutoGluon automatically select validation data from your provided
training set (it uses smart strategies such as stratified sampling). For
greater control, you can specify the ``holdout_frac`` argument to tell
AutoGluon what fraction of the provided training data to hold out for
validation.

**Caution:** Since AutoGluon tunes internal knobs based on this
validation data, performance estimates reported on this data may be
over-optimistic. For unbiased performance estimates, you should always
call ``predict()`` on a separate dataset (that was never passed to
``fit()``), as we did in the previous **Quick-Start** tutorial. We also
emphasize that most options specified in this tutorial are chosen to
minimize runtime for the purposes of demonstration and you should select
more reasonable values in order to obtain high-quality models.

``fit()`` trains neural networks and various types of tree ensembles by
default. You can specify various hyperparameter values for each type of
model. For each hyperparameter, you can either specify a single fixed
value, or a search space of values to consider during the hyperparameter
optimization. Hyperparameters which you do not specify are left at
default settings chosen automatically by AutoGluon, which may be fixed
values or search spaces.

.. code:: python

    hp_tune = True  # whether or not to do hyperparameter optimization
    
    nn_options = { # specifies non-default hyperparameter values for neural network models
        'num_epochs': 10, # number of training epochs (controls training time of NN models)
        'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True), # learning rate used in training (real-valued hyperparameter searched on log-scale)
        'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'), # activation function used in NN (categorical hyperparameter, default = first entry)
        'layers': ag.space.Categorical([100],[1000],[200,100],[300,200,100]),
          # Each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use
        'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1), # dropout probability (real-valued hyperparameter)
    }
    
    gbm_options = { # specifies non-default hyperparameter values for lightGBM gradient boosted trees
        'num_boost_round': 100, # number of boosting rounds (controls training time of GBM models)
        'num_leaves': ag.space.Int(lower=26, upper=66, default=36), # number of leaves in trees (integer hyperparameter)
    }
    
    hyperparameters = {'NN': nn_options, 'GBM': gbm_options}  # hyperparameters of each model type
    # If one of these keys is missing from hyperparameters dict, then no models of that type are trained.
    
    time_limits = 2*60  # train various models for ~2 min
    num_trials = 5  # try at most 3 different hyperparameter configurations for each type of model
    search_strategy = 'skopt'  # to tune hyperparameters using SKopt Bayesian optimization routine
    output_directory = 'agModels-predictOccupation'  # folder where to store trained models
    
    predictor = task.fit(train_data=train_data, tuning_data=val_data, label=label_column,
                         output_directory=output_directory, time_limits=time_limits, num_trials=num_trials,
                         hyperparameter_tune=hp_tune, hyperparameters=hyperparameters,
                         search_strategy=search_strategy)


.. parsed-literal::
    :class: output

    Warning: `hyperparameter_tune=True` is currently experimental and may cause the process to hang. Setting `auto_stack=True` instead is recommended to achieve maximum quality models.
    Beginning AutoGluon training ... Time limit = 120s
    AutoGluon will save models to agModels-predictOccupation/
    AutoGluon Version:  0.0.12b20200713
    Train Data Rows:    500
    Train Data Columns: 15
    Tuning Data Rows:    9769
    Tuning Data Columns: 15
    Preprocessing data ...
    Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
    AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
    If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    
    Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
    Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
    Train Data Class Count: 13
    Feature Generator processed 10223 data points with 14 features
    Original Features (raw dtypes):
    	int64 features: 6
    	object features: 8
    Original Features (inferred dtypes):
    	int features: 6
    	object features: 8
    Generated Features (special dtypes):
    Final Features (raw dtypes):
    	int features: 6
    	category features: 8
    Final Features:
    	int features: 6
    	category features: 8
    	Data preprocessing and feature engineering runtime = 0.11s ...
    AutoGluon will gauge predictive performance using evaluation metric: accuracy
    To change this, specify the eval_metric argument of fit()
    AutoGluon will early stop models using evaluation metric: accuracy
    scheduler_options: Key 'training_history_callback_delta_secs': Imputing default value 60
    scheduler_options: Key 'delay_get_config': Imputing default value True
    
    Starting Experiments
    Num of Finished Tasks is 0
    Num of Pending Tasks is 5


.. parsed-literal::
    :class: output

    HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))


.. parsed-literal::
    :class: output

    Time out (secs) is 54.0


.. parsed-literal::
    :class: output

    
.. parsed-literal::
    :class: output

    	0.2866	 = Validation accuracy score
    	6.68s	 = Training runtime
    	0.04s	 = Validation runtime
    	0.2738	 = Validation accuracy score
    	14.1s	 = Training runtime
    	0.04s	 = Validation runtime
    	0.2919	 = Validation accuracy score
    	5.93s	 = Training runtime
    	0.06s	 = Validation runtime
    	0.2849	 = Validation accuracy score
    	9.18s	 = Training runtime
    	0.26s	 = Validation runtime
    	0.2984	 = Validation accuracy score
    	6.09s	 = Training runtime
    	0.06s	 = Validation runtime
    scheduler_options: Key 'training_history_callback_delta_secs': Imputing default value 60
    scheduler_options: Key 'delay_get_config': Imputing default value True
    
    Starting Experiments
    Num of Finished Tasks is 0
    Num of Pending Tasks is 5


.. parsed-literal::
    :class: output

    HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))


.. parsed-literal::
    :class: output

    Time out (secs) is 54.0


.. parsed-literal::
    :class: output

    
.. parsed-literal::
    :class: output

    Please either provide filename or allow plot in get_training_curves
    	0.1223	 = Validation accuracy score
    	9.6s	 = Training runtime
    	0.84s	 = Validation runtime
    	0.0994	 = Validation accuracy score
    	9.92s	 = Training runtime
    	0.86s	 = Validation runtime
    	0.1344	 = Validation accuracy score
    	9.49s	 = Training runtime
    	0.84s	 = Validation runtime
    	0.2432	 = Validation accuracy score
    	9.94s	 = Training runtime
    	0.78s	 = Validation runtime
    	0.2817	 = Validation accuracy score
    	9.76s	 = Training runtime
    	0.78s	 = Validation runtime
    Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 119.89s of the 15.45s of remaining time.
    	0.3102	 = Validation accuracy score
    	2.58s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 107.17s ...


.. figure:: output_tabular-indepth_108df8_3_9.png


We again demonstrate how to use the trained models to predict on the
validation data (We caution again that performance estimates here are
biased because the same data was used to tune hyperparameters).

.. code:: python

    test_data = val_data.copy()
    y_test = test_data[label_column]
    test_data = test_data.drop(labels=[label_column],axis=1)  # delete label column
    
    y_pred = predictor.predict(test_data)
    print("Predictions:  ", list(y_pred)[:5])
    perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=False)


.. parsed-literal::
    :class: output

    Evaluation: accuracy on test data: 0.3087317023236769


.. parsed-literal::
    :class: output

    Predictions:   [' Other-service', ' ?', ' Exec-managerial', ' Sales', ' Other-service']


Use the following to view a summary of what happened during fit. This
command will shows details of the hyperparameter-tuning process for each
type of model:

.. code:: python

    results = predictor.fit_summary()


.. parsed-literal::
    :class: output

    *** Summary of fit() ***
    Estimated performance of each model:
                              model  score_val  pred_time_val   fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer
    0       weighted_ensemble_k0_l1   0.310160       3.446678  74.166373                0.003475           2.577527            1       True
    1    LightGBMClassifier/trial_4   0.298437       0.059784   6.093583                0.059784           6.093583            0       True
    2    LightGBMClassifier/trial_2   0.291855       0.058525   5.929050                0.058525           5.929050            0       True
    3    LightGBMClassifier/trial_0   0.286610       0.044747   6.675452                0.044747           6.675452            0       True
    4    LightGBMClassifier/trial_3   0.284862       0.259758   9.181610                0.259758           9.181610            0       True
    5   NeuralNetClassifier/trial_9   0.281674       0.775697   9.759481                0.775697           9.759481            0       True
    6    LightGBMClassifier/trial_1   0.273756       0.039599  14.100856                0.039599          14.100856            0       True
    7   NeuralNetClassifier/trial_8   0.243213       0.779406   9.939281                0.779406           9.939281            0       True
    8   NeuralNetClassifier/trial_7   0.134410       0.844418   9.492824                0.844418           9.492824            0       True
    9   NeuralNetClassifier/trial_5   0.122275       0.841027   9.598320                0.841027           9.598320            0       True
    10  NeuralNetClassifier/trial_6   0.099445       0.858345   9.923985                0.858345           9.923985            0       True
    Number of models trained: 11
    Types of models trained:
    {'LGBModel', 'TabularNeuralNetModel', 'WeightedEnsembleModel'}
    Bagging used: False 
    Stack-ensembling used: False 
    Hyperparameter-tuning used: True 
    User-specified hyperparameters:
    {'default': {'NN': [{'num_epochs': 10, 'learning_rate': Real: lower=0.0001, upper=0.01, 'activation': Categorical['relu', 'softrelu', 'tanh'], 'layers': Categorical[[100], [1000], [200, 100], [300, 200, 100]], 'dropout_prob': Real: lower=0.0, upper=0.5}], 'GBM': [{'num_boost_round': 100, 'num_leaves': Int: lower=26, upper=66}]}}
    Plot summary of models saved to file: agModels-predictOccupation/SummaryOfModels.html
    Plot summary of models saved to file: agModels-predictOccupation/LightGBMClassifier_HPOmodelsummary.html
    Plot summary of models saved to file: LightGBMClassifier_HPOmodelsummary.html
    Plot of HPO performance saved to file: agModels-predictOccupation/LightGBMClassifier_HPOperformanceVStrials.png


.. figure:: output_tabular-indepth_108df8_7_1.png


.. parsed-literal::
    :class: output

    Plot summary of models saved to file: agModels-predictOccupation/NeuralNetClassifier_HPOmodelsummary.html
    Plot summary of models saved to file: NeuralNetClassifier_HPOmodelsummary.html
    Plot of HPO performance saved to file: agModels-predictOccupation/NeuralNetClassifier_HPOperformanceVStrials.png


.. figure:: output_tabular-indepth_108df8_7_3.png


.. parsed-literal::
    :class: output

    *** Details of Hyperparameter optimization ***
    HPO for LightGBMClassifier model:  Num. configurations tried = 5, Time spent = 43.257811307907104, Search strategy = skopt
    Best hyperparameter-configuration (validation-performance: accuracy = -0.7163718634306869):
    {'feature_fraction': 0.879600073116919, 'learning_rate': 0.015098990505197188, 'min_data_in_leaf': 14, 'num_leaves': 60}
    HPO for NeuralNetClassifier model:  Num. configurations tried = 5, Time spent = 54.27782368659973, Search strategy = skopt
    Best hyperparameter-configuration (validation-performance: accuracy = 0.2816742081447964):
    {'activation.choice': 2, 'dropout_prob': 0.30984082610744046, 'embedding_size_factor': 0.9844052303126255, 'layers.choice': 3, 'learning_rate': 0.0009630384401533714, 'network_type.choice': 1, 'use_batchnorm.choice': 0, 'weight_decay': 4.663554649835668e-06}
    *** End of fit() summary ***


In the above example, the predictive performance may be poor because we
specified very little training to ensure quick runtimes. You can call
``fit()`` multiple times while modifying the above settings to better
understand how these choices affect performance outcomes. For example:
you can comment out the ``train_data.head`` command to train using a
larger dataset, increase the ``time_limits``, and increase the
``num_epochs`` and ``num_boost_round`` hyperparameters. To see more
detailed output during the execution of ``fit()``, you can also pass in
the argument: ``verbosity = 3``.

Specifying performance metrics
------------------------------

Performance in certain applications may be measured by different metrics
than the ones AutoGluon optimizes for by default. If you know the metric
that counts most in your application, you can specify it as done below
to utilize the balanced accuracy metric instead of standard accuracy
(the default):

.. code:: python

    metric = 'balanced_accuracy'
    predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric,
                         output_directory=output_directory, time_limits=60)
    
    performance = predictor.evaluate(val_data)


.. parsed-literal::
    :class: output

    Beginning AutoGluon training ... Time limit = 60s
    AutoGluon will save models to agModels-predictOccupation/
    AutoGluon Version:  0.0.12b20200713
    Train Data Rows:    500
    Train Data Columns: 15
    Preprocessing data ...
    Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
    AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
    If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    
    Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
    Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
    Train Data Class Count: 13
    Feature Generator processed 499 data points with 14 features
    Original Features (raw dtypes):
    	int64 features: 6
    	object features: 8
    Original Features (inferred dtypes):
    	int features: 6
    	object features: 8
    Generated Features (special dtypes):
    Final Features (raw dtypes):
    	int features: 6
    	category features: 8
    Final Features:
    	int features: 6
    	category features: 8
    	Data preprocessing and feature engineering runtime = 0.06s ...
    AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
    To change this, specify the eval_metric argument of fit()
    AutoGluon will early stop models using evaluation metric: balanced_accuracy
    Fitting model: RandomForestClassifierGini ... Training model for up to 59.94s of the 59.94s of remaining time.
    	0.256	 = Validation balanced_accuracy score
    	0.61s	 = Training runtime
    	0.11s	 = Validation runtime
    Fitting model: RandomForestClassifierEntr ... Training model for up to 59.19s of the 59.19s of remaining time.
    	0.2462	 = Validation balanced_accuracy score
    	0.61s	 = Training runtime
    	0.11s	 = Validation runtime
    Fitting model: ExtraTreesClassifierGini ... Training model for up to 58.45s of the 58.45s of remaining time.
    	0.2088	 = Validation balanced_accuracy score
    	0.5s	 = Training runtime
    	0.11s	 = Validation runtime
    Fitting model: ExtraTreesClassifierEntr ... Training model for up to 57.79s of the 57.79s of remaining time.
    	0.2069	 = Validation balanced_accuracy score
    	0.5s	 = Training runtime
    	0.11s	 = Validation runtime
    Fitting model: KNeighborsClassifierUnif ... Training model for up to 57.14s of the 57.14s of remaining time.
    	0.0902	 = Validation balanced_accuracy score
    	0.01s	 = Training runtime
    	0.11s	 = Validation runtime
    Fitting model: KNeighborsClassifierDist ... Training model for up to 57.02s of the 57.02s of remaining time.
    	0.1136	 = Validation balanced_accuracy score
    	0.01s	 = Training runtime
    	0.11s	 = Validation runtime
    Fitting model: LightGBMClassifier ... Training model for up to 56.91s of the 56.91s of remaining time.
    	0.2171	 = Validation balanced_accuracy score
    	7.08s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: CatboostClassifier ... Training model for up to 49.81s of the 49.81s of remaining time.
    	0.2852	 = Validation balanced_accuracy score
    	5.57s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: NeuralNetClassifier ... Training model for up to 44.22s of the 44.22s of remaining time.
    	0.1565	 = Validation balanced_accuracy score
    	3.4s	 = Training runtime
    	0.03s	 = Validation runtime
    Fitting model: LightGBMClassifierCustom ... Training model for up to 40.78s of the 40.78s of remaining time.
    	Ran out of time, early stopping on iteration 145. Best iteration is:
    	[133]	train_set's multi_logloss: 0.0993952	train_set's balanced_accuracy: 1	valid_set's multi_logloss: 2.73162	valid_set's balanced_accuracy: 0.195132
    	0.1951	 = Validation balanced_accuracy score
    	41.38s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.94s of the -1.99s of remaining time.
    	0.3144	 = Validation balanced_accuracy score
    	0.53s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 62.54s ...


.. parsed-literal::
    :class: output

    Predictive performance on given dataset: balanced_accuracy = 0.24542813882862827


Some other non-default metrics you might use include things like: ``f1``
(for binary classification), ``roc_auc`` (for binary classification),
``log_loss`` (for classification), ``mean_absolute_error`` (for
regression), ``median_absolute_error`` (for regression). You can also
define your own custom metric function, see examples in the folder:
``autogluon/utils/tabular/metrics/``

Model ensembling with stacking/bagging
--------------------------------------

Beyond hyperparameter-tuning with a correctly-specified evaluation
metric, two other methods to boost predictive performance are bagging
and stack-ensembling. You'll often see performance improve if you
specify ``num_bagging_folds`` = 5-10, ``stack_ensemble_levels`` = 1-3 in
the call to ``fit()``, but this will increase training times.

.. code:: python

    predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric,
                         num_bagging_folds=5, stack_ensemble_levels=1,
                         hyperparameters = {'NN':{'num_epochs':5}, 'GBM':{'num_boost_round':100}})


.. parsed-literal::
    :class: output

    No output_directory specified. Models will be saved in: AutogluonModels/ag-20200713_072410/
    Beginning AutoGluon training ...
    AutoGluon will save models to AutogluonModels/ag-20200713_072410/
    AutoGluon Version:  0.0.12b20200713
    Train Data Rows:    500
    Train Data Columns: 15
    Preprocessing data ...
    Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
    AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
    If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    
    Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
    Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
    Train Data Class Count: 13
    Feature Generator processed 499 data points with 14 features
    Original Features (raw dtypes):
    	int64 features: 6
    	object features: 8
    Original Features (inferred dtypes):
    	int features: 6
    	object features: 8
    Generated Features (special dtypes):
    Final Features (raw dtypes):
    	int features: 6
    	category features: 8
    Final Features:
    	int features: 6
    	category features: 8
    	Data preprocessing and feature engineering runtime = 0.07s ...
    AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
    To change this, specify the eval_metric argument of fit()
    AutoGluon will early stop models using evaluation metric: balanced_accuracy
    Fitting model: LightGBMClassifier_STACKER_l0 ...
    	0.2122	 = Validation balanced_accuracy score
    	21.93s	 = Training runtime
    	0.05s	 = Validation runtime
    Fitting model: NeuralNetClassifier_STACKER_l0 ...
    	0.0996	 = Validation balanced_accuracy score
    	2.98s	 = Training runtime
    	0.16s	 = Validation runtime
    Fitting model: weighted_ensemble_k0_l1 ...
    	0.2122	 = Validation balanced_accuracy score
    	0.18s	 = Training runtime
    	0.0s	 = Validation runtime
    Fitting model: LightGBMClassifier_STACKER_l1 ...
    	0.2154	 = Validation balanced_accuracy score
    	23.24s	 = Training runtime
    	0.07s	 = Validation runtime
    Fitting model: NeuralNetClassifier_STACKER_l1 ...
    	0.1014	 = Validation balanced_accuracy score
    	3.13s	 = Training runtime
    	0.2s	 = Validation runtime
    Fitting model: weighted_ensemble_k0_l2 ...
    	0.2154	 = Validation balanced_accuracy score
    	0.19s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 52.49s ...


You should not provide ``tuning_data`` when stacking/bagging, and
instead provide all your available data as ``train_data`` (which
AutoGluon will split in more intellgent ways). Rather than manually
searching for good bagging/stacking values yourself, AutoGluon will
automatically select good values for you if you specify ``auto_stack``
instead:

.. code:: python

    predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, auto_stack=True,
                         hyperparameters = {'NN':{'num_epochs':5}, 'GBM':{'num_boost_round':100}}, time_limits = 60) # last 2 arguments are just for quick demo, should be omitted


.. parsed-literal::
    :class: output

    No output_directory specified. Models will be saved in: AutogluonModels/ag-20200713_072503/
    Beginning AutoGluon training ... Time limit = 60s
    AutoGluon will save models to AutogluonModels/ag-20200713_072503/
    AutoGluon Version:  0.0.12b20200713
    Train Data Rows:    500
    Train Data Columns: 15
    Preprocessing data ...
    Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
    AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
    If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    
    Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
    Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
    Train Data Class Count: 13
    Feature Generator processed 499 data points with 14 features
    Original Features (raw dtypes):
    	int64 features: 6
    	object features: 8
    Original Features (inferred dtypes):
    	int features: 6
    	object features: 8
    Generated Features (special dtypes):
    Final Features (raw dtypes):
    	int features: 6
    	category features: 8
    Final Features:
    	int features: 6
    	category features: 8
    	Data preprocessing and feature engineering runtime = 0.07s ...
    AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
    To change this, specify the eval_metric argument of fit()
    AutoGluon will early stop models using evaluation metric: balanced_accuracy
    Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 59.93s of the 59.93s of remaining time.
    	0.2122	 = Validation balanced_accuracy score
    	21.93s	 = Training runtime
    	0.05s	 = Validation runtime
    Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 37.88s of the 37.88s of remaining time.
    	0.0825	 = Validation balanced_accuracy score
    	3.0s	 = Training runtime
    	0.16s	 = Validation runtime
    Repeating k-fold bagging: 2/20
    Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 34.69s of the 34.69s of remaining time.
    	0.2021	 = Validation balanced_accuracy score
    	43.82s	 = Training runtime
    	0.11s	 = Validation runtime
    Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 12.7s of the 12.7s of remaining time.
    	0.0816	 = Validation balanced_accuracy score
    	6.02s	 = Training runtime
    	0.3s	 = Validation runtime
    Completed 2/20 k-fold bagging repeats ...
    Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.93s of the 9.49s of remaining time.
    	0.2055	 = Validation balanced_accuracy score
    	0.18s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 50.7s ...


Getting predictions (inference-time options)
--------------------------------------------

Even if you've started a new Python session since last calling
``fit()``, you can still load a previously trained predictor from disk:

.. code:: python

    predictor = task.load(output_directory)

Here, ``output_directory`` is the same folder previously passed to
``fit()``, in which all the trained models have been saved. You can
train easily models on one machine and deploy them on another. Simply
copy the ``output_directory`` folder to the new machine and specify its
new path in ``task.load()``.

``predictor`` can make a prediction on an individual example rather than
a full dataset:

.. code:: python

    datapoint = test_data.iloc[[0]]  # Note: .iloc[0] won't work because it returns pandas Series instead of DataFrame
    print(datapoint)
    print(predictor.predict(datapoint))


.. parsed-literal::
    :class: output

       age workclass  fnlwgt education  education-num       marital-status  \
    0   31   Private  169085      11th              7   Married-civ-spouse   
    
      relationship    race      sex  capital-gain  capital-loss  hours-per-week  \
    0         Wife   White   Female             0             0              20   
    
       native-country   class  
    0   United-States   <=50K  
    [' Other-service']


To output predicted class probabilities instead of predicted classes,
you can use:

.. code:: python

    class_probs = predictor.predict_proba(datapoint)
    print(class_probs)


.. parsed-literal::
    :class: output

    [[0.02064052 0.14969262 0.06306401 0.08968029 0.02190279 0.04729904
      0.1106412  0.22255376 0.         0.08385979 0.02047405 0.09446129
      0.02806029 0.04767033]]


By default, ``predict()`` and ``predict_proba()`` will utilize the model
that AutoGluon thinks is most accurate, which is usually an ensemble of
many individual models. We can instead specify a particular model to use
for predictions (e.g. to reduce inference latency). Before deciding
which model to use, let's evaluate all of the models AutoGluon has
previously trained using our validation dataset:

.. code:: python

    results = predictor.leaderboard(val_data)


.. parsed-literal::
    :class: output

                             model  score_test  score_val  pred_time_test  pred_time_val   fit_time  pred_time_test_marginal  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer
    0      weighted_ensemble_k0_l1    0.245428   0.314354        1.873537       0.382244  17.706696                 0.011260                0.000920           0.533870            1       True
    1           CatboostClassifier    0.242090   0.285176        0.046611       0.011814   5.572967                 0.046611                0.011814           5.572967            0       True
    2   RandomForestClassifierGini    0.239836   0.256020        0.229170       0.110779   0.607253                 0.229170                0.110779           0.607253            0       True
    3   RandomForestClassifierEntr    0.235068   0.246162        0.229928       0.110748   0.608642                 0.229928                0.110748           0.608642            0       True
    4     ExtraTreesClassifierEntr    0.232380   0.206911        0.241727       0.110790   0.502666                 0.241727                0.110790           0.502666            0       True
    5     ExtraTreesClassifierGini    0.232062   0.208832        0.255994       0.110586   0.501962                 0.255994                0.110586           0.501962            0       True
    6           LightGBMClassifier    0.196499   0.217144        0.030356       0.010706   7.080877                 0.030356                0.010706           7.080877            0       True
    7     LightGBMClassifierCustom    0.178989   0.195132        0.543602       0.015779  41.375606                 0.543602                0.015779          41.375606            0       True
    8          NeuralNetClassifier    0.159724   0.156478        1.204566       0.028949   3.401347                 1.204566                0.028949           3.401347            0       True
    9     KNeighborsClassifierUnif    0.073979   0.090152        0.109848       0.108286   0.007716                 0.109848                0.108286           0.007716            0       True
    10    KNeighborsClassifierDist    0.071566   0.113624        0.111167       0.108525   0.007238                 0.111167                0.108525           0.007238            0       True


Here's how to specify a particular model to use for prediction instead
of AutoGluon's default model-choice:

.. code:: python

    i = 0  # index of model to use
    model_to_use = predictor.model_names[i]
    model_pred = predictor.predict(datapoint, model=model_to_use)
    print("Prediction from %s model: %s" % (model_to_use, model_pred))


.. parsed-literal::
    :class: output

    WARNING: `predictor.model_names` is a deprecated `predictor` variable. Use `predictor.get_model_names()` instead. Use of `predictor.model_names` will result in an exception starting in autogluon==0.1


.. parsed-literal::
    :class: output

    Prediction from RandomForestClassifierGini model: [' Other-service']


The ``predictor`` also remembers what metric predictions should be
evaluated with, which can be done with ground truth labels as follows:

::

    y_pred = predictor.predict(test_data)
    predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)

However, you must be careful here as certain metrics require predicted
probabilities rather than classes. Since the label columns remains in
the ``val_data`` DataFrame, we can instead use the shorthand:

::

    predictor.evaluate(val_data)

which will correctly select between ``predict()`` or ``predict_proba()``
depending on the evaluation metric.

Maximizing predictive performance
---------------------------------

To get the best predictive accuracy with AutoGluon, you should generally
use it like this:

.. code:: python

    long_time = 60 # for quick demonstration only, you should set this to longest time you are willing to wait
    predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, auto_stack=True, time_limits=long_time)


.. parsed-literal::
    :class: output

    No output_directory specified. Models will be saved in: AutogluonModels/ag-20200713_072558/
    Beginning AutoGluon training ... Time limit = 60s
    AutoGluon will save models to AutogluonModels/ag-20200713_072558/
    AutoGluon Version:  0.0.12b20200713
    Train Data Rows:    500
    Train Data Columns: 15
    Preprocessing data ...
    Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
    AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
    If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    
    Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
    Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
    Train Data Class Count: 13
    Feature Generator processed 499 data points with 14 features
    Original Features (raw dtypes):
    	int64 features: 6
    	object features: 8
    Original Features (inferred dtypes):
    	int features: 6
    	object features: 8
    Generated Features (special dtypes):
    Final Features (raw dtypes):
    	int features: 6
    	category features: 8
    Final Features:
    	int features: 6
    	category features: 8
    	Data preprocessing and feature engineering runtime = 0.06s ...
    AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
    To change this, specify the eval_metric argument of fit()
    AutoGluon will early stop models using evaluation metric: balanced_accuracy
    Fitting model: RandomForestClassifierGini_STACKER_l0 ... Training model for up to 59.94s of the 59.94s of remaining time.
    	0.2257	 = Validation balanced_accuracy score
    	3.04s	 = Training runtime
    	0.55s	 = Validation runtime
    Fitting model: RandomForestClassifierEntr_STACKER_l0 ... Training model for up to 56.21s of the 56.21s of remaining time.
    	0.2115	 = Validation balanced_accuracy score
    	3.05s	 = Training runtime
    	0.55s	 = Validation runtime
    Fitting model: ExtraTreesClassifierGini_STACKER_l0 ... Training model for up to 52.48s of the 52.48s of remaining time.
    	0.2214	 = Validation balanced_accuracy score
    	2.52s	 = Training runtime
    	0.55s	 = Validation runtime
    Fitting model: ExtraTreesClassifierEntr_STACKER_l0 ... Training model for up to 49.22s of the 49.22s of remaining time.
    	0.212	 = Validation balanced_accuracy score
    	2.52s	 = Training runtime
    	0.55s	 = Validation runtime
    Fitting model: KNeighborsClassifierUnif_STACKER_l0 ... Training model for up to 45.96s of the 45.96s of remaining time.
    	0.0689	 = Validation balanced_accuracy score
    	0.05s	 = Training runtime
    	0.55s	 = Validation runtime
    Fitting model: KNeighborsClassifierDist_STACKER_l0 ... Training model for up to 45.36s of the 45.36s of remaining time.
    	0.0708	 = Validation balanced_accuracy score
    	0.05s	 = Training runtime
    	0.54s	 = Validation runtime
    Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 44.76s of the 44.76s of remaining time.
    	Ran out of time, early stopping on iteration 165. Best iteration is:
    	[25]	train_set's multi_logloss: 1.29038	train_set's balanced_accuracy: 0.625395	valid_set's multi_logloss: 2.27283	valid_set's balanced_accuracy: 0.220647
    	Ran out of time, early stopping on iteration 174. Best iteration is:
    	[82]	train_set's multi_logloss: 0.478952	train_set's balanced_accuracy: 0.985162	valid_set's multi_logloss: 2.56851	valid_set's balanced_accuracy: 0.227756
    	Ran out of time, early stopping on iteration 185. Best iteration is:
    	[37]	train_set's multi_logloss: 1.00849	train_set's balanced_accuracy: 0.789105	valid_set's multi_logloss: 2.37467	valid_set's balanced_accuracy: 0.175384
    	0.2122	 = Validation balanced_accuracy score
    	37.74s	 = Training runtime
    	0.05s	 = Validation runtime
    Fitting model: CatboostClassifier_STACKER_l0 ... Training model for up to 6.9s of the 6.9s of remaining time.
    	0.2677	 = Validation balanced_accuracy score
    	6.52s	 = Training runtime
    	0.04s	 = Validation runtime
    Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 0.32s of the 0.32s of remaining time.
    	Ran out of time, stopping training early.
    	Time limit exceeded... Skipping NeuralNetClassifier_STACKER_l0.
    Completed 1/20 k-fold bagging repeats ...
    Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.94s of the -0.01s of remaining time.
    	0.2692	 = Validation balanced_accuracy score
    	0.58s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 60.6s ...


This command implements the following strategy to maximize accuracy:

-  Specify the ``auto_stack`` argument, which allows AutoGluon to
   automatically construct model ensembles based on multi-layer stack
   ensembling with repeated bagging, and will greatly improve the
   resulting predictions if granted sufficient training time.

-  Provide the ``eval_metric`` if you know what metric will be used to
   evaluate predictions in your application (e.g. ``roc_auc``,
   ``log_loss``, ``mean_absolute_error``, etc.)

-  Include all your data in ``train_data`` and do not provide
   ``tuning_data`` (AutoGluon will split the data more intelligently to
   fit its needs).

-  Do not specify the ``hyperparameter_tune`` argument
   (counterintuitively, hyperparameter tuning is not the best way to
   spend a limited training time budgets, as model ensembling is often
   superior). We recommend you only use ``hyperparameter_tune`` if your
   goal is to deploy a single model rather than an ensemble.

-  Do not specify ``hyperparameters`` argument (allow AutoGluon to
   adaptively select which models/hyperparameters to use).

-  Set ``time_limits`` to the longest amount of time (in seconds) that
   you are willing to wait. AutoGluon's predictive performance improves
   the longer ``fit()`` is allowed to run.