Forecasting Time-Series - In Depth Tutorial

This more advanced tutorial describes how you can exert greater control over AutoGluon’s time-series modeling. As an example forecasting task, we again use the Covid-19 dataset previously described in the Forecasting Time-Series - Quick Start tutorial.

from autogluon.forecasting import ForecastingPredictor
from autogluon.forecasting import TabularDataset

train_data = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/train.csv")

save_path = "agModels-covidforecast"
eval_metric = "mean_wQuantileLoss"  # just for demonstration, this is already the default evaluation metric
/var/lib/jenkins/workspace/workspace/autogluon-forecasting-py3-v3/venv/lib/python3.7/site-packages/gluonts/json.py:46: UserWarning: Using json-module for json-handling. Consider installing one of orjson, ujson to speed up serialization and deserialization.
  "Using json-module for json-handling. "

Specifying hyperparameters and tuning them

While AutoGluon-Forecasting will automatically tune certain hyperparameters of time-series models depending on the presets setting, you can manually control the hyperparameter optimization (HPO) process. The presets argument of predictor.fit() will automatically determine particular search spaces to consider for certain hyperparameter values, as well as how many HPO trials to run when searching for the best value in the chosen hyperparameter search space. Instead of specifying presets, you can manually specify all of these items yourself. Below we demonstrate how to tune the `context_length <https://ts.gluon.ai/tutorials/forecasting/extended_tutorial.html>`__ hyperparameter for just the MQCNN and DeepAR models, which controls how much past history is conditioned upon in any one forecast prediction by a trained model.

import autogluon.core as ag
from gluonts.mx.distribution.neg_binomial import NegativeBinomialOutput

context_search_space = ag.Int(75, 100)  # integer values spanning the listed range
epochs = 2  # small value used for quick demo, omit this or use much larger value in real applications!
num_batches_per_epoch = 5  # small value used for quick demo, omit this or use larger value in real applications!
num_hpo_trials = 2  # small value used for quick demo, use much larger value in real applications!

mqcnn_params = {
    "context_length": context_search_space,
    "epochs": epochs,
    "num_batches_per_epoch": num_batches_per_epoch,
}
deepar_params = {
    "context_length": context_search_space,
    "epochs": epochs,
    "num_batches_per_epoch": num_batches_per_epoch,
    "distr_output": NegativeBinomialOutput(),
}

predictor = ForecastingPredictor(path=save_path, eval_metric=eval_metric).fit(
    train_data, prediction_length=19, quantiles=[0.1, 0.5, 0.9],
    index_column="name", target_column="ConfirmedCases", time_column="Date",
    hyperparameter_tune_kwargs={'scheduler': 'local', 'searcher': 'bayesopt', 'num_trials': num_hpo_trials},
    hyperparameters={"MQCNN": mqcnn_params, "DeepAR": deepar_params}
)
Training with dataset in tabular format...
Finish rebuilding the data, showing the top five rows.
           name  2020-01-22  2020-01-23  2020-01-24  2020-01-25  2020-01-26  0  Afghanistan_         0.0         0.0         0.0         0.0         0.0
1      Albania_         0.0         0.0         0.0         0.0         0.0
2      Algeria_         0.0         0.0         0.0         0.0         0.0
3      Andorra_         0.0         0.0         0.0         0.0         0.0
4       Angola_         0.0         0.0         0.0         0.0         0.0

   2020-01-27  2020-01-28  2020-01-29  2020-01-30  ...  2020-03-24  0         0.0         0.0         0.0         0.0  ...        74.0
1         0.0         0.0         0.0         0.0  ...       123.0
2         0.0         0.0         0.0         0.0  ...       264.0
3         0.0         0.0         0.0         0.0  ...       164.0
4         0.0         0.0         0.0         0.0  ...         3.0

   2020-03-25  2020-03-26  2020-03-27  2020-03-28  2020-03-29  2020-03-30  0        84.0        94.0       110.0       110.0       120.0       170.0
1       146.0       174.0       186.0       197.0       212.0       223.0
2       302.0       367.0       409.0       454.0       511.0       584.0
3       188.0       224.0       267.0       308.0       334.0       370.0
4         3.0         4.0         4.0         5.0         7.0         7.0

   2020-03-31  2020-04-01  2020-04-02
0       174.0       237.0       273.0
1       243.0       259.0       277.0
2       716.0       847.0       986.0
3       376.0       390.0       428.0
4         7.0         8.0         8.0

[5 rows x 73 columns]
Validation data is None, will do auto splitting...
Finished processing data, using 0.307506799697876s.
Random seed set to 0
All models will be trained for quantiles [0.1, 0.5, 0.9].
Beginning AutoGluon training ...
AutoGluon will save models to agModels-covidforecast/
Start hyperparameter tuning for MQCNN
  0%|          | 0/2 [00:00<?, ?it/s]
Training model MQCNN/trial_0...
Start model training
Epoch[0] Learning rate is 0.001

  0%|          | 0/5 [00:00<?, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 57784
100%|██████████| 5/5 [00:00<00:00, 51.67it/s, epoch=1/2, avg_epoch_loss=131]
Epoch[0] Elapsed time 0.100 seconds
Epoch[0] Evaluation metric 'epoch_loss'=130.531328

0it [00:00, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 57784
10it [00:00, 78.57it/s, epoch=1/2, validation_avg_epoch_loss=133]
Epoch[0] Elapsed time 0.129 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=132.996351
Epoch[1] Learning rate is 0.001

100%|██████████| 5/5 [00:00<00:00, 62.41it/s, epoch=2/2, avg_epoch_loss=1.23]
Epoch[1] Elapsed time 0.083 seconds
Epoch[1] Evaluation metric 'epoch_loss'=1.226578

10it [00:00, 80.24it/s, epoch=2/2, validation_avg_epoch_loss=132]
Epoch[1] Elapsed time 0.127 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=132.171909
Computing averaged parameters.
Loading averaged parameters.
End model training
Evaluating model MQCNN/trial_0 with metric mean_wQuantileLoss on validation data...

  0%|          | 0/313 [00:00<?, ?it/s]Forecast is not sample based. Ignoring parameter num_samples from predict method.

100%|██████████| 313/313 [00:00<00:00, 2448.79it/s]

100%|██████████| 313/313 [00:00<00:00, 3571.14it/s]

Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1180.64it/s]
Validation score for model MQCNN/trial_0 is -0.9916079769993384
Training model MQCNN/trial_1...
Start model training
Epoch[0] Learning rate is 0.001

  0%|          | 0/5 [00:00<?, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 57784
100%|██████████| 5/5 [00:00<00:00, 52.73it/s, epoch=1/2, avg_epoch_loss=132]
Epoch[0] Elapsed time 0.097 seconds
Epoch[0] Evaluation metric 'epoch_loss'=131.697012

0it [00:00, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 57784
10it [00:00, 76.57it/s, epoch=1/2, validation_avg_epoch_loss=134]
Epoch[0] Elapsed time 0.133 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=134.007222
Epoch[1] Learning rate is 0.001

100%|██████████| 5/5 [00:00<00:00, 63.04it/s, epoch=2/2, avg_epoch_loss=1.34]
Epoch[1] Elapsed time 0.082 seconds
Epoch[1] Evaluation metric 'epoch_loss'=1.337818

10it [00:00, 79.43it/s, epoch=2/2, validation_avg_epoch_loss=133]
Epoch[1] Elapsed time 0.128 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=133.267226
Computing averaged parameters.
Loading averaged parameters.
End model training
Evaluating model MQCNN/trial_1 with metric mean_wQuantileLoss on validation data...

  0%|          | 0/313 [00:00<?, ?it/s]
100%|██████████| 313/313 [00:00<00:00, 2427.10it/s]

100%|██████████| 313/313 [00:00<00:00, 3321.79it/s]

Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1173.27it/s]
Validation score for model MQCNN/trial_1 is -0.9964028387055066
Start hyperparameter tuning for DeepAR
  0%|          | 0/2 [00:00<?, ?it/s]
Training model DeepAR/trial_0...
Start model training
Epoch[0] Learning rate is 0.001

  0%|          | 0/5 [00:00<?, ?it/s]Number of parameters in DeepARTrainingNetwork: 25843

100%|██████████| 5/5 [00:17<00:00,  3.60s/it, epoch=1/2, avg_epoch_loss=6.86]
Epoch[0] Elapsed time 17.997 seconds
Epoch[0] Evaluation metric 'epoch_loss'=6.862702

0it [00:00, ?it/s]Number of parameters in DeepARTrainingNetwork: 25843

10it [00:11,  1.19s/it, epoch=1/2, validation_avg_epoch_loss=43.6]
Epoch[0] Elapsed time 11.899 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=43.641521
Epoch[1] Learning rate is 0.001

100%|██████████| 5/5 [00:00<00:00, 14.72it/s, epoch=2/2, avg_epoch_loss=5.64]
Epoch[1] Elapsed time 0.342 seconds
Epoch[1] Evaluation metric 'epoch_loss'=5.637449

10it [00:00, 26.14it/s, epoch=2/2, validation_avg_epoch_loss=32.4]
Epoch[1] Elapsed time 0.385 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=32.405570
Computing averaged parameters.
Loading averaged parameters.
End model training
Evaluating model DeepAR/trial_0 with metric mean_wQuantileLoss on validation data...

  0%|          | 0/313 [00:00<?, ?it/s]
  0%|          | 1/313 [00:00<01:13,  4.23it/s]
 11%|█         | 33/313 [00:00<00:03, 82.88it/s]
 21%|██        | 65/313 [00:00<00:02, 108.94it/s]
 31%|███       | 97/313 [00:00<00:01, 121.30it/s]
 41%|████      | 129/313 [00:01<00:01, 129.00it/s]
 51%|█████▏    | 161/313 [00:01<00:01, 133.17it/s]
 62%|██████▏   | 193/313 [00:01<00:00, 136.62it/s]
 72%|███████▏  | 225/313 [00:01<00:00, 138.69it/s]
 82%|████████▏ | 257/313 [00:02<00:00, 140.20it/s]
100%|██████████| 313/313 [00:02<00:00, 138.46it/s]

100%|██████████| 313/313 [00:00<00:00, 3911.75it/s]

Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 990.81it/s]
Validation score for model DeepAR/trial_0 is -0.8336604527711646
Training model DeepAR/trial_1...
Start model training
Epoch[0] Learning rate is 0.001

  0%|          | 0/5 [00:00<?, ?it/s]Number of parameters in DeepARTrainingNetwork: 25843

100%|██████████| 5/5 [00:14<00:00,  2.98s/it, epoch=1/2, avg_epoch_loss=5.25]
Epoch[0] Elapsed time 14.883 seconds
Epoch[0] Evaluation metric 'epoch_loss'=5.253252

0it [00:00, ?it/s]Number of parameters in DeepARTrainingNetwork: 25843

10it [00:13,  1.36s/it, epoch=1/2, validation_avg_epoch_loss=42.9]
Epoch[0] Elapsed time 13.583 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=42.857225
Epoch[1] Learning rate is 0.001

100%|██████████| 5/5 [00:00<00:00, 13.32it/s, epoch=2/2, avg_epoch_loss=1.83]
Epoch[1] Elapsed time 0.378 seconds
Epoch[1] Evaluation metric 'epoch_loss'=1.828084

10it [00:00, 24.72it/s, epoch=2/2, validation_avg_epoch_loss=34.5]
Epoch[1] Elapsed time 0.407 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=34.543380
Computing averaged parameters.
Loading averaged parameters.
End model training
Evaluating model DeepAR/trial_1 with metric mean_wQuantileLoss on validation data...

  0%|          | 0/313 [00:00<?, ?it/s]
  0%|          | 1/313 [00:00<01:15,  4.15it/s]
 11%|█         | 33/313 [00:00<00:03, 79.80it/s]
 21%|██        | 65/313 [00:00<00:02, 104.05it/s]
 31%|███       | 97/313 [00:00<00:01, 115.46it/s]
 41%|████      | 129/313 [00:01<00:01, 121.56it/s]
 51%|█████▏    | 161/313 [00:01<00:01, 126.02it/s]
 62%|██████▏   | 193/313 [00:01<00:00, 128.60it/s]
 72%|███████▏  | 225/313 [00:01<00:00, 130.71it/s]
 82%|████████▏ | 257/313 [00:02<00:00, 131.92it/s]
100%|██████████| 313/313 [00:02<00:00, 130.84it/s]

100%|██████████| 313/313 [00:00<00:00, 3634.50it/s]

Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 964.50it/s]
Validation score for model DeepAR/trial_1 is -0.863341598531469
AutoGluon training complete, total runtime = 73.66s ...

To ensure quick runtimes, we specified that only 2 HPO trials should be run for tuning each model’s hyperparameters, which is too few for real applications. We specified that HPO should be performed via a Bayesian optimization searcher with HPO trials to evaluate candidate hyperparameter configurations executed via a local sequential job scheduler. See the AutoGluon Searcher/Scheduler documentation/tutorials for more details.

Above we set the epochs, num_batches_per_epoch, and distr_output hyperparameters to fixed values. You are allowed to set some hyperparameters to search spaces and others to fixed values. Any hyperparameters you do not specify values or search spaces for will be left at their default values. AutoGluon will only train those models which appear as keys in the hyperparameters dict argument passed into fit(), so in this case only the MQCNN and DeepAR models are trained. Refer to the GluonTS documentation for individual GluonTS models to see all of the hyperparameters you may specify for them.

Viewing additional information

We can view a summary of the HPO process, which will show the validation score achieved in each HPO trial as well as which hyperparameter configuration was evaluated in the corresponding trial:

predictor.fit_summary()
Generating leaderboard for all models trained...
* Summary of fit() *
Estimated performance of each model:
            model  val_score  fit_order
0  DeepAR/trial_0  -0.833660          3
1  DeepAR/trial_1  -0.863342          4
2   MQCNN/trial_0  -0.991608          1
3   MQCNN/trial_1  -0.996403          2
Number of models trained: 4
Types of models trained:
{'MQCNNModel', 'DeepARModel'}
Hyperparameter-tuning used: True
User-specified hyperparameters:
{'MQCNN': {'context_length': Int: lower=75, upper=100, 'epochs': 2, 'num_batches_per_epoch': 5}, 'DeepAR': {'context_length': Int: lower=75, upper=100, 'epochs': 2, 'num_batches_per_epoch': 5, 'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput()}}
Feature Metadata (Processed):
(raw dtype, special dtypes):
* Details of Hyperparameter optimization *
HPO for MQCNN model:  Num. configurations tried = 2, Time spent = 6.654607772827148s
Best hyperparameter-configuration (validation-performance: mean_wQuantileLoss = -0.9916079769993384):
{'context_length': 88}
HPO for DeepAR model:  Num. configurations tried = 2, Time spent = 66.95265889167786s
Best hyperparameter-configuration (validation-performance: mean_wQuantileLoss = -0.8336604527711646):
{'context_length': 88}
* End of fit() summary *
{'model_types': {'MQCNN/trial_0': 'MQCNNModel',
  'MQCNN/trial_1': 'MQCNNModel',
  'DeepAR/trial_0': 'DeepARModel',
  'DeepAR/trial_1': 'DeepARModel'},
 'model_performance': {'MQCNN/trial_0': -0.9916079769993384,
  'MQCNN/trial_1': -0.9964028387055066,
  'DeepAR/trial_0': -0.8336604527711646,
  'DeepAR/trial_1': -0.863341598531469},
 'model_best': None,
 'model_paths': {'MQCNN/trial_0': 'agModels-covidforecast/models/MQCNN/trial_0/',
  'MQCNN/trial_1': 'agModels-covidforecast/models/MQCNN/trial_1/',
  'DeepAR/trial_0': 'agModels-covidforecast/models/DeepAR/trial_0/',
  'DeepAR/trial_1': 'agModels-covidforecast/models/DeepAR/trial_1/'},
 'model_fit_times': {'MQCNN/trial_0': 3.8433167934417725,
  'MQCNN/trial_1': 0.5129275321960449,
  'DeepAR/trial_0': 30.80358338356018,
  'DeepAR/trial_1': 29.442209482192993},
 'hyperparameter_tune': True,
 'hyperparameters_userspecified': {'MQCNN': {'context_length': Int: lower=75, upper=100,
   'epochs': 2,
   'num_batches_per_epoch': 5},
  'DeepAR': {'context_length': Int: lower=75, upper=100,
   'epochs': 2,
   'num_batches_per_epoch': 5,
   'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput()}},
 'hpo_results': {'MQCNN': {'best_reward': -0.9916079769993384,
   'best_config': {'context_length': 88},
   'total_time': 6.654607772827148,
   'metadata': {'stop_criterion': {'time_limits': None, 'max_reward': None},
    'resources_per_trial': {'num_cpus': 'auto', 'num_gpus': 'auto'}},
   'reward_attr': 'validation_performance',
   'args': {'util_args': {'train_data_path': 'dataset_train.p',
     'val_data_path': 'dataset_val.p',
     'directory': 'agModels-covidforecast/models/MQCNN/',
     'model': MQCNN,
     'time_start': 1632359430.06532,
     'time_limit': None},
    'freq': 'D',
    'prediction_length': 19,
    'context_length': 88,
    'epochs': 2,
    'num_batches_per_epoch': 5,
    'use_feat_static_cat': False,
    'use_feat_static_real': False,
    'cardinality': None,
    'quantiles': [0.1, 0.5, 0.9],
    'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7effe15964d0>]},
   'trial_info': {0: {'config': {'context_length': 88},
     'history': [{'epoch': 1,
       'trial': 0,
       'time_this_iter': 5.026702642440796,
       'time_since_start': 5.026702642440796}],
     'metadata': {'epoch': 1,
      'trial': 0,
      'time_this_iter': 5.026702642440796,
      'time_since_start': 5.026702642440796},
     'validation_performance': -0.9916079769993384},
    1: {'config': {'context_length': 97},
     'history': [{'epoch': 1,
       'trial': 1,
       'time_this_iter': 1.5954809188842773,
       'time_since_start': 1.5954809188842773}],
     'metadata': {'epoch': 1,
      'trial': 1,
      'time_this_iter': 1.5954809188842773,
      'time_since_start': 1.5954809188842773},
     'validation_performance': -0.9964028387055066}},
   'validation_performance': -0.9916079769993384,
   'search_space': OrderedDict([('context_length',
                 Int: lower=75, upper=100)])},
  'DeepAR': {'best_reward': -0.8336604527711646,
   'best_config': {'context_length': 88},
   'total_time': 66.95265889167786,
   'metadata': {'stop_criterion': {'time_limits': None, 'max_reward': None},
    'resources_per_trial': {'num_cpus': 'auto', 'num_gpus': 'auto'}},
   'reward_attr': 'validation_performance',
   'args': {'util_args': {'train_data_path': 'dataset_train.p',
     'val_data_path': 'dataset_val.p',
     'directory': 'agModels-covidforecast/models/DeepAR/',
     'model': DeepAR,
     'time_start': 1632359436.7424335,
     'time_limit': None},
    'freq': 'D',
    'prediction_length': 19,
    'context_length': 88,
    'epochs': 2,
    'num_batches_per_epoch': 5,
    'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput(),
    'use_feat_static_cat': False,
    'use_feat_static_real': False,
    'cardinality': None,
    'quantiles': [0.1, 0.5, 0.9],
    'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f00944cd9d0>]},
   'trial_info': {0: {'config': {'context_length': 88},
     'history': [{'epoch': 1,
       'trial': 0,
       'time_this_iter': 34.0602662563324,
       'time_since_start': 34.0602662563324}],
     'metadata': {'epoch': 1,
      'trial': 0,
      'time_this_iter': 34.0602662563324,
      'time_since_start': 34.0602662563324},
     'validation_performance': -0.8336604527711646},
    1: {'config': {'context_length': 96},
     'history': [{'epoch': 1,
       'trial': 1,
       'time_this_iter': 32.85720777511597,
       'time_since_start': 32.85720777511597}],
     'metadata': {'epoch': 1,
      'trial': 1,
      'time_this_iter': 32.85720777511597,
      'time_since_start': 32.85720777511597},
     'validation_performance': -0.863341598531469}},
   'validation_performance': -0.8336604527711646,
   'search_space': OrderedDict([('context_length',
                 Int: lower=75, upper=100)])}},
 'model_hyperparams': {'MQCNN/trial_0': {'freq': 'D',
   'prediction_length': 19,
   'context_length': 88,
   'epochs': 2,
   'num_batches_per_epoch': 5,
   'use_feat_static_cat': False,
   'use_feat_static_real': False,
   'cardinality': None,
   'quantiles': [0.1, 0.5, 0.9],
   'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f00d03cc1d0>,
    <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7f00d03cc150>],
   'hybridize': False},
  'MQCNN/trial_1': {'freq': 'D',
   'prediction_length': 19,
   'context_length': 97,
   'epochs': 2,
   'num_batches_per_epoch': 5,
   'use_feat_static_cat': False,
   'use_feat_static_real': False,
   'cardinality': None,
   'quantiles': [0.1, 0.5, 0.9],
   'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7effe1f31f50>,
    <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7effe167c210>],
   'hybridize': False},
  'DeepAR/trial_0': {'freq': 'D',
   'prediction_length': 19,
   'context_length': 88,
   'epochs': 2,
   'num_batches_per_epoch': 5,
   'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput(),
   'use_feat_static_cat': False,
   'use_feat_static_real': False,
   'cardinality': None,
   'quantiles': [0.1, 0.5, 0.9],
   'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f00944cdd90>,
    <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7f00944cded0>]},
  'DeepAR/trial_1': {'freq': 'D',
   'prediction_length': 19,
   'context_length': 96,
   'epochs': 2,
   'num_batches_per_epoch': 5,
   'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput(),
   'use_feat_static_cat': False,
   'use_feat_static_real': False,
   'cardinality': None,
   'quantiles': [0.1, 0.5, 0.9],
   'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f00944fb9d0>,
    <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7effe01ca950>]}},
 'leaderboard':             model  val_score  fit_order
 0  DeepAR/trial_0  -0.833660          3
 1  DeepAR/trial_1  -0.863342          4
 2   MQCNN/trial_0  -0.991608          1
 3   MQCNN/trial_1  -0.996403          2}

The 'best_config' field in this summary indicates the hyperparameter configuration that performed best for each model. We can alternatively use the leaderboard to view the performance of each evaluated model/hyperparameter configuration:

predictor.leaderboard()
Generating leaderboard for all models trained...
model val_score fit_order
0 DeepAR/trial_0 -0.833660 3
1 DeepAR/trial_1 -0.863342 4
2 MQCNN/trial_0 -0.991608 1
3 MQCNN/trial_1 -0.996403 2

Here is yet another way to see which model AutoGluon believes to be the best (based on validation score), which is the model automatically used for prediction by default:

predictor._trainer.get_model_best()
'DeepAR/trial_0'

We can also view information about any model AutoGluon has trained:

models_trained = predictor._trainer.get_model_names_all()
specific_model = predictor._trainer.load_model(models_trained[0])
specific_model.get_info()
{'name': 'MQCNN/trial_0',
 'model_type': 'MQCNNModel',
 'eval_metric': 'mean_wQuantileLoss',
 'fit_time': 3.8433167934417725,
 'predict_time': 1.0593109130859375,
 'val_score': -0.9916079769993384,
 'hyperparameters': {'freq': 'D',
  'prediction_length': 19,
  'context_length': 88,
  'epochs': 2,
  'num_batches_per_epoch': 5,
  'use_feat_static_cat': False,
  'use_feat_static_real': False,
  'cardinality': None,
  'quantiles': [0.1, 0.5, 0.9],
  'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f0094533610>,
   <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7effe156ead0>],
  'hybridize': False}}

Evaluating trained models

Given some more recent held-out test data, here’s how to just evaluate the default model AutoGluon uses for forecasting without evaluating all of the other models as in leaderboard():

test_data = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv")
predictor.evaluate(test_data)  # to evaluate specific model, can also specify optional argument: model
Loaded data from: https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv | Columns = 3 / 3 | Rows = 28483 -> 28483
Does not specify model, will by default use the model with the best validation score for evaluation
100%|██████████| 313/313 [00:02<00:00, 131.64it/s]
100%|██████████| 313/313 [00:00<00:00, 3785.06it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 961.39it/s]
0.8176319396478796

Be aware that without providing extra test_data, AutoGluon’s reported validation scores may be slightly optimistic due to adaptive decisions like selecting models/hyperparameters based on the validation data, so it is always a good idea to use some truly held-out test data for an unbiased final evaluation after training has completed.

The ground truth time-series targets are often not available when we produce forecasts and only become available later in the future. In such a workflow, we may first produce predictions using AutoGluon, and then later evaluate them without having to recompute the predictions:

predictions = predictor.predict(train_data)  # before test data have been observed

predictor = ForecastingPredictor.load(save_path)  # reload predictor in future after test data are observed
ForecastingPredictor.evaluate_predictions(forecasts=predictions,
                                          targets=test_data,
                                          index_column=predictor.index_column,
                                          time_column=predictor.time_column,
                                          target_column=predictor.target_column,
                                          eval_metric=predictor.eval_metric)
Does not specify model, will by default use the model with the best validation score for prediction
Predicting with model DeepAR/trial_0
Loading predictor from path agModels-covidforecast/
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 958.89it/s]
0.817738727172412

Static features

In some forecasting problems involving multiple time-series, each individual time-series may be associated with some static features that do not change over time. For example, if forecasting demand for products over time, each product may be associated with an item category (categorical static feature) and an item vector embedding from a recommender system (numeric static features). AutoGluon allows you to provide such static features such that its models will condition their predictions upon them:

static_features = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/toy_static_features.csv")
static_features.head()
Loaded data from: https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/toy_static_features.csv | Columns = 3 / 3 | Rows = 313 -> 313
name static_cat_feature static_real_feature
0 Afghanistan_ 2 67.03
1 Albania_ 1 4.33
2 Algeria_ 5 19.63
3 Andorra_ 2 42.49
4 Angola_ 3 21.58

Note that each unique value of index_column in our time series data must be represented as a row in static_features (in a column whose name matches the index_column) that contains the feature values corresponding to this individual series. AutoGluon can automatically infer which static features are categorical vs. numeric when they are passed into fit():

predictor_static = ForecastingPredictor(path=save_path, eval_metric=eval_metric).fit(
    train_data, static_features=static_features, prediction_length=19, quantiles=[0.1, 0.5, 0.9],
    index_column="name", target_column="ConfirmedCases", time_column="Date",
    presets="low_quality"  # last argument is just here for quick demo, omit it in real applications!
)
Warning: path already exists! This predictor may overwrite an existing predictor! path="agModels-covidforecast"
presets is set to be low_quality
Training with dataset in tabular format...
Finish rebuilding the data, showing the top five rows.
           name  2020-01-22  2020-01-23  2020-01-24  2020-01-25  2020-01-26  0  Afghanistan_         0.0         0.0         0.0         0.0         0.0
1      Albania_         0.0         0.0         0.0         0.0         0.0
2      Algeria_         0.0         0.0         0.0         0.0         0.0
3      Andorra_         0.0         0.0         0.0         0.0         0.0
4       Angola_         0.0         0.0         0.0         0.0         0.0

   2020-01-27  2020-01-28  2020-01-29  2020-01-30  ...  2020-03-24  0         0.0         0.0         0.0         0.0  ...        74.0
1         0.0         0.0         0.0         0.0  ...       123.0
2         0.0         0.0         0.0         0.0  ...       264.0
3         0.0         0.0         0.0         0.0  ...       164.0
4         0.0         0.0         0.0         0.0  ...         3.0

   2020-03-25  2020-03-26  2020-03-27  2020-03-28  2020-03-29  2020-03-30  0        84.0        94.0       110.0       110.0       120.0       170.0
1       146.0       174.0       186.0       197.0       212.0       223.0
2       302.0       367.0       409.0       454.0       511.0       584.0
3       188.0       224.0       267.0       308.0       334.0       370.0
4         3.0         4.0         4.0         5.0         7.0         7.0

   2020-03-31  2020-04-01  2020-04-02
0       174.0       237.0       273.0
1       243.0       259.0       277.0
2       716.0       847.0       986.0
3       376.0       390.0       428.0
4         7.0         8.0         8.0

[5 rows x 73 columns]
Validation data is None, will do auto splitting...
static feature column static_cat_feature has 10 or less unique values, assuming it is categorical.
Fitting IdentityFeatureGenerator...
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Fitting IdentityFeatureGenerator...
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Fitting CategoryFeatureGenerator...
    Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    Fitting CategoryMemoryMinimizeFeatureGenerator...
Using previous inferred feature columns...
Static Cat Features Dataframe including ['static_cat_feature']
Static Real Features Dataframe including ['static_real_feature']
Finished processing data, using 1.2074267864227295s.
Random seed set to 0
All models will be trained for quantiles [0.1, 0.5, 0.9].
Beginning AutoGluon training ...
AutoGluon will save models to agModels-covidforecast/
Fitting model: SFF ...
Training model SFF...
Start model training
Epoch[0] Learning rate is 0.001
  0%|          | 0/10 [00:00<?, ?it/s]Number of parameters in SimpleFeedForwardTrainingNetwork: 31523
100%|██████████| 10/10 [00:02<00:00,  3.70it/s, epoch=1/5, avg_epoch_loss=0.645]
Epoch[0] Elapsed time 2.705 seconds
Epoch[0] Evaluation metric 'epoch_loss'=0.645329
0it [00:00, ?it/s]Number of parameters in SimpleFeedForwardTrainingNetwork: 31523
10it [00:00, 252.92it/s, epoch=1/5, validation_avg_epoch_loss=9.65]
Epoch[0] Elapsed time 0.041 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=9.653509
Epoch[1] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 195.29it/s, epoch=2/5, avg_epoch_loss=-.11]
Epoch[1] Elapsed time 0.053 seconds
Epoch[1] Evaluation metric 'epoch_loss'=-0.109619
10it [00:00, 313.17it/s, epoch=2/5, validation_avg_epoch_loss=9.15]
Epoch[1] Elapsed time 0.033 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=9.146872
Epoch[2] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 194.88it/s, epoch=3/5, avg_epoch_loss=0.0362]
Epoch[2] Elapsed time 0.053 seconds
Epoch[2] Evaluation metric 'epoch_loss'=0.036189
10it [00:00, 303.55it/s, epoch=3/5, validation_avg_epoch_loss=8.81]
Epoch[2] Elapsed time 0.034 seconds
Epoch[2] Evaluation metric 'validation_epoch_loss'=8.807447
Epoch[3] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 195.24it/s, epoch=4/5, avg_epoch_loss=0.842]
Epoch[3] Elapsed time 0.053 seconds
Epoch[3] Evaluation metric 'epoch_loss'=0.842182
10it [00:00, 308.68it/s, epoch=4/5, validation_avg_epoch_loss=8.6]
Epoch[3] Elapsed time 0.034 seconds
Epoch[3] Evaluation metric 'validation_epoch_loss'=8.597749
Epoch[4] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 192.90it/s, epoch=5/5, avg_epoch_loss=-.19]
Epoch[4] Elapsed time 0.054 seconds
Epoch[4] Evaluation metric 'epoch_loss'=-0.189805
10it [00:00, 306.27it/s, epoch=5/5, validation_avg_epoch_loss=8.34]
Epoch[4] Elapsed time 0.034 seconds
Epoch[4] Evaluation metric 'validation_epoch_loss'=8.344242
Computing averaged parameters.
Loading averaged parameters.
End model training
100%|██████████| 313/313 [00:00<00:00, 3948.88it/s]
100%|██████████| 313/313 [00:00<00:00, 3734.04it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 943.49it/s]
Fitting model: MQCNN ...
Training model MQCNN...
Start model training
Epoch[0] Learning rate is 0.001
  0%|          | 0/10 [00:00<?, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 58218
100%|██████████| 10/10 [00:00<00:00, 63.30it/s, epoch=1/5, avg_epoch_loss=100]
Epoch[0] Elapsed time 0.160 seconds
Epoch[0] Evaluation metric 'epoch_loss'=100.159810
0it [00:00, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 58218
10it [00:00, 85.18it/s, epoch=1/5, validation_avg_epoch_loss=426]
Epoch[0] Elapsed time 0.119 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=425.933828
Epoch[1] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 68.97it/s, epoch=2/5, avg_epoch_loss=98.5]
Epoch[1] Elapsed time 0.147 seconds
Epoch[1] Evaluation metric 'epoch_loss'=98.538936
10it [00:00, 85.11it/s, epoch=2/5, validation_avg_epoch_loss=423]
Epoch[1] Elapsed time 0.119 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=422.785052
Epoch[2] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 69.43it/s, epoch=3/5, avg_epoch_loss=96.8]
Epoch[2] Elapsed time 0.146 seconds
Epoch[2] Evaluation metric 'epoch_loss'=96.827922
10it [00:00, 85.13it/s, epoch=3/5, validation_avg_epoch_loss=419]
Epoch[2] Elapsed time 0.120 seconds
Epoch[2] Evaluation metric 'validation_epoch_loss'=418.784891
Epoch[3] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 69.81it/s, epoch=4/5, avg_epoch_loss=94.4]
Epoch[3] Elapsed time 0.145 seconds
Epoch[3] Evaluation metric 'epoch_loss'=94.388038
10it [00:00, 83.42it/s, epoch=4/5, validation_avg_epoch_loss=413]
Epoch[3] Elapsed time 0.121 seconds
Epoch[3] Evaluation metric 'validation_epoch_loss'=413.194976
Epoch[4] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 69.57it/s, epoch=5/5, avg_epoch_loss=90.9]
Epoch[4] Elapsed time 0.145 seconds
Epoch[4] Evaluation metric 'epoch_loss'=90.850428
10it [00:00, 84.72it/s, epoch=5/5, validation_avg_epoch_loss=404]
Epoch[4] Elapsed time 0.120 seconds
Epoch[4] Evaluation metric 'validation_epoch_loss'=403.642245
Computing averaged parameters.
Loading averaged parameters.
End model training
100%|██████████| 313/313 [00:00<00:00, 2614.64it/s]
100%|██████████| 313/313 [00:00<00:00, 3741.37it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 934.85it/s]
Fitting model: DeepAR ...
Training model DeepAR...
Start model training
Epoch[0] Learning rate is 0.001
  0%|          | 0/10 [00:00<?, ?it/s]Number of parameters in DeepARTrainingNetwork: 26218
100%|██████████| 10/10 [00:03<00:00,  2.78it/s, epoch=1/5, avg_epoch_loss=-2.29]
Epoch[0] Elapsed time 3.600 seconds
Epoch[0] Evaluation metric 'epoch_loss'=-2.289171
0it [00:00, ?it/s]Number of parameters in DeepARTrainingNetwork: 26218
10it [00:00, 11.91it/s, epoch=1/5, validation_avg_epoch_loss=8.88]
Epoch[0] Elapsed time 0.841 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=8.879016
Epoch[1] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 36.57it/s, epoch=2/5, avg_epoch_loss=-.744]
Epoch[1] Elapsed time 0.275 seconds
Epoch[1] Evaluation metric 'epoch_loss'=-0.744277
10it [00:00, 59.21it/s, epoch=2/5, validation_avg_epoch_loss=8.75]
Epoch[1] Elapsed time 0.171 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=8.748208
Epoch[2] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 37.73it/s, epoch=3/5, avg_epoch_loss=-.418]
Epoch[2] Elapsed time 0.267 seconds
Epoch[2] Evaluation metric 'epoch_loss'=-0.418038
10it [00:00, 59.09it/s, epoch=3/5, validation_avg_epoch_loss=8.51]
Epoch[2] Elapsed time 0.171 seconds
Epoch[2] Evaluation metric 'validation_epoch_loss'=8.507644
Epoch[3] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 37.18it/s, epoch=4/5, avg_epoch_loss=-.681]
Epoch[3] Elapsed time 0.271 seconds
Epoch[3] Evaluation metric 'epoch_loss'=-0.681060
10it [00:00, 58.33it/s, epoch=4/5, validation_avg_epoch_loss=8.3]
Epoch[3] Elapsed time 0.173 seconds
Epoch[3] Evaluation metric 'validation_epoch_loss'=8.304434
Epoch[4] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 37.22it/s, epoch=5/5, avg_epoch_loss=-1.9]
Epoch[4] Elapsed time 0.271 seconds
Epoch[4] Evaluation metric 'epoch_loss'=-1.904672
10it [00:00, 58.79it/s, epoch=5/5, validation_avg_epoch_loss=8.03]
Epoch[4] Elapsed time 0.172 seconds
Epoch[4] Evaluation metric 'validation_epoch_loss'=8.026868
Computing averaged parameters.
Loading averaged parameters.
End model training
100%|██████████| 313/313 [00:00<00:00, 313.62it/s]
100%|██████████| 313/313 [00:00<00:00, 3835.08it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 943.59it/s]
AutoGluon training complete, total runtime = 15.19s ...

Recall we only use presets = "low_quality" to ensure this example runs quickly, but this is NOT a good setting and you should either omit this argument or set presets = "best_quality" if you want to benchmark the best accuracy that AutoGluon can obtain!

If you provided static features to fit(), then the static features must be also provided when using leaderboard(), evaluate(), or predict():

predictor_static.leaderboard(test_data, static_features=static_features)
Using previous inferred feature columns...
Static Cat Features Dataframe including ['static_cat_feature']
Static Real Features Dataframe including ['static_real_feature']
Generating leaderboard for all models trained...
Additional data provided, testing on the additional data...
100%|██████████| 313/313 [00:00<00:00, 3942.52it/s]
100%|██████████| 313/313 [00:00<00:00, 3808.06it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 943.20it/s]
100%|██████████| 313/313 [00:00<00:00, 2445.15it/s]
100%|██████████| 313/313 [00:00<00:00, 3911.75it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 950.05it/s]
100%|██████████| 313/313 [00:00<00:00, 314.12it/s]
100%|██████████| 313/313 [00:00<00:00, 3918.63it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 951.37it/s]
model val_score fit_order test_score
0 SFF -0.660771 1 -0.267795
1 DeepAR -0.764094 3 -0.590745
2 MQCNN -0.920152 2 -0.865667

AutoGluon forecast predictions will now be based on the static features in addition to the historical time-series observations:

predictions = predictor_static.predict(test_data, static_features=static_features)
print(predictions["Afghanistan_"])
Using previous inferred feature columns...
Static Cat Features Dataframe including ['static_cat_feature']
Static Real Features Dataframe including ['static_real_feature']
Does not specify model, will by default use the model with the best validation score for prediction
Predicting with model SFF
                    0.1          0.5          0.9
2020-04-22   848.724426  1433.520386  2100.333252
2020-04-23   525.479980  1203.280273  1828.540649
2020-04-24   406.676422  1320.063599  2089.422852
2020-04-25   135.429993  1292.610840  2219.305908
2020-04-26    59.995323  1354.163574  2438.460938
2020-04-27   586.250061  1494.719849  2511.511475
2020-04-28   -85.886353  1529.408325  3028.596924
2020-04-29   367.283447  1467.030396  2635.018311
2020-04-30   559.825256  1562.025513  2969.750732
2020-05-01   -58.859753  1607.525146  2800.288818
2020-05-02   311.700958  1617.724609  3071.249023
2020-05-03  -322.320007  1401.588867  2945.138428
2020-05-04  -319.960236  1088.040771  2829.933105
2020-05-05  -403.656525  1754.966675  3624.205811
2020-05-06  -348.057587  1366.859619  4198.331543
2020-05-07   -34.126774  1619.350708  3284.425293
2020-05-08 -1168.174927  1551.574707  3387.256348
2020-05-09 -1259.406860  1204.493530  4268.099121
2020-05-10  -189.284729  1399.963745  3504.257812