Forecasting Time-Series - Quick Start

Via a simple fit() call, AutoGluon can train models to produce forecasts for time series data. This tutorial demonstrates how to quickly use AutoGluon to produce forecasts of Covid-19 cases in a country given historical data from each country. Let’s first import AutoGluon’s ForecastingPredictor and TabularDataset classes, where the latter is used to load time-series data stored in a tabular file format:

from autogluon.forecasting import ForecastingPredictor
from autogluon.forecasting import TabularDataset
/var/lib/jenkins/workspace/workspace/autogluon-forecasting-py3-v3/venv/lib/python3.7/site-packages/gluonts/json.py:46: UserWarning: Using json-module for json-handling. Consider installing one of orjson, ujson to speed up serialization and deserialization.
  "Using json-module for json-handling. "

We load the time-series data to use for training from a CSV file into an AutoGluon TabularDataset object. This object is essentially equivalent to a Pandas DataFrame and the same methods can be applied to both.

train_data = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/train.csv")
print(train_data[50:60])
          Date  ConfirmedCases          name
50  2020-03-12             7.0  Afghanistan_
51  2020-03-13             7.0  Afghanistan_
52  2020-03-14            11.0  Afghanistan_
53  2020-03-15            16.0  Afghanistan_
54  2020-03-16            21.0  Afghanistan_
55  2020-03-17            22.0  Afghanistan_
56  2020-03-18            22.0  Afghanistan_
57  2020-03-19            22.0  Afghanistan_
58  2020-03-20            24.0  Afghanistan_
59  2020-03-21            24.0  Afghanistan_

Note that we loaded data from a CSV file stored in the cloud (AWS s3 bucket), but you can you specify a local file-path instead if you have already downloaded the CSV file to your own machine (e.g., using wget). Our goal is to train models on this data that can forecast Covid case counts in each country at future dates. This corresponds to a forecasting problem with many related individual time-series (one per country). Each row in the table train_data corresponds to one observation of one time-series at a particular time.

The dataset you use for autogluon.forecasting should usually contain three columns: a date_column with the time information (here “Date”), an index_column with a categorical index ID that specifies which (out of multiple) time-series is being observed (here “name”, where each country corresponds to a different time-series in our example), and a target_column with the observed value of this particular time-series at this particular time (here “ConfirmedCases”). When forecasting future values of one particular time-series, AutoGluon may rely on historical observations of not only this time-series but also all of the other time-series in the dataset. You can use NA to represent missing observations in the data. If your data only contains observations of a single time-series, then index_column can be omitted. Currently only continuous numeric values are supported in the target_column.

Now let’s use AutoGluon to train some forecasting models. Below we instruct AutoGluon to fit models that can forecast up to 19 time-points into the future (prediction_length) and save them in the folder save_path. Because of the inherent uncertainty involved in this prediction problem, these models are trained to probabilistically forecast 3 different quantiles of the “ConfirmedCases” distribution: the central 0.5 quantile (median), a low 0.1 quantile, and a high 0.9 quantile. The first of these can be used as our forecasted value, while the latter two can be used as a prediction interval for this value (we are 80% confident the true value lies within this interval).

save_path = "agModels-covidforecast"

predictor = ForecastingPredictor(path=save_path).fit(train_data, prediction_length=19,
            index_column="name", target_column="ConfirmedCases", time_column="Date",
                                                         quantiles=[0.1, 0.5, 0.9],
                                                         presets="low_quality"  # last argument is just here for quick demo, omit it in real applications!
                                                    )
Warning: path already exists! This predictor may overwrite an existing predictor! path="agModels-covidforecast"
presets is set to be low_quality
Training with dataset in tabular format...
Finish rebuilding the data, showing the top five rows.
           name  2020-01-22  2020-01-23  2020-01-24  2020-01-25  2020-01-26  0  Afghanistan_         0.0         0.0         0.0         0.0         0.0
1      Albania_         0.0         0.0         0.0         0.0         0.0
2      Algeria_         0.0         0.0         0.0         0.0         0.0
3      Andorra_         0.0         0.0         0.0         0.0         0.0
4       Angola_         0.0         0.0         0.0         0.0         0.0

   2020-01-27  2020-01-28  2020-01-29  2020-01-30  ...  2020-03-24  0         0.0         0.0         0.0         0.0  ...        74.0
1         0.0         0.0         0.0         0.0  ...       123.0
2         0.0         0.0         0.0         0.0  ...       264.0
3         0.0         0.0         0.0         0.0  ...       164.0
4         0.0         0.0         0.0         0.0  ...         3.0

   2020-03-25  2020-03-26  2020-03-27  2020-03-28  2020-03-29  2020-03-30  0        84.0        94.0       110.0       110.0       120.0       170.0
1       146.0       174.0       186.0       197.0       212.0       223.0
2       302.0       367.0       409.0       454.0       511.0       584.0
3       188.0       224.0       267.0       308.0       334.0       370.0
4         3.0         4.0         4.0         5.0         7.0         7.0

   2020-03-31  2020-04-01  2020-04-02
0       174.0       237.0       273.0
1       243.0       259.0       277.0
2       716.0       847.0       986.0
3       376.0       390.0       428.0
4         7.0         8.0         8.0

[5 rows x 73 columns]
Validation data is None, will do auto splitting...
Finished processing data, using 0.3105947971343994s.
Random seed set to 0
All models will be trained for quantiles [0.1, 0.5, 0.9].
Beginning AutoGluon training ...
AutoGluon will save models to agModels-covidforecast/
Fitting model: SFF ...
Training model SFF...
Start model training
Epoch[0] Learning rate is 0.001
  0%|          | 0/10 [00:00<?, ?it/s]Number of parameters in SimpleFeedForwardTrainingNetwork: 31523
100%|██████████| 10/10 [00:01<00:00,  6.68it/s, epoch=1/5, avg_epoch_loss=-3.4]
Epoch[0] Elapsed time 1.499 seconds
Epoch[0] Evaluation metric 'epoch_loss'=-3.399109
0it [00:00, ?it/s]Number of parameters in SimpleFeedForwardTrainingNetwork: 31523
10it [00:00, 256.60it/s, epoch=1/5, validation_avg_epoch_loss=9.7]
Epoch[0] Elapsed time 0.041 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=9.702665
Epoch[1] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 194.89it/s, epoch=2/5, avg_epoch_loss=-2.2]
Epoch[1] Elapsed time 0.053 seconds
Epoch[1] Evaluation metric 'epoch_loss'=-2.204393
10it [00:00, 325.37it/s, epoch=2/5, validation_avg_epoch_loss=9.23]
Epoch[1] Elapsed time 0.032 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=9.231876
Epoch[2] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 198.72it/s, epoch=3/5, avg_epoch_loss=-.357]
Epoch[2] Elapsed time 0.052 seconds
Epoch[2] Evaluation metric 'epoch_loss'=-0.357210
10it [00:00, 318.91it/s, epoch=3/5, validation_avg_epoch_loss=8.87]
Epoch[2] Elapsed time 0.033 seconds
Epoch[2] Evaluation metric 'validation_epoch_loss'=8.873803
Epoch[3] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 199.88it/s, epoch=4/5, avg_epoch_loss=-1.01]
Epoch[3] Elapsed time 0.052 seconds
Epoch[3] Evaluation metric 'epoch_loss'=-1.005207
10it [00:00, 310.13it/s, epoch=4/5, validation_avg_epoch_loss=8.61]
Epoch[3] Elapsed time 0.034 seconds
Epoch[3] Evaluation metric 'validation_epoch_loss'=8.606632
Epoch[4] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 203.36it/s, epoch=5/5, avg_epoch_loss=-1.2]
Epoch[4] Elapsed time 0.051 seconds
Epoch[4] Evaluation metric 'epoch_loss'=-1.195356
10it [00:00, 325.66it/s, epoch=5/5, validation_avg_epoch_loss=8.5]
Epoch[4] Elapsed time 0.032 seconds
Epoch[4] Evaluation metric 'validation_epoch_loss'=8.499021
Computing averaged parameters.
Loading averaged parameters.
End model training
100%|██████████| 313/313 [00:00<00:00, 4182.21it/s]
100%|██████████| 313/313 [00:00<00:00, 3815.28it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1137.83it/s]
Fitting model: MQCNN ...
Training model MQCNN...
Start model training
Epoch[0] Learning rate is 0.001
  0%|          | 0/10 [00:00<?, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 57784
100%|██████████| 10/10 [00:00<00:00, 65.46it/s, epoch=1/5, avg_epoch_loss=98.9]
Epoch[0] Elapsed time 0.155 seconds
Epoch[0] Evaluation metric 'epoch_loss'=98.895254
0it [00:00, ?it/s]Number of parameters in ForkingSeq2SeqTrainingNetwork: 57784
10it [00:00, 84.81it/s, epoch=1/5, validation_avg_epoch_loss=424]
Epoch[0] Elapsed time 0.119 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=424.208052
Epoch[1] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 70.80it/s, epoch=2/5, avg_epoch_loss=97.6]
Epoch[1] Elapsed time 0.143 seconds
Epoch[1] Evaluation metric 'epoch_loss'=97.625159
10it [00:00, 85.74it/s, epoch=2/5, validation_avg_epoch_loss=422]
Epoch[1] Elapsed time 0.118 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=421.774191
Epoch[2] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 71.88it/s, epoch=3/5, avg_epoch_loss=96.3]
Epoch[2] Elapsed time 0.141 seconds
Epoch[2] Evaluation metric 'epoch_loss'=96.322472
10it [00:00, 87.63it/s, epoch=3/5, validation_avg_epoch_loss=419]
Epoch[2] Elapsed time 0.115 seconds
Epoch[2] Evaluation metric 'validation_epoch_loss'=418.761470
Epoch[3] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 72.05it/s, epoch=4/5, avg_epoch_loss=94.5]
Epoch[3] Elapsed time 0.140 seconds
Epoch[3] Evaluation metric 'epoch_loss'=94.466040
10it [00:00, 87.92it/s, epoch=4/5, validation_avg_epoch_loss=414]
Epoch[3] Elapsed time 0.115 seconds
Epoch[3] Evaluation metric 'validation_epoch_loss'=414.468602
Epoch[4] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 71.47it/s, epoch=5/5, avg_epoch_loss=91.6]
Epoch[4] Elapsed time 0.142 seconds
Epoch[4] Evaluation metric 'epoch_loss'=91.640606
10it [00:00, 88.57it/s, epoch=5/5, validation_avg_epoch_loss=406]
Epoch[4] Elapsed time 0.114 seconds
Epoch[4] Evaluation metric 'validation_epoch_loss'=406.421242
Computing averaged parameters.
Loading averaged parameters.
End model training
  0%|          | 0/313 [00:00<?, ?it/s]Forecast is not sample based. Ignoring parameter num_samples from predict method.
100%|██████████| 313/313 [00:00<00:00, 2736.95it/s]
100%|██████████| 313/313 [00:00<00:00, 1645.36it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1137.56it/s]
Fitting model: DeepAR ...
Training model DeepAR...
Start model training
Epoch[0] Learning rate is 0.001
  0%|          | 0/10 [00:00<?, ?it/s]Number of parameters in DeepARTrainingNetwork: 25884
100%|██████████| 10/10 [00:04<00:00,  2.35it/s, epoch=1/5, avg_epoch_loss=0.253]
Epoch[0] Elapsed time 4.266 seconds
Epoch[0] Evaluation metric 'epoch_loss'=0.252794
0it [00:00, ?it/s]Number of parameters in DeepARTrainingNetwork: 25884
10it [00:00, 11.83it/s, epoch=1/5, validation_avg_epoch_loss=8.16]
Epoch[0] Elapsed time 0.847 seconds
Epoch[0] Evaluation metric 'validation_epoch_loss'=8.158110
Epoch[1] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 37.54it/s, epoch=2/5, avg_epoch_loss=-1.39]
Epoch[1] Elapsed time 0.268 seconds
Epoch[1] Evaluation metric 'epoch_loss'=-1.386599
10it [00:00, 62.07it/s, epoch=2/5, validation_avg_epoch_loss=7.7]
Epoch[1] Elapsed time 0.163 seconds
Epoch[1] Evaluation metric 'validation_epoch_loss'=7.703091
Epoch[2] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 39.65it/s, epoch=3/5, avg_epoch_loss=-2.51]
Epoch[2] Elapsed time 0.254 seconds
Epoch[2] Evaluation metric 'epoch_loss'=-2.514444
10it [00:00, 61.58it/s, epoch=3/5, validation_avg_epoch_loss=7.43]
Epoch[2] Elapsed time 0.164 seconds
Epoch[2] Evaluation metric 'validation_epoch_loss'=7.431931
Epoch[3] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 38.71it/s, epoch=4/5, avg_epoch_loss=-1.07]
Epoch[3] Elapsed time 0.260 seconds
Epoch[3] Evaluation metric 'epoch_loss'=-1.071596
10it [00:00, 62.33it/s, epoch=4/5, validation_avg_epoch_loss=7.23]
Epoch[3] Elapsed time 0.162 seconds
Epoch[3] Evaluation metric 'validation_epoch_loss'=7.230093
Epoch[4] Learning rate is 0.001
100%|██████████| 10/10 [00:00<00:00, 39.71it/s, epoch=5/5, avg_epoch_loss=-.417]
Epoch[4] Elapsed time 0.254 seconds
Epoch[4] Evaluation metric 'epoch_loss'=-0.417213
10it [00:00, 61.66it/s, epoch=5/5, validation_avg_epoch_loss=7.12]
Epoch[4] Elapsed time 0.164 seconds
Epoch[4] Evaluation metric 'validation_epoch_loss'=7.119217
Computing averaged parameters.
Loading averaged parameters.
End model training
100%|██████████| 313/313 [00:00<00:00, 321.07it/s]
100%|██████████| 313/313 [00:00<00:00, 3886.14it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1105.04it/s]
AutoGluon training complete, total runtime = 17.75s ...

Note: We used presets = "low_quality" above to ensure this example runs quickly, but this is NOT a good setting! To obtain good performance in real applications you should either delete this argument or set presets to be one of: "best_quality", high_quality", "good_quality", "medium_quality". Higher quality presets will generally produce superior forecasting accuracy but take longer to train and may produce less efficient models. The low_quality presets are intended just for quickly verifying that AutoGluon can be run on your data and you should generally use the best_quality presets instead when benchmarking the accuracy of AutoGluon.

We can print a summary of what happened during fit():

predictor.fit_summary()
Generating leaderboard for all models trained...
* Summary of fit() *
Estimated performance of each model:
    model  val_score  fit_order
0     SFF  -0.676774          1
1  DeepAR  -0.836025          3
2   MQCNN  -0.928002          2
Number of models trained: 3
Types of models trained:
{'SimpleFeedForwardModel', 'MQCNNModel', 'DeepARModel'}
Hyperparameter-tuning used: False
User-specified hyperparameters:
toy
Feature Metadata (Processed):
(raw dtype, special dtypes):
* End of fit() summary *
{'model_types': {'SFF': 'SimpleFeedForwardModel',
  'MQCNN': 'MQCNNModel',
  'DeepAR': 'DeepARModel'},
 'model_performance': {'SFF': -0.6767744950415352,
  'MQCNN': -0.92800191364702,
  'DeepAR': -0.83602518266847},
 'model_best': None,
 'model_paths': {'SFF': 'agModels-covidforecast/models/SFF/',
  'MQCNN': 'agModels-covidforecast/models/MQCNN/',
  'DeepAR': 'agModels-covidforecast/models/DeepAR/'},
 'model_fit_times': {'SFF': 5.295125722885132,
  'MQCNN': 1.3960041999816895,
  'DeepAR': 6.93402624130249},
 'hyperparameter_tune': False,
 'hyperparameters_userspecified': 'toy',
 'model_hyperparams': {'SFF': {'freq': 'D',
   'prediction_length': 19,
   'epochs': 5,
   'num_batches_per_epoch': 10,
   'context_length': 5,
   'use_feat_static_cat': False,
   'use_feat_static_real': False,
   'cardinality': None,
   'quantiles': [0.1, 0.5, 0.9],
   'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f45d0362210>,
    <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7f45d0362a90>]},
  'MQCNN': {'freq': 'D',
   'prediction_length': 19,
   'epochs': 5,
   'num_batches_per_epoch': 10,
   'context_length': 5,
   'use_feat_static_cat': False,
   'use_feat_static_real': False,
   'cardinality': None,
   'quantiles': [0.1, 0.5, 0.9],
   'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f45d03623d0>,
    <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7f468a5b3f10>],
   'hybridize': False},
  'DeepAR': {'freq': 'D',
   'prediction_length': 19,
   'epochs': 5,
   'num_batches_per_epoch': 10,
   'context_length': 5,
   'use_feat_static_cat': False,
   'use_feat_static_real': False,
   'cardinality': None,
   'quantiles': [0.1, 0.5, 0.9],
   'callbacks': [<autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.EpochCounter at 0x7f468a5cc150>,
    <autogluon.forecasting.models.gluonts_model.abstract_gluonts.callback.TimeLimitCallback at 0x7f468a5cc7d0>]}},
 'leaderboard':     model  val_score  fit_order
 0     SFF  -0.676774          1
 1  DeepAR  -0.836025          3
 2   MQCNN  -0.928002          2}

Now let’s load some more recent test data to examine the forecasting performance of our trained models:

test_data = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv")
Loaded data from: https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv | Columns = 3 / 3 | Rows = 28483 -> 28483

The below code is unnecessary here, but is just included to demonstrate how to reload a trained Predictor object from file (for example in a new Python session):

predictor = ForecastingPredictor.load(save_path)
Loading predictor from path agModels-covidforecast/

We can view the test performance of each model AutoGluon has trained via the leaderboard() function, where higher scores correspond to better predictive performance (in this case where the evaluation metric corresponds to a loss, we append a negative sign to the loss to ensure higher=better):

predictor.leaderboard(test_data)
Generating leaderboard for all models trained...
Additional data provided, testing on the additional data...
100%|██████████| 313/313 [00:00<00:00, 4206.30it/s]
100%|██████████| 313/313 [00:00<00:00, 4018.07it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1117.67it/s]
100%|██████████| 313/313 [00:00<00:00, 2562.23it/s]
100%|██████████| 313/313 [00:00<00:00, 3965.32it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1111.00it/s]
100%|██████████| 313/313 [00:00<00:00, 318.32it/s]
100%|██████████| 313/313 [00:00<00:00, 3907.89it/s]
Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 1133.81it/s]
model val_score fit_order test_score
0 SFF -0.676774 1 -0.278595
1 DeepAR -0.836025 3 -0.747373
2 MQCNN -0.928002 2 -0.878101

Here test_score quantifies the performance of predictions on the held-out part of the test data (time points after the latest time observed in the original training data), while val_score quantifies the performance of predictions on an internal validation set that AutoGluon held-out during fit(). By default the validation set is comprised of the latest time-points in train_data, but you can also manually provide your own validation data through the fit() argument: val_data. You can also call predictor.leaderboard() without any test_data argument to only display val_score. By default, AutoGluon will score probabilistic forecasts of multiple time-series via the weighted quantile loss, but you can specify a different eval_metric in fit() to instruct AutoGluon to optimize for a different evaluation metric instead (eg. eval_metric="MAPE"). For more details about the individual time-series models that AutoGluon can train, you can view the GluonTS documentation or the AutoGluon source code folder autogluon/forecasting/models/.

We can also make forecasts further into the future based on the most recent data. When we call predict(), AutoGluon automatically forecasts with the model that had the best validation performance during training (this is the model at the top of leaderboard() when called without any data). The predictions returned by predict() form a dictionary whose keys index each time series (in this example, country) and whose values are DataFrames containing quantile forecasts for each time series (in this example, predicted quantiles of the case counts in each country at future subsequent dates to those observed in the test_data).

predictions = predictor.predict(test_data)
print(predictions['Afghanistan_'])  # quantile forecasts for the Afghanistan time-series
Does not specify model, will by default use the model with the best validation score for prediction
Predicting with model SFF
                    0.1          0.5          0.9
2020-04-22   238.829971  1182.778687  1932.760254
2020-04-23   292.640137  1286.876831  2406.811279
2020-04-24   129.684311  1362.938843  2444.654297
2020-04-25     1.438057  1273.847778  2290.441895
2020-04-26   263.676788  1384.483032  2708.584229
2020-04-27   215.322830  1575.142334  3049.258789
2020-04-28    73.187408  1584.535278  2793.485840
2020-04-29  -237.644989  1579.732056  3183.700439
2020-04-30    84.962097  1472.296021  3427.439941
2020-05-01  -736.102356  1614.746216  3914.478516
2020-05-02   134.536484  1722.437500  3610.033447
2020-05-03  -392.093353  1552.886841  3763.156250
2020-05-04   284.896149  1357.460815  3524.158203
2020-05-05  -886.666504  1731.957031  4186.833008
2020-05-06  -824.011108  1287.756348  3940.486572
2020-05-07 -1039.792236  1383.604980  4043.714844
2020-05-08 -1171.404175  1779.110962  3805.483154
2020-05-09 -1153.900146  1306.392456  3292.706787
2020-05-10  -945.160034  1354.559814  4442.953125

Instead of forecasting with the model that had the best validation score, you can instead specify which model to use for prediction, as well as that AutoGluon should only predict certain time-series of interest:

model_touse = "MQCNN"
time_series_to_predict = ["Germany_", "Zimbabwe_"]
predictions = predictor.predict(test_data, model=model_touse, time_series_to_predict=time_series_to_predict)
Predicting with model MQCNN

In predict(), AutoGluon makes predictions for prediction_length (= 19 in this example) time points into the future, after the last time observed in the dataset fed into predict(). In evaluate() and leaderboard(), AutoGluon makes predictions for the first prediction_length time points exceeding the last time observed in the train_data originally fed into fit(), and then scores these predictions against the target values at the corresponding times in the dataset fed into these methods. Because some models base their predictions on lengthy histories, it is important that in either case, the test_data you provide contains the train_data as a subset! You can verify the train_data are contained within the test_data in the example above.

After you no longer need a particular trained Predictor, remember to delete the save_path folder to free disk space on your machine. This is especially important to avoid running out of space if training many Predictors in sequence.