.. _sec_forecastingindepth: Forecasting Time-Series - In Depth Tutorial =========================================== This more advanced tutorial describes how you can exert greater control over AutoGluon's time-series modeling. As an example forecasting task, we again use the `Covid-19 dataset `__ previously described in the :ref:`sec_forecastingquick` tutorial. .. code:: python from autogluon.forecasting import ForecastingPredictor from autogluon.forecasting import TabularDataset train_data = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/train.csv") save_path = "agModels-covidforecast" eval_metric = "mean_wQuantileLoss" # just for demonstration, this is already the default evaluation metric .. parsed-literal:: :class: output /var/lib/jenkins/workspace/workspace/autogluon-forecasting-py3-v3/venv/lib/python3.7/site-packages/gluonts/json.py:46: UserWarning: Using `json`-module for json-handling. Consider installing one of `orjson`, `ujson` to speed up serialization and deserialization. "Using `json`-module for json-handling. " Specifying hyperparameters and tuning them ------------------------------------------ While AutoGluon-Forecasting will automatically tune certain hyperparameters of time-series models depending on the ``presets`` setting, you can manually control the hyperparameter optimization (HPO) process. The ``presets`` argument of ``predictor.fit()`` will automatically determine particular search spaces to consider for certain hyperparameter values, as well as how many HPO trials to run when searching for the best value in the chosen hyperparameter search space. Instead of specifying ``presets``, you can manually specify all of these items yourself. Below we demonstrate how to tune the ```context_length`` `__ hyperparameter for just the `MQCNN `__ and `DeepAR `__ models, which controls how much past history is conditioned upon in any one forecast prediction by a trained model. .. code:: python import autogluon.core as ag from gluonts.mx.distribution.neg_binomial import NegativeBinomialOutput context_search_space = ag.Int(75, 100) # integer values spanning the listed range epochs = 2 # small value used for quick demo, omit this or use much larger value in real applications! num_batches_per_epoch = 5 # small value used for quick demo, omit this or use larger value in real applications! num_hpo_trials = 2 # small value used for quick demo, use much larger value in real applications! mqcnn_params = { "context_length": context_search_space, "epochs": epochs, "num_batches_per_epoch": num_batches_per_epoch, } deepar_params = { "context_length": context_search_space, "epochs": epochs, "num_batches_per_epoch": num_batches_per_epoch, "distr_output": NegativeBinomialOutput(), } predictor = ForecastingPredictor(path=save_path, eval_metric=eval_metric).fit( train_data, prediction_length=19, quantiles=[0.1, 0.5, 0.9], index_column="name", target_column="ConfirmedCases", time_column="Date", hyperparameter_tune_kwargs={'scheduler': 'local', 'searcher': 'bayesopt', 'num_trials': num_hpo_trials}, hyperparameters={"MQCNN": mqcnn_params, "DeepAR": deepar_params} ) .. parsed-literal:: :class: output Training with dataset in tabular format... Finish rebuilding the data, showing the top five rows. name 2020-01-22 2020-01-23 2020-01-24 2020-01-25 2020-01-26 \ 0 Afghanistan_ 0.0 0.0 0.0 0.0 0.0 1 Albania_ 0.0 0.0 0.0 0.0 0.0 2 Algeria_ 0.0 0.0 0.0 0.0 0.0 3 Andorra_ 0.0 0.0 0.0 0.0 0.0 4 Angola_ 0.0 0.0 0.0 0.0 0.0 2020-01-27 2020-01-28 2020-01-29 2020-01-30 ... 2020-03-24 \ 0 0.0 0.0 0.0 0.0 ... 74.0 1 0.0 0.0 0.0 0.0 ... 123.0 2 0.0 0.0 0.0 0.0 ... 264.0 3 0.0 0.0 0.0 0.0 ... 164.0 4 0.0 0.0 0.0 0.0 ... 3.0 2020-03-25 2020-03-26 2020-03-27 2020-03-28 2020-03-29 2020-03-30 \ 0 84.0 94.0 110.0 110.0 120.0 170.0 1 146.0 174.0 186.0 197.0 212.0 223.0 2 302.0 367.0 409.0 454.0 511.0 584.0 3 188.0 224.0 267.0 308.0 334.0 370.0 4 3.0 4.0 4.0 5.0 7.0 7.0 2020-03-31 2020-04-01 2020-04-02 0 174.0 237.0 273.0 1 243.0 259.0 277.0 2 716.0 847.0 986.0 3 376.0 390.0 428.0 4 7.0 8.0 8.0 [5 rows x 73 columns] Validation data is None, will do auto splitting... Finished processing data, using 0.307506799697876s. Random seed set to 0 All models will be trained for quantiles [0.1, 0.5, 0.9]. Beginning AutoGluon training ... AutoGluon will save models to agModels-covidforecast/ Start hyperparameter tuning for MQCNN .. parsed-literal:: :class: output 0%| | 0/2 [00:00`__ for individual GluonTS models to see all of the hyperparameters you may specify for them. Viewing additional information ------------------------------ We can view a summary of the HPO process, which will show the validation score achieved in each HPO trial as well as which hyperparameter configuration was evaluated in the corresponding trial: .. code:: python predictor.fit_summary() .. parsed-literal:: :class: output Generating leaderboard for all models trained... .. parsed-literal:: :class: output *** Summary of fit() *** Estimated performance of each model: model val_score fit_order 0 DeepAR/trial_0 -0.833660 3 1 DeepAR/trial_1 -0.863342 4 2 MQCNN/trial_0 -0.991608 1 3 MQCNN/trial_1 -0.996403 2 Number of models trained: 4 Types of models trained: {'MQCNNModel', 'DeepARModel'} Hyperparameter-tuning used: True User-specified hyperparameters: {'MQCNN': {'context_length': Int: lower=75, upper=100, 'epochs': 2, 'num_batches_per_epoch': 5}, 'DeepAR': {'context_length': Int: lower=75, upper=100, 'epochs': 2, 'num_batches_per_epoch': 5, 'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput()}} Feature Metadata (Processed): (raw dtype, special dtypes): *** Details of Hyperparameter optimization *** HPO for MQCNN model: Num. configurations tried = 2, Time spent = 6.654607772827148s Best hyperparameter-configuration (validation-performance: mean_wQuantileLoss = -0.9916079769993384): {'context_length': 88} HPO for DeepAR model: Num. configurations tried = 2, Time spent = 66.95265889167786s Best hyperparameter-configuration (validation-performance: mean_wQuantileLoss = -0.8336604527711646): {'context_length': 88} *** End of fit() summary *** .. parsed-literal:: :class: output {'model_types': {'MQCNN/trial_0': 'MQCNNModel', 'MQCNN/trial_1': 'MQCNNModel', 'DeepAR/trial_0': 'DeepARModel', 'DeepAR/trial_1': 'DeepARModel'}, 'model_performance': {'MQCNN/trial_0': -0.9916079769993384, 'MQCNN/trial_1': -0.9964028387055066, 'DeepAR/trial_0': -0.8336604527711646, 'DeepAR/trial_1': -0.863341598531469}, 'model_best': None, 'model_paths': {'MQCNN/trial_0': 'agModels-covidforecast/models/MQCNN/trial_0/', 'MQCNN/trial_1': 'agModels-covidforecast/models/MQCNN/trial_1/', 'DeepAR/trial_0': 'agModels-covidforecast/models/DeepAR/trial_0/', 'DeepAR/trial_1': 'agModels-covidforecast/models/DeepAR/trial_1/'}, 'model_fit_times': {'MQCNN/trial_0': 3.8433167934417725, 'MQCNN/trial_1': 0.5129275321960449, 'DeepAR/trial_0': 30.80358338356018, 'DeepAR/trial_1': 29.442209482192993}, 'hyperparameter_tune': True, 'hyperparameters_userspecified': {'MQCNN': {'context_length': Int: lower=75, upper=100, 'epochs': 2, 'num_batches_per_epoch': 5}, 'DeepAR': {'context_length': Int: lower=75, upper=100, 'epochs': 2, 'num_batches_per_epoch': 5, 'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput()}}, 'hpo_results': {'MQCNN': {'best_reward': -0.9916079769993384, 'best_config': {'context_length': 88}, 'total_time': 6.654607772827148, 'metadata': {'stop_criterion': {'time_limits': None, 'max_reward': None}, 'resources_per_trial': {'num_cpus': 'auto', 'num_gpus': 'auto'}}, 'reward_attr': 'validation_performance', 'args': {'util_args': {'train_data_path': 'dataset_train.p', 'val_data_path': 'dataset_val.p', 'directory': 'agModels-covidforecast/models/MQCNN/', 'model': MQCNN, 'time_start': 1632359430.06532, 'time_limit': None}, 'freq': 'D', 'prediction_length': 19, 'context_length': 88, 'epochs': 2, 'num_batches_per_epoch': 5, 'use_feat_static_cat': False, 'use_feat_static_real': False, 'cardinality': None, 'quantiles': [0.1, 0.5, 0.9], 'callbacks': []}, 'trial_info': {0: {'config': {'context_length': 88}, 'history': [{'epoch': 1, 'trial': 0, 'time_this_iter': 5.026702642440796, 'time_since_start': 5.026702642440796}], 'metadata': {'epoch': 1, 'trial': 0, 'time_this_iter': 5.026702642440796, 'time_since_start': 5.026702642440796}, 'validation_performance': -0.9916079769993384}, 1: {'config': {'context_length': 97}, 'history': [{'epoch': 1, 'trial': 1, 'time_this_iter': 1.5954809188842773, 'time_since_start': 1.5954809188842773}], 'metadata': {'epoch': 1, 'trial': 1, 'time_this_iter': 1.5954809188842773, 'time_since_start': 1.5954809188842773}, 'validation_performance': -0.9964028387055066}}, 'validation_performance': -0.9916079769993384, 'search_space': OrderedDict([('context_length', Int: lower=75, upper=100)])}, 'DeepAR': {'best_reward': -0.8336604527711646, 'best_config': {'context_length': 88}, 'total_time': 66.95265889167786, 'metadata': {'stop_criterion': {'time_limits': None, 'max_reward': None}, 'resources_per_trial': {'num_cpus': 'auto', 'num_gpus': 'auto'}}, 'reward_attr': 'validation_performance', 'args': {'util_args': {'train_data_path': 'dataset_train.p', 'val_data_path': 'dataset_val.p', 'directory': 'agModels-covidforecast/models/DeepAR/', 'model': DeepAR, 'time_start': 1632359436.7424335, 'time_limit': None}, 'freq': 'D', 'prediction_length': 19, 'context_length': 88, 'epochs': 2, 'num_batches_per_epoch': 5, 'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput(), 'use_feat_static_cat': False, 'use_feat_static_real': False, 'cardinality': None, 'quantiles': [0.1, 0.5, 0.9], 'callbacks': []}, 'trial_info': {0: {'config': {'context_length': 88}, 'history': [{'epoch': 1, 'trial': 0, 'time_this_iter': 34.0602662563324, 'time_since_start': 34.0602662563324}], 'metadata': {'epoch': 1, 'trial': 0, 'time_this_iter': 34.0602662563324, 'time_since_start': 34.0602662563324}, 'validation_performance': -0.8336604527711646}, 1: {'config': {'context_length': 96}, 'history': [{'epoch': 1, 'trial': 1, 'time_this_iter': 32.85720777511597, 'time_since_start': 32.85720777511597}], 'metadata': {'epoch': 1, 'trial': 1, 'time_this_iter': 32.85720777511597, 'time_since_start': 32.85720777511597}, 'validation_performance': -0.863341598531469}}, 'validation_performance': -0.8336604527711646, 'search_space': OrderedDict([('context_length', Int: lower=75, upper=100)])}}, 'model_hyperparams': {'MQCNN/trial_0': {'freq': 'D', 'prediction_length': 19, 'context_length': 88, 'epochs': 2, 'num_batches_per_epoch': 5, 'use_feat_static_cat': False, 'use_feat_static_real': False, 'cardinality': None, 'quantiles': [0.1, 0.5, 0.9], 'callbacks': [, ], 'hybridize': False}, 'MQCNN/trial_1': {'freq': 'D', 'prediction_length': 19, 'context_length': 97, 'epochs': 2, 'num_batches_per_epoch': 5, 'use_feat_static_cat': False, 'use_feat_static_real': False, 'cardinality': None, 'quantiles': [0.1, 0.5, 0.9], 'callbacks': [, ], 'hybridize': False}, 'DeepAR/trial_0': {'freq': 'D', 'prediction_length': 19, 'context_length': 88, 'epochs': 2, 'num_batches_per_epoch': 5, 'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput(), 'use_feat_static_cat': False, 'use_feat_static_real': False, 'cardinality': None, 'quantiles': [0.1, 0.5, 0.9], 'callbacks': [, ]}, 'DeepAR/trial_1': {'freq': 'D', 'prediction_length': 19, 'context_length': 96, 'epochs': 2, 'num_batches_per_epoch': 5, 'distr_output': gluonts.mx.distribution.neg_binomial.NegativeBinomialOutput(), 'use_feat_static_cat': False, 'use_feat_static_real': False, 'cardinality': None, 'quantiles': [0.1, 0.5, 0.9], 'callbacks': [, ]}}, 'leaderboard': model val_score fit_order 0 DeepAR/trial_0 -0.833660 3 1 DeepAR/trial_1 -0.863342 4 2 MQCNN/trial_0 -0.991608 1 3 MQCNN/trial_1 -0.996403 2} The ``'best_config'`` field in this summary indicates the hyperparameter configuration that performed best for each model. We can alternatively use the leaderboard to view the performance of each evaluated model/hyperparameter configuration: .. code:: python predictor.leaderboard() .. parsed-literal:: :class: output Generating leaderboard for all models trained... .. raw:: html
model val_score fit_order
0 DeepAR/trial_0 -0.833660 3
1 DeepAR/trial_1 -0.863342 4
2 MQCNN/trial_0 -0.991608 1
3 MQCNN/trial_1 -0.996403 2
Here is yet another way to see which model AutoGluon believes to be the best (based on validation score), which is the model automatically used for prediction by default: .. code:: python predictor._trainer.get_model_best() .. parsed-literal:: :class: output 'DeepAR/trial_0' We can also view information about any model AutoGluon has trained: .. code:: python models_trained = predictor._trainer.get_model_names_all() specific_model = predictor._trainer.load_model(models_trained[0]) specific_model.get_info() .. parsed-literal:: :class: output {'name': 'MQCNN/trial_0', 'model_type': 'MQCNNModel', 'eval_metric': 'mean_wQuantileLoss', 'fit_time': 3.8433167934417725, 'predict_time': 1.0593109130859375, 'val_score': -0.9916079769993384, 'hyperparameters': {'freq': 'D', 'prediction_length': 19, 'context_length': 88, 'epochs': 2, 'num_batches_per_epoch': 5, 'use_feat_static_cat': False, 'use_feat_static_real': False, 'cardinality': None, 'quantiles': [0.1, 0.5, 0.9], 'callbacks': [, ], 'hybridize': False}} Evaluating trained models ------------------------- Given some more recent held-out test data, here's how to just evaluate the default model AutoGluon uses for forecasting without evaluating all of the other models as in ``leaderboard()``: .. code:: python test_data = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv") predictor.evaluate(test_data) # to evaluate specific model, can also specify optional argument: model .. parsed-literal:: :class: output Loaded data from: https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv | Columns = 3 / 3 | Rows = 28483 -> 28483 Does not specify model, will by default use the model with the best validation score for evaluation 100%|██████████| 313/313 [00:02<00:00, 131.64it/s] 100%|██████████| 313/313 [00:00<00:00, 3785.06it/s] Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 961.39it/s] .. parsed-literal:: :class: output 0.8176319396478796 Be aware that without providing extra ``test_data``, AutoGluon's reported validation scores may be slightly optimistic due to adaptive decisions like selecting models/hyperparameters based on the validation data, so it is always a good idea to use some truly held-out test data for an unbiased final evaluation after training has completed. The ground truth time-series targets are often not available when we produce forecasts and only become available later in the future. In such a workflow, we may first produce predictions using AutoGluon, and then later evaluate them without having to recompute the predictions: .. code:: python predictions = predictor.predict(train_data) # before test data have been observed predictor = ForecastingPredictor.load(save_path) # reload predictor in future after test data are observed ForecastingPredictor.evaluate_predictions(forecasts=predictions, targets=test_data, index_column=predictor.index_column, time_column=predictor.time_column, target_column=predictor.target_column, eval_metric=predictor.eval_metric) .. parsed-literal:: :class: output Does not specify model, will by default use the model with the best validation score for prediction Predicting with model DeepAR/trial_0 Loading predictor from path agModels-covidforecast/ Running evaluation: 100%|██████████| 313/313 [00:00<00:00, 958.89it/s] .. parsed-literal:: :class: output 0.817738727172412 Static features --------------- In some forecasting problems involving multiple time-series, each individual time-series may be associated with some static features that do not change over time. For example, if forecasting demand for products over time, each product may be associated with an item category (categorical static feature) and an item vector embedding from a recommender system (numeric static features). AutoGluon allows you to provide such static features such that its models will condition their predictions upon them: .. code:: python static_features = TabularDataset("https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/toy_static_features.csv") static_features.head() .. parsed-literal:: :class: output Loaded data from: https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/toy_static_features.csv | Columns = 3 / 3 | Rows = 313 -> 313 .. raw:: html
name static_cat_feature static_real_feature
0 Afghanistan_ 2 67.03
1 Albania_ 1 4.33
2 Algeria_ 5 19.63
3 Andorra_ 2 42.49
4 Angola_ 3 21.58
Note that each unique value of ``index_column`` in our time series data must be represented as a row in ``static_features`` (in a column whose name matches the ``index_column``) that contains the feature values corresponding to this individual series. AutoGluon can automatically infer which static features are categorical vs. numeric when they are passed into ``fit()``: .. code:: python predictor_static = ForecastingPredictor(path=save_path, eval_metric=eval_metric).fit( train_data, static_features=static_features, prediction_length=19, quantiles=[0.1, 0.5, 0.9], index_column="name", target_column="ConfirmedCases", time_column="Date", presets="low_quality" # last argument is just here for quick demo, omit it in real applications! ) .. parsed-literal:: :class: output Warning: path already exists! This predictor may overwrite an existing predictor! path="agModels-covidforecast" presets is set to be low_quality Training with dataset in tabular format... Finish rebuilding the data, showing the top five rows. name 2020-01-22 2020-01-23 2020-01-24 2020-01-25 2020-01-26 \ 0 Afghanistan_ 0.0 0.0 0.0 0.0 0.0 1 Albania_ 0.0 0.0 0.0 0.0 0.0 2 Algeria_ 0.0 0.0 0.0 0.0 0.0 3 Andorra_ 0.0 0.0 0.0 0.0 0.0 4 Angola_ 0.0 0.0 0.0 0.0 0.0 2020-01-27 2020-01-28 2020-01-29 2020-01-30 ... 2020-03-24 \ 0 0.0 0.0 0.0 0.0 ... 74.0 1 0.0 0.0 0.0 0.0 ... 123.0 2 0.0 0.0 0.0 0.0 ... 264.0 3 0.0 0.0 0.0 0.0 ... 164.0 4 0.0 0.0 0.0 0.0 ... 3.0 2020-03-25 2020-03-26 2020-03-27 2020-03-28 2020-03-29 2020-03-30 \ 0 84.0 94.0 110.0 110.0 120.0 170.0 1 146.0 174.0 186.0 197.0 212.0 223.0 2 302.0 367.0 409.0 454.0 511.0 584.0 3 188.0 224.0 267.0 308.0 334.0 370.0 4 3.0 4.0 4.0 5.0 7.0 7.0 2020-03-31 2020-04-01 2020-04-02 0 174.0 237.0 273.0 1 243.0 259.0 277.0 2 716.0 847.0 986.0 3 376.0 390.0 428.0 4 7.0 8.0 8.0 [5 rows x 73 columns] Validation data is None, will do auto splitting... static feature column static_cat_feature has 10 or less unique values, assuming it is categorical. Fitting IdentityFeatureGenerator... Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Fitting IdentityFeatureGenerator... Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Fitting CategoryFeatureGenerator... Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Fitting CategoryMemoryMinimizeFeatureGenerator... Using previous inferred feature columns... Static Cat Features Dataframe including ['static_cat_feature'] Static Real Features Dataframe including ['static_real_feature'] Finished processing data, using 1.2074267864227295s. Random seed set to 0 All models will be trained for quantiles [0.1, 0.5, 0.9]. Beginning AutoGluon training ... AutoGluon will save models to agModels-covidforecast/ Fitting model: SFF ... Training model SFF... Start model training Epoch[0] Learning rate is 0.001 0%| | 0/10 [00:00
model val_score fit_order test_score
0 SFF -0.660771 1 -0.267795
1 DeepAR -0.764094 3 -0.590745
2 MQCNN -0.920152 2 -0.865667
AutoGluon forecast predictions will now be based on the static features in addition to the historical time-series observations: .. code:: python predictions = predictor_static.predict(test_data, static_features=static_features) print(predictions["Afghanistan_"]) .. parsed-literal:: :class: output Using previous inferred feature columns... Static Cat Features Dataframe including ['static_cat_feature'] Static Real Features Dataframe including ['static_real_feature'] Does not specify model, will by default use the model with the best validation score for prediction Predicting with model SFF .. parsed-literal:: :class: output 0.1 0.5 0.9 2020-04-22 848.724426 1433.520386 2100.333252 2020-04-23 525.479980 1203.280273 1828.540649 2020-04-24 406.676422 1320.063599 2089.422852 2020-04-25 135.429993 1292.610840 2219.305908 2020-04-26 59.995323 1354.163574 2438.460938 2020-04-27 586.250061 1494.719849 2511.511475 2020-04-28 -85.886353 1529.408325 3028.596924 2020-04-29 367.283447 1467.030396 2635.018311 2020-04-30 559.825256 1562.025513 2969.750732 2020-05-01 -58.859753 1607.525146 2800.288818 2020-05-02 311.700958 1617.724609 3071.249023 2020-05-03 -322.320007 1401.588867 2945.138428 2020-05-04 -319.960236 1088.040771 2829.933105 2020-05-05 -403.656525 1754.966675 3624.205811 2020-05-06 -348.057587 1366.859619 4198.331543 2020-05-07 -34.126774 1619.350708 3284.425293 2020-05-08 -1168.174927 1551.574707 3387.256348 2020-05-09 -1259.406860 1204.493530 4268.099121 2020-05-10 -189.284729 1399.963745 3504.257812