TimeSeriesPredictor.fit#

TimeSeriesPredictor.fit(train_data: Union[TimeSeriesDataFrame, DataFrame, str], tuning_data: Optional[Union[TimeSeriesDataFrame, DataFrame, str]] = None, time_limit: Optional[int] = None, presets: Optional[str] = None, hyperparameters: Dict[Union[str, Type], Any] = None, hyperparameter_tune_kwargs: Optional[Union[str, Dict]] = None, excluded_model_types: Optional[List[str]] = None, num_val_windows: int = 1, val_step_size: Optional[int] = None, refit_every_n_windows: int = 1, refit_full: bool = False, enable_ensemble: bool = True, random_seed: Optional[int] = None, verbosity: Optional[int] = None) → TimeSeriesPredictor[source]#

Fit probabilistic forecasting models to the given time series dataset.

Parameters

train_data (Union[TimeSeriesDataFrame, pd.DataFrame, str]) –
Training data in the TimeSeriesDataFrame format.

Time series with length <= (num_val_windows + 1) * prediction_length will be ignored during training. See num_val_windows for details.

If known_covariates_names were specified when creating the predictor, train_data must include the columns listed in known_covariates_names with the covariates values aligned with the target time series. The known covariates must have a numeric (float or integer) dtype.

Columns of train_data except target and those listed in known_covariates_names will be interpreted as past_covariates - covariates that are known only in the past.

If train_data has static features (i.e., train_data.static_features is a pandas DataFrame), the predictor will interpret columns with int and float dtypes as continuous (real-valued) features, columns with object and str dtypes as categorical features, and will ignore the rest of columns.

For example, to ensure that column “store_id” with dtype int is interpreted as a category, we need to change its type to category:
```
data.static_features["store_id"] = data.static_features["store_id"].astype("category")
```
If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it to a TimeSeriesDataFrame.
tuning_data (Union[TimeSeriesDataFrame, pd.DataFrame, str], optional) –
Data reserved for model selection and hyperparameter tuning, rather than training individual models. Also used to compute the validation scores. Note that only the last prediction_length time steps of each time series are used for computing the validation score.

If tuning_data is provided, multi-window backtesting on training data will be disabled, the num_val_windows will be set to 0, and refit_full will be set to False.

Leaving this argument empty and letting AutoGluon automatically generate the validation set from train_data is a good default.

If known_covariates_names were specified when creating the predictor, tuning_data must also include the columns listed in known_covariates_names with the covariates values aligned with the target time series.

If train_data has past covariates or static features, tuning_data must have also include them (with same columns names and dtypes).

If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it to a TimeSeriesDataFrame.
time_limit (int, optional) – Approximately how long fit() will run (wall-clock time in seconds). If not specified, fit() will run until all models have completed training.
presets (str, optional) –
Optional preset configurations for various arguments in fit().

Can significantly impact predictive accuracy, memory footprint, inference latency of trained models, and various other properties of the returned predictor. It is recommended to specify presets and avoid specifying most other fit() arguments or model hyperparameters prior to becoming familiar with AutoGluon. For example, set presets="high_quality" to get a high-accuracy predictor, or set presets="fast_training" to quickly fit multiple simple statistical models. Any user-specified arguments in fit() will override the values used by presets.

Available presets:
- "fast_training": fit simple statistical models (ETS, Theta, Naive, SeasonalNaive) + fast tree-based model RecursiveTabular. These models are fast to train but may not be very accurate.
- "medium_quality": all models mentioned above + deep learning model DeepAR. Default setting that produces good forecasts with reasonable training time.
- "high_quality": all models mentioned above + automatically tuned statistical models (AutoETS, AutoARIMA) + tree-based model DirectTabular + deep learning models TemporalFusionTransformer and PatchTST . Much more accurate than medium_quality, but takes longer to train.
- "best_quality": all models mentioned above + more tabular models + training multiple copies of DeepAR. Usually better than high_quality, but takes even longer to train.
Details for these presets can be found in autogluon/timeseries/configs/presets_configs.py. If not provided, user-provided values for hyperparameters and hyperparameter_tune_kwargs will be used (defaulting to their default values specified below).
hyperparameters (str or dict, default = "medium_quality") –
Determines what models are trained and what hyperparameters are used by each model.

If str is passed, will use a preset hyperparameter configuration defined in` autogluon/timeseries/trainer/models/presets.py`.

If dict is provided, the keys are strings or types that indicate which models to train. Each value is itself a dict containing hyperparameters for each of the trained models, or a list of such dicts. Any omitted hyperparameters not specified here will be set to default. For example:
```
predictor.fit(
    ...
    hyperparameters={
        "DeepAR": {},
        "ETS": [
            {"seasonal": "add"},
            {"seasonal": None},
        ],
    }
)
```
The above example will train three models:
- DeepAR with default hyperparameters
- ETS with additive seasonality (all other parameters set to their defaults)
- ETS with seasonality disabled (all other parameters set to their defaults)
Full list of available models and their hyperparameters is provided in forecasting_zoo.

The hyperparameters for each model can be fixed values (as shown above), or search spaces over which hyperparameter optimization is performed. A search space should only be provided when hyperparameter_tune_kwargs is given (i.e., hyperparameter-tuning is utilized). For example:
```
from autogluon.common import space

predictor.fit(
    ...
    hyperparameters={
        "DeepAR": {
            "hidden_size": space.Int(20, 100),
            "dropout_rate": space.Categorical(0.1, 0.3),
        },
    },
    hyperparameter_tune_kwargs="auto",
)
```
In the above example, multiple versions of the DeepAR model with different values of the parameters “hidden_size” and “dropout_rate” will be trained.
hyperparameter_tune_kwargs (str or dict, optional) –
Hyperparameter tuning strategy and kwargs (for example, how many HPO trials to run). If None, then hyperparameter tuning will not be performed.

Currently, only HPO based on random search is supported for time series models.

Setting this parameter to string "random" performs 10 trials of random search per model.

We can change the number of random search trials per model by passing a dictionary as hyperparameter_tune_kwargs. The dict must include the following keys
- "num_trials": int, number of configurations to train for each tuned model
- "searcher": currently, the only supported option is "random" (random search)
- "scheduler": currently, the only supported option is "local" (all models trained on the same machine)
Example:
```
predictor.fit(
    ...
    hyperparameter_tune_kwargs={
        "num_trials": 5,
        "searcher": "random",
        "scheduler": "local",
    },
)
```
excluded_model_types (List[str], optional) –
Banned subset of model types to avoid training during fit(), even if present in hyperparameters. For example, the following code will train all models included in the high_quality presets except DeepAR:
```
predictor.fit(
    ...,
    presets="high_quality",
    excluded_model_types=["DeepAR"],
)
```
num_val_windows (int or None, default = 1) –
Number of backtests done on train_data for each trained model to estimate the validation performance. If num_val_windows=None, the predictor will attempt to set this parameter automatically based on the length of time series in train_data (at most to 5).

Increasing this parameter increases the training time roughly by a factor of num_val_windows // refit_every_n_windows. See refit_every_n_windows and val_step_size: for details.

For example, for prediction_length=2, num_val_windows=3 and val_step_size=1 the folds are:
```
|-------------------|
| x x x x x y y - - |
| x x x x x x y y - |
| x x x x x x x y y |
```
where x are the train time steps and y are the validation time steps.

This argument has no effect if tuning_data is provided.
val_step_size (int or None, default = None) –
Step size between consecutive validation windows. If set to None, defaults to prediction_length provided when creating the predictor.

This argument has no effect if tuning_data is provided.
refit_every_n_windows (int or None, default = 1) – When performing cross validation, each model will be retrained every refit_every_n_windows validation windows. If set to None, model will only be fit once for the first validation window.
refit_full (bool, default = False) – If True, after training is complete, AutoGluon will attempt to re-train all models using all of training data (including the data initially reserved for validation). This argument has no effect if tuning_data is provided.
enable_ensemble (bool, default = True) – If True, the TimeSeriesPredictor will fit a simple weighted ensemble on top of the models specified via hyperparameters.
random_seed (int, optional) – If provided, fixes the seed of the random number generator for all models. This guarantees reproducible results for most models (except those trained on GPU because of the non-determinism of GPU operations).
verbosity (int, optional) – If provided, overrides the verbosity value used when creating the TimeSeriesPredictor. See documentation for TimeSeriesPredictor for more details.