TimeSeriesPredictor.evaluate¶

Evaluate the forecast accuracy for given dataset.

This method measures the forecast accuracy using the last self.prediction_length time steps of each time series in data as a hold-out set.

Note

Metrics are always reported in ‘higher is better’ format. This means that metrics such as MASE or MAPE will be multiplied by -1, so their values will be negative. This is necessary to avoid the user needing to know the metric to understand if higher is better when looking at the evaluation results.

Parameters:

data (Union[TimeSeriesDataFrame, pd.DataFrame, Path, str]) –
The data to evaluate the best model on. The last prediction_length time steps of each time series in data will be held out for prediction and forecast accuracy will be calculated on these time steps.

Must include both historic and future data (i.e., length of all time series in data must be at least prediction_length + 1).

If known_covariates_names were specified when creating the predictor, data must include the columns listed in known_covariates_names with the covariates values aligned with the target time series.

If train_data used to train the predictor contained past covariates or static features, then data must also include them (with same column names and dtypes).

If provided data is an instance of pandas DataFrame, AutoGluon will attempt to automatically convert it to a TimeSeriesDataFrame.
model (str, optional) – Name of the model that you would like to evaluate. By default, the best model during training (with highest validation score) will be used.
metrics (str, TimeSeriesScorer or List[Union[str, TimeSeriesScorer]], optional) – Metric or a list of metrics to compute scores with. Defaults to self.eval_metric. Supports both metric names as strings and custom metrics based on TimeSeriesScorer.
display (bool, default = False) – If True, the scores will be printed.
use_cache (bool, default = True) – If True, will attempt to use the cached predictions. If False, cached predictions will be ignored. This argument is ignored if cache_predictions was set to False when creating the TimeSeriesPredictor.

Returns:

scores_dict – Dictionary where keys = metrics, values = performance along each metric. For consistency, error metrics will have their signs flipped to obey this convention. For example, negative MAPE values will be reported. To get the eval_metric score, do output[predictor.eval_metric.name].

Return type:

Dict[str, float]