autogluon.timeseries.TimeSeriesDataFrame¶
- class autogluon.timeseries.TimeSeriesDataFrame(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
A collection of univariate time series, where each row is identified by an (
item_id,timestamp) pair.For example, a time series data frame could represent the daily sales of a collection of products, where each
item_idcorresponds to a product andtimestampcorresponds to the day of the record.- Parameters:
data (pd.DataFrame, str, pathlib.Path or Iterable) –
Time series data to construct a
TimeSeriesDataFrame. The class currently supports four input formats.Time series data in a pandas DataFrame format without multi-index. For example:
item_id timestamp target 0 0 2019-01-01 0 1 0 2019-01-02 1 2 0 2019-01-03 2 3 1 2019-01-01 3 4 1 2019-01-02 4 5 1 2019-01-03 5 6 2 2019-01-01 6 7 2 2019-01-02 7 8 2 2019-01-03 8
You can also use
from_data_frame()for loading data in such format.Path to a data file in CSV or Parquet format. The file must contain columns
item_idandtimestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also usefrom_path()for loading data in such format.Time series data in pandas DataFrame format with multi-index on
item_idandtimestamp. For example:target item_id timestamp 0 2019-01-01 0 2019-01-02 1 2019-01-03 2 1 2019-01-01 3 2019-01-02 4 2019-01-03 5 2 2019-01-01 6 2019-01-02 7 2019-01-03 8
Time series data in Iterable format. For example:
iterable_dataset = [ {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')}, {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')}, {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')} ]
You can also use
from_iterable_dataset()for loading data in such format.static_features (pd.DataFrame, str or pathlib.Path, optional) –
An optional data frame describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if
TimeSeriesDataFramecontains sales of various products, static features may refer to time-independent features like color or brand.The index of the
static_featuresindex must contain a single entry for each item present in the respectiveTimeSeriesDataFrame. For example, the followingTimeSeriesDataFrame:target item_id timestamp A 2019-01-01 0 2019-01-02 1 2019-01-03 2 B 2019-01-01 3 2019-01-02 4 2019-01-03 5
is compatible with the following
static_features:feat_1 feat_2 item_id A 2.0 bar B 5.0 foo
TimeSeriesDataFramewill ensure consistency of static features during serialization/deserialization, copy and slice operations.If
static_featuresare provided duringfit, theTimeSeriesPredictorexpects the same metadata to be available during prediction time.id_column (str, optional) – Name of the
item_idcolumn, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).timestamp_column (str, optional) – Name of the
timestampcolumn, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).
- __init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
Methods
Assign new columns to the time series dataframe.
Convert each time series in the data frame to the given frequency.
Make a copy of the TimeSeriesDataFrame.
Drop rows containing NaNs.
Fill missing values represented by NaN.
Construct a
TimeSeriesDataFramefrom a pandas DataFrame.Construct a
TimeSeriesDataFramefrom an Iterable of dictionaries each of which represent a single time series.Construct a
TimeSeriesDataFramefrom a CSV or Parquet file.Convenience method to read pickled time series data frames.
Prepare model inputs necessary to predict the last
prediction_lengthtime steps of each time series in the dataset.Infer the time series frequency based on the timestamps of the observations.
Length of each time series in the dataframe.
Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.
Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.
Sort object by labels (along an axis).
Split dataframe to two different
TimeSeriesDataFrames before and after a certaincutoff_time.Convert TimeSeriesDataFrame to a pandas.DataFrame
Generate a train/test split from the given dataset.
Attributes
freqInferred pandas-compatible frequency of the timestamps in the data frame.
item_idsList of unique time series IDs contained in the data set.
num_itemsNumber of items (time series) in the data set.
static_features