autogluon.timeseries.TimeSeriesDataFrame¶
- class autogluon.timeseries.TimeSeriesDataFrame(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
A collection of univariate time series, where each row is identified by an (
item_id,timestamp) pair.For example, a time series data frame could represent the daily sales of a collection of products, where each
item_idcorresponds to a product andtimestampcorresponds to the day of the record.- Parameters:
data (pd.DataFrame, str, pathlib.Path or Iterable) –
Time series data to construct a
TimeSeriesDataFrame. The class currently supports four input formats.Time series data in a pandas DataFrame format without multi-index. For example:
item_id timestamp target 0 0 2019-01-01 0 1 0 2019-01-02 1 2 0 2019-01-03 2 3 1 2019-01-01 3 4 1 2019-01-02 4 5 1 2019-01-03 5 6 2 2019-01-01 6 7 2 2019-01-02 7 8 2 2019-01-03 8
You can also use
from_data_frame()for loading data in such format.Path to a data file in CSV or Parquet format. The file must contain columns
item_idandtimestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also usefrom_path()for loading data in such format.Time series data in pandas DataFrame format with multi-index on
item_idandtimestamp. For example:target item_id timestamp 0 2019-01-01 0 2019-01-02 1 2019-01-03 2 1 2019-01-01 3 2019-01-02 4 2019-01-03 5 2 2019-01-01 6 2019-01-02 7 2019-01-03 8
Time series data in Iterable format. For example:
iterable_dataset = [ {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')}, {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')}, {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')} ]
You can also use
from_iterable_dataset()for loading data in such format.static_features (pd.DataFrame, str or pathlib.Path, optional) –
An optional data frame describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if
TimeSeriesDataFramecontains sales of various products, static features may refer to time-independent features like color or brand.The index of the
static_featuresindex must contain a single entry for each item present in the respectiveTimeSeriesDataFrame. For example, the followingTimeSeriesDataFrame:target item_id timestamp A 2019-01-01 0 2019-01-02 1 2019-01-03 2 B 2019-01-01 3 2019-01-02 4 2019-01-03 5
is compatible with the following
static_features:feat_1 feat_2 item_id A 2.0 bar B 5.0 foo
TimeSeriesDataFramewill ensure consistency of static features during serialization/deserialization, copy and slice operations.If
static_featuresare provided duringfit, theTimeSeriesPredictorexpects the same metadata to be available during prediction time.id_column (str, optional) – Name of the
item_idcolumn, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).timestamp_column (str, optional) – Name of the
timestampcolumn, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset).
- freq¶
A pandas-compatible string describing the frequency of the time series. For example
"D"for daily data,"h"for hourly data, etc. This attribute is determined automatically based on the timestamps. For the full list of possible values, see pandas documentation.- Type:
str
- num_items¶
Number of items (time series) in the data set.
- Type:
int
- item_ids¶
List of unique time series IDs contained in the data set.
- Type:
pd.Index
- __init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
Methods
Convert each time series in the data frame to the given frequency.
Make a copy of the TimeSeriesDataFrame.
Drop rows containing NaNs.
Fill missing values represented by NaN.
Construct a
TimeSeriesDataFramefrom a pandas DataFrame.Construct a
TimeSeriesDataFramefrom an Iterable of dictionaries each of which represent a single time series.Construct a
TimeSeriesDataFramefrom a CSV or Parquet file.Convenience method to read pickled time series data frames.
Prepare model inputs necessary to predict the last
prediction_lengthtime steps of each time series in the dataset.Length of each time series in the dataframe.
Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps.
Select a subsequence from each time series between start (inclusive) and end (exclusive) indices.
Split dataframe to two different
TimeSeriesDataFrames before and after a certaincutoff_time.Generate a train/test split from the given dataset.
Attributes