autogluon.timeseries.TimeSeriesDataFrame¶
- class autogluon.timeseries.TimeSeriesDataFrame(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
- A collection of univariate time series, where each row is identified by an ( - item_id,- timestamp) pair.- For example, a time series data frame could represent the daily sales of a collection of products, where each - item_idcorresponds to a product and- timestampcorresponds to the day of the record.- Parameters:
- data (pd.DataFrame, str, pathlib.Path or Iterable) – - Time series data to construct a - TimeSeriesDataFrame. The class currently supports four input formats.- Time series data in a pandas DataFrame format without multi-index. For example: - item_id timestamp target 0 0 2019-01-01 0 1 0 2019-01-02 1 2 0 2019-01-03 2 3 1 2019-01-01 3 4 1 2019-01-02 4 5 1 2019-01-03 5 6 2 2019-01-01 6 7 2 2019-01-02 7 8 2 2019-01-03 8 
 - You can also use - from_data_frame()for loading data in such format.- Path to a data file in CSV or Parquet format. The file must contain columns - item_idand- timestamp, as well as columns with time series values. This is similar to Option 1 above (pandas DataFrame format without multi-index). Both remote (e.g., S3) and local paths are accepted. You can also use- from_path()for loading data in such format.
- Time series data in pandas DataFrame format with multi-index on - item_idand- timestamp. For example:- target item_id timestamp 0 2019-01-01 0 2019-01-02 1 2019-01-03 2 1 2019-01-01 3 2019-01-02 4 2019-01-03 5 2 2019-01-01 6 2019-01-02 7 2019-01-03 8 
- Time series data in Iterable format. For example: - iterable_dataset = [ {"target": [0, 1, 2], "start": pd.Period("01-01-2019", freq='D')}, {"target": [3, 4, 5], "start": pd.Period("01-01-2019", freq='D')}, {"target": [6, 7, 8], "start": pd.Period("01-01-2019", freq='D')} ] 
 - You can also use - from_iterable_dataset()for loading data in such format.
- static_features (pd.DataFrame, str or pathlib.Path, optional) – - An optional data frame describing the metadata of each individual time series that does not change with time. Can take real-valued or categorical values. For example, if - TimeSeriesDataFramecontains sales of various products, static features may refer to time-independent features like color or brand.- The index of the - static_featuresindex must contain a single entry for each item present in the respective- TimeSeriesDataFrame. For example, the following- TimeSeriesDataFrame:- target item_id timestamp A 2019-01-01 0 2019-01-02 1 2019-01-03 2 B 2019-01-01 3 2019-01-02 4 2019-01-03 5 - is compatible with the following - static_features:- feat_1 feat_2 item_id A 2.0 bar B 5.0 foo - TimeSeriesDataFramewill ensure consistency of static features during serialization/deserialization, copy and slice operations.- If - static_featuresare provided during- fit, the- TimeSeriesPredictorexpects the same metadata to be available during prediction time.
- id_column (str, optional) – Name of the - item_idcolumn, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).
- timestamp_column (str, optional) – Name of the - timestampcolumn, if it’s different from the default. This argument is only used when constructing a TimeSeriesDataFrame using format 1 (DataFrame without multi-index) or 2 (path to a file).
- num_cpus (int, default = -1) – Number of CPU cores used to process the iterable dataset in parallel. Set to -1 to use all cores. This argument is only used when constructing a TimeSeriesDataFrame using format 4 (iterable dataset). 
 
 - __init__(data: DataFrame | str | Path | Iterable, static_features: DataFrame | str | Path | None = None, id_column: str | None = None, timestamp_column: str | None = None, num_cpus: int = -1, *args, **kwargs)[source]¶
 - Methods - Assign new columns to the time series dataframe. - Convert each time series in the data frame to the given frequency. - Make a copy of the TimeSeriesDataFrame. - Drop rows containing NaNs. - Fill missing values represented by NaN. - Construct a - TimeSeriesDataFramefrom a pandas DataFrame.- Construct a - TimeSeriesDataFramefrom an Iterable of dictionaries each of which represent a single time series.- Construct a - TimeSeriesDataFramefrom a CSV or Parquet file.- Convenience method to read pickled time series data frames. - Prepare model inputs necessary to predict the last - prediction_lengthtime steps of each time series in the dataset.- Infer the time series frequency based on the timestamps of the observations. - Length of each time series in the dataframe. - Select a subsequence from each time series between start (inclusive) and end (exclusive) timestamps. - Select a subsequence from each time series between start (inclusive) and end (exclusive) indices. - Sort object by labels (along an axis). - Split dataframe to two different - TimeSeriesDataFrames before and after a certain- cutoff_time.- Convert TimeSeriesDataFrame to a pandas.DataFrame - Generate a train/test split from the given dataset. - Attributes - freq- Inferred pandas-compatible frequency of the timestamps in the data frame. - item_ids- List of unique time series IDs contained in the data set. - num_items- Number of items (time series) in the data set. - static_features