Predicting Columns in a Table - Deployment Optimization#

This tutorial will cover how to perform the end-to-end AutoML process to create an optimized and deployable AutoGluon artifact for production usage.

This tutorial assumes you have already read Predicting Columns in a Table - Quick Start and Predicting Columns in a Table - In Depth.

Fitting a TabularPredictor#

We will again use the AdultIncome dataset as in the previous tutorials and train a predictor to predict whether the person’s income exceeds $50,000 or not, which is recorded in the class column of this table.

from autogluon.tabular import TabularDataset, TabularPredictor
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
label = 'class'
subsample_size = 500  # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capital-gain	capital-loss	hours-per-week	native-country	class
6118	51	Private	39264	Some-college	10	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	40	United-States	>50K
23204	58	Private	51662	10th	6	Married-civ-spouse	Other-service	Wife	White	Female	0	0	8	United-States	<=50K
29590	40	Private	326310	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	44	United-States	<=50K
18116	37	Private	222450	HS-grad	9	Never-married	Sales	Not-in-family	White	Male	0	2339	40	El-Salvador	<=50K
33964	62	Private	109190	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	15024	0	40	United-States	>50K

save_path = 'agModels-predictClass-deployment'  # specifies folder to store trained models
predictor = TabularPredictor(label=label, path=save_path).fit(train_data)

/home/ci/autogluon/core/src/autogluon/core/utils/utils.py:564: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context("mode.use_inf_as_na", True):  # treat None, NaN, INF, NINF as NA
Beginning AutoGluon training ...
AutoGluon will save models to "agModels-predictClass-deployment"
AutoGluon Version:  0.8.3b20231012
Python Version:     3.10.8
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Tue Nov 30 00:17:50 UTC 2021
Disk Space Avail:   231.06 GB / 274.87 GB (84.1%)
Train Data Rows:    500
Train Data Columns: 14
Label Column: class
Preprocessing data ...
/home/ci/autogluon/core/src/autogluon/core/utils/utils.py:564: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context("mode.use_inf_as_na", True):  # treat None, NaN, INF, NINF as NA
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
/home/ci/autogluon/tabular/src/autogluon/tabular/learner/default_learner.py:215: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context("mode.use_inf_as_na", True):  # treat None, NaN, INF, NINF as NA
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    31024.29 MB
	Train Data (Original)  Memory Usage: 0.29 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif ...
	0.73	 = Validation score   (accuracy)
	0.01s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: KNeighborsDist ...
	0.65	 = Validation score   (accuracy)
	0.01s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBMXT ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '997.3333333333334' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.83	 = Validation score   (accuracy)
	0.23s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBM ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '997.3333333333334' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.85	 = Validation score   (accuracy)
	0.19s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: RandomForestGini ...
	0.84	 = Validation score   (accuracy)
	0.53s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: RandomForestEntr ...
	0.83	 = Validation score   (accuracy)
	0.47s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: CatBoost ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '997.3333333333334' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.85	 = Validation score   (accuracy)
	0.84s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: ExtraTreesGini ...
	0.82	 = Validation score   (accuracy)
	0.47s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: ExtraTreesEntr ...
	0.81	 = Validation score   (accuracy)
	0.48s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: NeuralNetFastAI ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '997.3333333333334' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/data/transforms.py:225: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if is_categorical_dtype(col):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/data/transforms.py:225: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if is_categorical_dtype(col):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
	0.82	 = Validation score   (accuracy)
	2.21s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: XGBoost ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '997.3333333333334' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
/home/ci/opt/venv/lib/python3.10/site-packages/xgboost/data.py:440: FutureWarning: is_sparse is deprecated and will be removed in a future version. Check `isinstance(dtype, pd.SparseDtype)` instead.
  if is_sparse(data):
	0.87	 = Validation score   (accuracy)
	0.32s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetTorch ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '997.3333333333334' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.83	 = Validation score   (accuracy)
	1.1s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBMLarge ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '997.3333333333334' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.83	 = Validation score   (accuracy)
	0.42s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	0.87	 = Validation score   (accuracy)
	0.7s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 8.53s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("agModels-predictClass-deployment")

Next, load separate test data to demonstrate how to make predictions on new examples at inference time:

test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label]  # values to predict
test_data.head()

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capital-loss	hours-per-week	native-country	class
0	31	Private	169085	11th	7	Married-civ-spouse	Sales	Wife	White	Female	0	20	United-States	<=50K
1	17	Self-emp-not-inc	226203	12th	8	Never-married	Sales	Own-child	White	Male	0	45	United-States	<=50K
2	47	Private	54260	Assoc-voc	11	Married-civ-spouse	Exec-managerial	Husband	White	Male	1887	60	United-States	>50K
3	21	Private	176262	Some-college	10	Never-married	Exec-managerial	Own-child	White	Female	0	30	United-States	<=50K
4	17	Private	241185	12th	8	Never-married	Prof-specialty	Own-child	White	Male	0	20	United-States	<=50K

We use our trained models to make predictions on the new data:

predictor = TabularPredictor.load(save_path)  # unnecessary, just demonstrates how to load previously-trained predictor from file

y_pred = predictor.predict(test_data)
y_pred

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)

      <=50K
      <=50K
      <=50K
      <=50K
      <=50K
         ...  
   <=50K
   <=50K
   <=50K
   <=50K
   <=50K
Name: class, Length: 9769, dtype: object

We can use leaderboard to evaluate the performance of each individual trained model on our labeled test data:

predictor.leaderboard(test_data, silent=True)

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):

	model	score_test	score_val	pred_time_test	pred_time_val	fit_time	pred_time_test_marginal	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	RandomForestGini	0.842870	0.84	0.159847	0.059480	0.525134	0.159847	0.059480	0.525134	1	True	5
1	CatBoost	0.842461	0.85	0.012938	0.006115	0.842724	0.012938	0.006115	0.842724	1	True	7
2	RandomForestEntr	0.841130	0.83	0.130912	0.059772	0.472541	0.130912	0.059772	0.472541	1	True	6
3	LightGBM	0.839799	0.85	0.020402	0.005797	0.190635	0.020402	0.005797	0.190635	1	True	4
4	XGBoost	0.837445	0.87	0.076095	0.008456	0.315308	0.076095	0.008456	0.315308	1	True	11
5	WeightedEnsemble_L2	0.837445	0.87	0.077545	0.009279	1.020229	0.001450	0.000823	0.704921	2	True	14
6	LightGBMXT	0.836421	0.83	0.011515	0.007926	0.234879	0.011515	0.007926	0.234879	1	True	3
7	ExtraTreesGini	0.833862	0.82	0.128559	0.059644	0.473758	0.128559	0.059644	0.473758	1	True	8
8	ExtraTreesEntr	0.833862	0.81	0.131432	0.060292	0.477145	0.131432	0.060292	0.477145	1	True	9
9	NeuralNetTorch	0.833555	0.83	0.056816	0.011365	1.104888	0.056816	0.011365	1.104888	1	True	12
10	LightGBMLarge	0.828949	0.83	0.022899	0.006103	0.415096	0.022899	0.006103	0.415096	1	True	13
11	NeuralNetFastAI	0.818610	0.82	0.144206	0.011861	2.205262	0.144206	0.011861	2.205262	1	True	10
12	KNeighborsUnif	0.725970	0.73	0.029566	0.015216	0.006952	0.029566	0.015216	0.006952	1	True	1
13	KNeighborsDist	0.695158	0.65	0.025363	0.014079	0.005082	0.025363	0.014079	0.005082	1	True	2

Snapshot a Predictor with .clone()#

Now that we have a working predictor artifact, we may want to alter it in a variety of ways to better suite our needs. For example, we may want to delete certain models to reduce disk usage via .delete_models(), or train additional models on top of the ones we already have via .fit_extra().

While you can do all of these operations on your predictor, you may want to be able to be able to revert to a prior state of the predictor in case something goes wrong. This is where predictor.clone() comes in.

predictor.clone() allows you to create a snapshot of the given predictor, cloning the artifacts of the predictor to a new location. You can then freely play around with the predictor and always load the earlier snapshot in case you want to undo your actions.

All you need to do to clone a predictor is specify a new directory path to clone to:

save_path_clone = save_path + '-clone'
# will return the path to the cloned predictor, identical to save_path_clone
path_clone = predictor.clone(path=save_path_clone)

Cloned TabularPredictor located in 'agModels-predictClass-deployment' to 'agModels-predictClass-deployment-clone'.
	To load the cloned predictor: predictor_clone = TabularPredictor.load(path="agModels-predictClass-deployment-clone")

Note that this logic doubles disk usage, as it completely clones every predictor artifact on disk to make an exact replica.

Now we can load the cloned predictor:

predictor_clone = TabularPredictor.load(path=path_clone)
# You can alternatively load the cloned TabularPredictor at the time of cloning:
# predictor_clone = predictor.clone(path=save_path_clone, return_clone=True)

We can see that the cloned predictor has the same leaderboard and functionality as the original:

y_pred_clone = predictor.predict(test_data)
y_pred_clone

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)

      <=50K
      <=50K
      <=50K
      <=50K
      <=50K
         ...  
   <=50K
   <=50K
   <=50K
   <=50K
   <=50K
Name: class, Length: 9769, dtype: object

y_pred.equals(y_pred_clone)

True

predictor_clone.leaderboard(test_data, silent=True)

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):

	model	score_test	score_val	pred_time_test	pred_time_val	fit_time	pred_time_test_marginal	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	RandomForestGini	0.842870	0.84	0.163769	0.059480	0.525134	0.163769	0.059480	0.525134	1	True	5
1	CatBoost	0.842461	0.85	0.012742	0.006115	0.842724	0.012742	0.006115	0.842724	1	True	7
2	RandomForestEntr	0.841130	0.83	0.128566	0.059772	0.472541	0.128566	0.059772	0.472541	1	True	6
3	LightGBM	0.839799	0.85	0.023390	0.005797	0.190635	0.023390	0.005797	0.190635	1	True	4
4	XGBoost	0.837445	0.87	0.069778	0.008456	0.315308	0.069778	0.008456	0.315308	1	True	11
5	WeightedEnsemble_L2	0.837445	0.87	0.071240	0.009279	1.020229	0.001462	0.000823	0.704921	2	True	14
6	LightGBMXT	0.836421	0.83	0.013749	0.007926	0.234879	0.013749	0.007926	0.234879	1	True	3
7	ExtraTreesGini	0.833862	0.82	0.127773	0.059644	0.473758	0.127773	0.059644	0.473758	1	True	8
8	ExtraTreesEntr	0.833862	0.81	0.129687	0.060292	0.477145	0.129687	0.060292	0.477145	1	True	9
9	NeuralNetTorch	0.833555	0.83	0.056010	0.011365	1.104888	0.056010	0.011365	1.104888	1	True	12
10	LightGBMLarge	0.828949	0.83	0.022320	0.006103	0.415096	0.022320	0.006103	0.415096	1	True	13
11	NeuralNetFastAI	0.818610	0.82	0.147886	0.011861	2.205262	0.147886	0.011861	2.205262	1	True	10
12	KNeighborsUnif	0.725970	0.73	0.025684	0.015216	0.006952	0.025684	0.015216	0.006952	1	True	1
13	KNeighborsDist	0.695158	0.65	0.025233	0.014079	0.005082	0.025233	0.014079	0.005082	1	True	2

Now let’s do some extra logic with the clone, such as calling refit_full:

predictor_clone.refit_full()

predictor_clone.leaderboard(test_data, silent=True)

Refitting models via `predictor.refit_full` using all of the data (combined train and validation)...
	Models trained in this way will have the suffix "_FULL" and have NaN validation score.
	This process is not bound by time_limit, but should take less time than the original `predictor.fit` call.
	To learn more, refer to the `.refit_full` method docstring which explains how "_FULL" models differ from normal models.
Fitting 1 L1 models ...
Fitting model: KNeighborsUnif_FULL ...
	0.0s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: KNeighborsDist_FULL ...
	0.0s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: LightGBMXT_FULL ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.14s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: LightGBM_FULL ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.17s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: RandomForestGini_FULL ...
	0.51s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: RandomForestEntr_FULL ...
	0.48s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: CatBoost_FULL ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.02s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: ExtraTreesGini_FULL ...
	0.49s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: ExtraTreesEntr_FULL ...
	0.49s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: NeuralNetFastAI_FULL ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/data/transforms.py:225: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if is_categorical_dtype(col):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/data/transforms.py:225: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if is_categorical_dtype(col):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
No improvement since epoch 0: early stopping
	0.36s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: XGBoost_FULL ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
/home/ci/opt/venv/lib/python3.10/site-packages/xgboost/data.py:440: FutureWarning: is_sparse is deprecated and will be removed in a future version. Check `isinstance(dtype, pd.SparseDtype)` instead.
  if is_sparse(data):
	0.14s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: NeuralNetTorch_FULL ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.58s	 = Training   runtime
Fitting 1 L1 models ...
Fitting model: LightGBMLarge_FULL ...
/home/ci/autogluon/common/src/autogluon/common/utils/pandas_utils.py:50: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '1097.3333333333335' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  memory_usage[column] = (
	0.21s	 = Training   runtime
Fitting model: WeightedEnsemble_L2_FULL | Skipping fit via cloning parent ...
	0.7s	 = Training   runtime
Updated best model to "WeightedEnsemble_L2_FULL" (Previously "WeightedEnsemble_L2"). AutoGluon will default to using "WeightedEnsemble_L2_FULL" for predict() and predict_proba().
Refit complete, total runtime = 3.99s ... Best model: "WeightedEnsemble_L2_FULL"
/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):

	model	score_test	score_val	pred_time_test	pred_time_val	fit_time	pred_time_test_marginal	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	CatBoost_FULL	0.842870	NaN	0.012193	NaN	0.023516	0.012193	NaN	0.023516	1	True	21
1	RandomForestGini	0.842870	0.84	0.160603	0.059480	0.525134	0.160603	0.059480	0.525134	1	True	5
2	CatBoost	0.842461	0.85	0.012979	0.006115	0.842724	0.012979	0.006115	0.842724	1	True	7
3	RandomForestEntr	0.841130	0.83	0.134090	0.059772	0.472541	0.134090	0.059772	0.472541	1	True	6
4	LightGBM_FULL	0.840823	NaN	0.024113	NaN	0.166357	0.024113	NaN	0.166357	1	True	18
5	LightGBM	0.839799	0.85	0.021576	0.005797	0.190635	0.021576	0.005797	0.190635	1	True	4
6	RandomForestGini_FULL	0.839390	NaN	0.161188	NaN	0.506023	0.161188	NaN	0.506023	1	True	19
7	RandomForestEntr_FULL	0.839185	NaN	0.129494	NaN	0.483882	0.129494	NaN	0.483882	1	True	20
8	LightGBMXT_FULL	0.837957	NaN	0.014557	NaN	0.143598	0.014557	NaN	0.143598	1	True	17
9	XGBoost	0.837445	0.87	0.073125	0.008456	0.315308	0.073125	0.008456	0.315308	1	True	11
10	WeightedEnsemble_L2	0.837445	0.87	0.074626	0.009279	1.020229	0.001501	0.000823	0.704921	2	True	14
11	LightGBMXT	0.836421	0.83	0.014218	0.007926	0.234879	0.014218	0.007926	0.234879	1	True	3
12	ExtraTreesEntr_FULL	0.835705	NaN	0.131094	NaN	0.487304	0.131094	NaN	0.487304	1	True	23
13	NeuralNetTorch_FULL	0.835091	NaN	0.049277	NaN	0.581399	0.049277	NaN	0.581399	1	True	26
14	ExtraTreesGini	0.833862	0.82	0.128682	0.059644	0.473758	0.128682	0.059644	0.473758	1	True	8
15	ExtraTreesEntr	0.833862	0.81	0.129024	0.060292	0.477145	0.129024	0.060292	0.477145	1	True	9
16	NeuralNetTorch	0.833555	0.83	0.064971	0.011365	1.104888	0.064971	0.011365	1.104888	1	True	12
17	ExtraTreesGini_FULL	0.833453	NaN	0.130448	NaN	0.486876	0.130448	NaN	0.486876	1	True	22
18	XGBoost_FULL	0.831610	NaN	0.071745	NaN	0.144373	0.071745	NaN	0.144373	1	True	25
19	WeightedEnsemble_L2_FULL	0.831610	NaN	0.073138	NaN	0.849294	0.001393	NaN	0.704921	2	True	28
20	LightGBMLarge	0.828949	0.83	0.027756	0.006103	0.415096	0.027756	0.006103	0.415096	1	True	13
21	LightGBMLarge_FULL	0.820964	NaN	0.022908	NaN	0.212195	0.022908	NaN	0.212195	1	True	27
22	NeuralNetFastAI	0.818610	0.82	0.154398	0.011861	2.205262	0.154398	0.011861	2.205262	1	True	10
23	NeuralNetFastAI_FULL	0.769270	NaN	0.148217	NaN	0.356267	0.148217	NaN	0.356267	1	True	24
24	KNeighborsUnif	0.725970	0.73	0.039380	0.015216	0.006952	0.039380	0.015216	0.006952	1	True	1
25	KNeighborsUnif_FULL	0.725151	NaN	0.027706	NaN	0.004486	0.027706	NaN	0.004486	1	True	15
26	KNeighborsDist	0.695158	0.65	0.027143	0.014079	0.005082	0.027143	0.014079	0.005082	1	True	2
27	KNeighborsDist_FULL	0.685434	NaN	0.030020	NaN	0.004329	0.030020	NaN	0.004329	1	True	16

We can see that we were able to fit additional models, but for whatever reason we may want to undo this operation.

Luckily, our original predictor is untouched!

predictor.leaderboard(test_data, silent=True)

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):
/home/ci/opt/venv/lib/python3.10/site-packages/fastai/tabular/core.py:233: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if not is_categorical_dtype(c):

	model	score_test	score_val	pred_time_test	pred_time_val	fit_time	pred_time_test_marginal	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	RandomForestGini	0.842870	0.84	0.160484	0.059480	0.525134	0.160484	0.059480	0.525134	1	True	5
1	CatBoost	0.842461	0.85	0.012280	0.006115	0.842724	0.012280	0.006115	0.842724	1	True	7
2	RandomForestEntr	0.841130	0.83	0.134449	0.059772	0.472541	0.134449	0.059772	0.472541	1	True	6
3	LightGBM	0.839799	0.85	0.018282	0.005797	0.190635	0.018282	0.005797	0.190635	1	True	4
4	XGBoost	0.837445	0.87	0.067650	0.008456	0.315308	0.067650	0.008456	0.315308	1	True	11
5	WeightedEnsemble_L2	0.837445	0.87	0.069390	0.009279	1.020229	0.001740	0.000823	0.704921	2	True	14
6	LightGBMXT	0.836421	0.83	0.010262	0.007926	0.234879	0.010262	0.007926	0.234879	1	True	3
7	ExtraTreesGini	0.833862	0.82	0.120041	0.059644	0.473758	0.120041	0.059644	0.473758	1	True	8
8	ExtraTreesEntr	0.833862	0.81	0.129467	0.060292	0.477145	0.129467	0.060292	0.477145	1	True	9
9	NeuralNetTorch	0.833555	0.83	0.047309	0.011365	1.104888	0.047309	0.011365	1.104888	1	True	12
10	LightGBMLarge	0.828949	0.83	0.022615	0.006103	0.415096	0.022615	0.006103	0.415096	1	True	13
11	NeuralNetFastAI	0.818610	0.82	0.139838	0.011861	2.205262	0.139838	0.011861	2.205262	1	True	10
12	KNeighborsUnif	0.725970	0.73	0.025429	0.015216	0.006952	0.025429	0.015216	0.006952	1	True	1
13	KNeighborsDist	0.695158	0.65	0.014953	0.014079	0.005082	0.014953	0.014079	0.005082	1	True	2

We can simply clone a new predictor from our original, and we will no longer be impacted by the call to refit_full on the prior clone.

Snapshot a deployment optimized Predictor via .clone_for_deployment()#

Instead of cloning an exact copy, we can instead clone a copy which has the minimal set of artifacts needed to do prediction.

Note that this optimized clone will have very limited functionality outside of calling predict and predict_proba. For example, it will be unable to train more models.

save_path_clone_opt = save_path + '-clone-opt'
# will return the path to the cloned predictor, identical to save_path_clone_opt
path_clone_opt = predictor.clone_for_deployment(path=save_path_clone_opt)

Cloned TabularPredictor located in 'agModels-predictClass-deployment' to 'agModels-predictClass-deployment-clone-opt'.
	To load the cloned predictor: predictor_clone = TabularPredictor.load(path="agModels-predictClass-deployment-clone-opt")
Clone: Keeping minimum set of models required to predict with best model 'WeightedEnsemble_L2'...
Deleting model KNeighborsUnif. All files under agModels-predictClass-deployment-clone-opt/models/KNeighborsUnif will be removed.
Deleting model KNeighborsDist. All files under agModels-predictClass-deployment-clone-opt/models/KNeighborsDist will be removed.
Deleting model LightGBMXT. All files under agModels-predictClass-deployment-clone-opt/models/LightGBMXT will be removed.
Deleting model LightGBM. All files under agModels-predictClass-deployment-clone-opt/models/LightGBM will be removed.
Deleting model RandomForestGini. All files under agModels-predictClass-deployment-clone-opt/models/RandomForestGini will be removed.
Deleting model RandomForestEntr. All files under agModels-predictClass-deployment-clone-opt/models/RandomForestEntr will be removed.
Deleting model CatBoost. All files under agModels-predictClass-deployment-clone-opt/models/CatBoost will be removed.
Deleting model ExtraTreesGini. All files under agModels-predictClass-deployment-clone-opt/models/ExtraTreesGini will be removed.
Deleting model ExtraTreesEntr. All files under agModels-predictClass-deployment-clone-opt/models/ExtraTreesEntr will be removed.
Deleting model NeuralNetFastAI. All files under agModels-predictClass-deployment-clone-opt/models/NeuralNetFastAI will be removed.
Deleting model NeuralNetTorch. All files under agModels-predictClass-deployment-clone-opt/models/NeuralNetTorch will be removed.
Deleting model LightGBMLarge. All files under agModels-predictClass-deployment-clone-opt/models/LightGBMLarge will be removed.
Clone: Removing artifacts unnecessary for prediction. NOTE: Clone can no longer fit new models, and most functionality except for predict and predict_proba will no longer work

predictor_clone_opt = TabularPredictor.load(path=path_clone_opt)

To avoid loading the model in every prediction call, we can persist the model in memory by:

predictor_clone_opt.persist_models()

Persisting 2 models in memory. Models will require 0.0% of memory.

['WeightedEnsemble_L2', 'XGBoost']

We can see that the optimized clone still makes the same predictions:

y_pred_clone_opt = predictor_clone_opt.predict(test_data)
y_pred_clone_opt

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)

      <=50K
      <=50K
      <=50K
      <=50K
      <=50K
         ...  
   <=50K
   <=50K
   <=50K
   <=50K
   <=50K
Name: class, Length: 9769, dtype: object

y_pred.equals(y_pred_clone_opt)

True

predictor_clone_opt.leaderboard(test_data, silent=True)

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)

	model	score_test	score_val	pred_time_test	pred_time_val	fit_time	pred_time_test_marginal	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	XGBoost	0.837445	0.87	0.030474	0.008456	0.315308	0.030474	0.008456	0.315308	1	True	1
1	WeightedEnsemble_L2	0.837445	0.87	0.031405	0.009279	1.020229	0.000931	0.000823	0.704921	2	True	2

We can check the disk usage of the optimized clone compared to the original:

size_original = predictor.get_size_disk()
size_opt = predictor_clone_opt.get_size_disk()
print(f'Size Original:  {size_original} bytes')
print(f'Size Optimized: {size_opt} bytes')
print(f'Optimized predictor achieved a {round((1 - (size_opt/size_original)) * 100, 1)}% reduction in disk usage.')

Size Original:  18606972 bytes
Size Optimized: 772023 bytes
Optimized predictor achieved a 95.9% reduction in disk usage.

We can also investigate the difference in the files that exist in the original and optimized predictor.

Original:

predictor.get_size_disk_per_file()

/models/ExtraTreesGini/model.pkl                        5064766
/models/ExtraTreesEntr/model.pkl                        5022996
/models/RandomForestGini/model.pkl                      3407741
/models/RandomForestEntr/model.pkl                      3266140
/models/XGBoost/xgb.ubj                                  564906
/models/LightGBMLarge/model.pkl                          470734
/models/NeuralNetTorch/model.pkl                         253757
/models/NeuralNetFastAImodel-internals.pkl               168974
/models/LightGBM/model.pkl                               146032
/models/LightGBMXT/model.pkl                              42065
/models/KNeighborsDist/model.pkl                          39986
/models/KNeighborsUnif/model.pkl                          39985
/utils/data/X.pkl                                         27612
/models/CatBoost/model.pkl                                21239
/metadata.json                                            11134
/learner.pkl                                              10138
/utils/data/X_val.pkl                                      8378
/models/WeightedEnsemble_L2/model.pkl                      7660
/utils/data/y.pkl                                          7461
/models/XGBoost/model.pkl                                  5998
/models/trainer.pkl                                        4643
/models/NeuralNetFastAI/model.pkl                          2498
/utils/data/y_val.pkl                                      2354
/models/WeightedEnsemble_L2/utils/model_template.pkl       1082
/predictor.pkl                                              765
/models/WeightedEnsemble_L2/utils/oof.pkl                   764
/utils/attr/LightGBM/y_pred_proba_val.pkl                   550
/utils/attr/LightGBMLarge/y_pred_proba_val.pkl              550
/utils/attr/ExtraTreesEntr/y_pred_proba_val.pkl             550
/utils/attr/NeuralNetFastAI/y_pred_proba_val.pkl            550
/utils/attr/XGBoost/y_pred_proba_val.pkl                    550
/utils/attr/NeuralNetTorch/y_pred_proba_val.pkl             550
/utils/attr/KNeighborsUnif/y_pred_proba_val.pkl             550
/utils/attr/LightGBMXT/y_pred_proba_val.pkl                 550
/utils/attr/KNeighborsDist/y_pred_proba_val.pkl             550
/utils/attr/RandomForestGini/y_pred_proba_val.pkl           550
/utils/attr/CatBoost/y_pred_proba_val.pkl                   550
/utils/attr/RandomForestEntr/y_pred_proba_val.pkl           550
/utils/attr/ExtraTreesGini/y_pred_proba_val.pkl             550
/__version__                                                 14
Name: size, dtype: int64

Optimized:

predictor_clone_opt.get_size_disk_per_file()

/models/XGBoost/xgb.ubj                       564906
/models/NeuralNetFastAImodel-internals.pkl    168974
/metadata.json                                 11134
/learner.pkl                                   10138
/models/WeightedEnsemble_L2/model.pkl           7695
/models/XGBoost/model.pkl                       6019
/models/trainer.pkl                             2378
/predictor.pkl                                   765
/__version__                                      14
Name: size, dtype: int64

Compile models for maximized inference speed#

In order to further improve inference efficiency, we can call .compile_models() to automatically convert sklearn function calls into their ONNX equivalents. Note that this is currently an experimental feature, which only improves RandomForest and TabularNeuralNetwork models. The compilation and inference speed acceleration require installation of skl2onnx and onnxruntime packages. To install supported versions of these packages automatically, we can call pip install autogluon.tabular[skl2onnx] on top of an existing AutoGluon installation, or pip install autogluon.tabular[all,skl2onnx] on a new AutoGluon installation.

It is important to make sure the predictor is cloned, because once the models are compiled, it won’t support fitting.

predictor_clone_opt.compile_models()

Compiling 2 Models ...
Skipping compilation for WeightedEnsemble_L2 ... (No config specified)
Skipping compilation for XGBoost ... (No config specified)
Finished compiling models, total runtime = 0s.

With the compiled predictor, the prediction results might not be exactly the same but should be very close.

y_pred_compile_opt = predictor_clone_opt.predict(test_data)
y_pred_compile_opt

/home/ci/autogluon/features/src/autogluon/features/generators/fillna.py:58: FutureWarning: The 'downcast' keyword in fillna is deprecated and will be removed in a future version. Use res.infer_objects(copy=False) to infer non-object dtype, or pd.to_numeric with the 'downcast' keyword to downcast numeric results.
  X.fillna(self._fillna_feature_map, inplace=True, downcast=False)

      <=50K
      <=50K
      <=50K
      <=50K
      <=50K
         ...  
   <=50K
   <=50K
   <=50K
   <=50K
   <=50K
Name: class, Length: 9769, dtype: object

Now all that is left is to upload the optimized predictor to a centralized storage location such as S3. To use this predictor in a new machine / system, simply download the artifact to local disk and load the predictor. Ensure that when loading a predictor you use the same Python version and AutoGluon version used during training to avoid instability.