.. _sec_tabularinterpretability: Interpretable rule-based modeling ================================= *Note*: This addition was made through collaboration with `the Yu-Group `__ at UC Berkeley. **Tip**: Prior to reading this tutorial, it is recommended to have a basic understanding of the TabularPredictor API covered in :ref:`sec_tabularquick`. In this tutorial, we will explain how to automatically use interpretable models powered by integration with `🔍 the imodels package `__. This allows for automatically learning models based on rules which are extremely concise and can be useful for (1) understanding data or (2) building a transparent predictive model. Begin by loading in data to predict. Note: interpretable rule-based modeling is currently only supported for binary classification. .. code:: python from autogluon.tabular import TabularDataset, TabularPredictor train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv') subsample_size = 500 # subsample subset of data for faster demo, try setting this to much larger values train_data = train_data.sample(n=subsample_size, random_state=0) train_data.head() .. raw:: html

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capital-gain	capital-loss	hours-per-week	native-country	class
6118	51	Private	39264	Some-college	10	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	40	United-States	>50K
23204	58	Private	51662	10th	6	Married-civ-spouse	Other-service	Wife	White	Female	0	0	8	United-States	<=50K
29590	40	Private	326310	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	44	United-States	<=50K
18116	37	Private	222450	HS-grad	9	Never-married	Sales	Not-in-family	White	Male	0	2339	40	El-Salvador	<=50K
33964	62	Private	109190	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	15024	0	40	United-States	>50K

Now, we create a predictor and fit it to the data. By specifying ``presets='interpretable'``, we tell the predictor to fit only interpretable models. .. code:: python predictor = TabularPredictor(label='class') predictor.fit(train_data, presets='interpretable') predictor.leaderboard() .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20220531_155743/" Presets specified: ['interpretable'] Beginning AutoGluon training ... AutoGluon will save models to "AutogluonModels/ag-20220531_155743/" AutoGluon Version: 0.4.2b20220531 Python Version: 3.9.13 Operating System: Linux Train Data Rows: 500 Train Data Columns: 14 Label Column: class Preprocessing data ... AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed). 2 unique label values: [' >50K', ' <=50K'] If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class. To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init. Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 19634.38 MB Train Data (Original) Memory Usage: 0.29 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Note: Converting 1 features to boolean dtype as they only contain 2 unique values. Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('int', ['bool']) : 1 | ['sex'] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.08s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric parameter of Predictor() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100 Fitting 6 L1 models ... Fitting model: RuleFit ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.73 = Validation score (accuracy) 1.87s = Training runtime 0.03s = Validation runtime Fitting model: RuleFit_2 ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.73 = Validation score (accuracy) 1.55s = Training runtime 0.06s = Validation runtime Fitting model: RuleFit_3 ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.81 = Validation score (accuracy) 1.67s = Training runtime 0.08s = Validation runtime Fitting model: GreedyTree ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.81 = Validation score (accuracy) 0.02s = Training runtime 0.01s = Validation runtime Fitting model: BoostedRules ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.78 = Validation score (accuracy) 0.02s = Training runtime 0.01s = Validation runtime Fitting model: BoostedRules_2 ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.82 = Validation score (accuracy) 0.03s = Training runtime 0.01s = Validation runtime /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( Fitting model: WeightedEnsemble_L2 ... 0.85 = Validation score (accuracy) 0.14s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 5.77s ... Best model: "WeightedEnsemble_L2" TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20220531_155743/") .. parsed-literal:: :class: output model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order 0 WeightedEnsemble_L2 0.85 0.103743 1.865120 0.000564 0.144954 2 True 7 1 BoostedRules_2 0.82 0.011969 0.030569 0.011969 0.030569 1 True 6 2 GreedyTree 0.81 0.010208 0.020375 0.010208 0.020375 1 True 4 3 RuleFit_3 0.81 0.081002 1.669222 0.081002 1.669222 1 True 3 4 BoostedRules 0.78 0.010581 0.023773 0.010581 0.023773 1 True 5 5 RuleFit 0.73 0.032896 1.865255 0.032896 1.865255 1 True 1 6 RuleFit_2 0.73 0.060955 1.547798 0.060955 1.547798 1 True 2 .. raw:: html

	model	score_val	pred_time_val	fit_time	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	WeightedEnsemble_L2	0.85	0.103743	1.865120	0.000564	0.144954	2	True	7
1	BoostedRules_2	0.82	0.011969	0.030569	0.011969	0.030569	1	True	6
2	GreedyTree	0.81	0.010208	0.020375	0.010208	0.020375	1	True	4
3	RuleFit_3	0.81	0.081002	1.669222	0.081002	1.669222	1	True	3
4	BoostedRules	0.78	0.010581	0.023773	0.010581	0.023773	1	True	5
5	RuleFit	0.73	0.032896	1.865255	0.032896	1.865255	1	True	1
6	RuleFit_2	0.73	0.060955	1.547798	0.060955	1.547798	1	True	2

The rule-based models take slightly different forms (see below), but all try to optimize predictive performance using as few rules as possible. See `imodels package `__ for more details. .. figure:: https://raw.githubusercontent.com/csinva/imodels/master/docs/img/model_table_rules.png In addition to the usual functions in ``TabularPredictor``, this predictor fitted with interpretable models has some additional functionality. For example, we can now inspect the complexity of the fitted models (i.e. how many rules they contain). .. code:: python predictor.interpretable_models_summary() .. parsed-literal:: :class: output /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code) .. raw:: html

	model_types	model_performance	complexity	model_best	model_paths	model_fit_times	model_pred_times	max_stack_level	num_classes	model_hyperparams
BoostedRules_2	BoostedRulesModel	0.82	20.0	WeightedEnsemble_L2	AutogluonModels/ag-20220531_155743/models/Boos...	0.030569	0.011969	2	2	{'random_state': 0, 'n_estimators': 10}
GreedyTree	GreedyTreeModel	0.81	19.0	WeightedEnsemble_L2	AutogluonModels/ag-20220531_155743/models/Gree...	0.020375	0.010208	2	2	{'random_state': 0, 'max_leaf_nodes': 20}
RuleFit_3	RuleFitModel	0.81	49.0	WeightedEnsemble_L2	AutogluonModels/ag-20220531_155743/models/Rule...	1.669222	0.081002	2	2	{'random_state': 0, 'max_rules': 20}
BoostedRules	BoostedRulesModel	0.78	10.0	WeightedEnsemble_L2	AutogluonModels/ag-20220531_155743/models/Boos...	0.023773	0.010581	2	2	{'random_state': 0, 'n_estimators': 5}
RuleFit	RuleFitModel	0.73	8.0	WeightedEnsemble_L2	AutogluonModels/ag-20220531_155743/models/Rule...	1.865255	0.032896	2	2	{'random_state': 0, 'max_rules': 7}
RuleFit_2	RuleFitModel	0.73	29.0	WeightedEnsemble_L2	AutogluonModels/ag-20220531_155743/models/Rule...	1.547798	0.060955	2	2	{'random_state': 0, 'max_rules': 12}

We can also explicitly inspect the rules of the best-performing model. .. code:: python predictor.print_interpretable_rules() # can optionally specify a model name or complexity threshold .. parsed-literal:: :class: output /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code) .. parsed-literal:: :class: output BoostedRules: Rule → predicted probability (final prediction is weighted sum of all predictions) If[96m marital-status_1.0 <= 0.5[00m → 0.00 (weight: 0.50) If[96m marital-status_1.0 > 0.5[00m → 0.51 (weight: 0.56) If[96m marital-status_1.0 <= 0.5[00m → 0.29 (weight: 0.46) If[96m marital-status_1.0 > 0.5[00m → 0.06 (weight: 0.24) If[96m education-num <= 11.5[00m → 0.79 (weight: 0.35) In some cases, these rules are sufficient to accurately make predictions. In other cases, they may just be used to gain a better understanding of the data before proceeding with more black-box models.