.. _sec_tabularinterpretability: Interpretable rule-based modeling ================================= *Note*: This addition was made through collaboration with `the Yu-Group `__ at UC Berkeley. **Tip**: Prior to reading this tutorial, it is recommended to have a basic understanding of the TabularPredictor API covered in :ref:`sec_tabularquick`. In this tutorial, we will explain how to automatically use interpretable models powered by integration with `🔍 the imodels package `__. This allows for automatically learning models based on rules which are extremely concise and can be useful for (1) understanding data or (2) building a transparent predictive model. Begin by loading in data to predict. Note: interpretable rule-based modeling is currently only supported for binary classification. .. code:: python from autogluon.tabular import TabularDataset, TabularPredictor train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv') subsample_size = 500 # subsample subset of data for faster demo, try setting this to much larger values train_data = train_data.sample(n=subsample_size, random_state=0) train_data.head() .. raw:: html
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country class
6118 51 Private 39264 Some-college 10 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States >50K
23204 58 Private 51662 10th 6 Married-civ-spouse Other-service Wife White Female 0 0 8 United-States <=50K
29590 40 Private 326310 Some-college 10 Married-civ-spouse Craft-repair Husband White Male 0 0 44 United-States <=50K
18116 37 Private 222450 HS-grad 9 Never-married Sales Not-in-family White Male 0 2339 40 El-Salvador <=50K
33964 62 Private 109190 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 15024 0 40 United-States >50K
Now, we create a predictor and fit it to the data. By specifying ``presets='interpretable'``, we tell the predictor to fit only interpretable models. .. code:: python predictor = TabularPredictor(label='class') predictor.fit(train_data, presets='interpretable') predictor.leaderboard() .. parsed-literal:: :class: output No path specified. Models will be saved in: "AutogluonModels/ag-20220531_155743/" Presets specified: ['interpretable'] Beginning AutoGluon training ... AutoGluon will save models to "AutogluonModels/ag-20220531_155743/" AutoGluon Version: 0.4.2b20220531 Python Version: 3.9.13 Operating System: Linux Train Data Rows: 500 Train Data Columns: 14 Label Column: class Preprocessing data ... AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed). 2 unique label values: [' >50K', ' <=50K'] If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class. To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init. Using Feature Generators to preprocess the data ... Fitting AutoMLPipelineFeatureGenerator... Available Memory: 19634.38 MB Train Data (Original) Memory Usage: 0.29 MB (0.0% of available memory) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. Stage 1 Generators: Fitting AsTypeFeatureGenerator... Note: Converting 1 features to boolean dtype as they only contain 2 unique values. Stage 2 Generators: Fitting FillNaFeatureGenerator... Stage 3 Generators: Fitting IdentityFeatureGenerator... Fitting CategoryFeatureGenerator... Fitting CategoryMemoryMinimizeFeatureGenerator... Stage 4 Generators: Fitting DropUniqueFeatureGenerator... Types of features in original data (raw dtype, special dtypes): ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...] Types of features in processed data (raw dtype, special dtypes): ('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...] ('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...] ('int', ['bool']) : 1 | ['sex'] 0.1s = Fit runtime 14 features in original data used to generate 14 features in processed data. Train Data (Processed) Memory Usage: 0.03 MB (0.0% of available memory) Data preprocessing and feature engineering runtime = 0.08s ... AutoGluon will gauge predictive performance using evaluation metric: 'accuracy' To change this, specify the eval_metric parameter of Predictor() Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 400, Val Rows: 100 Fitting 6 L1 models ... Fitting model: RuleFit ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.73 = Validation score (accuracy) 1.87s = Training runtime 0.03s = Validation runtime Fitting model: RuleFit_2 ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.73 = Validation score (accuracy) 1.55s = Training runtime 0.06s = Validation runtime Fitting model: RuleFit_3 ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.81 = Validation score (accuracy) 1.67s = Training runtime 0.08s = Validation runtime Fitting model: GreedyTree ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.81 = Validation score (accuracy) 0.02s = Training runtime 0.01s = Validation runtime Fitting model: BoostedRules ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.78 = Validation score (accuracy) 0.02s = Training runtime 0.01s = Validation runtime Fitting model: BoostedRules_2 ... /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( 0.82 = Validation score (accuracy) 0.03s = Training runtime 0.01s = Validation runtime /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/sklearn/utils/validation.py:624: UserWarning: pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array. warnings.warn( Fitting model: WeightedEnsemble_L2 ... 0.85 = Validation score (accuracy) 0.14s = Training runtime 0.0s = Validation runtime AutoGluon training complete, total runtime = 5.77s ... Best model: "WeightedEnsemble_L2" TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20220531_155743/") .. parsed-literal:: :class: output model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order 0 WeightedEnsemble_L2 0.85 0.103743 1.865120 0.000564 0.144954 2 True 7 1 BoostedRules_2 0.82 0.011969 0.030569 0.011969 0.030569 1 True 6 2 GreedyTree 0.81 0.010208 0.020375 0.010208 0.020375 1 True 4 3 RuleFit_3 0.81 0.081002 1.669222 0.081002 1.669222 1 True 3 4 BoostedRules 0.78 0.010581 0.023773 0.010581 0.023773 1 True 5 5 RuleFit 0.73 0.032896 1.865255 0.032896 1.865255 1 True 1 6 RuleFit_2 0.73 0.060955 1.547798 0.060955 1.547798 1 True 2 .. raw:: html
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 0.85 0.103743 1.865120 0.000564 0.144954 2 True 7
1 BoostedRules_2 0.82 0.011969 0.030569 0.011969 0.030569 1 True 6
2 GreedyTree 0.81 0.010208 0.020375 0.010208 0.020375 1 True 4
3 RuleFit_3 0.81 0.081002 1.669222 0.081002 1.669222 1 True 3
4 BoostedRules 0.78 0.010581 0.023773 0.010581 0.023773 1 True 5
5 RuleFit 0.73 0.032896 1.865255 0.032896 1.865255 1 True 1
6 RuleFit_2 0.73 0.060955 1.547798 0.060955 1.547798 1 True 2
The rule-based models take slightly different forms (see below), but all try to optimize predictive performance using as few rules as possible. See `imodels package `__ for more details. .. figure:: https://raw.githubusercontent.com/csinva/imodels/master/docs/img/model_table_rules.png In addition to the usual functions in ``TabularPredictor``, this predictor fitted with interpretable models has some additional functionality. For example, we can now inspect the complexity of the fitted models (i.e. how many rules they contain). .. code:: python predictor.interpretable_models_summary() .. parsed-literal:: :class: output /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code) .. raw:: html
model_types model_performance complexity model_best model_paths model_fit_times model_pred_times num_bag_folds max_stack_level num_classes model_hyperparams
BoostedRules_2 BoostedRulesModel 0.82 20.0 WeightedEnsemble_L2 AutogluonModels/ag-20220531_155743/models/Boos... 0.030569 0.011969 0 2 2 {'random_state': 0, 'n_estimators': 10}
GreedyTree GreedyTreeModel 0.81 19.0 WeightedEnsemble_L2 AutogluonModels/ag-20220531_155743/models/Gree... 0.020375 0.010208 0 2 2 {'random_state': 0, 'max_leaf_nodes': 20}
RuleFit_3 RuleFitModel 0.81 49.0 WeightedEnsemble_L2 AutogluonModels/ag-20220531_155743/models/Rule... 1.669222 0.081002 0 2 2 {'random_state': 0, 'max_rules': 20}
BoostedRules BoostedRulesModel 0.78 10.0 WeightedEnsemble_L2 AutogluonModels/ag-20220531_155743/models/Boos... 0.023773 0.010581 0 2 2 {'random_state': 0, 'n_estimators': 5}
RuleFit RuleFitModel 0.73 8.0 WeightedEnsemble_L2 AutogluonModels/ag-20220531_155743/models/Rule... 1.865255 0.032896 0 2 2 {'random_state': 0, 'max_rules': 7}
RuleFit_2 RuleFitModel 0.73 29.0 WeightedEnsemble_L2 AutogluonModels/ag-20220531_155743/models/Rule... 1.547798 0.060955 0 2 2 {'random_state': 0, 'max_rules': 12}
We can also explicitly inspect the rules of the best-performing model. .. code:: python predictor.print_interpretable_rules() # can optionally specify a model name or complexity threshold .. parsed-literal:: :class: output /var/lib/jenkins/miniconda3/envs/autogluon-tutorial-tabular-v3/lib/python3.9/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above. and should_run_async(code) .. parsed-literal:: :class: output BoostedRules: Rule → predicted probability (final prediction is weighted sum of all predictions) If marital-status_1.0 <= 0.5 → 0.00 (weight: 0.50) If marital-status_1.0 > 0.5 → 0.51 (weight: 0.56) If marital-status_1.0 <= 0.5 → 0.29 (weight: 0.46) If marital-status_1.0 > 0.5 → 0.06 (weight: 0.24) If education-num <= 11.5 → 0.79 (weight: 0.35) In some cases, these rules are sufficient to accurately make predictions. In other cases, they may just be used to gain a better understanding of the data before proceeding with more black-box models.