AutoMM Presets

Open In Colab Open In SageMaker Studio Lab

It is well-known that we usually need to set hyperparameters before the learning process begins. Deep learning models, e.g., pretrained foundation models, can have anywhere from a few hyperparameters to a few hundred. The hyperparameters can impact training speed, final model performance, and inference latency. However, choosing the proper hyperparameters may be challenging for many users with limited expertise.

In this tutorial, we will introduce the easy-to-use presets in AutoMM. Our presets can condense the complex hyperparameter setups into simple strings. More specifically, AutoMM supports three presets: medium_quality, high_quality, and best_quality.

import warnings

warnings.filterwarnings('ignore')

Dataset

For demonstration, we use a subsampled Stanford Sentiment Treebank (SST) dataset, which consists of movie reviews and their associated sentiment. Given a new movie review, the goal is to predict the sentiment reflected in the text (in this case, a binary classification, where reviews are labeled as 1 if they conveyed a positive opinion and 0 otherwise). To get started, let’s download and prepare the dataset.

from autogluon.core.utils.loaders import load_pd

train_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/train.parquet')
test_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/dev.parquet')
subsample_size = 1000  # subsample data for faster demo, try setting this to larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head(10)
sentence label
43787 very pleasing at its best moments 1
16159 , american chai is enough to make you put away... 0
59015 too much like an infomercial for ram dass 's l... 0
5108 a stirring visual sequence 1
67052 cool visual backmasking 1
35938 hard ground 0
49879 the striking , quietly vulnerable personality ... 1
51591 pan nalin 's exposition is beautiful and myste... 1
56780 wonderfully loopy 1
28518 most beautiful , evocative 1

Medium Quality

In some situations, we prefer fast training and inference to the prediction quality. medium_quality is designed for this purpose. Among the three presets, medium_quality has the smallest model size. Now let’s fit the predictor using the medium_quality preset. Here we set a tight time budget for a quick demo.

from autogluon.multimodal import MultiModalPredictor

predictor = MultiModalPredictor(label='label', eval_metric='acc', presets="medium_quality")
predictor.fit(
    train_data=train_data,
    time_limit=20, # seconds
)
No path specified. Models will be saved in: "AutogluonModels/ag-20250618_165810"
=================== System Info ===================
AutoGluon Version:  1.3.2b20250618
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.6.0+cu124
CUDA Version:       12.4
GPU Count:          1
Memory Avail:       28.49 GB / 30.95 GB (92.0%)
Disk Space Avail:   163.52 GB / 255.99 GB (63.9%)
===================================================
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(1), np.int64(0)]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165810
    ```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                         | Params | Mode 
---------------------------------------------------------------------------
0 | model             | HFAutoModelForTextPrediction | 13.5 M | train
1 | validation_metric | MulticlassAccuracy           | 0      | train
2 | loss_func         | CrossEntropyLoss             | 0      | train
---------------------------------------------------------------------------
13.5 M    Trainable params
0         Non-trainable params
13.5 M    Total params
53.934    Total estimated model params size (MB)
230       Modules in train mode
0         Modules in eval mode
Epoch 0, global step 3: 'val_accuracy' reached 0.47000 (best 0.47000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165810/epoch=0-step=3.ckpt' as top 3
Epoch 0, global step 7: 'val_accuracy' reached 0.58000 (best 0.58000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165810/epoch=0-step=7.ckpt' as top 3
Epoch 1, global step 10: 'val_accuracy' reached 0.61000 (best 0.61000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165810/epoch=1-step=10.ckpt' as top 3
Epoch 1, global step 14: 'val_accuracy' reached 0.64000 (best 0.64000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165810/epoch=1-step=14.ckpt' as top 3
Time limit reached. Elapsed time is 0:00:20. Signaling Trainer to stop.
Epoch 2, global step 16: 'val_accuracy' reached 0.71000 (best 0.71000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165810/epoch=2-step=16.ckpt' as top 3
Start to fuse 3 checkpoints via the greedy soup algorithm.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165810")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
<autogluon.multimodal.predictor.MultiModalPredictor at 0x7f74d9cdb0e0>

Then we can evaluate the predictor on the test data.

scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
{'roc_auc': np.float64(0.8355171760545593)}

High Quality

If you want to balance the prediction quality and training/inference speed, you can try the high_quality preset, which uses a larger model than medium_quality. Accordingly, we need to increase the time limit since larger models require more time to train.

from autogluon.multimodal import MultiModalPredictor

predictor = MultiModalPredictor(label='label', eval_metric='acc', presets="high_quality")
predictor.fit(
    train_data=train_data,
    time_limit=20, # seconds
)
No path specified. Models will be saved in: "AutogluonModels/ag-20250618_165836"
=================== System Info ===================
AutoGluon Version:  1.3.2b20250618
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.6.0+cu124
CUDA Version:       12.4
GPU Count:          1
Memory Avail:       27.33 GB / 30.95 GB (88.3%)
Disk Space Avail:   163.42 GB / 255.99 GB (63.8%)
===================================================
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(1), np.int64(0)]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165836
    ```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                         | Params | Mode 
---------------------------------------------------------------------------
0 | model             | HFAutoModelForTextPrediction | 108 M  | train
1 | validation_metric | MulticlassAccuracy           | 0      | train
2 | loss_func         | CrossEntropyLoss             | 0      | train
---------------------------------------------------------------------------
108 M     Trainable params
0         Non-trainable params
108 M     Total params
435.573   Total estimated model params size (MB)
229       Modules in train mode
0         Modules in eval mode
Epoch 0, global step 3: 'val_accuracy' reached 0.55500 (best 0.55500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165836/epoch=0-step=3.ckpt' as top 3
Epoch 0, global step 7: 'val_accuracy' reached 0.59500 (best 0.59500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165836/epoch=0-step=7.ckpt' as top 3
Epoch 1, global step 10: 'val_accuracy' reached 0.63000 (best 0.63000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165836/epoch=1-step=10.ckpt' as top 3
Time limit reached. Elapsed time is 0:00:29. Signaling Trainer to stop.
Start to fuse 3 checkpoints via the greedy soup algorithm.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165836")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
<autogluon.multimodal.predictor.MultiModalPredictor at 0x7f74d917d9d0>

Although high_quality requires more training time than medium_quality, it also brings performance gains.

scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
{'roc_auc': np.float64(0.7893407426117707)}

Best Quality

If you want the best performance and don’t care about the training/inference cost, give it a try for the best_quality preset. High-end GPUs with large memory are preferred in this case. Compared to high_quality, it requires much longer training time.

from autogluon.multimodal import MultiModalPredictor

predictor = MultiModalPredictor(label='label', eval_metric='acc', presets="best_quality")
predictor.fit(train_data=train_data, time_limit=180)
No path specified. Models will be saved in: "AutogluonModels/ag-20250618_165919"
=================== System Info ===================
AutoGluon Version:  1.3.2b20250618
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.6.0+cu124
CUDA Version:       12.4
GPU Count:          1
Memory Avail:       26.09 GB / 30.95 GB (84.3%)
Disk Space Avail:   163.01 GB / 255.99 GB (63.7%)
===================================================
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(1), np.int64(0)]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165919
    ```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                         | Params | Mode 
---------------------------------------------------------------------------
0 | model             | HFAutoModelForTextPrediction | 183 M  | train
1 | validation_metric | MulticlassAccuracy           | 0      | train
2 | loss_func         | CrossEntropyLoss             | 0      | train
---------------------------------------------------------------------------
183 M     Trainable params
0         Non-trainable params
183 M     Total params
735.332   Total estimated model params size (MB)
241       Modules in train mode
0         Modules in eval mode
Epoch 0, global step 3: 'val_accuracy' reached 0.43000 (best 0.43000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165919/epoch=0-step=3.ckpt' as top 3
Epoch 0, global step 7: 'val_accuracy' reached 0.56500 (best 0.56500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165919/epoch=0-step=7.ckpt' as top 3
Epoch 1, global step 10: 'val_accuracy' reached 0.57000 (best 0.57000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165919/epoch=1-step=10.ckpt' as top 3
Epoch 1, global step 14: 'val_accuracy' reached 0.67500 (best 0.67500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165919/epoch=1-step=14.ckpt' as top 3
Time limit reached. Elapsed time is 0:03:00. Signaling Trainer to stop.
Epoch 2, global step 15: 'val_accuracy' reached 0.68500 (best 0.68500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165919/epoch=2-step=15.ckpt' as top 3
Start to fuse 3 checkpoints via the greedy soup algorithm.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250618_165919")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
<autogluon.multimodal.predictor.MultiModalPredictor at 0x7f760688e2d0>

We can see that best_quality achieves better performance than high_quality.

scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
{'roc_auc': np.float64(0.8132288246190115)}

HPO Presets

The above three presets all use the default hyperparameters, which might not be optimal for your tasks. Fortunately, we also support hyperparameter optimization (HPO) with simple presets. To perform HPO, you can add a postfix _hpo in the three presets, resulting in medium_quality_hpo, high_quality_hpo, and best_quality_hpo.

Display Presets

In case you want to see each preset’s inside details, we provide you with a util function to get the hyperparameter setups. For example, here are hyperparameters of preset high_quality.

import json
from autogluon.multimodal.utils.presets import get_presets

hyperparameters, hyperparameter_tune_kwargs = get_presets(problem_type="default", presets="high_quality")
print(f"hyperparameters: {json.dumps(hyperparameters, sort_keys=True, indent=4)}")
print(f"hyperparameter_tune_kwargs: {json.dumps(hyperparameter_tune_kwargs, sort_keys=True, indent=4)}")
hyperparameters: {
    "model.document_transformer.checkpoint_name": "microsoft/layoutlmv3-base",
    "model.hf_text.checkpoint_name": "google/electra-base-discriminator",
    "model.names": [
        "ft_transformer",
        "timm_image",
        "hf_text",
        "document_transformer",
        "fusion_mlp"
    ],
    "model.timm_image.checkpoint_name": "caformer_b36.sail_in22k_ft_in1k"
}
hyperparameter_tune_kwargs: {}

The HPO presets make several hyperparameters tunable such as model backbone, batch size, learning rate, max epoch, and optimizer type. Below are the details of preset high_quality_hpo.

import json
import yaml
from autogluon.multimodal.utils.presets import get_presets

hyperparameters, hyperparameter_tune_kwargs = get_presets(problem_type="default", presets="high_quality_hpo")
print(f"hyperparameters: {yaml.dump(hyperparameters, allow_unicode=True, default_flow_style=False)}")
print(f"hyperparameter_tune_kwargs: {json.dumps(hyperparameter_tune_kwargs, sort_keys=True, indent=4)}")
hyperparameters: env.batch_size: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - 16
  - 32
  - 64
  - 128
  - 256
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
env.per_gpu_batch_size: 2
model.document_transformer.checkpoint_name: microsoft/layoutlmv3-base
model.hf_text.checkpoint_name: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - google/electra-base-discriminator
  - google/flan-t5-base
  - microsoft/deberta-v3-small
  - roberta-base
  - albert-xlarge-v2
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
model.names:
- ft_transformer
- timm_image
- hf_text
- document_transformer
- fusion_mlp
model.timm_image.checkpoint_name: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - swin_base_patch4_window7_224
  - convnext_base_in22ft1k
  - vit_base_patch16_clip_224.laion2b_ft_in12k_in1k
  - caformer_b36.sail_in22k_ft_in1k
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
optim.lr: !!python/object:ray.tune.search.sample.Float
  lower: 1.0e-05
  sampler: !!python/object:ray.tune.search.sample._LogUniform
    base: 10
  upper: 0.01
optim.max_epochs: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
  - 11
  - 12
  - 13
  - 14
  - 15
  - 16
  - 17
  - 18
  - 19
  - 20
  - 21
  - 22
  - 23
  - 24
  - 25
  - 26
  - 27
  - 28
  - 29
  - 30
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
optim.optim_type: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - adamw
  - sgd
  sampler: !!python/object:ray.tune.search.sample._Uniform {}

hyperparameter_tune_kwargs: {
    "num_trials": 512,
    "scheduler": "ASHA",
    "searcher": "bayes"
}

Other Examples

You may go to AutoMM Examples to explore other examples about AutoMM.

Customization

To learn how to customize AutoMM, please refer to Customize AutoMM.