Few Shot Learning with AutoMM¶
In this tutorial we introduce a simple but effective way for few shot classification problems. We present the functionality which leverages the high-quality features from foundation models and uses SVM for few shot classification tasks. Specifically, we extract sample features with pretrained models, and use the features for SVM learning. We show the effectiveness of the foundation-model-followed-by-SVM on a text classification dataset and an image classification dataset.
Few Shot Text Classification¶
Prepare Text Data¶
We prepare all datasets in the format of pd.DataFrame as in many of our tutorials have done.
For this tutorial, we’ll use a small MLDoc dataset for demonstration.
The dataset is a text classification dataset, which contains 4 classes and we downsampled the training data to 10 samples per class, a.k.a 10 shots.
For more details regarding MLDoc please see this link.
import pandas as pd
import os
from autogluon.core.utils.loaders import load_zip
download_dir = "./ag_automm_tutorial_fs_cls"
zip_file = "https://automl-mm-bench.s3.amazonaws.com/nlp_datasets/MLDoc-10shot-en.zip"
load_zip.unzip(zip_file, unzip_dir=download_dir)
dataset_path = os.path.join(download_dir)
train_df = pd.read_csv(f"{dataset_path}/train.csv", names=["label", "text"])
test_df = pd.read_csv(f"{dataset_path}/test.csv", names=["label", "text"])
print(train_df)
print(test_df)
Downloading ./ag_automm_tutorial_fs_cls/file.zip from https://automl-mm-bench.s3.amazonaws.com/nlp_datasets/MLDoc-10shot-en.zip...
   label                                               text
0   GCAT  b'Secretary-General Kofi Annan expressed conce...
1   CCAT  b'The health of ABB Asea Brown Boveri AG\'s Po...
2   GCAT  b'Nepali Prime Minister Lokendra Bahadur Chand...
3   CCAT  b'Integ Inc said Thursday its net loss widened...
4   GCAT  b'These are the leading stories in the Skopje ...
5   ECAT  b'Fears of a slowdown in India\'s industrial g...
6   MCAT  b'The Australian Treasury will offer a total o...
7   CCAT  b'Malaysia\'s Suria Capital Holdings Bhd and M...
8   MCAT  b'The UK gilt repo market had a quiet session ...
9   CCAT  b"Commonwealth Edison Co's (ComEd) 794 megawat...
10  GCAT  b'Police arrested 47 people on Thursday in a c...
11  GCAT  b"Army troops in the Comoros island of Anjouan...
12  ECAT  b"The House Banking Committee is considering w...
13  GCAT  b'A possible international anti-drug centre in...
14  ECAT  b'Angela Knight, economic secretary to the Bri...
15  GCAT  b'Nearly 300 people were feared dead in floods...
16  MCAT  b'The Oslo stock index fell with other Europea...
17  ECAT  b'Morgan Keegan said it won $18.540 million of...
18  CCAT  b'Britons can bank on the phone, bank on the i...
19  CCAT  b"Standard Chartered Bank and Prudential Secur...
20  CCAT  b"United Water Resources Inc said it and Lyonn...
21  ECAT  b'Tanzania on Thursday unveiled its 1997/98 bu...
22  GCAT  b'U.S. President Bill Clinton will meet Prime ...
23  CCAT  b"Pacific Century Regional Developments Ltd sa...
24  MCAT  b'The Athens bourse ended 0.65 percent lower w...
25  ECAT  b'Sri Lanka broad money supply, or M2, is seen...
26  GCAT  b'Collated results of African Nations Cup prel...
27  GCAT  b'Philippine President Fidel Ramos said on Fri...
28  MCAT  b'Shanghai copper futures ended down on heavy ...
29  CCAT  b"Goldman Sachs & Co said on Monday that David...
30  ECAT  b'Maine\'s revenues were higher than forecast ...
31  CCAT  b'Thai animal feedmillers said on Monday they ...
32  MCAT  b"Worldwide trading volume in emerging markets...
33  ECAT  b'One week ended June 25 daily avgs-millions  ...
34  ECAT  b'Algeria\'s non-energy exports reached $688 m...
35  ECAT  b'U.S. seasonally adjusted retail sales rose 1...
36  MCAT  b'The Indonesian rupiah weakened against the d...
37  MCAT  b'Brazilian stocks ended slightly higher led b...
38  MCAT  b'The price of gold hung around the psychologi...
39  MCAT  b'The won closed stronger versus the dollar on...
     label                                               text
0     CCAT  b'RJR Nabisco Holdings Corp has prevailed over...
1     ECAT  b"Britain's economy grew 0.8 percent in the fo...
2     ECAT  b'Slovenia\'s state Institute of Macroeconomic...
3     CCAT  b"Belgium's second largest bank Credit Communa...
4     GCAT  b'The IRA ordered its guerrillas to observe a ...
...    ...                                                ...
3995  CCAT  b"A consortium comprising Itochu Corp and Hanj...
3996  ECAT  b"The volume of Hong Kong's domestic exports i...
3997  ECAT  b'The Danish finance ministry said on Tuesday ...
3998  GCAT  b'A court is to investigate charges that forme...
3999  MCAT  b"German consumers of feed grains, bread rye a...
[4000 rows x 2 columns]
  0%|          | 0.00/2.59M [00:00<?, ?iB/s]
 83%|████████▎ | 2.15M/2.59M [00:00<00:00, 15.6MiB/s]
100%|██████████| 2.59M/2.59M [00:00<00:00, 18.2MiB/s]
Train a Few Shot Classifier¶
In order to perform few shot classification, we need to use the few_shot_classification problem type.
from autogluon.multimodal import MultiModalPredictor
predictor_fs_text = MultiModalPredictor(
    problem_type="few_shot_classification",
    label="label",  # column name of the label
    eval_metric="acc",
)
predictor_fs_text.fit(train_df)
scores = predictor_fs_text.evaluate(test_df, metrics=["acc", "f1_macro"])
print(scores)
{'acc': 0.83575, 'f1_macro': 0.8344679316932194}
/home/ci/autogluon/multimodal/src/autogluon/multimodal/data/templates.py:16: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
No path specified. Models will be saved in: "AutogluonModels/ag-20250922_232018"
=================== System Info ===================
AutoGluon Version:  1.4.1b20250922
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.7.1+cu126
CUDA Version:       12.6
GPU Count:          1
Memory Avail:       28.45 GB / 30.95 GB (91.9%)
Disk Space Avail:   173.82 GB / 255.99 GB (67.9%)
===================================================
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232018
    ```
INFO: Seed set to 0
/home/ci/autogluon/multimodal/src/autogluon/multimodal/models/utils.py:1148: UserWarning: provided max length: 512 is smaller than sentence-transformers/all-mpnet-base-v2's default: 514
  warnings.warn(
GPU Count: 1
GPU Count to be Used: 1
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉
To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232018")
    ```
If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
Compare to the Default Classifier¶
Let’s use the default classification problem type and compare the performance with the above.
from autogluon.multimodal import MultiModalPredictor
predictor_default_text = MultiModalPredictor(
    label="label",
    problem_type="classification",
    eval_metric="acc",
)
predictor_default_text.fit(train_data=train_df)
scores = predictor_default_text.evaluate(test_df, metrics=["acc", "f1_macro"])
print(scores)
{'acc': 0.3805, 'f1_macro': 0.2994888766363928}
No path specified. Models will be saved in: "AutogluonModels/ag-20250922_232113"
=================== System Info ===================
AutoGluon Version:  1.4.1b20250922
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.7.1+cu126
CUDA Version:       12.6
GPU Count:          1
Memory Avail:       27.40 GB / 30.95 GB (88.5%)
Disk Space Avail:   172.59 GB / 255.99 GB (67.4%)
===================================================
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	4 unique label values:  ['GCAT', 'CCAT', 'ECAT', 'MCAT']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232113
    ```
INFO: Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/ci/opt/venv/lib/python3.12/site-packages/lightning/pytorch/utilities/model_summary/model_summary.py:231: Precision 16-mixed is not supported by the model summary.  Estimated model size in MB will not be accurate. Using 32 bits instead.
INFO: 
  | Name              | Type                         | Params | Mode 
---------------------------------------------------------------------------
0 | model             | HFAutoModelForTextPrediction | 108 M  | train
1 | validation_metric | MulticlassAccuracy           | 0      | train
2 | loss_func         | CrossEntropyLoss             | 0      | train
---------------------------------------------------------------------------
108 M     Trainable params
0         Non-trainable params
108 M     Total params
435.579   Total estimated model params size (MB)
229       Modules in train mode
0         Modules in eval mode
/home/ci/opt/venv/lib/python3.12/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (4) is smaller than the logging interval Trainer(log_every_n_steps=10). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
INFO: Epoch 0, global step 1: 'val_accuracy' reached 0.37500 (best 0.37500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232113/epoch=0-step=1.ckpt' as top 3
INFO: Epoch 1, global step 2: 'val_accuracy' reached 0.50000 (best 0.50000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232113/epoch=1-step=2.ckpt' as top 3
INFO: Epoch 2, global step 3: 'val_accuracy' reached 0.37500 (best 0.50000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232113/epoch=2-step=3.ckpt' as top 3
INFO: Epoch 3, global step 4: 'val_accuracy' was not in top 3
INFO: Epoch 4, global step 5: 'val_accuracy' reached 0.50000 (best 0.50000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232113/epoch=4-step=5.ckpt' as top 3
INFO: Epoch 5, global step 6: 'val_accuracy' reached 0.50000 (best 0.50000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232113/epoch=5-step=6.ckpt' as top 3
INFO: Epoch 6, global step 7: 'val_accuracy' was not in top 3
Start to fuse 3 checkpoints via the greedy soup algorithm.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉
To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232113")
    ```
If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
Few Shot Image Classification¶
We also provide an example of using MultiModalPredictor on a few-shot image classification task.
Load Dataset¶
We use the Stanford Cars dataset for demonstration and have downsampled the training set to have 8 samples per class. The Stanford Cars is an image classification dataset and contains 196 classes. For more information regarding the dataset, please see here.
import os
from autogluon.core.utils.loaders import load_zip, load_s3
download_dir = "./ag_automm_tutorial_fs_cls/stanfordcars/"
zip_file = "https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/stanfordcars.zip"
train_csv = "https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/train_8shot.csv"
test_csv = "https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/test.csv"
load_zip.unzip(zip_file, unzip_dir=download_dir)
dataset_path = os.path.join(download_dir)
Downloading ./ag_automm_tutorial_fs_cls/stanfordcars//file.zip from https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/stanfordcars.zip...
  0%|          | 0.00/1.96G [00:00<?, ?iB/s]
  0%|          | 8.38M/1.96G [00:00<00:28, 68.4MiB/s]
  1%|          | 16.8M/1.96G [00:00<00:32, 59.7MiB/s]
  1%|▏         | 25.0M/1.96G [00:00<00:28, 68.2MiB/s]
  2%|▏         | 32.0M/1.96G [00:00<00:33, 58.2MiB/s]
  2%|▏         | 38.1M/1.96G [00:00<00:42, 45.6MiB/s]
  2%|▏         | 43.1M/1.96G [00:00<00:44, 43.1MiB/s]
  2%|▏         | 48.9M/1.96G [00:01<00:46, 41.0MiB/s]
  3%|▎         | 53.2M/1.96G [00:01<00:48, 39.0MiB/s]
  3%|▎         | 58.7M/1.96G [00:01<00:54, 35.0MiB/s]
  3%|▎         | 67.1M/1.96G [00:01<00:43, 43.6MiB/s]
  4%|▍         | 73.7M/1.96G [00:01<00:50, 37.0MiB/s]
  4%|▍         | 77.8M/1.96G [00:01<00:59, 31.4MiB/s]
  4%|▍         | 82.1M/1.96G [00:02<01:03, 29.5MiB/s]
  4%|▍         | 85.6M/1.96G [00:02<01:01, 30.6MiB/s]
  5%|▍         | 92.3M/1.96G [00:02<00:50, 36.8MiB/s]
  5%|▌         | 100M/1.96G [00:02<00:41, 44.3MiB/s]
  5%|▌         | 105M/1.96G [00:02<00:42, 43.5MiB/s]
  6%|▌         | 110M/1.96G [00:02<00:46, 39.8MiB/s]
  6%|▌         | 120M/1.96G [00:02<00:33, 54.5MiB/s]
  6%|▋         | 126M/1.96G [00:02<00:35, 52.1MiB/s]
  7%|▋         | 133M/1.96G [00:03<00:37, 48.6MiB/s]
  7%|▋         | 138M/1.96G [00:03<00:37, 48.1MiB/s]
  8%|▊         | 147M/1.96G [00:03<00:31, 58.2MiB/s]
  8%|▊         | 153M/1.96G [00:03<00:34, 52.0MiB/s]
  8%|▊         | 159M/1.96G [00:03<00:33, 53.5MiB/s]
  9%|▊         | 168M/1.96G [00:03<00:32, 55.7MiB/s]
  9%|▉         | 175M/1.96G [00:03<00:31, 56.6MiB/s]
  9%|▉         | 181M/1.96G [00:03<00:35, 49.9MiB/s]
 10%|▉         | 188M/1.96G [00:04<00:32, 54.9MiB/s]
 10%|▉         | 193M/1.96G [00:04<00:36, 48.2MiB/s]
 10%|█         | 202M/1.96G [00:04<00:30, 57.7MiB/s]
 11%|█         | 210M/1.96G [00:04<00:32, 54.4MiB/s]
 11%|█         | 218M/1.96G [00:04<00:30, 57.4MiB/s]
 12%|█▏        | 227M/1.96G [00:04<00:26, 65.6MiB/s]
 12%|█▏        | 235M/1.96G [00:04<00:31, 54.3MiB/s]
 12%|█▏        | 243M/1.96G [00:05<00:29, 59.1MiB/s]
 13%|█▎        | 250M/1.96G [00:05<00:29, 57.9MiB/s]
 13%|█▎        | 256M/1.96G [00:05<00:30, 55.6MiB/s]
 14%|█▎        | 265M/1.96G [00:05<00:26, 63.5MiB/s]
 14%|█▍        | 271M/1.96G [00:05<00:32, 51.3MiB/s]
 14%|█▍        | 277M/1.96G [00:05<00:32, 51.7MiB/s]
 15%|█▍        | 285M/1.96G [00:05<00:34, 48.3MiB/s]
 15%|█▍        | 294M/1.96G [00:06<00:32, 51.3MiB/s]
 15%|█▌        | 302M/1.96G [00:06<00:32, 50.8MiB/s]
 16%|█▌        | 309M/1.96G [00:06<00:31, 52.6MiB/s]
 16%|█▌        | 314M/1.96G [00:06<00:33, 48.5MiB/s]
 16%|█▋        | 319M/1.96G [00:06<00:44, 36.9MiB/s]
 17%|█▋        | 327M/1.96G [00:06<00:37, 43.5MiB/s]
 17%|█▋        | 336M/1.96G [00:06<00:31, 51.0MiB/s]
 18%|█▊        | 344M/1.96G [00:07<00:29, 54.6MiB/s]
 18%|█▊        | 352M/1.96G [00:07<00:26, 61.2MiB/s]
 18%|█▊        | 361M/1.96G [00:07<00:30, 52.4MiB/s]
 19%|█▉        | 369M/1.96G [00:07<00:27, 57.7MiB/s]
 19%|█▉        | 377M/1.96G [00:07<00:26, 59.8MiB/s]
 20%|█▉        | 386M/1.96G [00:07<00:27, 56.2MiB/s]
 20%|██        | 394M/1.96G [00:07<00:25, 61.3MiB/s]
 21%|██        | 403M/1.96G [00:08<00:28, 55.3MiB/s]
 21%|██        | 409M/1.96G [00:08<00:30, 50.9MiB/s]
 21%|██        | 415M/1.96G [00:08<00:31, 48.9MiB/s]
 21%|██▏       | 420M/1.96G [00:08<00:33, 45.7MiB/s]
 22%|██▏       | 428M/1.96G [00:08<00:30, 49.6MiB/s]
 22%|██▏       | 436M/1.96G [00:08<00:28, 53.6MiB/s]
 23%|██▎       | 443M/1.96G [00:08<00:28, 52.4MiB/s]
 23%|██▎       | 448M/1.96G [00:09<00:31, 47.5MiB/s]
 23%|██▎       | 455M/1.96G [00:09<00:29, 50.7MiB/s]
 23%|██▎       | 460M/1.96G [00:09<00:29, 51.2MiB/s]
 24%|██▍       | 465M/1.96G [00:09<00:30, 48.1MiB/s]
 24%|██▍       | 470M/1.96G [00:09<00:34, 43.6MiB/s]
 24%|██▍       | 478M/1.96G [00:09<00:32, 45.7MiB/s]
 25%|██▍       | 487M/1.96G [00:09<00:27, 52.7MiB/s]
 25%|██▌       | 495M/1.96G [00:09<00:24, 59.3MiB/s]
 26%|██▌       | 503M/1.96G [00:09<00:22, 65.7MiB/s]
 26%|██▌       | 512M/1.96G [00:10<00:21, 68.1MiB/s]
 27%|██▋       | 520M/1.96G [00:10<00:20, 69.6MiB/s]
 27%|██▋       | 528M/1.96G [00:10<00:23, 60.1MiB/s]
 27%|██▋       | 537M/1.96G [00:10<00:22, 64.3MiB/s]
 28%|██▊       | 547M/1.96G [00:10<00:19, 72.9MiB/s]
 28%|██▊       | 554M/1.96G [00:10<00:27, 51.3MiB/s]
 29%|██▊       | 562M/1.96G [00:10<00:24, 55.9MiB/s]
 29%|██▉       | 570M/1.96G [00:11<00:26, 53.3MiB/s]
 30%|██▉       | 579M/1.96G [00:11<00:24, 56.3MiB/s]
 30%|██▉       | 587M/1.96G [00:11<00:25, 52.9MiB/s]
 30%|███       | 596M/1.96G [00:11<00:26, 50.6MiB/s]
 31%|███       | 604M/1.96G [00:11<00:25, 53.6MiB/s]
 31%|███▏      | 612M/1.96G [00:11<00:22, 59.2MiB/s]
 32%|███▏      | 619M/1.96G [00:11<00:22, 59.6MiB/s]
 32%|███▏      | 625M/1.96G [00:12<00:25, 52.5MiB/s]
 32%|███▏      | 633M/1.96G [00:12<00:22, 58.0MiB/s]
 33%|███▎      | 639M/1.96G [00:12<00:23, 55.9MiB/s]
 33%|███▎      | 646M/1.96G [00:12<00:25, 51.0MiB/s]
 33%|███▎      | 653M/1.96G [00:12<00:23, 54.5MiB/s]
 34%|███▎      | 658M/1.96G [00:12<00:28, 46.3MiB/s]
 34%|███▍      | 663M/1.96G [00:12<00:31, 41.5MiB/s]
 34%|███▍      | 671M/1.96G [00:13<00:27, 47.1MiB/s]
 35%|███▍      | 679M/1.96G [00:13<00:26, 48.1MiB/s]
 35%|███▌      | 688M/1.96G [00:13<00:25, 48.9MiB/s]
 36%|███▌      | 696M/1.96G [00:13<00:22, 55.2MiB/s]
 36%|███▌      | 705M/1.96G [00:13<00:21, 57.4MiB/s]
 36%|███▋      | 713M/1.96G [00:13<00:20, 59.6MiB/s]
 37%|███▋      | 721M/1.96G [00:13<00:19, 64.6MiB/s]
 37%|███▋      | 730M/1.96G [00:14<00:18, 67.3MiB/s]
 38%|███▊      | 738M/1.96G [00:14<00:19, 63.9MiB/s]
 38%|███▊      | 747M/1.96G [00:14<00:21, 57.2MiB/s]
 39%|███▊      | 755M/1.96G [00:14<00:19, 60.4MiB/s]
 39%|███▉      | 762M/1.96G [00:14<00:21, 56.6MiB/s]
 39%|███▉      | 767M/1.96G [00:14<00:24, 48.6MiB/s]
 39%|███▉      | 772M/1.96G [00:14<00:26, 44.2MiB/s]
 40%|███▉      | 778M/1.96G [00:15<00:26, 45.3MiB/s]
 40%|███▉      | 783M/1.96G [00:15<00:28, 41.8MiB/s]
 40%|████      | 787M/1.96G [00:15<00:31, 36.6MiB/s]
 40%|████      | 791M/1.96G [00:15<00:31, 37.1MiB/s]
 41%|████      | 797M/1.96G [00:15<00:27, 42.6MiB/s]
 41%|████      | 805M/1.96G [00:15<00:26, 43.9MiB/s]
 42%|████▏     | 814M/1.96G [00:15<00:26, 44.0MiB/s]
 42%|████▏     | 822M/1.96G [00:16<00:26, 43.3MiB/s]
 42%|████▏     | 832M/1.96G [00:16<00:20, 53.9MiB/s]
 43%|████▎     | 839M/1.96G [00:16<00:20, 55.6MiB/s]
 43%|████▎     | 847M/1.96G [00:16<00:20, 53.0MiB/s]
 44%|████▎     | 856M/1.96G [00:16<00:19, 56.8MiB/s]
 44%|████▍     | 864M/1.96G [00:16<00:19, 56.0MiB/s]
 44%|████▍     | 871M/1.96G [00:16<00:19, 56.2MiB/s]
 45%|████▍     | 876M/1.96G [00:17<00:21, 49.3MiB/s]
 45%|████▌     | 882M/1.96G [00:17<00:21, 50.2MiB/s]
 45%|████▌     | 889M/1.96G [00:17<00:21, 48.8MiB/s]
 46%|████▌     | 896M/1.96G [00:17<00:24, 43.5MiB/s]
 46%|████▌     | 901M/1.96G [00:17<00:24, 43.3MiB/s]
 46%|████▋     | 906M/1.96G [00:17<00:23, 44.7MiB/s]
 47%|████▋     | 914M/1.96G [00:17<00:21, 49.6MiB/s]
 47%|████▋     | 923M/1.96G [00:18<00:20, 49.3MiB/s]
 48%|████▊     | 931M/1.96G [00:18<00:19, 52.0MiB/s]
 48%|████▊     | 940M/1.96G [00:18<00:19, 52.9MiB/s]
 48%|████▊     | 948M/1.96G [00:18<00:19, 51.8MiB/s]
 49%|████▉     | 955M/1.96G [00:18<00:17, 56.5MiB/s]
 49%|████▉     | 961M/1.96G [00:18<00:19, 50.3MiB/s]
 49%|████▉     | 966M/1.96G [00:18<00:19, 50.8MiB/s]
 50%|████▉     | 973M/1.96G [00:19<00:22, 44.3MiB/s]
 50%|█████     | 980M/1.96G [00:19<00:20, 48.8MiB/s]
 50%|█████     | 985M/1.96G [00:19<00:24, 39.7MiB/s]
 51%|█████     | 990M/1.96G [00:19<00:23, 41.6MiB/s]
 51%|█████     | 998M/1.96G [00:19<00:19, 49.2MiB/s]
 51%|█████▏    | 1.01G/1.96G [00:19<00:17, 55.4MiB/s]
 52%|█████▏    | 1.01G/1.96G [00:19<00:15, 61.7MiB/s]
 52%|█████▏    | 1.02G/1.96G [00:20<00:18, 50.6MiB/s]
 52%|█████▏    | 1.03G/1.96G [00:20<00:18, 50.4MiB/s]
 53%|█████▎    | 1.04G/1.96G [00:20<00:15, 60.8MiB/s]
 53%|█████▎    | 1.05G/1.96G [00:20<00:13, 68.8MiB/s]
 54%|█████▍    | 1.05G/1.96G [00:20<00:12, 74.8MiB/s]
 54%|█████▍    | 1.06G/1.96G [00:20<00:14, 62.2MiB/s]
 55%|█████▍    | 1.07G/1.96G [00:20<00:17, 51.7MiB/s]
 55%|█████▍    | 1.07G/1.96G [00:20<00:17, 51.0MiB/s]
 55%|█████▌    | 1.08G/1.96G [00:21<00:17, 50.3MiB/s]
 56%|█████▌    | 1.09G/1.96G [00:21<00:17, 49.4MiB/s]
 56%|█████▌    | 1.09G/1.96G [00:21<00:18, 46.9MiB/s]
 56%|█████▌    | 1.10G/1.96G [00:21<00:17, 48.0MiB/s]
 57%|█████▋    | 1.11G/1.96G [00:21<00:18, 47.2MiB/s]
 57%|█████▋    | 1.12G/1.96G [00:21<00:15, 53.2MiB/s]
 57%|█████▋    | 1.12G/1.96G [00:21<00:14, 59.1MiB/s]
 58%|█████▊    | 1.13G/1.96G [00:22<00:15, 51.9MiB/s]
 58%|█████▊    | 1.14G/1.96G [00:22<00:14, 56.0MiB/s]
 59%|█████▊    | 1.15G/1.96G [00:22<00:14, 55.9MiB/s]
 59%|█████▉    | 1.16G/1.96G [00:22<00:14, 56.9MiB/s]
 59%|█████▉    | 1.16G/1.96G [00:22<00:15, 50.8MiB/s]
 60%|█████▉    | 1.17G/1.96G [00:22<00:19, 40.2MiB/s]
 60%|█████▉    | 1.17G/1.96G [00:22<00:16, 46.5MiB/s]
 60%|██████    | 1.18G/1.96G [00:23<00:16, 47.7MiB/s]
 61%|██████    | 1.19G/1.96G [00:23<00:14, 51.4MiB/s]
 61%|██████▏   | 1.20G/1.96G [00:23<00:14, 52.6MiB/s]
 62%|██████▏   | 1.21G/1.96G [00:23<00:14, 52.1MiB/s]
 62%|██████▏   | 1.21G/1.96G [00:23<00:15, 46.8MiB/s]
 62%|██████▏   | 1.22G/1.96G [00:23<00:14, 52.7MiB/s]
 63%|██████▎   | 1.22G/1.96G [00:23<00:14, 51.4MiB/s]
 63%|██████▎   | 1.23G/1.96G [00:24<00:14, 49.9MiB/s]
 63%|██████▎   | 1.24G/1.96G [00:24<00:12, 57.3MiB/s]
 64%|██████▍   | 1.25G/1.96G [00:24<00:11, 60.7MiB/s]
 64%|██████▍   | 1.25G/1.96G [00:24<00:14, 49.3MiB/s]
 65%|██████▍   | 1.26G/1.96G [00:24<00:12, 56.5MiB/s]
 65%|██████▍   | 1.27G/1.96G [00:24<00:11, 60.8MiB/s]
 65%|██████▌   | 1.28G/1.96G [00:24<00:10, 66.4MiB/s]
 66%|██████▌   | 1.29G/1.96G [00:24<00:12, 56.0MiB/s]
 66%|██████▌   | 1.29G/1.96G [00:25<00:11, 57.4MiB/s]
 66%|██████▋   | 1.30G/1.96G [00:25<00:11, 55.5MiB/s]
 67%|██████▋   | 1.31G/1.96G [00:25<00:11, 57.4MiB/s]
 67%|██████▋   | 1.32G/1.96G [00:25<00:10, 59.2MiB/s]
 68%|██████▊   | 1.33G/1.96G [00:25<00:11, 56.2MiB/s]
 68%|██████▊   | 1.33G/1.96G [00:25<00:10, 62.2MiB/s]
 69%|██████▊   | 1.34G/1.96G [00:25<00:09, 64.7MiB/s]
 69%|██████▉   | 1.35G/1.96G [00:26<00:10, 59.1MiB/s]
 69%|██████▉   | 1.36G/1.96G [00:26<00:10, 56.3MiB/s]
 70%|██████▉   | 1.37G/1.96G [00:26<00:10, 54.7MiB/s]
 70%|███████   | 1.38G/1.96G [00:26<00:11, 50.3MiB/s]
 71%|███████   | 1.38G/1.96G [00:26<00:10, 56.5MiB/s]
 71%|███████   | 1.39G/1.96G [00:26<00:11, 49.1MiB/s]
 71%|███████▏  | 1.40G/1.96G [00:26<00:11, 48.9MiB/s]
 72%|███████▏  | 1.40G/1.96G [00:27<00:13, 42.8MiB/s]
 72%|███████▏  | 1.41G/1.96G [00:27<00:11, 46.6MiB/s]
 72%|███████▏  | 1.42G/1.96G [00:27<00:12, 42.4MiB/s]
 73%|███████▎  | 1.42G/1.96G [00:27<00:13, 40.7MiB/s]
 73%|███████▎  | 1.43G/1.96G [00:27<00:13, 39.1MiB/s]
 73%|███████▎  | 1.43G/1.96G [00:27<00:10, 47.8MiB/s]
 74%|███████▎  | 1.44G/1.96G [00:27<00:09, 55.0MiB/s]
 74%|███████▍  | 1.45G/1.96G [00:28<00:12, 40.6MiB/s]
 74%|███████▍  | 1.45G/1.96G [00:28<00:12, 41.1MiB/s]
 75%|███████▍  | 1.46G/1.96G [00:28<00:10, 46.8MiB/s]
 75%|███████▍  | 1.47G/1.96G [00:28<00:09, 52.9MiB/s]
 75%|███████▌  | 1.48G/1.96G [00:28<00:09, 49.5MiB/s]
 76%|███████▌  | 1.49G/1.96G [00:28<00:08, 57.9MiB/s]
 76%|███████▋  | 1.49G/1.96G [00:28<00:08, 57.2MiB/s]
 77%|███████▋  | 1.50G/1.96G [00:29<00:08, 55.1MiB/s]
 77%|███████▋  | 1.51G/1.96G [00:29<00:07, 56.6MiB/s]
 78%|███████▊  | 1.52G/1.96G [00:29<00:07, 59.5MiB/s]
 78%|███████▊  | 1.53G/1.96G [00:29<00:07, 58.2MiB/s]
 78%|███████▊  | 1.54G/1.96G [00:29<00:06, 66.4MiB/s]
 79%|███████▉  | 1.54G/1.96G [00:29<00:05, 71.5MiB/s]
 79%|███████▉  | 1.55G/1.96G [00:29<00:06, 59.8MiB/s]
 80%|███████▉  | 1.56G/1.96G [00:30<00:07, 55.7MiB/s]
 80%|████████  | 1.57G/1.96G [00:30<00:06, 57.7MiB/s]
 80%|████████  | 1.57G/1.96G [00:30<00:07, 53.1MiB/s]
 81%|████████  | 1.58G/1.96G [00:30<00:07, 50.9MiB/s]
 81%|████████  | 1.59G/1.96G [00:30<00:06, 61.3MiB/s]
 81%|████████▏ | 1.59G/1.96G [00:30<00:05, 62.1MiB/s]
 82%|████████▏ | 1.60G/1.96G [00:30<00:05, 62.6MiB/s]
 82%|████████▏ | 1.61G/1.96G [00:30<00:05, 58.2MiB/s]
 83%|████████▎ | 1.62G/1.96G [00:31<00:06, 55.4MiB/s]
 83%|████████▎ | 1.63G/1.96G [00:31<00:05, 60.9MiB/s]
 84%|████████▎ | 1.64G/1.96G [00:31<00:05, 56.7MiB/s]
 84%|████████▍ | 1.64G/1.96G [00:31<00:05, 52.6MiB/s]
 84%|████████▍ | 1.65G/1.96G [00:31<00:06, 48.6MiB/s]
 84%|████████▍ | 1.65G/1.96G [00:31<00:06, 43.9MiB/s]
 85%|████████▍ | 1.66G/1.96G [00:31<00:07, 42.6MiB/s]
 85%|████████▍ | 1.66G/1.96G [00:32<00:07, 37.0MiB/s]
 85%|████████▌ | 1.67G/1.96G [00:32<00:06, 41.6MiB/s]
 86%|████████▌ | 1.68G/1.96G [00:32<00:06, 44.1MiB/s]
 86%|████████▌ | 1.68G/1.96G [00:32<00:07, 39.5MiB/s]
 86%|████████▌ | 1.69G/1.96G [00:32<00:06, 44.4MiB/s]
 86%|████████▋ | 1.69G/1.96G [00:32<00:05, 44.2MiB/s]
 87%|████████▋ | 1.70G/1.96G [00:32<00:06, 39.0MiB/s]
 87%|████████▋ | 1.70G/1.96G [00:33<00:06, 41.0MiB/s]
 87%|████████▋ | 1.71G/1.96G [00:33<00:04, 50.4MiB/s]
 88%|████████▊ | 1.72G/1.96G [00:33<00:05, 46.5MiB/s]
 88%|████████▊ | 1.73G/1.96G [00:33<00:04, 49.2MiB/s]
 89%|████████▊ | 1.74G/1.96G [00:33<00:04, 53.3MiB/s]
 89%|████████▉ | 1.74G/1.96G [00:33<00:04, 48.5MiB/s]
 90%|████████▉ | 1.75G/1.96G [00:34<00:03, 51.5MiB/s]
 90%|████████▉ | 1.76G/1.96G [00:34<00:05, 34.1MiB/s]
 90%|█████████ | 1.77G/1.96G [00:34<00:04, 39.8MiB/s]
 91%|█████████ | 1.78G/1.96G [00:34<00:04, 39.1MiB/s]
 91%|█████████▏| 1.79G/1.96G [00:34<00:03, 44.0MiB/s]
 92%|█████████▏| 1.80G/1.96G [00:35<00:03, 46.5MiB/s]
 92%|█████████▏| 1.80G/1.96G [00:35<00:03, 46.6MiB/s]
 93%|█████████▎| 1.81G/1.96G [00:35<00:02, 48.7MiB/s]
 93%|█████████▎| 1.82G/1.96G [00:35<00:02, 55.6MiB/s]
 93%|█████████▎| 1.83G/1.96G [00:35<00:02, 49.6MiB/s]
 94%|█████████▎| 1.83G/1.96G [00:35<00:02, 47.7MiB/s]
 94%|█████████▍| 1.84G/1.96G [00:35<00:02, 47.7MiB/s]
 94%|█████████▍| 1.84G/1.96G [00:36<00:02, 45.0MiB/s]
 94%|█████████▍| 1.85G/1.96G [00:36<00:02, 39.7MiB/s]
 95%|█████████▍| 1.85G/1.96G [00:36<00:02, 43.3MiB/s]
 95%|█████████▍| 1.86G/1.96G [00:36<00:02, 38.4MiB/s]
 95%|█████████▌| 1.86G/1.96G [00:36<00:02, 33.1MiB/s]
 95%|█████████▌| 1.87G/1.96G [00:37<00:03, 26.2MiB/s]
 96%|█████████▌| 1.87G/1.96G [00:37<00:03, 26.8MiB/s]
 96%|█████████▌| 1.88G/1.96G [00:37<00:02, 34.2MiB/s]
 96%|█████████▋| 1.89G/1.96G [00:37<00:01, 41.7MiB/s]
 97%|█████████▋| 1.90G/1.96G [00:37<00:01, 45.2MiB/s]
 97%|█████████▋| 1.90G/1.96G [00:37<00:01, 44.8MiB/s]
 98%|█████████▊| 1.91G/1.96G [00:37<00:00, 46.0MiB/s]
 98%|█████████▊| 1.92G/1.96G [00:38<00:00, 47.7MiB/s]
 98%|█████████▊| 1.92G/1.96G [00:38<00:00, 46.4MiB/s]
 99%|█████████▊| 1.93G/1.96G [00:38<00:00, 44.1MiB/s]
 99%|█████████▉| 1.94G/1.96G [00:38<00:00, 49.6MiB/s]
 99%|█████████▉| 1.94G/1.96G [00:38<00:00, 53.2MiB/s]
100%|█████████▉| 1.95G/1.96G [00:38<00:00, 53.0MiB/s]
100%|█████████▉| 1.96G/1.96G [00:38<00:00, 54.2MiB/s]
100%|██████████| 1.96G/1.96G [00:38<00:00, 50.5MiB/s]
Unzipping ./ag_automm_tutorial_fs_cls/stanfordcars//file.zip to ./ag_automm_tutorial_fs_cls/stanfordcars/
!wget https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/train_8shot.csv -O ./ag_automm_tutorial_fs_cls/stanfordcars/train.csv
!wget https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/test.csv -O ./ag_automm_tutorial_fs_cls/stanfordcars/test.csv
--2025-09-22 23:24:07--  https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/train_8shot.csv
Resolving automl-mm-bench.s3.amazonaws.com (automl-mm-bench.s3.amazonaws.com)... 52.216.217.201, 16.15.218.18, 54.231.195.177, ...
Connecting to automl-mm-bench.s3.amazonaws.com (automl-mm-bench.s3.amazonaws.com)|52.216.217.201|:443... connected.
HTTP request sent, awaiting response...
200 OK
Length: 94879 (93K) [text/csv]
Saving to: ‘./ag_automm_tutorial_fs_cls/stanfordcars/train.csv’
./ag_automm_tutoria 100%[===================>]  92.66K  --.-KB/s    in 0.002s  
2025-09-22 23:24:07 (56.0 MB/s) - ‘./ag_automm_tutorial_fs_cls/stanfordcars/train.csv’ saved [94879/94879]
--2025-09-22 23:24:08--  https://automl-mm-bench.s3.amazonaws.com/vision_datasets/stanfordcars/test.csv
Resolving automl-mm-bench.s3.amazonaws.com (automl-mm-bench.s3.amazonaws.com)... 52.216.152.100, 16.15.217.214, 3.5.10.16, ...
Connecting to automl-mm-bench.s3.amazonaws.com (automl-mm-bench.s3.amazonaws.com)|52.216.152.100|:443... connected.
HTTP request sent, awaiting response...
200 OK
Length: 34472 (34K) [text/csv]
Saving to: ‘./ag_automm_tutorial_fs_cls/stanfordcars/test.csv’
./ag_automm_tutoria 100%[===================>]  33.66K  --.-KB/s    in 0.002s  
2025-09-22 23:24:08 (21.8 MB/s) - ‘./ag_automm_tutorial_fs_cls/stanfordcars/test.csv’ saved [34472/34472]
import pandas as pd
import os
train_df_raw = pd.read_csv(os.path.join(download_dir, "train.csv"))
train_df = train_df_raw.drop(
        columns=[
            "Source",
            "Confidence",
            "XMin",
            "XMax",
            "YMin",
            "YMax",
            "IsOccluded",
            "IsTruncated",
            "IsGroupOf",
            "IsDepiction",
            "IsInside",
        ]
    )
train_df["ImageID"] = download_dir + train_df["ImageID"].astype(str)
test_df_raw = pd.read_csv(os.path.join(download_dir, "test.csv"))
test_df = test_df_raw.drop(
        columns=[
            "Source",
            "Confidence",
            "XMin",
            "XMax",
            "YMin",
            "YMax",
            "IsOccluded",
            "IsTruncated",
            "IsGroupOf",
            "IsDepiction",
            "IsInside",
        ]
    )
test_df["ImageID"] = download_dir + test_df["ImageID"].astype(str)
print(os.path.exists(train_df.iloc[0]["ImageID"]))
print(train_df)
print(os.path.exists(test_df.iloc[0]["ImageID"]))
print(test_df)
True
                                                ImageID  LabelName
0     ./ag_automm_tutorial_fs_cls/stanfordcars/train...        147
1     ./ag_automm_tutorial_fs_cls/stanfordcars/train...        120
2     ./ag_automm_tutorial_fs_cls/stanfordcars/train...        147
3     ./ag_automm_tutorial_fs_cls/stanfordcars/train...        167
4     ./ag_automm_tutorial_fs_cls/stanfordcars/train...         73
...                                                 ...        ...
1563  ./ag_automm_tutorial_fs_cls/stanfordcars/train...        116
1564  ./ag_automm_tutorial_fs_cls/stanfordcars/train...         76
1565  ./ag_automm_tutorial_fs_cls/stanfordcars/train...        148
1566  ./ag_automm_tutorial_fs_cls/stanfordcars/train...        189
1567  ./ag_automm_tutorial_fs_cls/stanfordcars/train...        183
[1568 rows x 2 columns]
True
                                               ImageID  LabelName
0    ./ag_automm_tutorial_fs_cls/stanfordcars/test/...          0
1    ./ag_automm_tutorial_fs_cls/stanfordcars/test/...          0
2    ./ag_automm_tutorial_fs_cls/stanfordcars/test/...          0
3    ./ag_automm_tutorial_fs_cls/stanfordcars/test/...          1
4    ./ag_automm_tutorial_fs_cls/stanfordcars/test/...          1
..                                                 ...        ...
583  ./ag_automm_tutorial_fs_cls/stanfordcars/test/...        194
584  ./ag_automm_tutorial_fs_cls/stanfordcars/test/...        194
585  ./ag_automm_tutorial_fs_cls/stanfordcars/test/...        195
586  ./ag_automm_tutorial_fs_cls/stanfordcars/test/...        195
587  ./ag_automm_tutorial_fs_cls/stanfordcars/test/...        195
[588 rows x 2 columns]
Train a Few Shot Classifier¶
Similarly, we need to initialize MultiModalPredictor with the problem type few_shot_classification.
from autogluon.multimodal import MultiModalPredictor
predictor_fs_image = MultiModalPredictor(
    problem_type="few_shot_classification",
    label="LabelName",  # column name of the label
    eval_metric="acc",
)
predictor_fs_image.fit(train_df)
scores = predictor_fs_image.evaluate(test_df, metrics=["acc", "f1_macro"])
print(scores)
{'acc': 0.7993197278911565, 'f1_macro': 0.7941690962099125}
No path specified. Models will be saved in: "AutogluonModels/ag-20250922_232408"
=================== System Info ===================
AutoGluon Version:  1.4.1b20250922
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.7.1+cu126
CUDA Version:       12.6
GPU Count:          1
Memory Avail:       24.19 GB / 30.95 GB (78.2%)
Disk Space Avail:   168.19 GB / 255.99 GB (65.7%)
===================================================
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232408
    ```
INFO: Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉
To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232408")
    ```
If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
Compare to the Default Classifier¶
We can also train a default image classifier and compare to the few shot classifier.
from autogluon.multimodal import MultiModalPredictor
predictor_default_image = MultiModalPredictor(
    problem_type="classification",
    label="LabelName",  # column name of the label
    eval_metric="acc",
)
predictor_default_image.fit(train_data=train_df)
scores = predictor_default_image.evaluate(test_df, metrics=["acc", "f1_macro"])
print(scores)
{'acc': 0.5527210884353742, 'f1_macro': 0.5417897295448316}
No path specified. Models will be saved in: "AutogluonModels/ag-20250922_232544"
=================== System Info ===================
AutoGluon Version:  1.4.1b20250922
Python Version:     3.12.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.7.1+cu126
CUDA Version:       12.6
GPU Count:          1
Memory Avail:       23.70 GB / 30.95 GB (76.6%)
Disk Space Avail:   160.97 GB / 255.99 GB (62.9%)
===================================================
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
	Label info (max, min, mean, stddev): (195, 0, 97.5, 56.59764)
	If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544
    ```
INFO: Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/ci/opt/venv/lib/python3.12/site-packages/lightning/pytorch/utilities/model_summary/model_summary.py:231: Precision 16-mixed is not supported by the model summary.  Estimated model size in MB will not be accurate. Using 32 bits instead.
INFO: 
  | Name              | Type                            | Params | Mode 
------------------------------------------------------------------------------
0 | model             | TimmAutoModelForImagePrediction | 96.3 M | train
1 | validation_metric | MulticlassAccuracy              | 0      | train
2 | loss_func         | CrossEntropyLoss                | 0      | train
------------------------------------------------------------------------------
96.3 M    Trainable params
0         Non-trainable params
96.3 M    Total params
385.132   Total estimated model params size (MB)
863       Modules in train mode
0         Modules in eval mode
INFO: Epoch 0, global step 4: 'val_accuracy' reached 0.00000 (best 0.00000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=0-step=4.ckpt' as top 3
INFO: Epoch 0, global step 9: 'val_accuracy' reached 0.00318 (best 0.00318), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=0-step=9.ckpt' as top 3
INFO: Epoch 1, global step 14: 'val_accuracy' reached 0.01592 (best 0.01592), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=1-step=14.ckpt' as top 3
INFO: Epoch 1, global step 19: 'val_accuracy' reached 0.04777 (best 0.04777), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=1-step=19.ckpt' as top 3
INFO: Epoch 2, global step 24: 'val_accuracy' reached 0.10510 (best 0.10510), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=2-step=24.ckpt' as top 3
INFO: Epoch 2, global step 29: 'val_accuracy' reached 0.13694 (best 0.13694), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=2-step=29.ckpt' as top 3
INFO: Epoch 3, global step 34: 'val_accuracy' reached 0.13376 (best 0.13694), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=3-step=34.ckpt' as top 3
INFO: Epoch 3, global step 39: 'val_accuracy' reached 0.17197 (best 0.17197), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=3-step=39.ckpt' as top 3
INFO: Epoch 4, global step 44: 'val_accuracy' reached 0.27389 (best 0.27389), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=4-step=44.ckpt' as top 3
INFO: Epoch 4, global step 49: 'val_accuracy' reached 0.28981 (best 0.28981), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=4-step=49.ckpt' as top 3
INFO: Epoch 5, global step 54: 'val_accuracy' reached 0.30892 (best 0.30892), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=5-step=54.ckpt' as top 3
INFO: Epoch 5, global step 59: 'val_accuracy' reached 0.34076 (best 0.34076), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=5-step=59.ckpt' as top 3
INFO: Epoch 6, global step 64: 'val_accuracy' reached 0.38854 (best 0.38854), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=6-step=64.ckpt' as top 3
INFO: Epoch 6, global step 69: 'val_accuracy' reached 0.41083 (best 0.41083), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=6-step=69.ckpt' as top 3
INFO: Epoch 7, global step 74: 'val_accuracy' reached 0.43312 (best 0.43312), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=7-step=74.ckpt' as top 3
INFO: Epoch 7, global step 79: 'val_accuracy' reached 0.43949 (best 0.43949), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=7-step=79.ckpt' as top 3
INFO: Epoch 8, global step 84: 'val_accuracy' reached 0.45223 (best 0.45223), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=8-step=84.ckpt' as top 3
INFO: Epoch 8, global step 89: 'val_accuracy' reached 0.49363 (best 0.49363), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=8-step=89.ckpt' as top 3
INFO: Epoch 9, global step 94: 'val_accuracy' reached 0.51911 (best 0.51911), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=9-step=94.ckpt' as top 3
INFO: Epoch 9, global step 99: 'val_accuracy' reached 0.50318 (best 0.51911), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=9-step=99.ckpt' as top 3
INFO: Epoch 10, global step 104: 'val_accuracy' reached 0.51592 (best 0.51911), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=10-step=104.ckpt' as top 3
INFO: Epoch 10, global step 109: 'val_accuracy' reached 0.53185 (best 0.53185), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=10-step=109.ckpt' as top 3
INFO: Epoch 11, global step 114: 'val_accuracy' reached 0.52866 (best 0.53185), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=11-step=114.ckpt' as top 3
INFO: Epoch 11, global step 119: 'val_accuracy' reached 0.53185 (best 0.53185), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=11-step=119.ckpt' as top 3
INFO: Epoch 12, global step 124: 'val_accuracy' reached 0.53503 (best 0.53503), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=12-step=124.ckpt' as top 3
INFO: Epoch 12, global step 129: 'val_accuracy' was not in top 3
INFO: Epoch 13, global step 134: 'val_accuracy' reached 0.53822 (best 0.53822), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=13-step=134.ckpt' as top 3
INFO: Epoch 13, global step 139: 'val_accuracy' reached 0.53503 (best 0.53822), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=13-step=139.ckpt' as top 3
INFO: Epoch 14, global step 144: 'val_accuracy' reached 0.54459 (best 0.54459), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=14-step=144.ckpt' as top 3
INFO: Epoch 14, global step 149: 'val_accuracy' was not in top 3
INFO: Epoch 15, global step 154: 'val_accuracy' was not in top 3
INFO: Epoch 15, global step 159: 'val_accuracy' reached 0.54140 (best 0.54459), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=15-step=159.ckpt' as top 3
INFO: Epoch 16, global step 164: 'val_accuracy' reached 0.54777 (best 0.54777), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=16-step=164.ckpt' as top 3
INFO: Epoch 16, global step 169: 'val_accuracy' reached 0.54777 (best 0.54777), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544/epoch=16-step=169.ckpt' as top 3
INFO: Epoch 17, global step 174: 'val_accuracy' was not in top 3
INFO: Epoch 17, global step 179: 'val_accuracy' was not in top 3
INFO: Epoch 18, global step 184: 'val_accuracy' was not in top 3
INFO: Epoch 18, global step 189: 'val_accuracy' was not in top 3
INFO: Epoch 19, global step 194: 'val_accuracy' was not in top 3
INFO: Epoch 19, global step 199: 'val_accuracy' was not in top 3
INFO: `Trainer.fit` stopped: `max_epochs=20` reached.
Start to fuse 3 checkpoints via the greedy soup algorithm.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉
To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250922_232544")
    ```
If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
As you can see that the few_shot_classification performs much better than the default classification in image classification as well.
Customization¶
To learn how to customize AutoMM, please refer to Customize AutoMM.