AutoMM Detection - Fast Finetune on COCO Format Dataset¶

https://automl-mm-bench.s3.amazonaws.com/object_detection/example_image/pothole144_gt.jpg

Fig. 2 Pothole Dataset¶

In this section, our goal is to fast finetune and evaluate a pretrained model on Pothole dataset in COCO format. Pothole is a single object, i.e. pothole, detection dataset, containing 665 images with bounding box annotations for the creation of detection models and can work as POC/POV for the maintenance of roads. See AutoMM Detection - Prepare Pascal VOC Dataset for how to prepare Pothole dataset.

To start, let’s import MultiModalPredictor:

from autogluon.multimodal import MultiModalPredictor

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().

Moving 0 files to the new cache system

0it [00:00, ?it/s]

Make sure mmcv-full and mmdet are installed:

!mim install mmcv-full
!pip install mmdet

Looking in links: https://download.openmmlab.com/mmcv/dist/cu117/torch1.13.0/index.html
Requirement already satisfied: mmcv-full in /home/ci/opt/venv/lib/python3.8/site-packages (1.7.1)
Requirement already satisfied: packaging in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (23.0)
Requirement already satisfied: addict in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (2.4.0)
Requirement already satisfied: Pillow in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (9.4.0)
Requirement already satisfied: opencv-python>=3 in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (4.7.0.68)
Requirement already satisfied: pyyaml in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (5.4.1)
Requirement already satisfied: yapf in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (0.32.0)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (1.22.4)
Requirement already satisfied: mmdet in /home/ci/opt/venv/lib/python3.8/site-packages (2.28.1)
Requirement already satisfied: scipy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.8.1)
Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.22.4)
Requirement already satisfied: terminaltables in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (3.1.10)
Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (3.6.3)
Requirement already satisfied: six in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.16.0)
Requirement already satisfied: pycocotools in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (2.0.6)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (1.4.4)
Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (4.38.0)
Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (2.8.2)
Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (1.0.7)
Requirement already satisfied: pillow>=6.2.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (9.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (3.0.9)
Requirement already satisfied: packaging>=20.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (23.0)

And also import some other packages that will be used in this tutorial:

import os
import time

from autogluon.core.utils.loaders import load_zip

We have the sample dataset ready in the cloud. Let’s download it:

zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip"
download_dir = "./pothole"

load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "pothole")
train_path = os.path.join(data_dir, "Annotations", "usersplit_train_cocoformat.json")
val_path = os.path.join(data_dir, "Annotations", "usersplit_val_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "usersplit_test_cocoformat.json")

Downloading ./pothole/file.zip from https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip...

100%|██████████| 351M/351M [00:06<00:00, 51.3MiB/s]

While using COCO format dataset, the input is the json annotation file of the dataset split. In this example, usersplit_train_cocoformat.json is the annotation file of the train split. usersplit_val_cocoformat.json is the annotation file of the validation split. And usersplit_test_cocoformat.json is the annotation file of the test split.

We select the YOLOv3 with MobileNetV2 as backbone, and input resolution is 320x320, pretrained on COCO dataset. With this setting, it is fast to finetune or inference, and easy to deploy. And we use all the GPUs (if any):

checkpoint_name = "yolov3_mobilenetv2_320_300e_coco"
num_gpus = -1  # use all GPUs

We create the MultiModalPredictor with selected checkpoint name and number of GPUs. We need to specify the problem_type to "object_detection", and also provide a sample_data_path for the predictor to infer the catgories of the dataset. Here we provide the train_path, and it also works using any other split of this dataset.

predictor = MultiModalPredictor(
    hyperparameters={
        "model.mmdet_image.checkpoint_name": checkpoint_name,
        "env.num_gpus": num_gpus,
    },
    problem_type="object_detection",
    sample_data_path=train_path,
)

processing yolov3_mobilenetv2_320_300e_coco...

Output()

[32mSuccessfully downloaded yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
[32mSuccessfully dumped yolov3_mobilenetv2_320_300e_coco.py to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
processing yolov3_mobilenetv2_320_300e_coco...
[32myolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth exists in /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
[32mSuccessfully dumped yolov3_mobilenetv2_320_300e_coco.py to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
load checkpoint from local path: yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth
The model and loaded state dict do not match exactly

size mismatch for bbox_head.convs_pred.0.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
size mismatch for bbox_head.convs_pred.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
size mismatch for bbox_head.convs_pred.1.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
size mismatch for bbox_head.convs_pred.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
size mismatch for bbox_head.convs_pred.2.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
size mismatch for bbox_head.convs_pred.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).

We set the learning rate to be 2e-4. Note that we use a two-stage learning rate option during finetuning by default, and the model head will have 100x learning rate. Using a two-stage learning rate with high learning rate only on head layers makes the model converge faster during finetuning. It usually gives better performance as well, especially on small datasets with hundreds or thousands of images. We also set the epoch to be 30 for fast finetuning and batch_size to be 32. We also compute the time of the fit process here for better understanding the speed.

import time
start = time.time()
predictor.fit(
    train_path,
    hyperparameters={
        "optimization.learning_rate": 2e-4, # we use two stage and detection head has 100x lr
        "optimization.max_epochs": 30,
        "env.per_gpu_batch_size": 32,  # decrease it when model is large
    },
)
end = time.time()

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!

Global seed set to 123
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                             | Params
-----------------------------------------------------------------------
0 | model             | MMDetAutoModelForObjectDetection | 3.7 M
1 | validation_metric | MeanMetric                       | 0
-----------------------------------------------------------------------
3.7 M     Trainable params
0         Non-trainable params
3.7 M     Total params
14.675    Total estimated model params size (MB)
Epoch 0, global step 1: 'val_direct_loss' reached 52603.91797 (best 52603.91797), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=0-step=1.ckpt' as top 1
/home/ci/opt/venv/lib/python3.8/site-packages/pytorch_lightning/utilities/cloud_io.py:33: LightningDeprecationWarning: pytorch_lightning.utilities.cloud_io.get_filesystem has been deprecated in v1.8.0 and will be removed in v1.10.0. Please use lightning_lite.utilities.cloud_io.get_filesystem instead.
  rank_zero_deprecation(
/home/ci/opt/venv/lib/python3.8/site-packages/pytorch_lightning/utilities/cloud_io.py:25: LightningDeprecationWarning: pytorch_lightning.utilities.cloud_io.atomic_save has been deprecated in v1.8.0 and will be removed in v1.10.0. This function is internal but you can copy over its implementation.
  rank_zero_deprecation(
Epoch 0, global step 3: 'val_direct_loss' reached 7221.30127 (best 7221.30127), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=0-step=3.ckpt' as top 1
Epoch 1, global step 4: 'val_direct_loss' reached 2628.90527 (best 2628.90527), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=1-step=4.ckpt' as top 1
Epoch 1, global step 6: 'val_direct_loss' reached 911.72858 (best 911.72858), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=1-step=6.ckpt' as top 1
Epoch 2, global step 7: 'val_direct_loss' reached 763.92450 (best 763.92450), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=2-step=7.ckpt' as top 1
Epoch 2, global step 9: 'val_direct_loss' reached 714.85297 (best 714.85297), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=2-step=9.ckpt' as top 1
Epoch 3, global step 10: 'val_direct_loss' was not in top 1
Epoch 3, global step 12: 'val_direct_loss' reached 681.82672 (best 681.82672), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=3-step=12.ckpt' as top 1
Epoch 4, global step 13: 'val_direct_loss' reached 596.27637 (best 596.27637), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=4-step=13.ckpt' as top 1
Epoch 4, global step 15: 'val_direct_loss' was not in top 1
Epoch 5, global step 16: 'val_direct_loss' reached 591.26923 (best 591.26923), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=5-step=16.ckpt' as top 1
Epoch 5, global step 18: 'val_direct_loss' reached 563.00592 (best 563.00592), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=5-step=18.ckpt' as top 1
Epoch 6, global step 19: 'val_direct_loss' was not in top 1
Epoch 6, global step 21: 'val_direct_loss' was not in top 1
Epoch 7, global step 22: 'val_direct_loss' reached 545.98834 (best 545.98834), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=7-step=22.ckpt' as top 1
Epoch 7, global step 24: 'val_direct_loss' was not in top 1
Epoch 8, global step 25: 'val_direct_loss' reached 474.08963 (best 474.08963), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=8-step=25.ckpt' as top 1
Epoch 8, global step 27: 'val_direct_loss' was not in top 1
Epoch 9, global step 28: 'val_direct_loss' was not in top 1
Epoch 9, global step 30: 'val_direct_loss' was not in top 1
Epoch 10, global step 31: 'val_direct_loss' was not in top 1
Epoch 10, global step 33: 'val_direct_loss' was not in top 1
Epoch 11, global step 34: 'val_direct_loss' was not in top 1
Epoch 11, global step 36: 'val_direct_loss' reached 465.77646 (best 465.77646), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=11-step=36.ckpt' as top 1
Epoch 12, global step 37: 'val_direct_loss' reached 413.23276 (best 413.23276), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231255/epoch=12-step=37.ckpt' as top 1
Epoch 12, global step 39: 'val_direct_loss' was not in top 1
Epoch 13, global step 40: 'val_direct_loss' was not in top 1
Epoch 13, global step 42: 'val_direct_loss' was not in top 1
Epoch 14, global step 43: 'val_direct_loss' was not in top 1
Epoch 14, global step 45: 'val_direct_loss' was not in top 1
Epoch 15, global step 46: 'val_direct_loss' was not in top 1
Epoch 15, global step 48: 'val_direct_loss' was not in top 1
Epoch 16, global step 49: 'val_direct_loss' was not in top 1
Epoch 16, global step 51: 'val_direct_loss' was not in top 1
Epoch 17, global step 52: 'val_direct_loss' was not in top 1

Print out the time and we can see that it’s fast!

print("This finetuning takes %.2f seconds." % (end - start))

This finetuning takes 145.01 seconds.

To evaluate the model we just trained, run:

predictor.evaluate(test_path)

Global seed set to 123

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!

WARNING:automm:A new predictor save path is created.This is to prevent you to overwrite previous predictor saved here.You could check current save path at predictor._save_path.If you still want to use this path, set resume=True

saving file at /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230206_231522/object_detection_result_cache.json
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.02s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=0.42s).
Accumulating evaluation results...
DONE (t=0.05s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.162
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.426
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.088
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.031
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.159
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.288
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.140
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.280
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.326
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.431

{'map': 0.16155008768665033}

And the evaluation results are shown in command line output. The first value is mAP in COCO standard, and the second one is mAP in VOC standard (or mAP50). For more details about these metrics, see COCO’s evaluation guideline.

We can get the prediction on test set:

pred = predictor.predict(test_path)

Global seed set to 123

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!

Let’s also visualize the prediction result:

!pip install opencv-python

Requirement already satisfied: opencv-python in /home/ci/opt/venv/lib/python3.8/site-packages (4.7.0.68)
Requirement already satisfied: numpy>=1.17.3 in /home/ci/opt/venv/lib/python3.8/site-packages (from opencv-python) (1.22.4)

from autogluon.multimodal.utils import visualize_detection
conf_threshold = 0.25  # Specify a confidence threshold to filter out unwanted boxes
visualization_result_dir = "./"  # Use the pwd as result dir to save the visualized image
visualized = visualize_detection(
    pred=pred[12:13],
    detection_classes=predictor.get_predictor_classes(),
    conf_threshold=conf_threshold,
    visualization_result_dir=visualization_result_dir,
)
from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized[0][:, :, ::-1], 'RGB')
display(img)

../../../../_images/output_detection_fast_finetune_coco_631583_22_0.png

Under this fast finetune setting, we reached a good mAP number on a new dataset with a few hundred seconds! For how to finetune with higher performance, see AutoMM Detection - High Performance Finetune on COCO Format Dataset, where we finetuned a VFNet model with 5 hours and reached mAP = 0.450, mAP50 = 0.718 on this dataset.

Other Examples¶

You may go to AutoMM Examples to explore other examples about AutoMM.

Customization¶

To learn how to customize AutoMM, please refer to Customize AutoMM.

Citation¶

@misc{redmon2018yolov3,
    title={YOLOv3: An Incremental Improvement},
    author={Joseph Redmon and Ali Farhadi},
    year={2018},
    eprint={1804.02767},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}