.. _sec_automm_detection_fast_ft_coco:

AutoMM Detection - Fast Finetune on COCO Format Dataset
=======================================================


.. figure:: https://automl-mm-bench.s3.amazonaws.com/object_detection/example_image/pothole144_gt.jpg
   :width: 500px

   Pothole Dataset


In this section, our goal is to fast finetune and evaluate a pretrained
model on `Pothole
dataset <https://www.kaggle.com/datasets/andrewmvd/pothole-detection>`__
in COCO format. Pothole is a single object, i.e. \ ``pothole``,
detection dataset, containing 665 images with bounding box annotations
for the creation of detection models and can work as POC/POV for the
maintenance of roads. See :ref:`sec_automm_detection_prepare_voc` for
how to prepare Pothole dataset.

To start, let’s import MultiModalPredictor:

.. code:: python

    from autogluon.multimodal import MultiModalPredictor


.. parsed-literal::
    :class: output

    The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


.. parsed-literal::
    :class: output

    Moving 0 files to the new cache system


.. parsed-literal::
    :class: output

    0it [00:00, ?it/s]


Make sure ``mmcv-full`` and ``mmdet`` are installed:

.. code:: python

    !mim install mmcv-full
    !pip install mmdet


.. parsed-literal::
    :class: output

    Looking in links: https://download.openmmlab.com/mmcv/dist/cu117/torch1.13.0/index.html
    Requirement already satisfied: mmcv-full in /home/ci/opt/venv/lib/python3.8/site-packages (1.7.1)
    Requirement already satisfied: opencv-python>=3 in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (4.7.0.68)
    Requirement already satisfied: addict in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (2.4.0)
    Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (1.23.5)
    Requirement already satisfied: packaging in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (23.0)
    Requirement already satisfied: Pillow in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (9.4.0)
    Requirement already satisfied: pyyaml in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (5.4.1)
    Requirement already satisfied: yapf in /home/ci/opt/venv/lib/python3.8/site-packages (from mmcv-full) (0.32.0)
    Requirement already satisfied: mmdet in /home/ci/opt/venv/lib/python3.8/site-packages (2.28.1)
    Requirement already satisfied: terminaltables in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (3.1.10)
    Requirement already satisfied: scipy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.10.0)
    Requirement already satisfied: matplotlib in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (3.6.3)
    Requirement already satisfied: pycocotools in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (2.0.6)
    Requirement already satisfied: six in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.16.0)
    Requirement already satisfied: numpy in /home/ci/opt/venv/lib/python3.8/site-packages (from mmdet) (1.23.5)
    Requirement already satisfied: kiwisolver>=1.0.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (1.4.4)
    Requirement already satisfied: packaging>=20.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (23.0)
    Requirement already satisfied: pillow>=6.2.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (9.4.0)
    Requirement already satisfied: pyparsing>=2.2.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (3.0.9)
    Requirement already satisfied: cycler>=0.10 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (0.11.0)
    Requirement already satisfied: fonttools>=4.22.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (4.38.0)
    Requirement already satisfied: contourpy>=1.0.1 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (1.0.7)
    Requirement already satisfied: python-dateutil>=2.7 in /home/ci/opt/venv/lib/python3.8/site-packages (from matplotlib->mmdet) (2.8.2)


And also import some other packages that will be used in this tutorial:

.. code:: python

    import os
    import time
    
    from autogluon.core.utils.loaders import load_zip

We have the sample dataset ready in the cloud. Let’s download it:

.. code:: python

    zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip"
    download_dir = "./pothole"
    
    load_zip.unzip(zip_file, unzip_dir=download_dir)
    data_dir = os.path.join(download_dir, "pothole")
    train_path = os.path.join(data_dir, "Annotations", "usersplit_train_cocoformat.json")
    val_path = os.path.join(data_dir, "Annotations", "usersplit_val_cocoformat.json")
    test_path = os.path.join(data_dir, "Annotations", "usersplit_test_cocoformat.json")


.. parsed-literal::
    :class: output

    Downloading ./pothole/file.zip from https://automl-mm-bench.s3.amazonaws.com/object_detection/dataset/pothole.zip...


.. parsed-literal::
    :class: output

    100%|██████████| 351M/351M [00:06<00:00, 50.8MiB/s]


While using COCO format dataset, the input is the json annotation file
of the dataset split. In this example,
``usersplit_train_cocoformat.json`` is the annotation file of the train
split. ``usersplit_val_cocoformat.json`` is the annotation file of the
validation split. And ``usersplit_test_cocoformat.json`` is the
annotation file of the test split.

We select the YOLOv3 with MobileNetV2 as backbone, and input resolution
is 320x320, pretrained on COCO dataset. With this setting, it is fast to
finetune or inference, and easy to deploy. And we use all the GPUs (if
any):

.. code:: python

    checkpoint_name = "yolov3_mobilenetv2_320_300e_coco"
    num_gpus = -1  # use all GPUs

We create the MultiModalPredictor with selected checkpoint name and
number of GPUs. We need to specify the problem_type to
``"object_detection"``, and also provide a ``sample_data_path`` for the
predictor to infer the catgories of the dataset. Here we provide the
``train_path``, and it also works using any other split of this dataset.

.. code:: python

    predictor = MultiModalPredictor(
        hyperparameters={
            "model.mmdet_image.checkpoint_name": checkpoint_name,
            "env.num_gpus": num_gpus,
        },
        problem_type="object_detection",
        sample_data_path=train_path,
    )


.. parsed-literal::
    :class: output

    processing yolov3_mobilenetv2_320_300e_coco...


.. raw:: html

    <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"></pre>


.. parsed-literal::
    :class: output

    Output()


.. raw:: html

    <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">
    </pre>


.. parsed-literal::
    :class: output

    [32mSuccessfully downloaded yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
    [32mSuccessfully dumped yolov3_mobilenetv2_320_300e_coco.py to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
    processing yolov3_mobilenetv2_320_300e_coco...
    [32myolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth exists in /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
    [32mSuccessfully dumped yolov3_mobilenetv2_320_300e_coco.py to /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune[0m
    load checkpoint from local path: yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth
    The model and loaded state dict do not match exactly
    
    size mismatch for bbox_head.convs_pred.0.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
    size mismatch for bbox_head.convs_pred.0.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
    size mismatch for bbox_head.convs_pred.1.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
    size mismatch for bbox_head.convs_pred.1.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).
    size mismatch for bbox_head.convs_pred.2.weight: copying a param with shape torch.Size([255, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([18, 96, 1, 1]).
    size mismatch for bbox_head.convs_pred.2.bias: copying a param with shape torch.Size([255]) from checkpoint, the shape in current model is torch.Size([18]).


We set the learning rate to be ``2e-4``. Note that we use a two-stage
learning rate option during finetuning by default, and the model head
will have 100x learning rate. Using a two-stage learning rate with high
learning rate only on head layers makes the model converge faster during
finetuning. It usually gives better performance as well, especially on
small datasets with hundreds or thousands of images. We also set the
epoch to be 30 for fast finetuning and batch_size to be 32. We also
compute the time of the fit process here for better understanding the
speed.

.. code:: python

    import time
    start = time.time()
    predictor.fit(
        train_path,
        hyperparameters={
            "optimization.learning_rate": 2e-4, # we use two stage and detection head has 100x lr
            "optimization.max_epochs": 30,
            "env.per_gpu_batch_size": 32,  # decrease it when model is large
        },
    )
    end = time.time()


.. parsed-literal::
    :class: output

    Using default root folder: ./pothole/pothole/Annotations/... Specify `root=...` if you feel it is wrong...
    Global seed set to 123
    No path specified. Models will be saved in: "AutogluonModels/ag-20230214_015606/"


.. parsed-literal::
    :class: output

    loading annotations into memory...
    Done (t=0.00s)
    creating index...
    index created!


.. parsed-literal::
    :class: output

    AutoMM starts to create your model. ✨
    
    - Model will be saved to "/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606".
    
    - Validation metric is "map".
    
    - To track the learning progress, you can open a terminal and launch Tensorboard:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606
        ```
    
    Enjoy your coffee, and let AutoMM do the job ☕☕☕ Learn more at https://auto.gluon.ai
    
    /home/ci/opt/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:577: LightningDeprecationWarning: The Trainer argument `auto_select_gpus` has been deprecated in v1.9.0 and will be removed in v2.0.0. Please use the function `pytorch_lightning.accelerators.find_usable_cuda_devices` instead.
      rank_zero_deprecation(
    GPU available: True (cuda), used: True
    TPU available: False, using: 0 TPU cores
    IPU available: False, using: 0 IPUs
    HPU available: False, using: 0 HPUs
    `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
    
      | Name              | Type                             | Params
    -----------------------------------------------------------------------
    0 | model             | MMDetAutoModelForObjectDetection | 3.7 M 
    1 | validation_metric | MeanAveragePrecision             | 0     
    -----------------------------------------------------------------------
    3.7 M     Trainable params
    0         Non-trainable params
    3.7 M     Total params
    14.675    Total estimated model params size (MB)
    Epoch 2, global step 9: 'val_map' reached 0.00517 (best 0.00517), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606/epoch=2-step=9.ckpt' as top 1
    Epoch 5, global step 18: 'val_map' reached 0.06065 (best 0.06065), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606/epoch=5-step=18.ckpt' as top 1
    Epoch 8, global step 27: 'val_map' reached 0.14553 (best 0.14553), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606/epoch=8-step=27.ckpt' as top 1
    Epoch 11, global step 36: 'val_map' reached 0.18481 (best 0.18481), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606/epoch=11-step=36.ckpt' as top 1
    Epoch 14, global step 45: 'val_map' reached 0.22442 (best 0.22442), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606/epoch=14-step=45.ckpt' as top 1
    Epoch 17, global step 54: 'val_map' reached 0.25855 (best 0.25855), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606/epoch=17-step=54.ckpt' as top 1
    Epoch 20, global step 63: 'val_map' reached 0.26429 (best 0.26429), saving model to '/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606/epoch=20-step=63.ckpt' as top 1
    Epoch 23, global step 72: 'val_map' was not in top 1
    Epoch 26, global step 81: 'val_map' was not in top 1
    Epoch 29, global step 90: 'val_map' was not in top 1
    `Trainer.fit` stopped: `max_epochs=30` reached.
    AutoMM has created your model 🎉🎉🎉
    
    - To load the model, use the code below:
        ```python
        from autogluon.multimodal import MultiModalPredictor
        predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606")
        ```
    
    - You can open a terminal and launch Tensorboard to visualize the training log:
        ```shell
        # Assume you have installed tensorboard
        tensorboard --logdir /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015606
        ```
    
    - If you are not satisfied with the model, try to increase the training time, 
    adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
    or post issues on GitHub: https://github.com/autogluon/autogluon
    
    
Print out the time and we can see that it’s fast!

.. code:: python

    print("This finetuning takes %.2f seconds." % (end - start))


.. parsed-literal::
    :class: output

    This finetuning takes 158.85 seconds.


To evaluate the model we just trained, run:

.. code:: python

    predictor.evaluate(test_path)


.. parsed-literal::
    :class: output

    Using default root folder: ./pothole/pothole/Annotations/... Specify `root=...` if you feel it is wrong...


.. parsed-literal::
    :class: output

    loading annotations into memory...
    Done (t=0.00s)
    creating index...
    index created!


.. parsed-literal::
    :class: output

    /home/ci/opt/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:577: LightningDeprecationWarning: The Trainer argument `auto_select_gpus` has been deprecated in v1.9.0 and will be removed in v2.0.0. Please use the function `pytorch_lightning.accelerators.find_usable_cuda_devices` instead.
      rank_zero_deprecation(
    A new predictor save path is created.This is to prevent you to overwrite previous predictor saved here.You could check current save path at predictor._save_path.If you still want to use this path, set resume=True
    No path specified. Models will be saved in: "AutogluonModels/ag-20230214_015846/"


.. parsed-literal::
    :class: output

    saving file at /home/ci/autogluon/docs/_build/eval/tutorials/multimodal/object_detection/finetune/AutogluonModels/ag-20230214_015846/object_detection_result_cache.json
    loading annotations into memory...
    Done (t=0.00s)
    creating index...
    index created!
    Loading and preparing results...
    DONE (t=0.01s)
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *bbox*
    DONE (t=0.24s).
    Accumulating evaluation results...
    DONE (t=0.04s).
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.225
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.536
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.178
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.047
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.220
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.390
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.171
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.343
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.390
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.393
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.509


.. parsed-literal::
    :class: output

    {'map': 0.2252467580757856,
     'mean_average_precision': 0.2252467580757856,
     'map_50': 0.5361216854926647,
     'map_75': 0.17767789882203627,
     'map_small': 0.047494345058986594,
     'map_medium': 0.21996057096849003,
     'map_large': 0.390208121378859,
     'mar_1': 0.17138643067846607,
     'mar_10': 0.3427728613569322,
     'mar_100': 0.3902654867256637,
     'mar_small': 0.23661971830985915,
     'mar_medium': 0.3933701657458563,
     'mar_large': 0.5091954022988506}


And the evaluation results are shown in command line output. The first
value is mAP in COCO standard, and the second one is mAP in VOC standard
(or mAP50). For more details about these metrics, see `COCO’s evaluation
guideline <https://cocodataset.org/#detection-eval>`__.

We can get the prediction on test set:

.. code:: python

    pred = predictor.predict(test_path)


.. parsed-literal::
    :class: output

    Using default root folder: ./pothole/pothole/Annotations/... Specify `root=...` if you feel it is wrong...


.. parsed-literal::
    :class: output

    loading annotations into memory...
    Done (t=0.00s)
    creating index...
    index created!


.. parsed-literal::
    :class: output

    /home/ci/opt/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:577: LightningDeprecationWarning: The Trainer argument `auto_select_gpus` has been deprecated in v1.9.0 and will be removed in v2.0.0. Please use the function `pytorch_lightning.accelerators.find_usable_cuda_devices` instead.
      rank_zero_deprecation(


Let’s also visualize the prediction result:

.. code:: python

    !pip install opencv-python


.. parsed-literal::
    :class: output

    Requirement already satisfied: opencv-python in /home/ci/opt/venv/lib/python3.8/site-packages (4.7.0.68)
    Requirement already satisfied: numpy>=1.17.0 in /home/ci/opt/venv/lib/python3.8/site-packages (from opencv-python) (1.23.5)


.. code:: python

    from autogluon.multimodal.utils import visualize_detection
    conf_threshold = 0.25  # Specify a confidence threshold to filter out unwanted boxes
    visualization_result_dir = "./"  # Use the pwd as result dir to save the visualized image
    visualized = visualize_detection(
        pred=pred[12:13],
        detection_classes=predictor.get_predictor_classes(),
        conf_threshold=conf_threshold,
        visualization_result_dir=visualization_result_dir,
    )
    from PIL import Image
    from IPython.display import display
    img = Image.fromarray(visualized[0][:, :, ::-1], 'RGB')
    display(img)


.. parsed-literal::
    :class: output

    Saved visualizations to ./


.. figure:: output_detection_fast_finetune_coco_631583_22_1.png


Under this fast finetune setting, we reached a good mAP number on a new
dataset with a few hundred seconds! For how to finetune with higher
performance, see :ref:`sec_automm_detection_high_ft_coco`, where we
finetuned a VFNet model with 5 hours and reached
``mAP = 0.450, mAP50 = 0.718`` on this dataset.

Other Examples
~~~~~~~~~~~~~~

You may go to `AutoMM
Examples <https://github.com/autogluon/autogluon/tree/master/examples/automm>`__
to explore other examples about AutoMM.

Customization
~~~~~~~~~~~~~

To learn how to customize AutoMM, please refer to
:ref:`sec_automm_customization`.

Citation
~~~~~~~~

::

   @misc{redmon2018yolov3,
       title={YOLOv3: An Incremental Improvement},
       author={Joseph Redmon and Ali Farhadi},
       year={2018},
       eprint={1804.02767},
       archivePrefix={arXiv},
       primaryClass={cs.CV}
   }