.. _sec_tabularprediction_text_multimodal:

Explore Models for Data Tables with Text and Categorical Features
=================================================================


We will introduce how to use AutoGluon to deal with tabular data that
involves text and categorical features. This type of data, i.e., data
which contains text and other features, is prevalent in real world
applications. For example, when building a sentiment analysis model of
users' tweets, we can not only use the raw text in the tweets but also
other features such as the topic of the tweet and the user profile. In
the following, we will investigate different ways to ensemble the
state-of-the-art (pretrained) language models in AutoGluon
TextPrediction with all the other models used in AutoGluon's
TabularPredictor. For more details about the inner-working of the neural
network architecture used in AutoGluon TextPrediction, you may refer to
Section ":ref:`sec_textprediction_architecture`" in
:ref:`sec_textprediction_heterogeneous`.

.. code:: python

    %matplotlib inline
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import pprint
    import random
    from autogluon.text import TextPrediction
    from autogluon.tabular import TabularPredictor
    import mxnet as mx
    
    np.random.seed(123)
    random.seed(123)
    mx.random.seed(123)

Product Sentiment Analysis Dataset
----------------------------------

In the following, we will use the product sentiment analysis dataset
from this `MachineHack
hackathon <https://www.machinehack.com/hackathons/product_sentiment_classification_weekend_hackathon_19/leaderboard>`__.
The goal of this task is to predict the user's sentiment towards a
product given a review that is in raw text and the product's type, e.g.,
Tablet, Mobile, etc. We have split the original training data to be 90%
for training and 10% for development.

.. code:: python

    !mkdir -p product_sentiment_machine_hack
    !wget https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/train.csv -O product_sentiment_machine_hack/train.csv
    !wget https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/dev.csv -O product_sentiment_machine_hack/dev.csv
    !wget https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/test.csv -O product_sentiment_machine_hack/test.csv


.. parsed-literal::
    :class: output

    --2021-02-23 19:24:46--  https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/train.csv
    Resolving autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)... 52.216.101.195
    Connecting to autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)|52.216.101.195|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 689486 (673K) [text/csv]
    Saving to: ‘product_sentiment_machine_hack/train.csv’
    
    product_sentiment_m 100%[===================>] 673.33K  2.09MB/s    in 0.3s    
    
    2021-02-23 19:24:46 (2.09 MB/s) - ‘product_sentiment_machine_hack/train.csv’ saved [689486/689486]
    
    --2021-02-23 19:24:47--  https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/dev.csv
    Resolving autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)... 52.216.101.195
    Connecting to autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)|52.216.101.195|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 75517 (74K) [text/csv]
    Saving to: ‘product_sentiment_machine_hack/dev.csv’
    
    product_sentiment_m 100%[===================>]  73.75K  --.-KB/s    in 0.1s    
    
    2021-02-23 19:24:48 (508 KB/s) - ‘product_sentiment_machine_hack/dev.csv’ saved [75517/75517]
    
    --2021-02-23 19:24:48--  https://autogluon-text-data.s3.amazonaws.com/multimodal_text/machine_hack_product_sentiment/test.csv
    Resolving autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)... 52.216.101.195
    Connecting to autogluon-text-data.s3.amazonaws.com (autogluon-text-data.s3.amazonaws.com)|52.216.101.195|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 312194 (305K) [text/csv]
    Saving to: ‘product_sentiment_machine_hack/test.csv’
    
    product_sentiment_m 100%[===================>] 304.88K  1.18MB/s    in 0.3s    
    
    2021-02-23 19:24:49 (1.18 MB/s) - ‘product_sentiment_machine_hack/test.csv’ saved [312194/312194]
    

.. code:: python

    feature_columns = ['Product_Description', 'Product_Type']
    label = 'Sentiment'
    
    train_df = pd.read_csv('product_sentiment_machine_hack/train.csv')
    dev_df = pd.read_csv('product_sentiment_machine_hack/dev.csv')
    test_df = pd.read_csv('product_sentiment_machine_hack/test.csv')
    
    train_df = train_df[feature_columns + [label]]
    dev_df = dev_df[feature_columns + [label]]
    test_df = test_df[feature_columns]
    print('Number of training samples:', len(train_df))
    print('Number of dev samples:', len(dev_df))
    print('Number of test samples:', len(test_df))


.. parsed-literal::
    :class: output

    Number of training samples: 5727
    Number of dev samples: 637
    Number of test samples: 2728


There are two features in the dataset: the users' review of the product
and the product's type. Also, there are four classes and we have split
the train and dev set based on stratified sampling.

.. code:: python

    train_df


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Product_Description</th>
          <th>Product_Type</th>
          <th>Sentiment</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>Just heard that Apple is opening a store in do...</td>
          <td>2</td>
          <td>3</td>
        </tr>
        <tr>
          <th>1</th>
          <td>Tristan H, apture: being fast &amp;amp; iterative ...</td>
          <td>9</td>
          <td>2</td>
        </tr>
        <tr>
          <th>2</th>
          <td>Hey, you lucky dogs at #SXSW with iPads -- che...</td>
          <td>6</td>
          <td>3</td>
        </tr>
        <tr>
          <th>3</th>
          <td>RT @mention THIS was the best thing I saw at #...</td>
          <td>9</td>
          <td>2</td>
        </tr>
        <tr>
          <th>4</th>
          <td>Apple is opening temp retail store in Austin t...</td>
          <td>2</td>
          <td>3</td>
        </tr>
        <tr>
          <th>...</th>
          <td>...</td>
          <td>...</td>
          <td>...</td>
        </tr>
        <tr>
          <th>5722</th>
          <td>RT @mention At #SXSW and want to win an iPad? ...</td>
          <td>9</td>
          <td>2</td>
        </tr>
        <tr>
          <th>5723</th>
          <td>RT @mention I mean, sliced bread is great. But...</td>
          <td>3</td>
          <td>3</td>
        </tr>
        <tr>
          <th>5724</th>
          <td>Apple cited as the opposite of crowdsourcing -...</td>
          <td>2</td>
          <td>1</td>
        </tr>
        <tr>
          <th>5725</th>
          <td>Good CNN article on why #SXSW is important to ...</td>
          <td>7</td>
          <td>3</td>
        </tr>
        <tr>
          <th>5726</th>
          <td>ÛÏ@mention Google to Launch Major New Social ...</td>
          <td>3</td>
          <td>3</td>
        </tr>
      </tbody>
    </table>
    <p>5727 rows × 3 columns</p>
    </div>


.. code:: python

    dev_df


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Product_Description</th>
          <th>Product_Type</th>
          <th>Sentiment</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>Do it. RT @mention Come party w/ Google tonigh...</td>
          <td>3</td>
          <td>3</td>
        </tr>
        <tr>
          <th>1</th>
          <td>Line for iPads at #SXSW. Doesn't look too bad!...</td>
          <td>6</td>
          <td>3</td>
        </tr>
        <tr>
          <th>2</th>
          <td>First up: iPad Design Headaches (2 Tablets, Ca...</td>
          <td>6</td>
          <td>2</td>
        </tr>
        <tr>
          <th>3</th>
          <td>#SXSW: Mint Talks Mobile App Development Chall...</td>
          <td>9</td>
          <td>2</td>
        </tr>
        <tr>
          <th>4</th>
          <td>ÛÏ@mention Apple store downtown Austin open t...</td>
          <td>9</td>
          <td>2</td>
        </tr>
        <tr>
          <th>...</th>
          <td>...</td>
          <td>...</td>
          <td>...</td>
        </tr>
        <tr>
          <th>632</th>
          <td>Bet on a GoogleBuzz-like #fail. People don't c...</td>
          <td>9</td>
          <td>0</td>
        </tr>
        <tr>
          <th>633</th>
          <td>RT &amp;gt; @mention Guy gets tattoo at SXSW so he...</td>
          <td>9</td>
          <td>2</td>
        </tr>
        <tr>
          <th>634</th>
          <td>#austinites #sxsw and check it out on #iphone ...</td>
          <td>9</td>
          <td>2</td>
        </tr>
        <tr>
          <th>635</th>
          <td>New @mention for iPhone+Android.. No more serv...</td>
          <td>0</td>
          <td>3</td>
        </tr>
        <tr>
          <th>636</th>
          <td>Why isn't news industry spending more R&amp;amp;D?...</td>
          <td>9</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    <p>637 rows × 3 columns</p>
    </div>


.. code:: python

    test_df


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Product_Description</th>
          <th>Product_Type</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>RT @mention Going to #SXSW? The new iPhone gui...</td>
          <td>7</td>
        </tr>
        <tr>
          <th>1</th>
          <td>RT @mention 95% of iPhone and Droid apps have ...</td>
          <td>9</td>
        </tr>
        <tr>
          <th>2</th>
          <td>RT @mention Thank you to @mention for letting ...</td>
          <td>9</td>
        </tr>
        <tr>
          <th>3</th>
          <td>#Thanks @mention we're lovin' the @mention app...</td>
          <td>7</td>
        </tr>
        <tr>
          <th>4</th>
          <td>At #sxsw? @mention / @mention wanna buy you a ...</td>
          <td>9</td>
        </tr>
        <tr>
          <th>...</th>
          <td>...</td>
          <td>...</td>
        </tr>
        <tr>
          <th>2723</th>
          <td>RT @mention eww and LOL. RT @mention Just saw ...</td>
          <td>9</td>
        </tr>
        <tr>
          <th>2724</th>
          <td>Free 22 track #sxsw sampler album on iTunes. #...</td>
          <td>9</td>
        </tr>
        <tr>
          <th>2725</th>
          <td>Setting up for the Google #gsdm  #sxsw party. ...</td>
          <td>3</td>
        </tr>
        <tr>
          <th>2726</th>
          <td>RT @mention #SXSW Come see Bitbop in Austin #g...</td>
          <td>9</td>
        </tr>
        <tr>
          <th>2727</th>
          <td>So many Google products. isn't it time to  tra...</td>
          <td>5</td>
        </tr>
      </tbody>
    </table>
    <p>2728 rows × 2 columns</p>
    </div>


What happens if we ignore all the non-text features?
----------------------------------------------------

First of all, let's try to ignore all the non-text features. We will use
the TextPrediction model in AutoGluon to train a predictor with text
data only. This will internally use the ELECTRA-small model as the
backbone. As we can see, the result is not very good.

.. code:: python

    predictor_text_only = TextPrediction.fit(train_df[['Product_Description', 'Sentiment']],
                                             label=label,
                                             time_limits=None,
                                             ngpus_per_trial=1,
                                             hyperparameters='default_no_hpo',
                                             eval_metric='accuracy',
                                             stopping_metric='accuracy',
                                             output_directory='ag_text_only')


.. parsed-literal::
    :class: output

    2021-02-23 19:24:49,386 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ag_text_only/ag_text_prediction.log
    All Logs will be saved to ag_text_only/ag_text_prediction.log
    2021-02-23 19:24:49,404 - autogluon.text.text_prediction.text_prediction - INFO - Train Dataset:
    Train Dataset:
    2021-02-23 19:24:49,405 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=4581/0
       length, min/avg/max=11/104.81707050862258/170
    )
    - Categorical(
       name="Sentiment"
       #total/missing=4581/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[87, 280, 2721, 1493]
    )
    
    
    Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=4581/0
       length, min/avg/max=11/104.81707050862258/170
    )
    - Categorical(
       name="Sentiment"
       #total/missing=4581/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[87, 280, 2721, 1493]
    )
    
    
    2021-02-23 19:24:49,406 - autogluon.text.text_prediction.text_prediction - INFO - Tuning Dataset:
    Tuning Dataset:
    2021-02-23 19:24:49,407 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=1146/0
       length, min/avg/max=29/104.95986038394415/178
    )
    - Categorical(
       name="Sentiment"
       #total/missing=1146/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[13, 79, 667, 387]
    )
    
    
    Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=1146/0
       length, min/avg/max=29/104.95986038394415/178
    )
    - Categorical(
       name="Sentiment"
       #total/missing=1146/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[13, 79, 667, 387]
    )
    
    
    WARNING: changing multiprocessing start method to forkserver
    2021-02-23 19:24:49,415 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ag_text_only/main.log
    All Logs will be saved to ag_text_only/main.log


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    2021-02-23 19:26:08,587 - autogluon.text.text_prediction.text_prediction - INFO - Results=
    Results=
    2021-02-23 19:26:08,589 - autogluon.text.text_prediction.text_prediction - INFO - Best_config={'search_space▁optimization.lr': 5e-05}
    Best_config={'search_space▁optimization.lr': 5e-05}


.. parsed-literal::
    :class: output

    (task:0)	2021-02-23 19:24:52,491 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_text_only/task0/training.log
    2021-02-23 19:24:52,491 - root - INFO - learning:
      early_stopping_patience: 10
      log_metrics: auto
      stop_metric: auto
      valid_ratio: 0.15
    misc:
      exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/ag_text_only/task0
      seed: 123
    model:
      backbone:
        name: google_electra_small
      network:
        agg_net:
          activation: tanh
          agg_type: concat
          data_dropout: False
          dropout: 0.1
          feature_proj_num_layers: -1
          initializer:
            bias: ['zeros']
            weight: ['xavier', 'uniform', 'avg', 3.0]
          mid_units: 256
          norm_eps: 1e-05
          normalization: layer_norm
          out_proj_num_layers: 0
        categorical_net:
          activation: leaky
          data_dropout: False
          dropout: 0.1
          emb_units: 32
          initializer:
            bias: ['zeros']
            embed: ['xavier', 'gaussian', 'in', 1.0]
            weight: ['xavier', 'uniform', 'avg', 3.0]
          mid_units: 64
          norm_eps: 1e-05
          normalization: layer_norm
          num_layers: 1
        feature_units: -1
        initializer:
          bias: ['zeros']
          weight: ['truncnorm', 0, 0.02]
        numerical_net:
          activation: leaky
          data_dropout: False
          dropout: 0.1
          initializer:
            bias: ['zeros']
            weight: ['xavier', 'uniform', 'avg', 3.0]
          input_centering: False
          mid_units: 128
          norm_eps: 1e-05
          normalization: layer_norm
          num_layers: 1
        text_net:
          pool_type: cls
          use_segment_id: True
      preprocess:
        max_length: 128
        merge_text: True
    optimization:
      batch_size: 32
      begin_lr: 0.0
      final_lr: 0.0
      layerwise_lr_decay: 0.8
      log_frequency: 0.1
      lr: 5e-05
      lr_scheduler: triangular
      max_grad_norm: 1.0
      model_average: 5
      num_train_epochs: 4
      optimizer: adamw
      optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
      per_device_batch_size: 16
      val_batch_size_mult: 2
      valid_frequency: 0.1
      warmup_portion: 0.1
      wd: 0.01
    version: 1
    2021-02-23 19:24:52,645 - root - INFO - Process training set...
    2021-02-23 19:24:56,192 - root - INFO - Done!
    2021-02-23 19:24:56,192 - root - INFO - Process dev set...
    2021-02-23 19:24:58,906 - root - INFO - Done!
    2021-02-23 19:25:04,337 - root - INFO - #Total Params/Fixed Params=13484036/0
    2021-02-23 19:25:04,352 - root - INFO - Using gradient accumulation. Global batch size = 32
    2021-02-23 19:25:06,689 - root - INFO - [Iter 15/572, Epoch 0] train loss=8.3855e-01, gnorm=1.2309e+01, lr=1.3158e-05, #samples processed=720, #sample per second=314.10
    2021-02-23 19:25:07,481 - root - INFO - [Iter 15/572, Epoch 0] valid accuracy=5.8028e-01, log_loss=1.1171e+00, accuracy=5.8028e-01, time spent=0.715s, total_time=0.05min
    2021-02-23 19:25:08,990 - root - INFO - [Iter 30/572, Epoch 0] train loss=6.4969e-01, gnorm=5.2289e+00, lr=2.6316e-05, #samples processed=720, #sample per second=312.86
    2021-02-23 19:25:09,823 - root - INFO - [Iter 30/572, Epoch 0] valid accuracy=5.8115e-01, log_loss=9.4872e-01, accuracy=5.8115e-01, time spent=0.695s, total_time=0.09min
    2021-02-23 19:25:11,273 - root - INFO - [Iter 45/572, Epoch 0] train loss=6.7008e-01, gnorm=7.3572e+00, lr=3.9474e-05, #samples processed=720, #sample per second=315.50
    2021-02-23 19:25:12,108 - root - INFO - [Iter 45/572, Epoch 0] valid accuracy=6.0384e-01, log_loss=8.8923e-01, accuracy=6.0384e-01, time spent=0.703s, total_time=0.13min
    2021-02-23 19:25:13,588 - root - INFO - [Iter 60/572, Epoch 0] train loss=6.6470e-01, gnorm=4.5690e+00, lr=4.9709e-05, #samples processed=720, #sample per second=311.03
    2021-02-23 19:25:14,416 - root - INFO - [Iter 60/572, Epoch 0] valid accuracy=6.2391e-01, log_loss=9.0283e-01, accuracy=6.2391e-01, time spent=0.696s, total_time=0.17min
    2021-02-23 19:25:15,839 - root - INFO - [Iter 75/572, Epoch 0] train loss=6.2934e-01, gnorm=4.4687e+00, lr=4.8252e-05, #samples processed=720, #sample per second=319.82
    2021-02-23 19:25:16,550 - root - INFO - [Iter 75/572, Epoch 0] valid accuracy=6.1257e-01, log_loss=8.9348e-01, accuracy=6.1257e-01, time spent=0.711s, total_time=0.20min
    2021-02-23 19:25:17,929 - root - INFO - [Iter 90/572, Epoch 0] train loss=6.5662e-01, gnorm=5.8075e+00, lr=4.6796e-05, #samples processed=720, #sample per second=344.50
    2021-02-23 19:25:18,641 - root - INFO - [Iter 90/572, Epoch 0] valid accuracy=6.1431e-01, log_loss=8.4047e-01, accuracy=6.1431e-01, time spent=0.711s, total_time=0.24min
    2021-02-23 19:25:20,112 - root - INFO - [Iter 105/572, Epoch 0] train loss=6.1765e-01, gnorm=4.4335e+00, lr=4.5340e-05, #samples processed=720, #sample per second=329.88
    2021-02-23 19:25:20,960 - root - INFO - [Iter 105/572, Epoch 0] valid accuracy=6.3264e-01, log_loss=8.3699e-01, accuracy=6.3264e-01, time spent=0.705s, total_time=0.28min
    2021-02-23 19:25:22,380 - root - INFO - [Iter 120/572, Epoch 0] train loss=5.8680e-01, gnorm=6.0388e+00, lr=4.3883e-05, #samples processed=720, #sample per second=317.53
    2021-02-23 19:25:23,089 - root - INFO - [Iter 120/572, Epoch 0] valid accuracy=6.1082e-01, log_loss=8.6694e-01, accuracy=6.1082e-01, time spent=0.709s, total_time=0.31min
    2021-02-23 19:25:24,448 - root - INFO - [Iter 135/572, Epoch 0] train loss=6.5235e-01, gnorm=5.0395e+00, lr=4.2427e-05, #samples processed=720, #sample per second=348.19
    2021-02-23 19:25:25,287 - root - INFO - [Iter 135/572, Epoch 0] valid accuracy=6.5096e-01, log_loss=8.1716e-01, accuracy=6.5096e-01, time spent=0.709s, total_time=0.35min
    2021-02-23 19:25:26,702 - root - INFO - [Iter 150/572, Epoch 1] train loss=6.0744e-01, gnorm=8.6245e+00, lr=4.0971e-05, #samples processed=698, #sample per second=309.71
    2021-02-23 19:25:27,411 - root - INFO - [Iter 150/572, Epoch 1] valid accuracy=6.4834e-01, log_loss=8.0773e-01, accuracy=6.4834e-01, time spent=0.709s, total_time=0.38min
    2021-02-23 19:25:28,826 - root - INFO - [Iter 165/572, Epoch 1] train loss=5.5537e-01, gnorm=9.9330e+00, lr=3.9515e-05, #samples processed=720, #sample per second=338.93
    2021-02-23 19:25:29,663 - root - INFO - [Iter 165/572, Epoch 1] valid accuracy=6.5445e-01, log_loss=8.1417e-01, accuracy=6.5445e-01, time spent=0.704s, total_time=0.42min
    2021-02-23 19:25:31,099 - root - INFO - [Iter 180/572, Epoch 1] train loss=5.6724e-01, gnorm=4.1054e+00, lr=3.8058e-05, #samples processed=720, #sample per second=316.83
    2021-02-23 19:25:31,810 - root - INFO - [Iter 180/572, Epoch 1] valid accuracy=6.5009e-01, log_loss=8.0914e-01, accuracy=6.5009e-01, time spent=0.711s, total_time=0.46min
    2021-02-23 19:25:33,226 - root - INFO - [Iter 195/572, Epoch 1] train loss=5.9127e-01, gnorm=1.0215e+01, lr=3.6602e-05, #samples processed=720, #sample per second=338.49
    2021-02-23 19:25:33,932 - root - INFO - [Iter 195/572, Epoch 1] valid accuracy=6.5271e-01, log_loss=7.9851e-01, accuracy=6.5271e-01, time spent=0.706s, total_time=0.49min
    2021-02-23 19:25:35,342 - root - INFO - [Iter 210/572, Epoch 1] train loss=5.3871e-01, gnorm=4.3828e+00, lr=3.5146e-05, #samples processed=720, #sample per second=340.33
    2021-02-23 19:25:36,061 - root - INFO - [Iter 210/572, Epoch 1] valid accuracy=6.3874e-01, log_loss=8.1257e-01, accuracy=6.3874e-01, time spent=0.718s, total_time=0.53min
    2021-02-23 19:25:37,499 - root - INFO - [Iter 225/572, Epoch 1] train loss=5.3807e-01, gnorm=5.2270e+00, lr=3.3689e-05, #samples processed=720, #sample per second=333.78
    2021-02-23 19:25:38,217 - root - INFO - [Iter 225/572, Epoch 1] valid accuracy=6.4660e-01, log_loss=7.9136e-01, accuracy=6.4660e-01, time spent=0.717s, total_time=0.56min
    2021-02-23 19:25:39,636 - root - INFO - [Iter 240/572, Epoch 1] train loss=5.8410e-01, gnorm=6.8311e+00, lr=3.2233e-05, #samples processed=720, #sample per second=336.94
    2021-02-23 19:25:40,466 - root - INFO - [Iter 240/572, Epoch 1] valid accuracy=6.6841e-01, log_loss=7.7052e-01, accuracy=6.6841e-01, time spent=0.700s, total_time=0.60min
    2021-02-23 19:25:41,844 - root - INFO - [Iter 255/572, Epoch 1] train loss=5.3738e-01, gnorm=5.7739e+00, lr=3.0777e-05, #samples processed=720, #sample per second=326.23
    2021-02-23 19:25:42,544 - root - INFO - [Iter 255/572, Epoch 1] valid accuracy=6.6754e-01, log_loss=7.8309e-01, accuracy=6.6754e-01, time spent=0.700s, total_time=0.64min
    2021-02-23 19:25:43,952 - root - INFO - [Iter 270/572, Epoch 1] train loss=5.0632e-01, gnorm=5.0476e+00, lr=2.9320e-05, #samples processed=720, #sample per second=341.52
    2021-02-23 19:25:44,676 - root - INFO - [Iter 270/572, Epoch 1] valid accuracy=6.5794e-01, log_loss=7.8638e-01, accuracy=6.5794e-01, time spent=0.724s, total_time=0.67min
    2021-02-23 19:25:46,088 - root - INFO - [Iter 285/572, Epoch 1] train loss=5.3142e-01, gnorm=5.6990e+00, lr=2.7864e-05, #samples processed=720, #sample per second=337.05
    2021-02-23 19:25:46,932 - root - INFO - [Iter 285/572, Epoch 1] valid accuracy=6.7016e-01, log_loss=7.8180e-01, accuracy=6.7016e-01, time spent=0.709s, total_time=0.71min
    2021-02-23 19:25:48,345 - root - INFO - [Iter 300/572, Epoch 2] train loss=5.3040e-01, gnorm=9.8988e+00, lr=2.6408e-05, #samples processed=709, #sample per second=314.15
    2021-02-23 19:25:49,061 - root - INFO - [Iter 300/572, Epoch 2] valid accuracy=6.6754e-01, log_loss=7.6951e-01, accuracy=6.6754e-01, time spent=0.715s, total_time=0.74min
    2021-02-23 19:25:50,485 - root - INFO - [Iter 315/572, Epoch 2] train loss=5.1332e-01, gnorm=1.2150e+01, lr=2.4951e-05, #samples processed=720, #sample per second=336.58
    2021-02-23 19:25:51,199 - root - INFO - [Iter 315/572, Epoch 2] valid accuracy=6.5620e-01, log_loss=7.7306e-01, accuracy=6.5620e-01, time spent=0.713s, total_time=0.78min
    2021-02-23 19:25:52,606 - root - INFO - [Iter 330/572, Epoch 2] train loss=5.3934e-01, gnorm=6.6843e+00, lr=2.3495e-05, #samples processed=720, #sample per second=339.58
    2021-02-23 19:25:53,322 - root - INFO - [Iter 330/572, Epoch 2] valid accuracy=6.4572e-01, log_loss=7.8320e-01, accuracy=6.4572e-01, time spent=0.716s, total_time=0.82min
    2021-02-23 19:25:54,722 - root - INFO - [Iter 345/572, Epoch 2] train loss=4.8354e-01, gnorm=6.3392e+00, lr=2.2039e-05, #samples processed=720, #sample per second=340.32
    2021-02-23 19:25:55,428 - root - INFO - [Iter 345/572, Epoch 2] valid accuracy=6.4834e-01, log_loss=8.2002e-01, accuracy=6.4834e-01, time spent=0.706s, total_time=0.85min
    2021-02-23 19:25:56,849 - root - INFO - [Iter 360/572, Epoch 2] train loss=5.3423e-01, gnorm=5.4433e+00, lr=2.0583e-05, #samples processed=720, #sample per second=338.53
    2021-02-23 19:25:57,560 - root - INFO - [Iter 360/572, Epoch 2] valid accuracy=6.5881e-01, log_loss=7.6452e-01, accuracy=6.5881e-01, time spent=0.711s, total_time=0.89min
    2021-02-23 19:25:58,981 - root - INFO - [Iter 375/572, Epoch 2] train loss=5.3697e-01, gnorm=6.2103e+00, lr=1.9126e-05, #samples processed=720, #sample per second=337.70
    2021-02-23 19:25:59,709 - root - INFO - [Iter 375/572, Epoch 2] valid accuracy=6.2042e-01, log_loss=8.2208e-01, accuracy=6.2042e-01, time spent=0.727s, total_time=0.92min
    2021-02-23 19:26:01,114 - root - INFO - [Iter 390/572, Epoch 2] train loss=5.7040e-01, gnorm=6.7707e+00, lr=1.7670e-05, #samples processed=720, #sample per second=337.59
    2021-02-23 19:26:01,836 - root - INFO - [Iter 390/572, Epoch 2] valid accuracy=6.5620e-01, log_loss=7.8493e-01, accuracy=6.5620e-01, time spent=0.722s, total_time=0.96min
    2021-02-23 19:26:03,272 - root - INFO - [Iter 405/572, Epoch 2] train loss=5.2676e-01, gnorm=8.0785e+00, lr=1.6214e-05, #samples processed=720, #sample per second=333.67
    2021-02-23 19:26:03,999 - root - INFO - [Iter 405/572, Epoch 2] valid accuracy=6.5096e-01, log_loss=7.7680e-01, accuracy=6.5096e-01, time spent=0.726s, total_time=0.99min
    2021-02-23 19:26:05,402 - root - INFO - [Iter 420/572, Epoch 2] train loss=5.1999e-01, gnorm=1.1199e+01, lr=1.4757e-05, #samples processed=720, #sample per second=338.01
    2021-02-23 19:26:06,132 - root - INFO - [Iter 420/572, Epoch 2] valid accuracy=6.6405e-01, log_loss=7.6699e-01, accuracy=6.6405e-01, time spent=0.730s, total_time=1.03min
    2021-02-23 19:26:07,538 - root - INFO - [Iter 435/572, Epoch 3] train loss=5.4638e-01, gnorm=9.2815e+00, lr=1.3301e-05, #samples processed=698, #sample per second=326.79
    2021-02-23 19:26:08,272 - root - INFO - [Iter 435/572, Epoch 3] valid accuracy=6.6143e-01, log_loss=7.6100e-01, accuracy=6.6143e-01, time spent=0.734s, total_time=1.06min
    2021-02-23 19:26:08,276 - root - INFO - Early stopping patience reached!


.. code:: python

    print(predictor_text_only.evaluate(dev_df[['Product_Description', 'Sentiment']], metrics='accuracy'))


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)


.. parsed-literal::
    :class: output

    {'accuracy': 0.6671899529042387}


Model 1: Baseline with N-Gram + TF-IDF
--------------------------------------

The first baseline model is to directly call AutoGluon's
TabularPredictor to train a predictor. TabularPredictor uses the n-gram
and TF-IDF based features for text columns and considers text and
categorical columns simultaneously.

.. code:: python

    predictor_model1 = TabularPredictor(label=label, eval_metric='accuracy', path='model1').fit(train_df)


.. parsed-literal::
    :class: output

    Beginning AutoGluon training ...
    AutoGluon will save models to "model1/"
    AutoGluon Version:  0.1.0b20210223
    Train Data Rows:    5727
    Train Data Columns: 2
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
    	4 unique label values:  [3, 2, 1, 0]
    	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    NumExpr defaulting to 8 threads.
    Train Data Class Count: 4
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    14382.9 MB
    	Train Data (Original)  Memory Usage: 1.0 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    		Fitting TextSpecialFeatureGenerator...
    			Fitting BinnedFeatureGenerator...
    			Fitting DropDuplicatesFeatureGenerator...
    		Fitting TextNgramFeatureGenerator...
    			Fitting CountVectorizer for text features: ['Product_Description']
    			CountVectorizer fit with vocabulary size = 725
    		Warning: Due to memory constraints, ngram feature count is being reduced. Allocate more memory to maximize model quality.
    		Reducing Vectorizer vocab size from 725 to 354 to avoid OOM error
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])          : 1 | ['Product_Type']
    		('object', ['text']) : 1 | ['Product_Description']
    	Types of features in processed data (raw dtype, special dtypes):
    		('int', [])                         :   1 | ['Product_Type']
    		('int', ['binned', 'text_special']) :  38 | ['Product_Description.char_count', 'Product_Description.word_count', 'Product_Description.capital_ratio', 'Product_Description.lower_ratio', 'Product_Description.digit_ratio', ...]
    		('int', ['text_ngram'])             : 355 | ['__nlp__.10', '__nlp__.11', '__nlp__.6th', '__nlp__.about', '__nlp__.all', ...]
    	2.1s = Fit runtime
    	2 features in original data used to generate 394 features in processed data.
    	Train Data (Processed) Memory Usage: 2.31 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 2.16s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric argument of fit()
    Automatically generating train/validation split with holdout_frac=0.1, Train Rows: 5154, Val Rows: 573
    Fitting model: NeuralNetMXNet ...
    	0.8726	 = Validation accuracy score
    	4.1s	 = Training runtime
    	0.03s	 = Validation runtime
    Fitting model: NeuralNetFastAI ...
    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
      return torch._C._cuda_getDeviceCount() > 0


.. parsed-literal::
    :class: output

    █

.. parsed-literal::
    :class: output

    	0.8482	 = Validation accuracy score
    	12.06s	 = Training runtime
    	0.36s	 = Validation runtime
    Fitting model: KNeighborsUnif ...
    	0.8534	 = Validation accuracy score
    	0.02s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: KNeighborsDist ...
    	0.8534	 = Validation accuracy score
    	0.02s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: RandomForestGini ...
    	0.8709	 = Validation accuracy score
    	1.03s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: RandomForestEntr ...
    	0.8709	 = Validation accuracy score
    	1.04s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: ExtraTreesGini ...
    	0.8464	 = Validation accuracy score
    	1.15s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: ExtraTreesEntr ...
    	0.8464	 = Validation accuracy score
    	1.15s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: LightGBM ...
    	0.8831	 = Validation accuracy score
    	1.08s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMXT ...
    	0.8534	 = Validation accuracy score
    	1.19s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.8726	 = Validation accuracy score
    	1.04s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: XGBoost ...
    	0.8778	 = Validation accuracy score
    	1.84s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: LightGBMLarge ...
    	0.8813	 = Validation accuracy score
    	3.45s	 = Training runtime
    	0.01s	 = Validation runtime


.. parsed-literal::
    :class: output

    █

.. parsed-literal::
    :class: output

    Fitting model: WeightedEnsemble_L1 ...
    	0.8883	 = Validation accuracy score
    	0.37s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 44.09s ...
    TabularPredictor saved. To load, use: TabularPredictor.load("model1/")


.. code:: python

    predictor_model1.leaderboard(dev_df, silent=True)


.. parsed-literal::
    :class: output

    █


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>WeightedEnsemble_L1</td>
          <td>0.890110</td>
          <td>0.888307</td>
          <td>10.636745</td>
          <td>0.579389</td>
          <td>20.212959</td>
          <td>0.005916</td>
          <td>0.000465</td>
          <td>0.374030</td>
          <td>1</td>
          <td>True</td>
          <td>14</td>
        </tr>
        <tr>
          <th>1</th>
          <td>LightGBMLarge</td>
          <td>0.886970</td>
          <td>0.881326</td>
          <td>0.015931</td>
          <td>0.006296</td>
          <td>3.445040</td>
          <td>0.015931</td>
          <td>0.006296</td>
          <td>3.445040</td>
          <td>0</td>
          <td>True</td>
          <td>13</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CatBoost</td>
          <td>0.886970</td>
          <td>0.872600</td>
          <td>0.018929</td>
          <td>0.006263</td>
          <td>1.039262</td>
          <td>0.018929</td>
          <td>0.006263</td>
          <td>1.039262</td>
          <td>0</td>
          <td>True</td>
          <td>11</td>
        </tr>
        <tr>
          <th>3</th>
          <td>RandomForestGini</td>
          <td>0.886970</td>
          <td>0.870855</td>
          <td>0.115808</td>
          <td>0.077118</td>
          <td>1.025226</td>
          <td>0.115808</td>
          <td>0.077118</td>
          <td>1.025226</td>
          <td>0</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>4</th>
          <td>XGBoost</td>
          <td>0.885400</td>
          <td>0.877836</td>
          <td>0.079946</td>
          <td>0.019477</td>
          <td>1.838979</td>
          <td>0.079946</td>
          <td>0.019477</td>
          <td>1.838979</td>
          <td>0</td>
          <td>True</td>
          <td>12</td>
        </tr>
        <tr>
          <th>5</th>
          <td>RandomForestEntr</td>
          <td>0.885400</td>
          <td>0.870855</td>
          <td>0.118663</td>
          <td>0.075674</td>
          <td>1.039675</td>
          <td>0.118663</td>
          <td>0.075674</td>
          <td>1.039675</td>
          <td>0</td>
          <td>True</td>
          <td>6</td>
        </tr>
        <tr>
          <th>6</th>
          <td>KNeighborsUnif</td>
          <td>0.883830</td>
          <td>0.853403</td>
          <td>0.032106</td>
          <td>0.019376</td>
          <td>0.019685</td>
          <td>0.032106</td>
          <td>0.019376</td>
          <td>0.019685</td>
          <td>0</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>7</th>
          <td>KNeighborsDist</td>
          <td>0.883830</td>
          <td>0.853403</td>
          <td>0.041204</td>
          <td>0.019170</td>
          <td>0.019528</td>
          <td>0.041204</td>
          <td>0.019170</td>
          <td>0.019528</td>
          <td>0</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>8</th>
          <td>LightGBM</td>
          <td>0.882261</td>
          <td>0.883072</td>
          <td>0.011820</td>
          <td>0.006165</td>
          <td>1.076681</td>
          <td>0.011820</td>
          <td>0.006165</td>
          <td>1.076681</td>
          <td>0</td>
          <td>True</td>
          <td>9</td>
        </tr>
        <tr>
          <th>9</th>
          <td>NeuralNetMXNet</td>
          <td>0.877551</td>
          <td>0.872600</td>
          <td>0.046874</td>
          <td>0.034172</td>
          <td>4.098654</td>
          <td>0.046874</td>
          <td>0.034172</td>
          <td>4.098654</td>
          <td>0</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>10</th>
          <td>LightGBMXT</td>
          <td>0.869702</td>
          <td>0.853403</td>
          <td>0.013507</td>
          <td>0.008016</td>
          <td>1.188565</td>
          <td>0.013507</td>
          <td>0.008016</td>
          <td>1.188565</td>
          <td>0</td>
          <td>True</td>
          <td>10</td>
        </tr>
        <tr>
          <th>11</th>
          <td>ExtraTreesEntr</td>
          <td>0.868132</td>
          <td>0.846422</td>
          <td>0.150189</td>
          <td>0.080109</td>
          <td>1.153447</td>
          <td>0.150189</td>
          <td>0.080109</td>
          <td>1.153447</td>
          <td>0</td>
          <td>True</td>
          <td>8</td>
        </tr>
        <tr>
          <th>12</th>
          <td>ExtraTreesGini</td>
          <td>0.866562</td>
          <td>0.846422</td>
          <td>0.175136</td>
          <td>0.079671</td>
          <td>1.148813</td>
          <td>0.175136</td>
          <td>0.079671</td>
          <td>1.148813</td>
          <td>0</td>
          <td>True</td>
          <td>7</td>
        </tr>
        <tr>
          <th>13</th>
          <td>NeuralNetFastAI</td>
          <td>0.854003</td>
          <td>0.848168</td>
          <td>10.244841</td>
          <td>0.364427</td>
          <td>12.060060</td>
          <td>10.244841</td>
          <td>0.364427</td>
          <td>12.060060</td>
          <td>0</td>
          <td>True</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    </div>


We can find that using product type (a categorical column) is quite
essential for good performance in this task. The accuracy is much higher
than the model trained with only text column.

Model 2: Extract Text Embedding and Use Tabular Predictor
---------------------------------------------------------

Our second attempt in combining text and other features is to use the
trained TextPrediction model to extract embeddings and use
TabularPredictor to build the predictor on top of the text embeddings.
The AutoGluon TextPrediction model offers the ``extract_embedding()``
functionality (For more details, go to
:ref:`sec_textprediction_extract_embedding`), so we are able to build
a two-stage model. In the first stage, we use the text-only model to
extract sentence embeddings. In the second stage, we use
TabularPredictor to get the final model.

.. code:: python

    train_sentence_embeddings = predictor_text_only.extract_embedding(train_df)
    dev_sentence_embeddings = predictor_text_only.extract_embedding(dev_df)
    print(train_sentence_embeddings)


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/text/src/autogluon/text/text_prediction/dataset.py:321: SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead
    
    See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
      df[col_name] = df[col_name].fillna('').apply(str)
    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)


.. parsed-literal::
    :class: output

    [[-0.061683  0.789052 -0.614252 -0.383968 ... -0.202426  1.144868  0.039427 -0.13562 ]
     [-0.269277  0.177113 -0.197375 -0.172229 ... -0.584261  0.625235 -0.355088  0.211777]
     [-0.83451   0.264692 -0.687199  0.234191 ... -0.605356  0.332709  0.029832 -0.160492]
     [-0.271491  0.149054 -0.506492 -0.09476  ... -0.809414  0.29643   0.31992  -0.096194]
     ...
     [-0.054611 -0.060668 -0.49929  -0.170906 ... -0.284143  0.805138  0.430891 -0.191869]
     [-0.277809  0.150503 -0.625322 -0.241075 ... -0.525608  0.909708 -0.124487 -0.031551]
     [-0.64508  -0.07616  -0.567146  0.192171 ... -1.166889  0.589877  0.242167 -0.549045]
     [-0.837447  0.347837 -0.525436 -0.440289 ... -0.742066  0.565927 -0.054493 -0.411046]]


.. code:: python

    merged_train_data = train_df.join(pd.DataFrame(train_sentence_embeddings))
    merged_dev_data = dev_df.join(pd.DataFrame(dev_sentence_embeddings))
    print(merged_train_data)


.. parsed-literal::
    :class: output

                                        Product_Description  Product_Type  \
    0     Just heard that Apple is opening a store in do...             2   
    1     Tristan H, apture: being fast &amp; iterative ...             9   
    2     Hey, you lucky dogs at #SXSW with iPads -- che...             6   
    3     RT @mention THIS was the best thing I saw at #...             9   
    4     Apple is opening temp retail store in Austin t...             2   
    ...                                                 ...           ...   
    5722  RT @mention At #SXSW and want to win an iPad? ...             9   
    5723  RT @mention I mean, sliced bread is great. But...             3   
    5724  Apple cited as the opposite of crowdsourcing -...             2   
    5725  Good CNN article on why #SXSW is important to ...             7   
    5726  ÛÏ@mention Google to Launch Major New Social ...             3   
    
          Sentiment         0         1         2         3         4         5  \
    0             3 -0.061683  0.789052 -0.614252 -0.383968  0.794183 -0.581301   
    1             2 -0.269277  0.177113 -0.197375 -0.172229  0.547932 -0.265157   
    2             3 -0.834510  0.264692 -0.687199  0.234191  1.018778 -0.689753   
    3             2 -0.271491  0.149054 -0.506492 -0.094760  0.644704 -0.521082   
    4             3  0.180634  0.237529 -0.668062 -0.119891  0.387544 -0.314172   
    ...         ...       ...       ...       ...       ...       ...       ...   
    5722          2 -0.771102  0.284567 -0.285301 -0.168485  0.645094  0.036831   
    5723          3 -0.054611 -0.060668 -0.499290 -0.170906 -0.367915  0.331775   
    5724          1 -0.277809  0.150503 -0.625322 -0.241075  0.157916  0.060280   
    5725          3 -0.645080 -0.076160 -0.567146  0.192171  0.524227 -0.318997   
    5726          3 -0.837447  0.347837 -0.525436 -0.440289  0.857342 -0.283967   
    
                 6  ...       246       247       248       249       250  \
    0     0.919014  ... -0.357193  0.390834  0.833298 -0.115630  0.786055   
    1     1.170505  ... -0.212880  0.123253  0.668844 -0.826962  0.772176   
    2     1.244796  ... -0.300159  0.729561  0.551330 -0.400327  0.671096   
    3     1.293411  ... -0.233223  0.269830  0.657665 -0.285630  0.512562   
    4     1.199535  ... -0.490884  0.116289  0.888642 -0.219426  0.773025   
    ...        ...  ...       ...       ...       ...       ...       ...   
    5722  1.063322  ... -0.032328  0.582992  0.674876 -0.196406  0.172549   
    5723  0.314198  ... -0.639878 -0.178192  0.481809 -0.700696 -0.039856   
    5724  0.656375  ... -0.459748 -0.046848  0.820173 -0.563527  0.208965   
    5725  1.467807  ... -0.601973  0.303506  0.423291 -0.275483  0.578006   
    5726  1.174435  ... -0.239029  0.465101  0.134878 -0.096706  0.547499   
    
               251       252       253       254       255  
    0     0.243626 -0.202426  1.144868  0.039427 -0.135620  
    1    -0.053033 -0.584261  0.625235 -0.355088  0.211777  
    2     0.120588 -0.605356  0.332709  0.029832 -0.160492  
    3     0.320794 -0.809414  0.296430  0.319920 -0.096194  
    4     0.546610 -0.479898  1.102872  0.085935 -0.345065  
    ...        ...       ...       ...       ...       ...  
    5722  0.049309 -0.042409  0.503684 -0.348889 -0.508032  
    5723  1.003379 -0.284143  0.805138  0.430891 -0.191869  
    5724  0.631439 -0.525608  0.909708 -0.124487 -0.031551  
    5725  0.421409 -1.166889  0.589877  0.242167 -0.549045  
    5726  0.293577 -0.742066  0.565927 -0.054493 -0.411046  
    
    [5727 rows x 259 columns]


.. code:: python

    predictor_model2 = TabularPredictor(label=label, eval_metric='accuracy', path='model2').fit(merged_train_data)


.. parsed-literal::
    :class: output

    Beginning AutoGluon training ...
    AutoGluon will save models to "model2/"
    AutoGluon Version:  0.1.0b20210223
    Train Data Rows:    5727
    Train Data Columns: 258
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
    	4 unique label values:  [3, 2, 1, 0]
    	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Train Data Class Count: 4
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    13929.9 MB
    	Train Data (Original)  Memory Usage: 6.87 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    		Fitting TextSpecialFeatureGenerator...
    			Fitting BinnedFeatureGenerator...
    			Fitting DropDuplicatesFeatureGenerator...
    		Fitting TextNgramFeatureGenerator...
    			Fitting CountVectorizer for text features: ['Product_Description']
    			CountVectorizer fit with vocabulary size = 725
    		Warning: Due to memory constraints, ngram feature count is being reduced. Allocate more memory to maximize model quality.
    		Reducing Vectorizer vocab size from 725 to 303 to avoid OOM error
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('float', [])        : 256 | ['0', '1', '2', '3', '4', ...]
    		('int', [])          :   1 | ['Product_Type']
    		('object', ['text']) :   1 | ['Product_Description']
    	Types of features in processed data (raw dtype, special dtypes):
    		('float', [])                       : 256 | ['0', '1', '2', '3', '4', ...]
    		('int', [])                         :   1 | ['Product_Type']
    		('int', ['binned', 'text_special']) :  38 | ['Product_Description.char_count', 'Product_Description.word_count', 'Product_Description.capital_ratio', 'Product_Description.lower_ratio', 'Product_Description.digit_ratio', ...]
    		('int', ['text_ngram'])             : 304 | ['__nlp__.11', '__nlp__.6th', '__nlp__.about', '__nlp__.all', '__nlp__.amp', ...]
    	2.4s = Fit runtime
    	258 features in original data used to generate 599 features in processed data.
    	Train Data (Processed) Memory Usage: 7.89 MB (0.1% of available memory)
    Data preprocessing and feature engineering runtime = 2.48s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric argument of fit()
    Automatically generating train/validation split with holdout_frac=0.1, Train Rows: 5154, Val Rows: 573
    Fitting model: NeuralNetMXNet ...
    	0.8918	 = Validation accuracy score
    	6.3s	 = Training runtime
    	0.04s	 = Validation runtime
    Fitting model: NeuralNetFastAI ...


.. parsed-literal::
    :class: output

    █

.. parsed-literal::
    :class: output

    	0.8534	 = Validation accuracy score
    	18.82s	 = Training runtime
    	0.52s	 = Validation runtime
    Fitting model: KNeighborsUnif ...
    	0.8551	 = Validation accuracy score
    	0.02s	 = Training runtime
    	0.13s	 = Validation runtime
    Fitting model: KNeighborsDist ...
    	0.8586	 = Validation accuracy score
    	0.03s	 = Training runtime
    	0.12s	 = Validation runtime
    Fitting model: RandomForestGini ...
    	0.8691	 = Validation accuracy score
    	2.91s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: RandomForestEntr ...
    	0.8551	 = Validation accuracy score
    	5.16s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: ExtraTreesGini ...
    	0.8168	 = Validation accuracy score
    	1.25s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: ExtraTreesEntr ...
    	0.8028	 = Validation accuracy score
    	1.31s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: LightGBM ...
    	0.8935	 = Validation accuracy score
    	9.69s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMXT ...
    	0.8726	 = Validation accuracy score
    	9.19s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.8883	 = Validation accuracy score
    	19.0s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: XGBoost ...
    	0.8935	 = Validation accuracy score
    	39.02s	 = Training runtime
    	0.05s	 = Validation runtime
    Fitting model: LightGBMLarge ...
    	0.8935	 = Validation accuracy score
    	57.6s	 = Training runtime
    	0.02s	 = Validation runtime


.. parsed-literal::
    :class: output

    █

.. parsed-literal::
    :class: output

    Fitting model: WeightedEnsemble_L1 ...
    	0.8988	 = Validation accuracy score
    	0.38s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 186.44s ...
    TabularPredictor saved. To load, use: TabularPredictor.load("model2/")


.. code:: python

    predictor_model2.leaderboard(merged_dev_data, silent=True)


.. parsed-literal::
    :class: output

    █


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>WeightedEnsemble_L1</td>
          <td>0.893250</td>
          <td>0.898778</td>
          <td>10.744249</td>
          <td>0.715136</td>
          <td>98.361377</td>
          <td>0.005639</td>
          <td>0.000464</td>
          <td>0.375998</td>
          <td>1</td>
          <td>True</td>
          <td>14</td>
        </tr>
        <tr>
          <th>1</th>
          <td>LightGBM</td>
          <td>0.891680</td>
          <td>0.893543</td>
          <td>0.029830</td>
          <td>0.007839</td>
          <td>9.685987</td>
          <td>0.029830</td>
          <td>0.007839</td>
          <td>9.685987</td>
          <td>0</td>
          <td>True</td>
          <td>9</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CatBoost</td>
          <td>0.886970</td>
          <td>0.888307</td>
          <td>0.028248</td>
          <td>0.012147</td>
          <td>19.002439</td>
          <td>0.028248</td>
          <td>0.012147</td>
          <td>19.002439</td>
          <td>0</td>
          <td>True</td>
          <td>11</td>
        </tr>
        <tr>
          <th>3</th>
          <td>NeuralNetMXNet</td>
          <td>0.886970</td>
          <td>0.891798</td>
          <td>0.053362</td>
          <td>0.042966</td>
          <td>6.302544</td>
          <td>0.053362</td>
          <td>0.042966</td>
          <td>6.302544</td>
          <td>0</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>4</th>
          <td>LightGBMLarge</td>
          <td>0.886970</td>
          <td>0.893543</td>
          <td>0.059007</td>
          <td>0.016558</td>
          <td>57.602369</td>
          <td>0.059007</td>
          <td>0.016558</td>
          <td>57.602369</td>
          <td>0</td>
          <td>True</td>
          <td>13</td>
        </tr>
        <tr>
          <th>5</th>
          <td>XGBoost</td>
          <td>0.886970</td>
          <td>0.893543</td>
          <td>0.160011</td>
          <td>0.054713</td>
          <td>39.018259</td>
          <td>0.160011</td>
          <td>0.054713</td>
          <td>39.018259</td>
          <td>0</td>
          <td>True</td>
          <td>12</td>
        </tr>
        <tr>
          <th>6</th>
          <td>LightGBMXT</td>
          <td>0.872841</td>
          <td>0.872600</td>
          <td>0.018458</td>
          <td>0.009431</td>
          <td>9.185145</td>
          <td>0.018458</td>
          <td>0.009431</td>
          <td>9.185145</td>
          <td>0</td>
          <td>True</td>
          <td>10</td>
        </tr>
        <tr>
          <th>7</th>
          <td>KNeighborsUnif</td>
          <td>0.855573</td>
          <td>0.855148</td>
          <td>0.153559</td>
          <td>0.128035</td>
          <td>0.021495</td>
          <td>0.153559</td>
          <td>0.128035</td>
          <td>0.021495</td>
          <td>0</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>8</th>
          <td>KNeighborsDist</td>
          <td>0.854003</td>
          <td>0.858639</td>
          <td>0.123584</td>
          <td>0.121682</td>
          <td>0.030962</td>
          <td>0.123584</td>
          <td>0.121682</td>
          <td>0.030962</td>
          <td>0</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>9</th>
          <td>NeuralNetFastAI</td>
          <td>0.850863</td>
          <td>0.853403</td>
          <td>10.367347</td>
          <td>0.518267</td>
          <td>18.815119</td>
          <td>10.367347</td>
          <td>0.518267</td>
          <td>18.815119</td>
          <td>0</td>
          <td>True</td>
          <td>2</td>
        </tr>
        <tr>
          <th>10</th>
          <td>RandomForestGini</td>
          <td>0.830455</td>
          <td>0.869110</td>
          <td>0.101238</td>
          <td>0.078978</td>
          <td>2.912344</td>
          <td>0.101238</td>
          <td>0.078978</td>
          <td>2.912344</td>
          <td>0</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>11</th>
          <td>RandomForestEntr</td>
          <td>0.808477</td>
          <td>0.855148</td>
          <td>0.099813</td>
          <td>0.078739</td>
          <td>5.161032</td>
          <td>0.099813</td>
          <td>0.078739</td>
          <td>5.161032</td>
          <td>0</td>
          <td>True</td>
          <td>6</td>
        </tr>
        <tr>
          <th>12</th>
          <td>ExtraTreesGini</td>
          <td>0.800628</td>
          <td>0.816754</td>
          <td>0.150204</td>
          <td>0.082309</td>
          <td>1.251960</td>
          <td>0.150204</td>
          <td>0.082309</td>
          <td>1.251960</td>
          <td>0</td>
          <td>True</td>
          <td>7</td>
        </tr>
        <tr>
          <th>13</th>
          <td>ExtraTreesEntr</td>
          <td>0.778650</td>
          <td>0.802792</td>
          <td>0.126574</td>
          <td>0.082750</td>
          <td>1.306403</td>
          <td>0.126574</td>
          <td>0.082750</td>
          <td>1.306403</td>
          <td>0</td>
          <td>True</td>
          <td>8</td>
        </tr>
      </tbody>
    </table>
    </div>


The performance is better than the first model.

Model 3: Use the Neural Network in AutoGluon-Text in Tabular Weighted Ensemble
------------------------------------------------------------------------------

Another option is to directly include the neural network in
AutoGluon-Text as one candidate of TabularPredictor. We can do that now
by changing the hyperparameters. Note that for the purpose of this
tutorial, we are manually setting the ``hyperparameters`` and we will
release some good pre-configurations soon.

.. code:: python

    tabular_multimodel_hparam_v1 = {
        'GBM': [{}, {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}],
        'CAT': {},
        'TEXT_NN_V1': {},
    }
    
    predictor_model3 = TabularPredictor(label=label, eval_metric='accuracy', path='model3').fit(
        train_df, hyperparameters=tabular_multimodel_hparam_v1
    )


.. parsed-literal::
    :class: output

    Beginning AutoGluon training ...
    AutoGluon will save models to "model3/"
    AutoGluon Version:  0.1.0b20210223
    Train Data Rows:    5727
    Train Data Columns: 2
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
    	4 unique label values:  [3, 2, 1, 0]
    	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Train Data Class Count: 4
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    13504.55 MB
    	Train Data (Original)  Memory Usage: 1.0 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting IdentityFeatureGenerator...
    			Fitting RenameFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    		Fitting TextSpecialFeatureGenerator...
    			Fitting BinnedFeatureGenerator...
    			Fitting DropDuplicatesFeatureGenerator...
    		Fitting TextNgramFeatureGenerator...
    			Fitting CountVectorizer for text features: ['Product_Description']
    			CountVectorizer fit with vocabulary size = 725
    		Warning: Due to memory constraints, ngram feature count is being reduced. Allocate more memory to maximize model quality.
    		Reducing Vectorizer vocab size from 725 to 271 to avoid OOM error
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])          : 1 | ['Product_Type']
    		('object', ['text']) : 1 | ['Product_Description']
    	Types of features in processed data (raw dtype, special dtypes):
    		('int', [])                         :   1 | ['Product_Type']
    		('int', ['binned', 'text_special']) :  38 | ['Product_Description.char_count', 'Product_Description.word_count', 'Product_Description.capital_ratio', 'Product_Description.lower_ratio', 'Product_Description.digit_ratio', ...]
    		('int', ['text_ngram'])             : 272 | ['__nlp__.about', '__nlp__.all', '__nlp__.amp', '__nlp__.an', '__nlp__.an ipad', ...]
    		('object', ['text'])                :   1 | ['Product_Description_raw_text']
    	2.1s = Fit runtime
    	2 features in original data used to generate 312 features in processed data.
    	Train Data (Processed) Memory Usage: 2.8 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 2.13s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric argument of fit()
    Automatically generating train/validation split with holdout_frac=0.1, Train Rows: 5154, Val Rows: 573
    Fitting model: LightGBM ...
    	0.8796	 = Validation accuracy score
    	1.02s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMXT ...
    	0.8586	 = Validation accuracy score
    	1.22s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.8726	 = Validation accuracy score
    	0.95s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: TextNeuralNetV1 ...
    All Logs will be saved to model3/models/TextNeuralNetV1/TextNeuralNetV1/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    	0.8918	 = Validation accuracy score
    	90.19s	 = Training runtime
    	0.65s	 = Validation runtime
    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    Fitting model: WeightedEnsemble_L1 ...
    	0.8935	 = Validation accuracy score
    	0.15s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 97.4s ...
    TabularPredictor saved. To load, use: TabularPredictor.load("model3/")


.. code:: python

    predictor_model3.leaderboard(dev_df, silent=True)


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>WeightedEnsemble_L1</td>
          <td>0.896389</td>
          <td>0.893543</td>
          <td>0.999798</td>
          <td>0.672808</td>
          <td>92.509537</td>
          <td>0.009143</td>
          <td>0.000531</td>
          <td>0.149596</td>
          <td>1</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>1</th>
          <td>TextNeuralNetV1</td>
          <td>0.888540</td>
          <td>0.891798</td>
          <td>0.960964</td>
          <td>0.649250</td>
          <td>90.191949</td>
          <td>0.960964</td>
          <td>0.649250</td>
          <td>90.191949</td>
          <td>0</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>2</th>
          <td>LightGBM</td>
          <td>0.886970</td>
          <td>0.879581</td>
          <td>0.008237</td>
          <td>0.006089</td>
          <td>1.020931</td>
          <td>0.008237</td>
          <td>0.006089</td>
          <td>1.020931</td>
          <td>0</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3</th>
          <td>CatBoost</td>
          <td>0.886970</td>
          <td>0.872600</td>
          <td>0.017194</td>
          <td>0.015831</td>
          <td>0.948771</td>
          <td>0.017194</td>
          <td>0.015831</td>
          <td>0.948771</td>
          <td>0</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>4</th>
          <td>LightGBMXT</td>
          <td>0.868132</td>
          <td>0.858639</td>
          <td>0.012497</td>
          <td>0.007196</td>
          <td>1.219221</td>
          <td>0.012497</td>
          <td>0.007196</td>
          <td>1.219221</td>
          <td>0</td>
          <td>True</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    </div>


Model 4: K-Fold Bagging and Stack Ensemble
------------------------------------------

A more advanced strategy is to use 5-fold bagging and call stack
ensembling. This is expected to improve the final performance.

.. code:: python

    predictor_model4 = TabularPredictor(label=label, eval_metric='accuracy', path='model4').fit(
        train_df, hyperparameters=tabular_multimodel_hparam_v1, num_bag_folds=5, num_stack_levels=1
    )


.. parsed-literal::
    :class: output

    Beginning AutoGluon training ...
    AutoGluon will save models to "model4/"
    AutoGluon Version:  0.1.0b20210223
    Train Data Rows:    5727
    Train Data Columns: 2
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
    	4 unique label values:  [3, 2, 1, 0]
    	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Train Data Class Count: 4
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    13421.11 MB
    	Train Data (Original)  Memory Usage: 1.0 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting IdentityFeatureGenerator...
    			Fitting RenameFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    		Fitting TextSpecialFeatureGenerator...
    			Fitting BinnedFeatureGenerator...
    			Fitting DropDuplicatesFeatureGenerator...
    		Fitting TextNgramFeatureGenerator...
    			Fitting CountVectorizer for text features: ['Product_Description']
    			CountVectorizer fit with vocabulary size = 725
    		Warning: Due to memory constraints, ngram feature count is being reduced. Allocate more memory to maximize model quality.
    		Reducing Vectorizer vocab size from 725 to 265 to avoid OOM error
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])          : 1 | ['Product_Type']
    		('object', ['text']) : 1 | ['Product_Description']
    	Types of features in processed data (raw dtype, special dtypes):
    		('int', [])                         :   1 | ['Product_Type']
    		('int', ['binned', 'text_special']) :  38 | ['Product_Description.char_count', 'Product_Description.word_count', 'Product_Description.capital_ratio', 'Product_Description.lower_ratio', 'Product_Description.digit_ratio', ...]
    		('int', ['text_ngram'])             : 266 | ['__nlp__.about', '__nlp__.all', '__nlp__.amp', '__nlp__.an', '__nlp__.an ipad', ...]
    		('object', ['text'])                :   1 | ['Product_Description_raw_text']
    	2.2s = Fit runtime
    	2 features in original data used to generate 306 features in processed data.
    	Train Data (Processed) Memory Usage: 2.76 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 2.19s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric argument of fit()
    Fitting model: LightGBM_BAG_L0 ...
    	0.8797	 = Validation accuracy score
    	8.12s	 = Training runtime
    	0.07s	 = Validation runtime
    Fitting model: LightGBMXT_BAG_L0 ...
    	0.8598	 = Validation accuracy score
    	7.38s	 = Training runtime
    	0.07s	 = Validation runtime
    Fitting model: CatBoost_BAG_L0 ...
    	0.8745	 = Validation accuracy score
    	4.73s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: TextNeuralNetV1_BAG_L0 ...
    All Logs will be saved to model4/models/TextNeuralNetV1_BAG_L0/S1F1/S1F1/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    All Logs will be saved to model4/models/TextNeuralNetV1_BAG_L0/S1F2/S1F2/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    All Logs will be saved to model4/models/TextNeuralNetV1_BAG_L0/S1F3/S1F3/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    All Logs will be saved to model4/models/TextNeuralNetV1_BAG_L0/S1F4/S1F4/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    All Logs will be saved to model4/models/TextNeuralNetV1_BAG_L0/S1F5/S1F5/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    	0.883	 = Validation accuracy score
    	503.53s	 = Training runtime
    	35.01s	 = Validation runtime
    Fitting model: WeightedEnsemble_L1 ...
    	0.883	 = Validation accuracy score
    	0.46s	 = Training runtime
    	0.0s	 = Validation runtime
    Fitting model: LightGBM_BAG_L1 ...
    	0.8846	 = Validation accuracy score
    	10.14s	 = Training runtime
    	0.04s	 = Validation runtime
    Fitting model: LightGBMXT_BAG_L1 ...
    	0.8809	 = Validation accuracy score
    	9.59s	 = Training runtime
    	0.07s	 = Validation runtime
    Fitting model: CatBoost_BAG_L1 ...
    	0.8841	 = Validation accuracy score
    	11.96s	 = Training runtime
    	0.09s	 = Validation runtime
    Fitting model: WeightedEnsemble_L2 ...
    	0.8846	 = Validation accuracy score
    	0.4s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 594.87s ...
    TabularPredictor saved. To load, use: TabularPredictor.load("model4/")


.. code:: python

    predictor_model4.leaderboard(dev_df, silent=True)


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>TextNeuralNetV1_BAG_L0</td>
          <td>0.897959</td>
          <td>0.883010</td>
          <td>4.781237</td>
          <td>35.009925</td>
          <td>503.525820</td>
          <td>4.781237</td>
          <td>35.009925</td>
          <td>503.525820</td>
          <td>0</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>1</th>
          <td>WeightedEnsemble_L1</td>
          <td>0.897959</td>
          <td>0.883010</td>
          <td>4.783215</td>
          <td>35.010866</td>
          <td>503.988360</td>
          <td>0.001978</td>
          <td>0.000941</td>
          <td>0.462540</td>
          <td>1</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CatBoost_BAG_L1</td>
          <td>0.896389</td>
          <td>0.884058</td>
          <td>5.075670</td>
          <td>35.320508</td>
          <td>535.708009</td>
          <td>0.046778</td>
          <td>0.094063</td>
          <td>11.955566</td>
          <td>1</td>
          <td>True</td>
          <td>8</td>
        </tr>
        <tr>
          <th>3</th>
          <td>LightGBM_BAG_L1</td>
          <td>0.893250</td>
          <td>0.884582</td>
          <td>5.080883</td>
          <td>35.270539</td>
          <td>533.896461</td>
          <td>0.051992</td>
          <td>0.044093</td>
          <td>10.144018</td>
          <td>1</td>
          <td>True</td>
          <td>6</td>
        </tr>
        <tr>
          <th>4</th>
          <td>WeightedEnsemble_L2</td>
          <td>0.893250</td>
          <td>0.884582</td>
          <td>5.082560</td>
          <td>35.271442</td>
          <td>534.292905</td>
          <td>0.001677</td>
          <td>0.000904</td>
          <td>0.396444</td>
          <td>2</td>
          <td>True</td>
          <td>9</td>
        </tr>
        <tr>
          <th>5</th>
          <td>LightGBMXT_BAG_L1</td>
          <td>0.893250</td>
          <td>0.880915</td>
          <td>5.109002</td>
          <td>35.294523</td>
          <td>533.344140</td>
          <td>0.080111</td>
          <td>0.068078</td>
          <td>9.591697</td>
          <td>1</td>
          <td>True</td>
          <td>7</td>
        </tr>
        <tr>
          <th>6</th>
          <td>CatBoost_BAG_L0</td>
          <td>0.886970</td>
          <td>0.874454</td>
          <td>0.042751</td>
          <td>0.083205</td>
          <td>4.729065</td>
          <td>0.042751</td>
          <td>0.083205</td>
          <td>4.729065</td>
          <td>0</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>7</th>
          <td>LightGBM_BAG_L0</td>
          <td>0.886970</td>
          <td>0.879693</td>
          <td>0.118651</td>
          <td>0.068003</td>
          <td>8.121579</td>
          <td>0.118651</td>
          <td>0.068003</td>
          <td>8.121579</td>
          <td>0</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>8</th>
          <td>LightGBMXT_BAG_L0</td>
          <td>0.877551</td>
          <td>0.859787</td>
          <td>0.086252</td>
          <td>0.065313</td>
          <td>7.375978</td>
          <td>0.086252</td>
          <td>0.065313</td>
          <td>7.375978</td>
          <td>0</td>
          <td>True</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    </div>


Model 5: Multimodal embedding + TabularPredictor
------------------------------------------------

Also, since the neural network in text prediction can directly handle
multi-modal data, we can fit a model with TextPrediction first and then
use that as an embedding extractor. This can be viewed as an improved
version of Model-2.

.. code:: python

    predictor_text_multimodal = TextPrediction.fit(train_df,
                                                   label=label,
                                                   time_limits=None,
                                                   eval_metric='accuracy',
                                                   stopping_metric='accuracy',
                                                   hyperparameters='default_no_hpo',
                                                   output_directory='predictor_text_multimodal')
    
    train_sentence_multimodal_embeddings = predictor_text_multimodal.extract_embedding(train_df)
    dev_sentence_multimodal_embeddings = predictor_text_multimodal.extract_embedding(dev_df)
    
    predictor_model5 = TabularPredictor(label=label, eval_metric='accuracy', path='model5').fit(train_df)


.. parsed-literal::
    :class: output

    2021-02-23 19:42:14,695 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to predictor_text_multimodal/ag_text_prediction.log
    All Logs will be saved to predictor_text_multimodal/ag_text_prediction.log
    2021-02-23 19:42:14,715 - autogluon.text.text_prediction.text_prediction - INFO - Train Dataset:
    Train Dataset:
    2021-02-23 19:42:14,716 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=4581/0
       length, min/avg/max=11/104.7640253219821/178
    )
    - Categorical(
       name="Product_Type"
       #total/missing=4581/0
       num_class (total/non_special)=10/10
       categories=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
       freq=[38, 43, 336, 218, 15, 152, 474, 233, 142, 2930]
    )
    - Categorical(
       name="Sentiment"
       #total/missing=4581/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[75, 276, 2717, 1513]
    )
    
    
    Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=4581/0
       length, min/avg/max=11/104.7640253219821/178
    )
    - Categorical(
       name="Product_Type"
       #total/missing=4581/0
       num_class (total/non_special)=10/10
       categories=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
       freq=[38, 43, 336, 218, 15, 152, 474, 233, 142, 2930]
    )
    - Categorical(
       name="Sentiment"
       #total/missing=4581/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[75, 276, 2717, 1513]
    )
    
    
    2021-02-23 19:42:14,717 - autogluon.text.text_prediction.text_prediction - INFO - Tuning Dataset:
    Tuning Dataset:
    2021-02-23 19:42:14,718 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=1146/0
       length, min/avg/max=32/105.1719022687609/159
    )
    - Categorical(
       name="Product_Type"
       #total/missing=1146/0
       num_class (total/non_special)=10/10
       categories=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
       freq=[8, 13, 75, 55, 2, 42, 123, 62, 36, 730]
    )
    - Categorical(
       name="Sentiment"
       #total/missing=1146/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[25, 83, 671, 367]
    )
    
    
    Columns:
    
    - Text(
       name="Product_Description"
       #total/missing=1146/0
       length, min/avg/max=32/105.1719022687609/159
    )
    - Categorical(
       name="Product_Type"
       #total/missing=1146/0
       num_class (total/non_special)=10/10
       categories=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
       freq=[8, 13, 75, 55, 2, 42, 123, 62, 36, 730]
    )
    - Categorical(
       name="Sentiment"
       #total/missing=1146/0
       num_class (total/non_special)=4/4
       categories=[0, 1, 2, 3]
       freq=[25, 83, 671, 367]
    )
    
    
    Label columns=['Sentiment'], Feature columns=['Product_Description', 'Product_Type'], Problem types=['classification'], Label shapes=[4]
    Eval Metric=accuracy, Stop Metric=accuracy, Log Metrics=['acc', 'log_loss', 'accuracy']
    2021-02-23 19:42:14,722 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to predictor_text_multimodal/main.log
    All Logs will be saved to predictor_text_multimodal/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    2021-02-23 19:43:58,181 - autogluon.text.text_prediction.text_prediction - INFO - Results=
    Results=
    2021-02-23 19:43:58,183 - autogluon.text.text_prediction.text_prediction - INFO - Best_config={'search_space▁optimization.lr': 5e-05}
    Best_config={'search_space▁optimization.lr': 5e-05}


.. parsed-literal::
    :class: output

    (task:7)	2021-02-23 19:42:16,860 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/predictor_text_multimodal/task7/training.log
    2021-02-23 19:42:16,860 - root - INFO - learning:
      early_stopping_patience: 10
      log_metrics: auto
      stop_metric: auto
      valid_ratio: 0.15
    misc:
      exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/docs/_build/eval/tutorials/tabular_prediction/predictor_text_multimodal/task7
      seed: 123
    model:
      backbone:
        name: google_electra_small
      network:
        agg_net:
          activation: tanh
          agg_type: concat
          data_dropout: False
          dropout: 0.1
          feature_proj_num_layers: -1
          initializer:
            bias: ['zeros']
            weight: ['xavier', 'uniform', 'avg', 3.0]
          mid_units: 256
          norm_eps: 1e-05
          normalization: layer_norm
          out_proj_num_layers: 0
        categorical_net:
          activation: leaky
          data_dropout: False
          dropout: 0.1
          emb_units: 32
          initializer:
            bias: ['zeros']
            embed: ['xavier', 'gaussian', 'in', 1.0]
            weight: ['xavier', 'uniform', 'avg', 3.0]
          mid_units: 64
          norm_eps: 1e-05
          normalization: layer_norm
          num_layers: 1
        feature_units: -1
        initializer:
          bias: ['zeros']
          weight: ['truncnorm', 0, 0.02]
        numerical_net:
          activation: leaky
          data_dropout: False
          dropout: 0.1
          initializer:
            bias: ['zeros']
            weight: ['xavier', 'uniform', 'avg', 3.0]
          input_centering: False
          mid_units: 128
          norm_eps: 1e-05
          normalization: layer_norm
          num_layers: 1
        text_net:
          pool_type: cls
          use_segment_id: True
      preprocess:
        max_length: 128
        merge_text: True
    optimization:
      batch_size: 32
      begin_lr: 0.0
      final_lr: 0.0
      layerwise_lr_decay: 0.8
      log_frequency: 0.1
      lr: 5e-05
      lr_scheduler: triangular
      max_grad_norm: 1.0
      model_average: 5
      num_train_epochs: 4
      optimizer: adamw
      optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
      per_device_batch_size: 16
      val_batch_size_mult: 2
      valid_frequency: 0.1
      warmup_portion: 0.1
      wd: 0.01
    version: 1
    2021-02-23 19:42:17,008 - root - INFO - Process training set...
    2021-02-23 19:42:20,561 - root - INFO - Done!
    2021-02-23 19:42:20,563 - root - INFO - Process dev set...
    2021-02-23 19:42:23,413 - root - INFO - Done!
    2021-02-23 19:42:28,728 - root - INFO - #Total Params/Fixed Params=13504196/0
    2021-02-23 19:42:28,744 - root - INFO - Using gradient accumulation. Global batch size = 32
    2021-02-23 19:42:31,227 - root - INFO - [Iter 15/572, Epoch 0] train loss=7.6880e-01, gnorm=4.5836e+00, lr=1.3158e-05, #samples processed=720, #sample per second=294.59
    2021-02-23 19:42:32,048 - root - INFO - [Iter 15/572, Epoch 0] valid accuracy=6.9110e-01, log_loss=8.6215e-01, accuracy=6.9110e-01, time spent=0.740s, total_time=0.05min
    2021-02-23 19:42:33,539 - root - INFO - [Iter 30/572, Epoch 0] train loss=5.2239e-01, gnorm=4.9510e+00, lr=2.6316e-05, #samples processed=720, #sample per second=311.42
    2021-02-23 19:42:34,447 - root - INFO - [Iter 30/572, Epoch 0] valid accuracy=8.2373e-01, log_loss=6.7741e-01, accuracy=8.2373e-01, time spent=0.736s, total_time=0.09min
    2021-02-23 19:42:35,948 - root - INFO - [Iter 45/572, Epoch 0] train loss=4.0803e-01, gnorm=2.5185e+00, lr=3.9474e-05, #samples processed=720, #sample per second=298.88
    2021-02-23 19:42:36,801 - root - INFO - [Iter 45/572, Epoch 0] valid accuracy=8.5689e-01, log_loss=5.3321e-01, accuracy=8.5689e-01, time spent=0.734s, total_time=0.13min
    2021-02-23 19:42:38,372 - root - INFO - [Iter 60/572, Epoch 0] train loss=3.7348e-01, gnorm=3.0502e+00, lr=4.9709e-05, #samples processed=720, #sample per second=297.09
    2021-02-23 19:42:39,249 - root - INFO - [Iter 60/572, Epoch 0] valid accuracy=8.5689e-01, log_loss=4.9822e-01, accuracy=8.5689e-01, time spent=0.740s, total_time=0.17min
    2021-02-23 19:42:40,683 - root - INFO - [Iter 75/572, Epoch 0] train loss=3.2895e-01, gnorm=3.5754e+00, lr=4.8252e-05, #samples processed=720, #sample per second=311.56
    2021-02-23 19:42:41,602 - root - INFO - [Iter 75/572, Epoch 0] valid accuracy=8.6126e-01, log_loss=5.3280e-01, accuracy=8.6126e-01, time spent=0.744s, total_time=0.21min
    2021-02-23 19:42:43,136 - root - INFO - [Iter 90/572, Epoch 0] train loss=2.7651e-01, gnorm=2.3192e+00, lr=4.6796e-05, #samples processed=720, #sample per second=293.54
    2021-02-23 19:42:43,881 - root - INFO - [Iter 90/572, Epoch 0] valid accuracy=8.5777e-01, log_loss=4.9850e-01, accuracy=8.5777e-01, time spent=0.744s, total_time=0.25min
    2021-02-23 19:42:45,342 - root - INFO - [Iter 105/572, Epoch 0] train loss=3.2964e-01, gnorm=2.4756e+00, lr=4.5340e-05, #samples processed=720, #sample per second=326.43
    2021-02-23 19:42:46,231 - root - INFO - [Iter 105/572, Epoch 0] valid accuracy=8.6126e-01, log_loss=4.8262e-01, accuracy=8.6126e-01, time spent=0.748s, total_time=0.29min
    2021-02-23 19:42:47,723 - root - INFO - [Iter 120/572, Epoch 0] train loss=2.7775e-01, gnorm=3.3851e+00, lr=4.3883e-05, #samples processed=720, #sample per second=302.40
    2021-02-23 19:42:48,593 - root - INFO - [Iter 120/572, Epoch 0] valid accuracy=8.6126e-01, log_loss=4.9458e-01, accuracy=8.6126e-01, time spent=0.747s, total_time=0.33min
    2021-02-23 19:42:50,058 - root - INFO - [Iter 135/572, Epoch 0] train loss=3.5077e-01, gnorm=5.2572e+00, lr=4.2427e-05, #samples processed=720, #sample per second=308.41
    2021-02-23 19:42:50,945 - root - INFO - [Iter 135/572, Epoch 0] valid accuracy=8.6126e-01, log_loss=5.0231e-01, accuracy=8.6126e-01, time spent=0.746s, total_time=0.37min
    2021-02-23 19:42:52,411 - root - INFO - [Iter 150/572, Epoch 1] train loss=3.0280e-01, gnorm=2.9394e+00, lr=4.0971e-05, #samples processed=698, #sample per second=296.61
    2021-02-23 19:42:53,289 - root - INFO - [Iter 150/572, Epoch 1] valid accuracy=8.6126e-01, log_loss=4.6663e-01, accuracy=8.6126e-01, time spent=0.741s, total_time=0.41min
    2021-02-23 19:42:54,739 - root - INFO - [Iter 165/572, Epoch 1] train loss=3.3440e-01, gnorm=1.9468e+00, lr=3.9515e-05, #samples processed=720, #sample per second=309.34
    2021-02-23 19:42:55,618 - root - INFO - [Iter 165/572, Epoch 1] valid accuracy=8.6126e-01, log_loss=4.7071e-01, accuracy=8.6126e-01, time spent=0.743s, total_time=0.45min
    2021-02-23 19:42:57,103 - root - INFO - [Iter 180/572, Epoch 1] train loss=3.1203e-01, gnorm=2.5760e+00, lr=3.8058e-05, #samples processed=720, #sample per second=304.62
    2021-02-23 19:42:58,016 - root - INFO - [Iter 180/572, Epoch 1] valid accuracy=8.6475e-01, log_loss=4.5359e-01, accuracy=8.6475e-01, time spent=0.770s, total_time=0.49min
    2021-02-23 19:42:59,492 - root - INFO - [Iter 195/572, Epoch 1] train loss=3.1515e-01, gnorm=3.4280e+00, lr=3.6602e-05, #samples processed=720, #sample per second=301.42
    2021-02-23 19:43:00,239 - root - INFO - [Iter 195/572, Epoch 1] valid accuracy=8.6300e-01, log_loss=4.7073e-01, accuracy=8.6300e-01, time spent=0.747s, total_time=0.52min
    2021-02-23 19:43:01,687 - root - INFO - [Iter 210/572, Epoch 1] train loss=3.1073e-01, gnorm=3.3016e+00, lr=3.5146e-05, #samples processed=720, #sample per second=328.04
    2021-02-23 19:43:02,578 - root - INFO - [Iter 210/572, Epoch 1] valid accuracy=8.6562e-01, log_loss=4.5557e-01, accuracy=8.6562e-01, time spent=0.757s, total_time=0.56min
    2021-02-23 19:43:04,062 - root - INFO - [Iter 225/572, Epoch 1] train loss=2.7208e-01, gnorm=4.7893e+00, lr=3.3689e-05, #samples processed=720, #sample per second=303.15
    2021-02-23 19:43:04,950 - root - INFO - [Iter 225/572, Epoch 1] valid accuracy=8.6562e-01, log_loss=4.6826e-01, accuracy=8.6562e-01, time spent=0.750s, total_time=0.60min
    2021-02-23 19:43:06,403 - root - INFO - [Iter 240/572, Epoch 1] train loss=2.8616e-01, gnorm=2.7858e+00, lr=3.2233e-05, #samples processed=720, #sample per second=307.56
    2021-02-23 19:43:07,296 - root - INFO - [Iter 240/572, Epoch 1] valid accuracy=8.6736e-01, log_loss=4.4909e-01, accuracy=8.6736e-01, time spent=0.755s, total_time=0.64min
    2021-02-23 19:43:08,790 - root - INFO - [Iter 255/572, Epoch 1] train loss=2.3186e-01, gnorm=1.3332e+00, lr=3.0777e-05, #samples processed=720, #sample per second=301.64
    2021-02-23 19:43:09,552 - root - INFO - [Iter 255/572, Epoch 1] valid accuracy=8.6126e-01, log_loss=5.0038e-01, accuracy=8.6126e-01, time spent=0.761s, total_time=0.68min
    2021-02-23 19:43:11,041 - root - INFO - [Iter 270/572, Epoch 1] train loss=2.5154e-01, gnorm=3.5720e+00, lr=2.9320e-05, #samples processed=720, #sample per second=320.01
    2021-02-23 19:43:11,801 - root - INFO - [Iter 270/572, Epoch 1] valid accuracy=8.6649e-01, log_loss=4.6431e-01, accuracy=8.6649e-01, time spent=0.761s, total_time=0.72min
    2021-02-23 19:43:13,268 - root - INFO - [Iter 285/572, Epoch 1] train loss=2.8520e-01, gnorm=3.8546e+00, lr=2.7864e-05, #samples processed=720, #sample per second=323.30
    2021-02-23 19:43:14,027 - root - INFO - [Iter 285/572, Epoch 1] valid accuracy=8.6562e-01, log_loss=4.6998e-01, accuracy=8.6562e-01, time spent=0.758s, total_time=0.75min
    2021-02-23 19:43:15,491 - root - INFO - [Iter 300/572, Epoch 2] train loss=3.3013e-01, gnorm=3.5853e+00, lr=2.6408e-05, #samples processed=709, #sample per second=318.87
    2021-02-23 19:43:16,396 - root - INFO - [Iter 300/572, Epoch 2] valid accuracy=8.6911e-01, log_loss=4.4177e-01, accuracy=8.6911e-01, time spent=0.759s, total_time=0.79min
    2021-02-23 19:43:17,867 - root - INFO - [Iter 315/572, Epoch 2] train loss=2.7419e-01, gnorm=3.3656e+00, lr=2.4951e-05, #samples processed=720, #sample per second=303.16
    2021-02-23 19:43:18,608 - root - INFO - [Iter 315/572, Epoch 2] valid accuracy=8.6300e-01, log_loss=4.9422e-01, accuracy=8.6300e-01, time spent=0.741s, total_time=0.83min
    2021-02-23 19:43:20,083 - root - INFO - [Iter 330/572, Epoch 2] train loss=2.5470e-01, gnorm=2.7021e+00, lr=2.3495e-05, #samples processed=720, #sample per second=324.87
    2021-02-23 19:43:20,839 - root - INFO - [Iter 330/572, Epoch 2] valid accuracy=8.6824e-01, log_loss=4.5256e-01, accuracy=8.6824e-01, time spent=0.756s, total_time=0.87min
    2021-02-23 19:43:22,290 - root - INFO - [Iter 345/572, Epoch 2] train loss=2.8035e-01, gnorm=3.6649e+00, lr=2.2039e-05, #samples processed=720, #sample per second=326.20
    2021-02-23 19:43:23,049 - root - INFO - [Iter 345/572, Epoch 2] valid accuracy=8.6736e-01, log_loss=4.4729e-01, accuracy=8.6736e-01, time spent=0.758s, total_time=0.90min
    2021-02-23 19:43:24,557 - root - INFO - [Iter 360/572, Epoch 2] train loss=2.5612e-01, gnorm=3.1314e+00, lr=2.0583e-05, #samples processed=720, #sample per second=317.63
    2021-02-23 19:43:25,339 - root - INFO - [Iter 360/572, Epoch 2] valid accuracy=8.6475e-01, log_loss=4.6779e-01, accuracy=8.6475e-01, time spent=0.781s, total_time=0.94min
    2021-02-23 19:43:26,821 - root - INFO - [Iter 375/572, Epoch 2] train loss=2.2449e-01, gnorm=5.5644e+00, lr=1.9126e-05, #samples processed=720, #sample per second=318.10
    2021-02-23 19:43:27,578 - root - INFO - [Iter 375/572, Epoch 2] valid accuracy=8.6824e-01, log_loss=4.4857e-01, accuracy=8.6824e-01, time spent=0.757s, total_time=0.98min
    2021-02-23 19:43:29,074 - root - INFO - [Iter 390/572, Epoch 2] train loss=2.8870e-01, gnorm=4.0543e+00, lr=1.7670e-05, #samples processed=720, #sample per second=319.61
    2021-02-23 19:43:29,829 - root - INFO - [Iter 390/572, Epoch 2] valid accuracy=8.6649e-01, log_loss=4.6483e-01, accuracy=8.6649e-01, time spent=0.755s, total_time=1.02min
    2021-02-23 19:43:31,234 - root - INFO - [Iter 405/572, Epoch 2] train loss=2.4080e-01, gnorm=5.0446e+00, lr=1.6214e-05, #samples processed=720, #sample per second=333.41
    2021-02-23 19:43:32,136 - root - INFO - [Iter 405/572, Epoch 2] valid accuracy=8.6911e-01, log_loss=4.3994e-01, accuracy=8.6911e-01, time spent=0.764s, total_time=1.06min
    2021-02-23 19:43:33,652 - root - INFO - [Iter 420/572, Epoch 2] train loss=2.6520e-01, gnorm=4.7820e+00, lr=1.4757e-05, #samples processed=720, #sample per second=297.77
    2021-02-23 19:43:34,531 - root - INFO - [Iter 420/572, Epoch 2] valid accuracy=8.6911e-01, log_loss=4.4517e-01, accuracy=8.6911e-01, time spent=0.747s, total_time=1.10min
    2021-02-23 19:43:35,988 - root - INFO - [Iter 435/572, Epoch 3] train loss=2.7997e-01, gnorm=2.2674e+00, lr=1.3301e-05, #samples processed=698, #sample per second=298.79
    2021-02-23 19:43:36,768 - root - INFO - [Iter 435/572, Epoch 3] valid accuracy=8.6824e-01, log_loss=4.5767e-01, accuracy=8.6824e-01, time spent=0.780s, total_time=1.13min
    2021-02-23 19:43:38,196 - root - INFO - [Iter 450/572, Epoch 3] train loss=2.6275e-01, gnorm=1.0901e+01, lr=1.1845e-05, #samples processed=720, #sample per second=326.10
    2021-02-23 19:43:38,967 - root - INFO - [Iter 450/572, Epoch 3] valid accuracy=8.6736e-01, log_loss=4.6367e-01, accuracy=8.6736e-01, time spent=0.771s, total_time=1.17min
    2021-02-23 19:43:40,466 - root - INFO - [Iter 465/572, Epoch 3] train loss=2.9477e-01, gnorm=6.8845e+00, lr=1.0388e-05, #samples processed=720, #sample per second=317.20
    2021-02-23 19:43:41,352 - root - INFO - [Iter 465/572, Epoch 3] valid accuracy=8.6998e-01, log_loss=4.3869e-01, accuracy=8.6998e-01, time spent=0.754s, total_time=1.21min
    2021-02-23 19:43:42,808 - root - INFO - [Iter 480/572, Epoch 3] train loss=2.5355e-01, gnorm=2.8429e+00, lr=8.9320e-06, #samples processed=720, #sample per second=307.44
    2021-02-23 19:43:43,567 - root - INFO - [Iter 480/572, Epoch 3] valid accuracy=8.6911e-01, log_loss=4.5120e-01, accuracy=8.6911e-01, time spent=0.758s, total_time=1.25min
    2021-02-23 19:43:45,077 - root - INFO - [Iter 495/572, Epoch 3] train loss=2.4979e-01, gnorm=4.2404e+00, lr=7.4757e-06, #samples processed=720, #sample per second=317.39
    2021-02-23 19:43:45,833 - root - INFO - [Iter 495/572, Epoch 3] valid accuracy=8.6824e-01, log_loss=4.4716e-01, accuracy=8.6824e-01, time spent=0.756s, total_time=1.28min
    2021-02-23 19:43:47,312 - root - INFO - [Iter 510/572, Epoch 3] train loss=2.6619e-01, gnorm=3.3240e+00, lr=6.0194e-06, #samples processed=720, #sample per second=322.11
    2021-02-23 19:43:48,061 - root - INFO - [Iter 510/572, Epoch 3] valid accuracy=8.6911e-01, log_loss=4.4639e-01, accuracy=8.6911e-01, time spent=0.748s, total_time=1.32min
    2021-02-23 19:43:49,506 - root - INFO - [Iter 525/572, Epoch 3] train loss=2.0779e-01, gnorm=2.5354e+00, lr=4.5631e-06, #samples processed=720, #sample per second=328.20
    2021-02-23 19:43:50,264 - root - INFO - [Iter 525/572, Epoch 3] valid accuracy=8.6911e-01, log_loss=4.5519e-01, accuracy=8.6911e-01, time spent=0.758s, total_time=1.36min
    2021-02-23 19:43:51,690 - root - INFO - [Iter 540/572, Epoch 3] train loss=2.6498e-01, gnorm=3.6586e+00, lr=3.1068e-06, #samples processed=720, #sample per second=329.82
    2021-02-23 19:43:52,433 - root - INFO - [Iter 540/572, Epoch 3] valid accuracy=8.6911e-01, log_loss=4.5044e-01, accuracy=8.6911e-01, time spent=0.743s, total_time=1.39min
    2021-02-23 19:43:53,926 - root - INFO - [Iter 555/572, Epoch 3] train loss=2.9774e-01, gnorm=4.7774e+00, lr=1.6505e-06, #samples processed=720, #sample per second=321.93
    2021-02-23 19:43:54,681 - root - INFO - [Iter 555/572, Epoch 3] valid accuracy=8.6824e-01, log_loss=4.4645e-01, accuracy=8.6824e-01, time spent=0.754s, total_time=1.43min
    2021-02-23 19:43:56,197 - root - INFO - [Iter 570/572, Epoch 3] train loss=2.0356e-01, gnorm=2.9091e+00, lr=1.9417e-07, #samples processed=720, #sample per second=317.11
    2021-02-23 19:43:56,947 - root - INFO - [Iter 570/572, Epoch 3] valid accuracy=8.6824e-01, log_loss=4.4582e-01, accuracy=8.6824e-01, time spent=0.749s, total_time=1.47min
    2021-02-23 19:43:57,899 - root - INFO - [Iter 572/572, Epoch 3] valid accuracy=8.6824e-01, log_loss=4.4583e-01, accuracy=8.6824e-01, time spent=0.760s, total_time=1.49min


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/text/src/autogluon/text/text_prediction/dataset.py:321: SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead
    
    See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
      df[col_name] = df[col_name].fillna('').apply(str)
    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    Beginning AutoGluon training ...
    AutoGluon will save models to "model5/"
    AutoGluon Version:  0.1.0b20210223
    Train Data Rows:    5727
    Train Data Columns: 2
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
    	4 unique label values:  [3, 2, 1, 0]
    	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Train Data Class Count: 4
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    13323.89 MB
    	Train Data (Original)  Memory Usage: 1.0 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    		Fitting TextSpecialFeatureGenerator...
    			Fitting BinnedFeatureGenerator...
    			Fitting DropDuplicatesFeatureGenerator...
    		Fitting TextNgramFeatureGenerator...
    			Fitting CountVectorizer for text features: ['Product_Description']
    			CountVectorizer fit with vocabulary size = 725
    		Warning: Due to memory constraints, ngram feature count is being reduced. Allocate more memory to maximize model quality.
    		Reducing Vectorizer vocab size from 725 to 259 to avoid OOM error
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])          : 1 | ['Product_Type']
    		('object', ['text']) : 1 | ['Product_Description']
    	Types of features in processed data (raw dtype, special dtypes):
    		('int', [])                         :   1 | ['Product_Type']
    		('int', ['binned', 'text_special']) :  38 | ['Product_Description.char_count', 'Product_Description.word_count', 'Product_Description.capital_ratio', 'Product_Description.lower_ratio', 'Product_Description.digit_ratio', ...]
    		('int', ['text_ngram'])             : 260 | ['__nlp__.about', '__nlp__.all', '__nlp__.amp', '__nlp__.an', '__nlp__.an ipad', ...]
    	2.1s = Fit runtime
    	2 features in original data used to generate 299 features in processed data.
    	Train Data (Processed) Memory Usage: 1.77 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 2.17s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric argument of fit()
    Automatically generating train/validation split with holdout_frac=0.1, Train Rows: 5154, Val Rows: 573
    Fitting model: NeuralNetMXNet ...
    	0.8743	 = Validation accuracy score
    	5.12s	 = Training runtime
    	0.04s	 = Validation runtime
    Fitting model: NeuralNetFastAI ...


.. parsed-literal::
    :class: output

    █

.. parsed-literal::
    :class: output

    	0.8499	 = Validation accuracy score
    	17.51s	 = Training runtime
    	0.28s	 = Validation runtime
    Fitting model: KNeighborsUnif ...
    	0.8534	 = Validation accuracy score
    	0.02s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: KNeighborsDist ...
    	0.8534	 = Validation accuracy score
    	0.02s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: RandomForestGini ...
    	0.8796	 = Validation accuracy score
    	0.96s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: RandomForestEntr ...
    	0.8761	 = Validation accuracy score
    	1.0s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: ExtraTreesGini ...
    	0.8534	 = Validation accuracy score
    	1.06s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: ExtraTreesEntr ...
    	0.8499	 = Validation accuracy score
    	1.08s	 = Training runtime
    	0.08s	 = Validation runtime
    Fitting model: LightGBM ...
    	0.8778	 = Validation accuracy score
    	1.31s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMXT ...
    	0.8586	 = Validation accuracy score
    	1.28s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.8726	 = Validation accuracy score
    	0.9s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: XGBoost ...
    	0.8778	 = Validation accuracy score
    	2.42s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMLarge ...
    	0.8848	 = Validation accuracy score
    	4.02s	 = Training runtime
    	0.01s	 = Validation runtime


.. parsed-literal::
    :class: output

    █

.. parsed-literal::
    :class: output

    Fitting model: WeightedEnsemble_L1 ...
    	0.8901	 = Validation accuracy score
    	0.4s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 51.48s ...
    TabularPredictor saved. To load, use: TabularPredictor.load("model5/")


.. code:: python

    predictor_model5.leaderboard(dev_df.join(pd.DataFrame(dev_sentence_multimodal_embeddings)), silent=True)


.. parsed-literal::
    :class: output

    █


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>CatBoost</td>
          <td>0.886970</td>
          <td>0.872600</td>
          <td>0.014327</td>
          <td>0.005457</td>
          <td>0.895682</td>
          <td>0.014327</td>
          <td>0.005457</td>
          <td>0.895682</td>
          <td>0</td>
          <td>True</td>
          <td>11</td>
        </tr>
        <tr>
          <th>1</th>
          <td>RandomForestGini</td>
          <td>0.886970</td>
          <td>0.879581</td>
          <td>0.120903</td>
          <td>0.079546</td>
          <td>0.959055</td>
          <td>0.120903</td>
          <td>0.079546</td>
          <td>0.959055</td>
          <td>0</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>2</th>
          <td>WeightedEnsemble_L1</td>
          <td>0.886970</td>
          <td>0.890052</td>
          <td>10.760290</td>
          <td>0.648267</td>
          <td>29.524343</td>
          <td>0.006635</td>
          <td>0.000556</td>
          <td>0.402977</td>
          <td>1</td>
          <td>True</td>
          <td>14</td>
        </tr>
        <tr>
          <th>3</th>
          <td>LightGBM</td>
          <td>0.885400</td>
          <td>0.877836</td>
          <td>0.012522</td>
          <td>0.006084</td>
          <td>1.307731</td>
          <td>0.012522</td>
          <td>0.006084</td>
          <td>1.307731</td>
          <td>0</td>
          <td>True</td>
          <td>9</td>
        </tr>
        <tr>
          <th>4</th>
          <td>NeuralNetMXNet</td>
          <td>0.885400</td>
          <td>0.874346</td>
          <td>0.043936</td>
          <td>0.037668</td>
          <td>5.120788</td>
          <td>0.043936</td>
          <td>0.037668</td>
          <td>5.120788</td>
          <td>0</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>5</th>
          <td>LightGBMLarge</td>
          <td>0.883830</td>
          <td>0.884817</td>
          <td>0.028177</td>
          <td>0.007567</td>
          <td>4.015435</td>
          <td>0.028177</td>
          <td>0.007567</td>
          <td>4.015435</td>
          <td>0</td>
          <td>True</td>
          <td>13</td>
        </tr>
        <tr>
          <th>6</th>
          <td>KNeighborsUnif</td>
          <td>0.883830</td>
          <td>0.853403</td>
          <td>0.031159</td>
          <td>0.019027</td>
          <td>0.017875</td>
          <td>0.031159</td>
          <td>0.019027</td>
          <td>0.017875</td>
          <td>0</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>7</th>
          <td>KNeighborsDist</td>
          <td>0.883830</td>
          <td>0.853403</td>
          <td>0.040395</td>
          <td>0.017965</td>
          <td>0.017411</td>
          <td>0.040395</td>
          <td>0.017965</td>
          <td>0.017411</td>
          <td>0</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>8</th>
          <td>XGBoost</td>
          <td>0.883830</td>
          <td>0.877836</td>
          <td>0.113737</td>
          <td>0.011080</td>
          <td>2.421276</td>
          <td>0.113737</td>
          <td>0.011080</td>
          <td>2.421276</td>
          <td>0</td>
          <td>True</td>
          <td>12</td>
        </tr>
        <tr>
          <th>9</th>
          <td>RandomForestEntr</td>
          <td>0.883830</td>
          <td>0.876091</td>
          <td>0.119312</td>
          <td>0.080413</td>
          <td>0.995966</td>
          <td>0.119312</td>
          <td>0.080413</td>
          <td>0.995966</td>
          <td>0</td>
          <td>True</td>
          <td>6</td>
        </tr>
        <tr>
          <th>10</th>
          <td>ExtraTreesGini</td>
          <td>0.875981</td>
          <td>0.853403</td>
          <td>0.177299</td>
          <td>0.080179</td>
          <td>1.055089</td>
          <td>0.177299</td>
          <td>0.080179</td>
          <td>1.055089</td>
          <td>0</td>
          <td>True</td>
          <td>7</td>
        </tr>
        <tr>
          <th>11</th>
          <td>NeuralNetFastAI</td>
          <td>0.874411</td>
          <td>0.849913</td>
          <td>10.059017</td>
          <td>0.280836</td>
          <td>17.511071</td>
          <td>10.059017</td>
          <td>0.280836</td>
          <td>17.511071</td>
          <td>0</td>
          <td>True</td>
          <td>2</td>
        </tr>
        <tr>
          <th>12</th>
          <td>LightGBMXT</td>
          <td>0.871272</td>
          <td>0.858639</td>
          <td>0.012264</td>
          <td>0.006685</td>
          <td>1.283650</td>
          <td>0.012264</td>
          <td>0.006685</td>
          <td>1.283650</td>
          <td>0</td>
          <td>True</td>
          <td>10</td>
        </tr>
        <tr>
          <th>13</th>
          <td>ExtraTreesEntr</td>
          <td>0.869702</td>
          <td>0.849913</td>
          <td>0.169439</td>
          <td>0.082980</td>
          <td>1.080275</td>
          <td>0.169439</td>
          <td>0.082980</td>
          <td>1.080275</td>
          <td>0</td>
          <td>True</td>
          <td>8</td>
        </tr>
      </tbody>
    </table>
    </div>


Model 6: Use a larger backbone
------------------------------

Now, we will choose to use a larger backbone: ELECTRA-base. We will find
that the performance gets improved after we change to use a larger
backbone model. However, we should notice that the training time will be
longer and the inference cost will be higher.

.. code:: python

    from autogluon.text.text_prediction.text_prediction import ag_text_prediction_params
    from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
    import copy
    
    text_nn_params = ag_text_prediction_params.create('default_electra_base_no_hpo')
    
    tabular_multimodel_hparam_v2 = {
        'GBM': [{}, {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}],
        'CAT': {},
        'TEXT_NN_V1': text_nn_params,
    }
    
    predictor_model6 = TabularPredictor(label=label, eval_metric='accuracy', path='model6').fit(
        train_df, hyperparameters=tabular_multimodel_hparam_v2
    )


.. parsed-literal::
    :class: output

    Beginning AutoGluon training ...
    AutoGluon will save models to "model6/"
    AutoGluon Version:  0.1.0b20210223
    Train Data Rows:    5727
    Train Data Columns: 2
    Preprocessing data ...
    AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
    	4 unique label values:  [3, 2, 1, 0]
    	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
    Train Data Class Count: 4
    Using Feature Generators to preprocess the data ...
    Fitting AutoMLPipelineFeatureGenerator...
    	Available Memory:                    13221.79 MB
    	Train Data (Original)  Memory Usage: 1.0 MB (0.0% of available memory)
    	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
    	Stage 1 Generators:
    		Fitting AsTypeFeatureGenerator...
    	Stage 2 Generators:
    		Fitting FillNaFeatureGenerator...
    	Stage 3 Generators:
    		Fitting IdentityFeatureGenerator...
    		Fitting IdentityFeatureGenerator...
    			Fitting RenameFeatureGenerator...
    		Fitting CategoryFeatureGenerator...
    			Fitting CategoryMemoryMinimizeFeatureGenerator...
    		Fitting TextSpecialFeatureGenerator...
    			Fitting BinnedFeatureGenerator...
    			Fitting DropDuplicatesFeatureGenerator...
    		Fitting TextNgramFeatureGenerator...
    			Fitting CountVectorizer for text features: ['Product_Description']
    			CountVectorizer fit with vocabulary size = 725
    		Warning: Due to memory constraints, ngram feature count is being reduced. Allocate more memory to maximize model quality.
    		Reducing Vectorizer vocab size from 725 to 256 to avoid OOM error
    	Stage 4 Generators:
    		Fitting DropUniqueFeatureGenerator...
    	Types of features in original data (raw dtype, special dtypes):
    		('int', [])          : 1 | ['Product_Type']
    		('object', ['text']) : 1 | ['Product_Description']
    	Types of features in processed data (raw dtype, special dtypes):
    		('int', [])                         :   1 | ['Product_Type']
    		('int', ['binned', 'text_special']) :  38 | ['Product_Description.char_count', 'Product_Description.word_count', 'Product_Description.capital_ratio', 'Product_Description.lower_ratio', 'Product_Description.digit_ratio', ...]
    		('int', ['text_ngram'])             : 257 | ['__nlp__.about', '__nlp__.all', '__nlp__.amp', '__nlp__.an', '__nlp__.an ipad', ...]
    		('object', ['text'])                :   1 | ['Product_Description_raw_text']
    	2.2s = Fit runtime
    	2 features in original data used to generate 297 features in processed data.
    	Train Data (Processed) Memory Usage: 2.71 MB (0.0% of available memory)
    Data preprocessing and feature engineering runtime = 2.21s ...
    AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
    	To change this, specify the eval_metric argument of fit()
    Automatically generating train/validation split with holdout_frac=0.1, Train Rows: 5154, Val Rows: 573
    Fitting model: LightGBM ...
    	0.8778	 = Validation accuracy score
    	1.16s	 = Training runtime
    	0.01s	 = Validation runtime
    Fitting model: LightGBMXT ...
    	0.8569	 = Validation accuracy score
    	1.37s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: CatBoost ...
    	0.8726	 = Validation accuracy score
    	0.9s	 = Training runtime
    	0.02s	 = Validation runtime
    Fitting model: TextNeuralNetV1 ...
    All Logs will be saved to model6/models/TextNeuralNetV1/TextNeuralNetV1/main.log
    Starting Hyperparameter Tuning ... (num_trials=1)


.. parsed-literal::
    :class: output

      0%|          | 0/1 [00:00<?, ?it/s]


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    	0.9162	 = Validation accuracy score
    	319.17s	 = Training runtime
    	2.26s	 = Validation runtime
    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)
    Fitting model: WeightedEnsemble_L1 ...
    	0.9162	 = Validation accuracy score
    	0.14s	 = Training runtime
    	0.0s	 = Validation runtime
    AutoGluon training complete, total runtime = 331.01s ...
    TabularPredictor saved. To load, use: TabularPredictor.load("model6/")


.. code:: python

    predictor_model6.leaderboard(dev_df, silent=True)


.. parsed-literal::
    :class: output

    /var/lib/jenkins/workspace/workspace/autogluon-tutorial-tabular-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
      self._build_cache(*args)


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }
    
        .dataframe tbody tr th {
            vertical-align: top;
        }
    
        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>model</th>
          <th>score_test</th>
          <th>score_val</th>
          <th>pred_time_test</th>
          <th>pred_time_val</th>
          <th>fit_time</th>
          <th>pred_time_test_marginal</th>
          <th>pred_time_val_marginal</th>
          <th>fit_time_marginal</th>
          <th>stack_level</th>
          <th>can_infer</th>
          <th>fit_order</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>TextNeuralNetV1</td>
          <td>0.899529</td>
          <td>0.916230</td>
          <td>3.329292</td>
          <td>2.263031</td>
          <td>319.174685</td>
          <td>3.329292</td>
          <td>2.263031</td>
          <td>319.174685</td>
          <td>0</td>
          <td>True</td>
          <td>4</td>
        </tr>
        <tr>
          <th>1</th>
          <td>WeightedEnsemble_L1</td>
          <td>0.899529</td>
          <td>0.916230</td>
          <td>3.337787</td>
          <td>2.263554</td>
          <td>319.317756</td>
          <td>0.008495</td>
          <td>0.000524</td>
          <td>0.143071</td>
          <td>1</td>
          <td>True</td>
          <td>5</td>
        </tr>
        <tr>
          <th>2</th>
          <td>CatBoost</td>
          <td>0.886970</td>
          <td>0.872600</td>
          <td>0.015280</td>
          <td>0.015155</td>
          <td>0.900599</td>
          <td>0.015280</td>
          <td>0.015155</td>
          <td>0.900599</td>
          <td>0</td>
          <td>True</td>
          <td>3</td>
        </tr>
        <tr>
          <th>3</th>
          <td>LightGBM</td>
          <td>0.885400</td>
          <td>0.877836</td>
          <td>0.007218</td>
          <td>0.006019</td>
          <td>1.159134</td>
          <td>0.007218</td>
          <td>0.006019</td>
          <td>1.159134</td>
          <td>0</td>
          <td>True</td>
          <td>1</td>
        </tr>
        <tr>
          <th>4</th>
          <td>LightGBMXT</td>
          <td>0.868132</td>
          <td>0.856894</td>
          <td>0.013989</td>
          <td>0.017886</td>
          <td>1.365870</td>
          <td>0.013989</td>
          <td>0.017886</td>
          <td>1.365870</td>
          <td>0</td>
          <td>True</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    </div>


Major Takeaways
---------------

After performing these comparisons, we have the following takeaways:

-  The multimodal text neural network structure used in TextPrediction
   is a good for dealing with tabular data with text and categorical
   features.

-  K-fold bagging / stacking and weighted ensemble are helpful

-  We need a larger backbone. This aligns with the observation in recent
   papers, e.g., `Scaling Laws for Autoregressive Generative
   Modeling <https://arxiv.org/abs/2010.14701>`__.