.. _sec_textquick:

Text Classification - Quick Start
=================================


Note: ``TextClassification`` is in preview mode and is not feature
complete. While the tutorial described below is functional, using
``TextClassification`` on custom datasets is not yet supported. For an
alternative, text data can be passed to ``TabularPrediction`` in tabular
format which has text feature support.

We adopt the task of Text Classification as a running example to
illustrate basic usage of AutoGluon’s NLP capability.

The AutoGluon Text functionality depends on the
`GluonNLP <https://gluon-nlp.mxnet.io/>`__ package. Thus, in order to
use AutoGluon-Text, you will need to install GluonNLP via
``pip install gluonnlp==0.8.1``

In this tutorial, we are using sentiment analysis as a text
classification example. We will load sentences and the corresponding
labels (sentiment) into AutoGluon and use this data to obtain a neural
network that can classify new sentences. Different from traditional
machine learning where we need to manually define the neural network,
and specify the hyperparameters in the training process, with just a
single call to ``AutoGluon``'s ``fit`` function, AutoGluon will
automatically train many models under thousands of different
hyperparameter configurations and then return the best model.

We begin by specifying ``TextClassification`` as our task of interest:

.. code:: python

    import autogluon as ag
    from autogluon import TextClassification as task

Create AutoGluon Dataset
------------------------

We are using a subset of the Stanford Sentiment Treebank
(`SST <https://nlp.stanford.edu/sentiment/>`__). The original dataset
consists of sentences from movie reviews and human annotations of their
sentiment. The task is to classify whether a given sentence has positive
or negative sentiment (binary classification).

.. code:: python

    dataset = task.Dataset(name='ToySST')

In the above call, we have the proper train/validation/test split of the
SST dataset.

Use AutoGluon to fit Models
---------------------------

Now, we want to obtain a neural network classifier using AutoGluon. In
the default configuration, rather than attempting to train complex
models from scratch using our data, AutoGluon fine-tunes neural networks
that have already been pretrained on large scale text dataset such as
Wikicorpus. Although the dataset involves entirely different text,
lower-level features captured in the representations of the pretrained
network (such as edge/texture detectors) are likely to remain useful for
our own text dataset.

While we primarily stick with default configurations in this Beginner
tutorial, the Advanced tutorial covers various options that you can
specify for greater control over the training process. With just a
single call to AutoGluon's ``fit`` function, AutoGluon will train many
models with different hyperparameter configurations and return the best
model.

However, neural network training can be quite time-costly. To ensure
quick runtimes, we tell AutoGluon to obey strict limits: ``epochs``
specifies how much computational effort can be devoted to training any
single network, while ``time_limits`` in seconds specifies how much time
``fit`` has to return a model (more precisely, training runs are started
as long as ``time_limits`` is not reached). For demo purposes, we
specify only small values for ``time_limits``, ``epochs``:

.. code:: python

    predictor = task.fit(dataset, epochs=1, time_limits=30)


.. parsed-literal::
    :class: output

    `TextClassification` is in preview mode.Please feel free to request new features in issues if it is not covered in the current implementation. If your dataset is in tabular format, you could also try out our `TabularPrediction` module.
    scheduler_options: Key 'training_history_callback_delta_secs': Imputing default value 60
    scheduler_options: Key 'delay_get_config': Imputing default value True
    
    Starting Experiments
    Num of Finished Tasks is 0
    Time out (secs) is 30


.. parsed-literal::
    :class: output

    scheduler: FIFOScheduler(
    DistributedResourceManager{
    (Remote: Remote REMOTE_ID: 0, 
    	<Remote: 'inproc://172.31.41.232/31162/1' processes=1 threads=8, memory=33.24 GB>, Resource: NodeResourceManager(8 CPUs, 1 GPUs))
    })
    

.. parsed-literal::
    :class: output

    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
      Optimizer.opt_registry[name].__name__))
    Using gradient accumulation. Effective batch size = batch_size * accumulate = 32
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/gluonnlp/data/sampler.py:354: UserWarning: Some buckets are empty and will be removed. Unused bucket keys=[59]
      str(unused_bucket_keys))
    100%|██████████| 26/26 [00:14<00:00,  1.78it/s]
    100%|██████████| 1/1 [00:00<00:00,  5.28it/s]
    validation metrics:accuracy:0.6250
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
      Optimizer.opt_registry[name].__name__))
    Using gradient accumulation. Effective batch size = batch_size * accumulate = 32
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/gluonnlp/data/sampler.py:354: UserWarning: Some buckets are empty and will be removed. Unused bucket keys=[59]
      str(unused_bucket_keys))
    100%|██████████| 26/26 [00:14<00:00,  1.78it/s]
    100%|██████████| 1/1 [00:00<00:00,  6.19it/s]
    validation metrics:accuracy:0.7500
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
      Optimizer.opt_registry[name].__name__))
    Using gradient accumulation. Effective batch size = batch_size * accumulate = 32
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/gluonnlp/data/sampler.py:354: UserWarning: Some buckets are empty and will be removed. Unused bucket keys=[59]
      str(unused_bucket_keys))
    100%|██████████| 26/26 [00:14<00:00,  1.76it/s]
    100%|██████████| 1/1 [00:00<00:00,  5.86it/s]
    validation metrics:accuracy:0.6250
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
      Optimizer.opt_registry[name].__name__))
    Using gradient accumulation. Effective batch size = batch_size * accumulate = 32
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/gluonnlp/data/sampler.py:354: UserWarning: Some buckets are empty and will be removed. Unused bucket keys=[59]
      str(unused_bucket_keys))
    100%|██████████| 26/26 [00:14<00:00,  1.74it/s]
    100%|██████████| 1/1 [00:00<00:00,  6.07it/s]
    100%|██████████| 1/1 [00:00<00:00,  6.03it/s]validation metrics:accuracy:0.7500
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
      Optimizer.opt_registry[name].__name__))
    /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
      Optimizer.opt_registry[name].__name__))


Within ``fit``, the model with the best hyperparameter configuration is
selected based on its validation accuracy after being trained on the
data in the training split.

The best Top-1 accuracy achieved on the validation set is:

.. code:: python

    print('Top-1 val acc: %.3f' % predictor.results['best_reward'])


.. parsed-literal::
    :class: output

    Top-1 val acc: 0.750


Within ``fit``, this model is also finally fitted on our entire dataset
(i.e., merging training+validation) using the same optimal
hyperparameter configuration. The resulting model is considered as final
model to be applied to classify new text.

We now construct a test dataset similarly as we did with the train
dataset, and then ``evaluate`` the final model produced by ``fit`` on
the test data:

.. code:: python

    test_acc = predictor.evaluate(dataset)
    print('Top-1 test acc: %.3f' % test_acc)


.. parsed-literal::
    :class: output

    Top-1 test acc: 0.750


Given an example sentence, we can easily use the final model to
``predict`` the label (and the conditional class-probability):

.. code:: python

    sentence = 'I feel this is awesome!'
    ind = predictor.predict(sentence)
    print('The input sentence sentiment is classified as [%d].' % ind.asscalar())


.. parsed-literal::
    :class: output

    The input sentence sentiment is classified as [1].


The ``results`` object returned by ``fit`` contains summaries describing
various aspects of the training process. For example, we can inspect the
best hyperparameter configuration corresponding to the final model which
achieved the above (best) results:

.. code:: python

    print('The best configuration is:')
    print(predictor.results['best_config'])


.. parsed-literal::
    :class: output

    The best configuration is:
    {'lr': 4.442672673345054e-05, 'net.choice': 0, 'pretrained_dataset.choice': 0}