.. _sec_textquick: Text Classification - Quick Start ================================= Note: ``TextClassification`` is in preview mode and is not feature complete. While the tutorial described below is functional, using ``TextClassification`` on custom datasets is not yet supported. For an alternative, text data can be passed to ``TabularPrediction`` in tabular format which has text feature support. We adopt the task of Text Classification as a running example to illustrate basic usage of AutoGluon’s NLP capability. The AutoGluon Text functionality depends on the `GluonNLP `__ package. Thus, in order to use AutoGluon-Text, you will need to install GluonNLP via ``pip install gluonnlp==0.8.1`` In this tutorial, we are using sentiment analysis as a text classification example. We will load sentences and the corresponding labels (sentiment) into AutoGluon and use this data to obtain a neural network that can classify new sentences. Different from traditional machine learning where we need to manually define the neural network, and specify the hyperparameters in the training process, with just a single call to ``AutoGluon``'s ``fit`` function, AutoGluon will automatically train many models under thousands of different hyperparameter configurations and then return the best model. We begin by specifying ``TextClassification`` as our task of interest: .. code:: python import autogluon as ag from autogluon import TextClassification as task Create AutoGluon Dataset ------------------------ We are using a subset of the Stanford Sentiment Treebank (`SST `__). The original dataset consists of sentences from movie reviews and human annotations of their sentiment. The task is to classify whether a given sentence has positive or negative sentiment (binary classification). .. code:: python dataset = task.Dataset(name='ToySST') In the above call, we have the proper train/validation/test split of the SST dataset. Use AutoGluon to fit Models --------------------------- Now, we want to obtain a neural network classifier using AutoGluon. In the default configuration, rather than attempting to train complex models from scratch using our data, AutoGluon fine-tunes neural networks that have already been pretrained on large scale text dataset such as Wikicorpus. Although the dataset involves entirely different text, lower-level features captured in the representations of the pretrained network (such as edge/texture detectors) are likely to remain useful for our own text dataset. While we primarily stick with default configurations in this Beginner tutorial, the Advanced tutorial covers various options that you can specify for greater control over the training process. With just a single call to AutoGluon's ``fit`` function, AutoGluon will train many models with different hyperparameter configurations and return the best model. However, neural network training can be quite time-costly. To ensure quick runtimes, we tell AutoGluon to obey strict limits: ``epochs`` specifies how much computational effort can be devoted to training any single network, while ``time_limits`` in seconds specifies how much time ``fit`` has to return a model (more precisely, training runs are started as long as ``time_limits`` is not reached). For demo purposes, we specify only small values for ``time_limits``, ``epochs``: .. code:: python predictor = task.fit(dataset, epochs=1, time_limits=30) .. parsed-literal:: :class: output `TextClassification` is in preview mode.Please feel free to request new features in issues if it is not covered in the current implementation. If your dataset is in tabular format, you could also try out our `TabularPrediction` module. scheduler_options: Key 'training_history_callback_delta_secs': Imputing default value 60 scheduler_options: Key 'delay_get_config': Imputing default value True Starting Experiments Num of Finished Tasks is 0 Time out (secs) is 30 .. parsed-literal:: :class: output scheduler: FIFOScheduler( DistributedResourceManager{ (Remote: Remote REMOTE_ID: 0, , Resource: NodeResourceManager(8 CPUs, 1 GPUs)) }) .. parsed-literal:: :class: output /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB Optimizer.opt_registry[name].__name__)) Using gradient accumulation. Effective batch size = batch_size * accumulate = 32 100%|██████████| 27/27 [00:14<00:00, 1.83it/s] 100%|██████████| 1/1 [00:00<00:00, 5.97it/s] validation metrics:accuracy:0.3750 /var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB Optimizer.opt_registry[name].__name__)) Using gradient accumulation. Effective batch size = batch_size * accumulate = 32 0%| | 0/1 [00:00