Text Prediction - Customized Hyperparameter Search¶

This tutorial teaches you how to control the hyperparameter tuning process in TextPrediction by specifying:

A custom search space of candidate hyperparameter values to consider.
Which hyperparameter optimization algorithm should be used to actually search through this space.

import numpy as np
import warnings
warnings.filterwarnings('ignore')
np.random.seed(123)

Paraphrase Identification¶

We consider a Paraphrase Identification task for illustration. Given a pair of sentences, the goal is to predict whether or not one sentence is a restatement of the other (a binary classification task). Here we train models on the Microsoft Research Paraphrase Corpus dataset.

from autogluon.core.utils.loaders import load_pd

train_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/mrpc/train.parquet')
dev_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/mrpc/dev.parquet')
train_data.head(10)

	sentence1	sentence2	label
0	Amrozi accused his brother , whom he called " ...	Referring to him as only " the witness " , Amr...	1
1	Yucaipa owned Dominick 's before selling the c...	Yucaipa bought Dominick 's in 1995 for $ 693 m...	0
2	They had published an advertisement on the Int...	On June 10 , the ship 's owners had published ...	1
3	Around 0335 GMT , Tab shares were up 19 cents ...	Tab shares jumped 20 cents , or 4.6 % , to set...	0
4	The stock rose $ 2.11 , or about 11 percent , ...	PG & E Corp. shares jumped $ 1.63 or 8 percent...	1
5	Revenue in the first quarter of the year dropp...	With the scandal hanging over Stewart 's compa...	1
6	The Nasdaq had a weekly gain of 17.27 , or 1.2...	The tech-laced Nasdaq Composite .IXIC rallied ...	0
7	The DVD-CCA then appealed to the state Supreme...	The DVD CCA appealed that decision to the U.S....	1
8	That compared with $ 35.18 million , or 24 cen...	Earnings were affected by a non-recurring $ 8 ...	0
9	Shares of Genentech , a much larger company wi...	Shares of Xoma fell 16 percent in early trade ...	0

from autogluon_contrib_nlp.data.tokenizers import MosesTokenizer
tokenizer = MosesTokenizer('en')  # just used to display sentences
row_index = 2
print('Paraphrase example:')
print('Sentence1: ', tokenizer.decode(train_data['sentence1'][row_index].split()))
print('Sentence2: ', tokenizer.decode(train_data['sentence2'][row_index].split()))
print('Label: ', train_data['label'][row_index])

row_index = 3
print('\nNot Paraphrase example:')
print('Sentence1:', tokenizer.decode(train_data['sentence1'][row_index].split()))
print('Sentence2:', tokenizer.decode(train_data['sentence2'][row_index].split()))
print('Label:', train_data['label'][row_index])

Paraphrase example:
Sentence1:  They had published an advertisement on the Internet on June 10, offering the cargo for sale, he added.
Sentence2:  On June 10, the ship's owners had published an advertisement on the Internet, offering the explosives for sale.
Label:  1

Not Paraphrase example:
Sentence1: Around 0335 GMT, Tab shares were up 19 cents, or 4.4%, at A $4.56, having earlier set a record high of A $4.57.
Sentence2: Tab shares jumped 20 cents, or 4.6%, to set a record closing high at A $4.57.
Label: 0

Perform HPO over a Customized Search Space with Random Search¶

To control which hyperparameter values are considered during fit(), we specify the hyperparameters argument. Rather than specifying a particular fixed value for a hyperparameter, we can specify a space of values to search over via ag.space. We can also specify which HPO algorithm to use for the search via search_strategy (a simple random search is specified below). In this example, we search for good values of the following hyperparameters:

warmup
learning rate
dropout before the first task-specific layer
layer-wise learning rate decay
number of task-specific layers

import autogluon.core as ag
from autogluon.text import TextPrediction as task

hyperparameters = {
    'models': {
            'BertForTextPredictionBasic': {
                'search_space': {
                    'model.network.agg_net.num_layers': ag.space.Int(0, 3),
                    'model.network.agg_net.data_dropout': ag.space.Categorical(False, True),
                    'optimization.num_train_epochs': 4,
                    'optimization.warmup_portion': ag.space.Real(0.1, 0.2),
                    'optimization.layerwise_lr_decay': ag.space.Real(0.8, 1.0),
                    'optimization.lr': ag.space.Real(1E-5, 1E-4)
                }
            },
    },
    'hpo_params': {
        'search_strategy': 'random'  # perform HPO via simple random search
    }
}

We can now call fit() with hyperparameter-tuning over our custom search space. Below num_trials controls the maximal number of different hyperparameter configurations for which AutoGluon will train models (5 models are trained under different hyperparameter configurations in this case). To achieve good performance in your applications, you should use larger values of num_trials, which may identify superior hyperparameter values but will require longer runtimes.

predictor_mrpc = task.fit(train_data,
                          label='label',
                          hyperparameters=hyperparameters,
                          num_trials=5,  # increase this to achieve good performance in your applications
                          time_limits=60 * 6,
                          ngpus_per_trial=1,
                          seed=123,
                          output_directory='./ag_mrpc_random_search')

2020-11-30 15:50:05,591 - root - INFO - All Logs will be saved to ./ag_mrpc_random_search/ag_text_prediction.log
2020-11-30 15:50:05,608 - root - INFO - Train Dataset:
2020-11-30 15:50:05,608 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=2934/0
   length, min/avg/max=38/118.12/220
)
- Text(
   name="sentence2"
   #total/missing=2934/0
   length, min/avg/max=42/118.65/215
)
- Categorical(
   name="label"
   #total/missing=2934/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[962, 1972]
)


2020-11-30 15:50:05,608 - root - INFO - Tuning Dataset:
2020-11-30 15:50:05,609 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=734/0
   length, min/avg/max=38/119.92/226
)
- Text(
   name="sentence2"
   #total/missing=734/0
   length, min/avg/max=51/119.23/210
)
- Categorical(
   name="label"
   #total/missing=734/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[232, 502]
)


2020-11-30 15:50:05,609 - root - INFO - Label columns=['label'], Feature columns=['sentence1', 'sentence2'], Problem types=['classification'], Label shapes=[2]
2020-11-30 15:50:05,610 - root - INFO - Eval Metric=acc, Stop Metric=acc, Log Metrics=['f1', 'mcc', 'auc', 'acc', 'nll']

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5.0), HTML(value='')))

100%|██████████| 368/368 [01:16<00:00,  2.49it/s]
 84%|████████▍ | 309/368 [01:05<00:12,  4.71it/s]
 92%|█████████▏| 339/368 [01:12<00:06,  4.70it/s]
100%|██████████| 368/368 [01:17<00:00,  4.77it/s]

 73%|███████▎  | 269/368 [00:57<00:21,  4.70it/s]

We can again evaluate our model’s performance on separate test data.

dev_score = predictor_mrpc.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))

Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.num_layers': 2, 'search_space▁optimization.layerwise_lr_decay': 0.912355300099337, 'search_space▁optimization.lr': 7.14729072876085e-05, 'search_space▁optimization.warmup_portion': 0.19366799436501417}
Total Time = 387.745667219162s
Accuracy = 79.66%
F1 = 85.66%

And also use the model to predict whether new sentence pairs are paraphrases of each other or not.

sentence1 = 'It is simple to solve NLP problems with AutoGluon.'
sentence2 = 'With AutoGluon, it is easy to solve NLP problems.'
sentence3 = 'AutoGluon gives you a very bad user experience for solving NLP problems.'
prediction1 = predictor_mrpc.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))

A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.00844783 0.99155223]"

A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "False"
Prob = "[0.745728   0.25427192]"

Use Bayesian Optimization¶

Instead of random search, we can perform HPO via Bayesian Optimization. Here we specify bayesopt as the searcher.

hyperparameters['hpo_params'] = {
    'search_strategy': 'bayesopt'
}

predictor_mrpc_bo = task.fit(train_data, label='label',
                                hyperparameters=hyperparameters,
                                time_limits=60 * 6,
                                num_trials=5,  # increase this to get good performance in your applications
                                ngpus_per_trial=1, seed=123,
                                output_directory='./ag_mrpc_custom_space_fifo_bo')

2020-11-30 15:56:42,525 - root - INFO - All Logs will be saved to ./ag_mrpc_custom_space_fifo_bo/ag_text_prediction.log
2020-11-30 15:56:42,543 - root - INFO - Train Dataset:
2020-11-30 15:56:42,543 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=2934/0
   length, min/avg/max=38/118.34/220
)
- Text(
   name="sentence2"
   #total/missing=2934/0
   length, min/avg/max=43/118.83/215
)
- Categorical(
   name="label"
   #total/missing=2934/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[949, 1985]
)


2020-11-30 15:56:42,544 - root - INFO - Tuning Dataset:
2020-11-30 15:56:42,544 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=734/0
   length, min/avg/max=44/119.03/226
)
- Text(
   name="sentence2"
   #total/missing=734/0
   length, min/avg/max=42/118.54/210
)
- Categorical(
   name="label"
   #total/missing=734/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[245, 489]
)


2020-11-30 15:56:42,545 - root - INFO - Label columns=['label'], Feature columns=['sentence1', 'sentence2'], Problem types=['classification'], Label shapes=[2]
2020-11-30 15:56:42,545 - root - INFO - Eval Metric=acc, Stop Metric=acc, Log Metrics=['f1', 'mcc', 'auc', 'acc', 'nll']

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5.0), HTML(value='')))

 92%|█████████▏| 339/368 [01:11<00:06,  4.77it/s]
100%|██████████| 368/368 [01:17<00:00,  4.74it/s]
100%|██████████| 368/368 [01:15<00:00,  4.85it/s]
 89%|████████▉ | 329/368 [01:08<00:08,  4.80it/s]

 73%|███████▎  | 269/368 [00:56<00:20,  4.75it/s]

dev_score = predictor_mrpc_bo.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc_bo.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc_bo.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))

Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.num_layers': 0, 'search_space▁optimization.layerwise_lr_decay': 0.8964584420622498, 'search_space▁optimization.lr': 4.356400129415584e-05, 'search_space▁optimization.warmup_portion': 0.17596939190261784}
Total Time = 388.13313579559326s
Accuracy = 81.62%
F1 = 86.87%

predictions = predictor_mrpc_bo.predict(dev_data)
prediction1 = predictor_mrpc_bo.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc_bo.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc_bo.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc_bo.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))

A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.00999034 0.9900096 ]"

A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "True"
Prob = "[0.48341402 0.516586  ]"

Use Hyperband¶

Alternatively, we can instead use the Hyperband algorithm for HPO. Hyperband will try multiple hyperparameter configurations simultaneously and will early stop training under poor configurations to free compute resources for exploring new hyperparameter configurations. It may be able to identify good hyperparameter values more quickly than other search strategies in your applications.

scheduler_options = {'max_t': 40}  # Maximal number of epochs for training the neural network
hyperparameters['hpo_params'] = {
    'search_strategy': 'hyperband',
    'scheduler_options': scheduler_options
}

predictor_mrpc_hyperband = task.fit(train_data, label='label',
                                    hyperparameters=hyperparameters,
                                    time_limits=60 * 6, ngpus_per_trial=1, seed=123,
                                    output_directory='./ag_mrpc_custom_space_hyperband')

2020-11-30 16:03:28,662 - root - INFO - All Logs will be saved to ./ag_mrpc_custom_space_hyperband/ag_text_prediction.log
2020-11-30 16:03:28,680 - root - INFO - Train Dataset:
2020-11-30 16:03:28,681 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=2934/0
   length, min/avg/max=38/118.51/226
)
- Text(
   name="sentence2"
   #total/missing=2934/0
   length, min/avg/max=42/118.93/215
)
- Categorical(
   name="label"
   #total/missing=2934/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[951, 1983]
)


2020-11-30 16:03:28,682 - root - INFO - Tuning Dataset:
2020-11-30 16:03:28,682 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=734/0
   length, min/avg/max=40/118.38/219
)
- Text(
   name="sentence2"
   #total/missing=734/0
   length, min/avg/max=46/118.14/207
)
- Categorical(
   name="label"
   #total/missing=734/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[243, 491]
)


2020-11-30 16:03:28,683 - root - INFO - Label columns=['label'], Feature columns=['sentence1', 'sentence2'], Problem types=['classification'], Label shapes=[2]
2020-11-30 16:03:28,683 - root - INFO - Eval Metric=acc, Stop Metric=acc, Log Metrics=['f1', 'mcc', 'auc', 'acc', 'nll']

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=4.0), HTML(value='')))

100%|██████████| 368/368 [01:17<00:00,  4.75it/s]
 79%|███████▊  | 289/368 [01:01<00:16,  4.72it/s]
 27%|██▋       | 99/368 [00:21<00:59,  4.52it/s]

 27%|██▋       | 99/368 [00:22<01:01,  4.36it/s]

dev_score = predictor_mrpc_hyperband.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc_hyperband.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc_hyperband.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))

Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.num_layers': 2, 'search_space▁optimization.layerwise_lr_decay': 0.9, 'search_space▁optimization.lr': 5.5e-05, 'search_space▁optimization.warmup_portion': 0.15}
Total Time = 213.23019814491272s
Accuracy = 85.29%
F1 = 89.58%

predictions = predictor_mrpc_hyperband.predict(dev_data)
prediction1 = predictor_mrpc_hyperband.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc_hyperband.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc_hyperband.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc_hyperband.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))

A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.01037684 0.9896231 ]"

A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "False"
Prob = "[0.72828925 0.27171075]"

Use Hyperband together with Bayesian Optimization¶

Finally, we can use a combination of Hyperband and Bayesian Optimization.

scheduler_options = {'max_t': 40}
hyperparameters['hpo_params'] = {
    'search_strategy': 'bayesopt_hyperband',
    'scheduler_options': scheduler_options
}

predictor_mrpc_bohb = task.fit(
    train_data, label='label',
    hyperparameters=hyperparameters,
    time_limits=60 * 6, ngpus_per_trial=1, seed=123,
    output_directory='./ag_mrpc_custom_space_bohb')

2020-11-30 16:07:19,893 - root - INFO - All Logs will be saved to ./ag_mrpc_custom_space_bohb/ag_text_prediction.log
2020-11-30 16:07:19,910 - root - INFO - Train Dataset:
2020-11-30 16:07:19,910 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=2934/0
   length, min/avg/max=38/118.40/226
)
- Text(
   name="sentence2"
   #total/missing=2934/0
   length, min/avg/max=42/118.91/210
)
- Categorical(
   name="label"
   #total/missing=2934/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[957, 1977]
)


2020-11-30 16:07:19,911 - root - INFO - Tuning Dataset:
2020-11-30 16:07:19,911 - root - INFO - Columns:

- Text(
   name="sentence1"
   #total/missing=734/0
   length, min/avg/max=44/118.81/219
)
- Text(
   name="sentence2"
   #total/missing=734/0
   length, min/avg/max=46/118.19/215
)
- Categorical(
   name="label"
   #total/missing=734/0
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[237, 497]
)


2020-11-30 16:07:19,912 - root - INFO - Label columns=['label'], Feature columns=['sentence1', 'sentence2'], Problem types=['classification'], Label shapes=[2]
2020-11-30 16:07:19,912 - root - INFO - Eval Metric=acc, Stop Metric=acc, Log Metrics=['f1', 'mcc', 'auc', 'acc', 'nll']

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=4.0), HTML(value='')))

100%|██████████| 368/368 [01:18<00:00,  4.69it/s]
100%|██████████| 368/368 [01:17<00:00,  4.72it/s]
 27%|██▋       | 99/368 [00:22<01:00,  4.45it/s]

 27%|██▋       | 99/368 [00:22<01:01,  4.37it/s]

dev_score = predictor_mrpc_bohb.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc_bohb.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc_bohb.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))

Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.num_layers': 2, 'search_space▁optimization.layerwise_lr_decay': 0.9, 'search_space▁optimization.lr': 5.5e-05, 'search_space▁optimization.warmup_portion': 0.15}
Total Time = 231.7855942249298s
Accuracy = 84.07%
F1 = 88.89%

predictions = predictor_mrpc_bohb.predict(dev_data)
prediction1 = predictor_mrpc_bohb.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc_bohb.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc_bohb.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc_bohb.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))

A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.00511682 0.9948832 ]"

A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "False"
Prob = "[0.71323025 0.28676972]"