Text Prediction - Customized Hyperparameter Search¶
This tutorial teaches you how to control the hyperparameter tuning
process in TextPrediction
by specifying:
A custom search space of candidate hyperparameter values to consider.
Which hyperparameter optimization algorithm should be used to actually search through this space.
import numpy as np
import warnings
warnings.filterwarnings('ignore')
np.random.seed(123)
Paraphrase Identification¶
We consider a Paraphrase Identification task for illustration. Given a pair of sentences, the goal is to predict whether or not one sentence is a restatement of the other (a binary classification task). Here we train models on the Microsoft Research Paraphrase Corpus dataset. For quick demonstration, we will subsample the training data and only use 800 samples.
from autogluon.core.utils.loaders import load_pd
train_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/mrpc/train.parquet')
dev_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/mrpc/dev.parquet')
rand_idx = np.random.permutation(np.arange(len(train_data)))[:800]
train_data = train_data.iloc[rand_idx]
train_data.reset_index(inplace=True, drop=True)
train_data.head(10)
sentence1 | sentence2 | label | |
---|---|---|---|
0 | Altria shares fell 2.2 percent or 96 cents to ... | Its shares fell $ 9.61 to $ 50.26 , ranking as... | 1 |
1 | One of the Commission 's publicly stated goals... | The Commission has publicly said one of its go... | 1 |
2 | " I don 't think my brain is going to go dead ... | In a conference call yesterday , he said , " I... | 1 |
3 | " This will put a severe crimp in our reserves... | " This is going to put a severe crimp in our r... | 1 |
4 | The Dow Jones industrials climbed more than 14... | The Dow Jones industrials briefly surpassed th... | 1 |
5 | Massachusetts is one of 12 states that does no... | Massachusetts is one of 12 states without the ... | 1 |
6 | Mr. Geoghan had been living in a protective cu... | He had been in protective custody since being ... | 1 |
7 | Since December 2002 , Evans has been the vice ... | Evans is also the vice-chairman of the Federal... | 1 |
8 | Business groups and border cities have raised ... | Business groups and border cities have raised ... | 1 |
9 | A member of the chart-topping collective So So... | A member of the rap group So Solid Crew threw ... | 1 |
from autogluon_contrib_nlp.data.tokenizers import MosesTokenizer
tokenizer = MosesTokenizer('en') # just used to display sentences
row_index = 2
print('Paraphrase example:')
print('Sentence1: ', tokenizer.decode(train_data['sentence1'][row_index].split()))
print('Sentence2: ', tokenizer.decode(train_data['sentence2'][row_index].split()))
print('Label: ', train_data['label'][row_index])
row_index = 3
print('\nNot Paraphrase example:')
print('Sentence1:', tokenizer.decode(train_data['sentence1'][row_index].split()))
print('Sentence2:', tokenizer.decode(train_data['sentence2'][row_index].split()))
print('Label:', train_data['label'][row_index])
Paraphrase example:
Sentence1: "I don't think my brain is going to go dead this afternoon or next week," he said.
Sentence2: In a conference call yesterday, he said, "I don't think that my brain is going to go dead this afternoon or next week."
Label: 1
Not Paraphrase example:
Sentence1: "This will put a severe crimp in our reserves," O'Keefe said Friday during a roundtable discussion with reporters at NASA headquarters.
Sentence2: "This is going to put a severe crimp in our reserves," O'Keefe said during a breakfast with reporters.
Label: 1
Perform HPO over a Customized Search Space with Random Search¶
To control which hyperparameter values are considered during fit()
,
we specify the hyperparameters
argument. Rather than specifying a
particular fixed value for a hyperparameter, we can specify a space of
values to search over via ag.space
. We can also specify which HPO
algorithm to use for the search via search_strategy
(a simple
random
search
is specified below). In this example, we search for good values of the
following hyperparameters:
warmup
learning rate
dropout before the first task-specific layer
layer-wise learning rate decay
number of task-specific layers
import autogluon.core as ag
from autogluon.text import TextPrediction as task
hyperparameters = {
'models': {
'BertForTextPredictionBasic': {
'search_space': {
'model.network.agg_net.mid_units': ag.space.Int(32, 128),
'model.network.agg_net.data_dropout': ag.space.Categorical(False, True),
'optimization.num_train_epochs': 4,
'optimization.warmup_portion': ag.space.Real(0.1, 0.2),
'optimization.layerwise_lr_decay': ag.space.Real(0.8, 1.0),
'optimization.lr': ag.space.Real(1E-5, 1E-4)
}
},
},
'hpo_params': {
'search_strategy': 'random' # perform HPO via simple random search
}
}
We can now call fit()
with hyperparameter-tuning over our custom
search space. Below num_trials
controls the maximal number of
different hyperparameter configurations for which AutoGluon will train
models (5 models are trained under different hyperparameter
configurations in this case). To achieve good performance in your
applications, you should use larger values of num_trials
, which may
identify superior hyperparameter values but will require longer
runtimes.
predictor_mrpc = task.fit(train_data,
label='label',
hyperparameters=hyperparameters,
num_trials=2, # increase this to achieve good performance in your applications
time_limits=60 * 2,
ngpus_per_trial=1,
seed=123,
output_directory='./ag_mrpc_random_search')
2021-02-23 19:27:23,233 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_random_search/ag_text_prediction.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_random_search/ag_text_prediction.log
2021-02-23 19:27:23,248 - autogluon.text.text_prediction.text_prediction - INFO - Train Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Train Dataset:
2021-02-23 19:27:23,249 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=44/116.909375/200
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=42/117.7421875/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[208, 432]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=44/116.909375/200
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=42/117.7421875/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[208, 432]
)
2021-02-23 19:27:23,250 - autogluon.text.text_prediction.text_prediction - INFO - Tuning Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Tuning Dataset:
2021-02-23 19:27:23,251 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=54/119.2625/195
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=50/118.3125/199
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[60, 100]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=54/119.2625/195
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=50/118.3125/199
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[60, 100]
)
WARNING:autogluon.core.utils.multiprocessing_utils:WARNING: changing multiprocessing start method to forkserver
2021-02-23 19:27:23,258 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_random_search/main.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_random_search/main.log
0%| | 0/2 [00:00<?, ?it/s]
(task:0) 2021-02-23 19:27:27,069 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_random_search/task0/training.log
2021-02-23 19:27:27,069 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_random_search/task0
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: False
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 80
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.9
log_frequency: 0.1
lr: 5.5e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.15
wd: 0.01
version: 1
2021-02-23 19:27:27,221 - root - INFO - Process training set...
2021-02-23 19:27:27,483 - root - INFO - Done!
2021-02-23 19:27:27,483 - root - INFO - Process dev set...
2021-02-23 19:27:27,555 - root - INFO - Done!
2021-02-23 19:27:32,839 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:27:32,855 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:27:33,756 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.8817e-01, gnorm=6.7327e+00, lr=9.1667e-06, #samples processed=96, #sample per second=111.22
2021-02-23 19:27:34,029 - root - INFO - [Iter 2/80, Epoch 0] valid f1=7.6448e-01, mcc=-6.1430e-02, roc_auc=4.5100e-01, accuracy=6.1875e-01, log_loss=7.4747e-01, time spent=0.190s, total_time=0.02min
2021-02-23 19:27:34,376 - root - INFO - [Iter 4/80, Epoch 0] train loss=4.5925e-01, gnorm=5.9951e+00, lr=1.8333e-05, #samples processed=96, #sample per second=154.96
2021-02-23 19:27:34,550 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.4194e-01, mcc=-2.4507e-02, roc_auc=5.0567e-01, accuracy=6.0000e-01, log_loss=6.9279e-01, time spent=0.174s, total_time=0.03min
2021-02-23 19:27:34,854 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.7326e-01, gnorm=9.1130e+00, lr=2.7500e-05, #samples processed=96, #sample per second=200.84
2021-02-23 19:27:35,168 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.6562e-01, mcc=4.1345e-02, roc_auc=6.0150e-01, accuracy=6.2500e-01, log_loss=6.9605e-01, time spent=0.173s, total_time=0.04min
2021-02-23 19:27:35,488 - root - INFO - [Iter 8/80, Epoch 0] train loss=5.3341e-01, gnorm=7.2195e+00, lr=3.6667e-05, #samples processed=96, #sample per second=151.48
2021-02-23 19:27:35,830 - root - INFO - [Iter 8/80, Epoch 0] valid f1=7.6923e-01, mcc=2.1872e-01, roc_auc=6.6150e-01, accuracy=6.6250e-01, log_loss=6.3399e-01, time spent=0.175s, total_time=0.05min
2021-02-23 19:27:36,123 - root - INFO - [Iter 10/80, Epoch 0] train loss=3.7414e-01, gnorm=8.9824e+00, lr=4.5833e-05, #samples processed=96, #sample per second=151.28
2021-02-23 19:27:36,423 - root - INFO - [Iter 10/80, Epoch 0] valid f1=7.7570e-01, mcc=3.3516e-01, roc_auc=7.0300e-01, accuracy=7.0000e-01, log_loss=6.0978e-01, time spent=0.177s, total_time=0.06min
2021-02-23 19:27:36,651 - root - INFO - [Iter 12/80, Epoch 0] train loss=4.2719e-01, gnorm=6.7945e+00, lr=5.5000e-05, #samples processed=96, #sample per second=181.74
2021-02-23 19:27:36,981 - root - INFO - [Iter 12/80, Epoch 0] valid f1=7.9070e-01, mcc=3.7687e-01, roc_auc=7.2683e-01, accuracy=7.1875e-01, log_loss=5.9225e-01, time spent=0.176s, total_time=0.07min
2021-02-23 19:27:37,317 - root - INFO - [Iter 14/80, Epoch 0] train loss=4.0578e-01, gnorm=4.5154e+00, lr=5.3382e-05, #samples processed=96, #sample per second=144.19
2021-02-23 19:27:37,491 - root - INFO - [Iter 14/80, Epoch 0] valid f1=7.7291e-01, mcc=1.4708e-01, roc_auc=7.5417e-01, accuracy=6.4375e-01, log_loss=7.2812e-01, time spent=0.174s, total_time=0.08min
2021-02-23 19:27:37,739 - root - INFO - [Iter 16/80, Epoch 0] train loss=2.9886e-01, gnorm=5.1645e+00, lr=5.1765e-05, #samples processed=96, #sample per second=227.27
2021-02-23 19:27:37,913 - root - INFO - [Iter 16/80, Epoch 0] valid f1=7.9654e-01, mcc=3.3932e-01, roc_auc=7.6250e-01, accuracy=7.0625e-01, log_loss=6.3610e-01, time spent=0.174s, total_time=0.08min
2021-02-23 19:27:38,192 - root - INFO - [Iter 18/80, Epoch 0] train loss=3.3882e-01, gnorm=1.4434e+01, lr=5.0147e-05, #samples processed=96, #sample per second=212.22
2021-02-23 19:27:38,505 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.0365e-01, mcc=4.0292e-01, roc_auc=7.6300e-01, accuracy=7.3125e-01, log_loss=5.8838e-01, time spent=0.174s, total_time=0.09min
2021-02-23 19:27:38,835 - root - INFO - [Iter 20/80, Epoch 0] train loss=3.7979e-01, gnorm=7.5863e+00, lr=4.8529e-05, #samples processed=96, #sample per second=149.42
2021-02-23 19:27:39,010 - root - INFO - [Iter 20/80, Epoch 0] valid f1=7.2222e-01, mcc=3.8730e-01, roc_auc=7.5767e-01, accuracy=6.8750e-01, log_loss=6.3206e-01, time spent=0.175s, total_time=0.10min
2021-02-23 19:27:39,262 - root - INFO - [Iter 22/80, Epoch 1] train loss=4.2116e-01, gnorm=8.4340e+00, lr=4.6912e-05, #samples processed=96, #sample per second=224.67
2021-02-23 19:27:39,438 - root - INFO - [Iter 22/80, Epoch 1] valid f1=7.8505e-01, mcc=3.6368e-01, roc_auc=7.6600e-01, accuracy=7.1250e-01, log_loss=5.7421e-01, time spent=0.175s, total_time=0.11min
2021-02-23 19:27:39,682 - root - INFO - [Iter 24/80, Epoch 1] train loss=2.8027e-01, gnorm=3.4579e+00, lr=4.5294e-05, #samples processed=96, #sample per second=228.73
2021-02-23 19:27:39,856 - root - INFO - [Iter 24/80, Epoch 1] valid f1=7.7043e-01, mcc=8.3280e-02, roc_auc=7.8583e-01, accuracy=6.3125e-01, log_loss=8.6803e-01, time spent=0.174s, total_time=0.12min
2021-02-23 19:27:40,098 - root - INFO - [Iter 26/80, Epoch 1] train loss=5.0928e-01, gnorm=6.5005e+00, lr=4.3676e-05, #samples processed=96, #sample per second=231.00
2021-02-23 19:27:40,275 - root - INFO - [Iter 26/80, Epoch 1] valid f1=7.7519e-01, mcc=1.4525e-01, roc_auc=7.9150e-01, accuracy=6.3750e-01, log_loss=8.9179e-01, time spent=0.177s, total_time=0.12min
2021-02-23 19:27:40,588 - root - INFO - [Iter 28/80, Epoch 1] train loss=5.3130e-01, gnorm=1.0276e+01, lr=4.2059e-05, #samples processed=96, #sample per second=195.98
2021-02-23 19:27:40,765 - root - INFO - [Iter 28/80, Epoch 1] valid f1=7.9149e-01, mcc=3.0667e-01, roc_auc=7.8533e-01, accuracy=6.9375e-01, log_loss=5.8727e-01, time spent=0.177s, total_time=0.13min
2021-02-23 19:27:41,015 - root - INFO - [Iter 30/80, Epoch 1] train loss=2.6975e-01, gnorm=4.3352e+00, lr=4.0441e-05, #samples processed=96, #sample per second=224.77
2021-02-23 19:27:41,325 - root - INFO - [Iter 30/80, Epoch 1] valid f1=7.8571e-01, mcc=4.4799e-01, roc_auc=7.7850e-01, accuracy=7.3750e-01, log_loss=5.4732e-01, time spent=0.175s, total_time=0.14min
2021-02-23 19:27:41,576 - root - INFO - [Iter 32/80, Epoch 1] train loss=3.5379e-01, gnorm=3.4492e+00, lr=3.8824e-05, #samples processed=96, #sample per second=171.09
2021-02-23 19:27:41,754 - root - INFO - [Iter 32/80, Epoch 1] valid f1=7.6503e-01, mcc=4.6831e-01, roc_auc=7.8350e-01, accuracy=7.3125e-01, log_loss=5.6594e-01, time spent=0.177s, total_time=0.15min
2021-02-23 19:27:41,998 - root - INFO - [Iter 34/80, Epoch 1] train loss=3.2732e-01, gnorm=3.2378e+00, lr=3.7206e-05, #samples processed=96, #sample per second=227.60
2021-02-23 19:27:42,308 - root - INFO - [Iter 34/80, Epoch 1] valid f1=8.0383e-01, mcc=4.3980e-01, roc_auc=7.9400e-01, accuracy=7.4375e-01, log_loss=5.2334e-01, time spent=0.174s, total_time=0.16min
2021-02-23 19:27:42,542 - root - INFO - [Iter 36/80, Epoch 1] train loss=3.6090e-01, gnorm=5.8467e+00, lr=3.5588e-05, #samples processed=96, #sample per second=176.62
2021-02-23 19:27:42,719 - root - INFO - [Iter 36/80, Epoch 1] valid f1=7.9464e-01, mcc=3.5553e-01, roc_auc=8.0350e-01, accuracy=7.1250e-01, log_loss=5.4078e-01, time spent=0.177s, total_time=0.16min
2021-02-23 19:27:42,975 - root - INFO - [Iter 38/80, Epoch 1] train loss=3.3077e-01, gnorm=9.6598e+00, lr=3.3971e-05, #samples processed=96, #sample per second=221.85
2021-02-23 19:27:43,153 - root - INFO - [Iter 38/80, Epoch 1] valid f1=7.9821e-01, mcc=3.7126e-01, roc_auc=7.9967e-01, accuracy=7.1875e-01, log_loss=5.4245e-01, time spent=0.178s, total_time=0.17min
2021-02-23 19:27:43,391 - root - INFO - [Iter 40/80, Epoch 1] train loss=2.4521e-01, gnorm=3.6837e+00, lr=3.2353e-05, #samples processed=96, #sample per second=230.84
2021-02-23 19:27:43,699 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.0751e-01, mcc=4.3578e-01, roc_auc=7.9317e-01, accuracy=7.4375e-01, log_loss=5.3365e-01, time spent=0.177s, total_time=0.18min
2021-02-23 19:27:43,939 - root - INFO - [Iter 42/80, Epoch 2] train loss=3.1737e-01, gnorm=5.5844e+00, lr=3.0735e-05, #samples processed=96, #sample per second=175.04
2021-02-23 19:27:44,115 - root - INFO - [Iter 42/80, Epoch 2] valid f1=8.0180e-01, mcc=3.8680e-01, roc_auc=7.9483e-01, accuracy=7.2500e-01, log_loss=5.5859e-01, time spent=0.175s, total_time=0.19min
2021-02-23 19:27:44,379 - root - INFO - [Iter 44/80, Epoch 2] train loss=1.9825e-01, gnorm=3.3499e+00, lr=2.9118e-05, #samples processed=96, #sample per second=218.36
2021-02-23 19:27:44,557 - root - INFO - [Iter 44/80, Epoch 2] valid f1=8.1223e-01, mcc=4.0422e-01, roc_auc=7.9467e-01, accuracy=7.3125e-01, log_loss=6.2161e-01, time spent=0.177s, total_time=0.19min
2021-02-23 19:27:44,798 - root - INFO - [Iter 46/80, Epoch 2] train loss=4.0163e-01, gnorm=1.0859e+01, lr=2.7500e-05, #samples processed=96, #sample per second=229.36
2021-02-23 19:27:44,974 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.0702e-01, mcc=3.8730e-01, roc_auc=7.8900e-01, accuracy=7.2500e-01, log_loss=6.2400e-01, time spent=0.176s, total_time=0.20min
2021-02-23 19:27:45,216 - root - INFO - [Iter 48/80, Epoch 2] train loss=2.9999e-01, gnorm=3.8524e+00, lr=2.5882e-05, #samples processed=96, #sample per second=229.87
2021-02-23 19:27:45,393 - root - INFO - [Iter 48/80, Epoch 2] valid f1=7.9812e-01, mcc=4.0744e-01, roc_auc=7.8017e-01, accuracy=7.3125e-01, log_loss=5.8893e-01, time spent=0.177s, total_time=0.21min
2021-02-23 19:27:45,639 - root - INFO - [Iter 50/80, Epoch 2] train loss=2.8221e-01, gnorm=6.4566e+00, lr=2.4265e-05, #samples processed=96, #sample per second=226.98
2021-02-23 19:27:45,816 - root - INFO - [Iter 50/80, Epoch 2] valid f1=7.5000e-01, mcc=3.7867e-01, roc_auc=7.7350e-01, accuracy=7.0000e-01, log_loss=6.0462e-01, time spent=0.177s, total_time=0.22min
2021-02-23 19:27:46,066 - root - INFO - [Iter 52/80, Epoch 2] train loss=3.4392e-01, gnorm=5.9336e+00, lr=2.2647e-05, #samples processed=96, #sample per second=224.76
2021-02-23 19:27:46,242 - root - INFO - [Iter 52/80, Epoch 2] valid f1=7.6768e-01, mcc=3.9087e-01, roc_auc=7.7567e-01, accuracy=7.1250e-01, log_loss=5.9934e-01, time spent=0.176s, total_time=0.22min
2021-02-23 19:27:46,492 - root - INFO - [Iter 54/80, Epoch 2] train loss=2.4564e-01, gnorm=3.1173e+00, lr=2.1029e-05, #samples processed=96, #sample per second=225.48
2021-02-23 19:27:46,801 - root - INFO - [Iter 54/80, Epoch 2] valid f1=8.1308e-01, mcc=4.4926e-01, roc_auc=7.8350e-01, accuracy=7.5000e-01, log_loss=5.9745e-01, time spent=0.178s, total_time=0.23min
2021-02-23 19:27:47,055 - root - INFO - [Iter 56/80, Epoch 2] train loss=1.6644e-01, gnorm=3.4446e+00, lr=1.9412e-05, #samples processed=96, #sample per second=170.70
2021-02-23 19:27:47,232 - root - INFO - [Iter 56/80, Epoch 2] valid f1=8.1385e-01, mcc=4.0634e-01, roc_auc=7.9100e-01, accuracy=7.3125e-01, log_loss=6.4647e-01, time spent=0.177s, total_time=0.24min
2021-02-23 19:27:47,475 - root - INFO - [Iter 58/80, Epoch 2] train loss=4.0639e-01, gnorm=8.7185e+00, lr=1.7794e-05, #samples processed=96, #sample per second=228.21
2021-02-23 19:27:47,656 - root - INFO - [Iter 58/80, Epoch 2] valid f1=8.1385e-01, mcc=4.0634e-01, roc_auc=7.9400e-01, accuracy=7.3125e-01, log_loss=6.7208e-01, time spent=0.180s, total_time=0.25min
2021-02-23 19:27:47,908 - root - INFO - [Iter 60/80, Epoch 2] train loss=3.1296e-01, gnorm=1.0041e+01, lr=1.6176e-05, #samples processed=96, #sample per second=221.95
2021-02-23 19:27:48,088 - root - INFO - [Iter 60/80, Epoch 2] valid f1=8.1385e-01, mcc=4.0634e-01, roc_auc=7.9533e-01, accuracy=7.3125e-01, log_loss=6.3212e-01, time spent=0.180s, total_time=0.25min
2021-02-23 19:27:48,334 - root - INFO - [Iter 62/80, Epoch 3] train loss=3.2886e-01, gnorm=6.3907e+00, lr=1.4559e-05, #samples processed=96, #sample per second=225.58
2021-02-23 19:27:48,648 - root - INFO - [Iter 62/80, Epoch 3] valid f1=8.2243e-01, mcc=4.7778e-01, roc_auc=7.9250e-01, accuracy=7.6250e-01, log_loss=5.7437e-01, time spent=0.179s, total_time=0.26min
2021-02-23 19:27:48,903 - root - INFO - [Iter 64/80, Epoch 3] train loss=2.4223e-01, gnorm=4.1302e+00, lr=1.2941e-05, #samples processed=96, #sample per second=168.79
2021-02-23 19:27:49,228 - root - INFO - [Iter 64/80, Epoch 3] valid f1=8.1000e-01, mcc=4.9333e-01, roc_auc=7.8950e-01, accuracy=7.6250e-01, log_loss=5.6212e-01, time spent=0.178s, total_time=0.27min
2021-02-23 19:27:49,476 - root - INFO - [Iter 66/80, Epoch 3] train loss=2.5574e-01, gnorm=1.0088e+01, lr=1.1324e-05, #samples processed=96, #sample per second=167.51
2021-02-23 19:27:49,651 - root - INFO - [Iter 66/80, Epoch 3] valid f1=7.8571e-01, mcc=4.4799e-01, roc_auc=7.8900e-01, accuracy=7.3750e-01, log_loss=5.6690e-01, time spent=0.174s, total_time=0.28min
2021-02-23 19:27:49,908 - root - INFO - [Iter 68/80, Epoch 3] train loss=2.3437e-01, gnorm=6.6369e+00, lr=9.7059e-06, #samples processed=96, #sample per second=222.49
2021-02-23 19:27:50,258 - root - INFO - [Iter 68/80, Epoch 3] valid f1=8.1000e-01, mcc=4.9333e-01, roc_auc=7.9050e-01, accuracy=7.6250e-01, log_loss=5.6044e-01, time spent=0.176s, total_time=0.29min
2021-02-23 19:27:50,501 - root - INFO - [Iter 70/80, Epoch 3] train loss=3.6536e-01, gnorm=5.3816e+00, lr=8.0882e-06, #samples processed=96, #sample per second=161.87
2021-02-23 19:27:50,793 - root - INFO - [Iter 70/80, Epoch 3] valid f1=8.2126e-01, mcc=4.9716e-01, roc_auc=7.9367e-01, accuracy=7.6875e-01, log_loss=5.5796e-01, time spent=0.178s, total_time=0.30min
2021-02-23 19:27:51,033 - root - INFO - [Iter 72/80, Epoch 3] train loss=1.8926e-01, gnorm=3.4774e+00, lr=6.4706e-06, #samples processed=96, #sample per second=180.28
2021-02-23 19:27:51,212 - root - INFO - [Iter 72/80, Epoch 3] valid f1=8.1481e-01, mcc=4.4815e-01, roc_auc=7.9633e-01, accuracy=7.5000e-01, log_loss=5.6824e-01, time spent=0.178s, total_time=0.31min
2021-02-23 19:27:51,455 - root - INFO - [Iter 74/80, Epoch 3] train loss=2.3578e-01, gnorm=4.0792e+00, lr=4.8529e-06, #samples processed=96, #sample per second=227.86
2021-02-23 19:27:51,633 - root - INFO - [Iter 74/80, Epoch 3] valid f1=8.1818e-01, mcc=4.4721e-01, roc_auc=7.9800e-01, accuracy=7.5000e-01, log_loss=5.7957e-01, time spent=0.178s, total_time=0.31min
2021-02-23 19:27:51,890 - root - INFO - [Iter 76/80, Epoch 3] train loss=2.6687e-01, gnorm=3.0414e+00, lr=3.2353e-06, #samples processed=96, #sample per second=220.89
2021-02-23 19:27:52,067 - root - INFO - [Iter 76/80, Epoch 3] valid f1=8.2353e-01, mcc=4.6231e-01, roc_auc=7.9867e-01, accuracy=7.5625e-01, log_loss=5.8215e-01, time spent=0.177s, total_time=0.32min
2021-02-23 19:27:52,312 - root - INFO - [Iter 78/80, Epoch 3] train loss=1.6882e-01, gnorm=3.0694e+00, lr=1.6176e-06, #samples processed=96, #sample per second=227.34
2021-02-23 19:27:52,490 - root - INFO - [Iter 78/80, Epoch 3] valid f1=8.2353e-01, mcc=4.6231e-01, roc_auc=7.9783e-01, accuracy=7.5625e-01, log_loss=5.8319e-01, time spent=0.178s, total_time=0.33min
2021-02-23 19:27:52,739 - root - INFO - [Iter 80/80, Epoch 3] train loss=1.3949e-01, gnorm=4.2179e+00, lr=0.0000e+00, #samples processed=96, #sample per second=225.14
2021-02-23 19:27:52,916 - root - INFO - [Iter 80/80, Epoch 3] valid f1=8.2353e-01, mcc=4.6231e-01, roc_auc=7.9783e-01, accuracy=7.5625e-01, log_loss=5.8304e-01, time spent=0.177s, total_time=0.33min
2021-02-23 19:28:12,411 - autogluon.text.text_prediction.text_prediction - INFO - Results=
INFO:autogluon.text.text_prediction.text_prediction:Results=
2021-02-23 19:28:12,414 - autogluon.text.text_prediction.text_prediction - INFO - Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.mid_units': 80, 'search_space▁optimization.layerwise_lr_decay': 0.9, 'search_space▁optimization.lr': 5.5e-05, 'search_space▁optimization.warmup_portion': 0.15}
INFO:autogluon.text.text_prediction.text_prediction:Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.mid_units': 80, 'search_space▁optimization.layerwise_lr_decay': 0.9, 'search_space▁optimization.lr': 5.5e-05, 'search_space▁optimization.warmup_portion': 0.15}
(task:1) 2021-02-23 19:27:55,793 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_random_search/task1/training.log
2021-02-23 19:27:55,794 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_random_search/task1
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: False
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 93
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.9705516632779506
log_frequency: 0.1
lr: 7.967709362655271e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.19737539140923366
wd: 0.01
version: 1
2021-02-23 19:27:55,943 - root - INFO - Process training set...
2021-02-23 19:27:56,209 - root - INFO - Done!
2021-02-23 19:27:56,209 - root - INFO - Process dev set...
2021-02-23 19:27:56,272 - root - INFO - Done!
2021-02-23 19:28:01,645 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:28:01,661 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:28:02,576 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.8651e-01, gnorm=6.4874e+00, lr=1.0624e-05, #samples processed=96, #sample per second=109.89
2021-02-23 19:28:02,853 - root - INFO - [Iter 2/80, Epoch 0] valid f1=7.5969e-01, mcc=-8.7149e-02, roc_auc=4.5417e-01, accuracy=6.1250e-01, log_loss=7.2693e-01, time spent=0.188s, total_time=0.02min
2021-02-23 19:28:03,207 - root - INFO - [Iter 4/80, Epoch 0] train loss=4.6394e-01, gnorm=7.5454e+00, lr=2.1247e-05, #samples processed=96, #sample per second=152.30
2021-02-23 19:28:03,382 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.4286e-01, mcc=1.6609e-02, roc_auc=5.4100e-01, accuracy=6.0625e-01, log_loss=6.8001e-01, time spent=0.174s, total_time=0.03min
2021-02-23 19:28:03,703 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.5454e-01, gnorm=6.7836e+00, lr=3.1871e-05, #samples processed=96, #sample per second=193.34
2021-02-23 19:28:04,009 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.6923e-01, mcc=0.0000e+00, roc_auc=6.4967e-01, accuracy=6.2500e-01, log_loss=7.8367e-01, time spent=0.176s, total_time=0.04min
2021-02-23 19:28:04,332 - root - INFO - [Iter 8/80, Epoch 0] train loss=6.0453e-01, gnorm=9.9783e+00, lr=4.2494e-05, #samples processed=96, #sample per second=152.74
2021-02-23 19:28:04,659 - root - INFO - [Iter 8/80, Epoch 0] valid f1=7.6543e-01, mcc=1.5187e-01, roc_auc=6.9700e-01, accuracy=6.4375e-01, log_loss=6.4704e-01, time spent=0.177s, total_time=0.05min
2021-02-23 19:28:04,949 - root - INFO - [Iter 10/80, Epoch 0] train loss=3.5790e-01, gnorm=8.4106e+00, lr=5.3118e-05, #samples processed=96, #sample per second=155.63
2021-02-23 19:28:05,266 - root - INFO - [Iter 10/80, Epoch 0] valid f1=7.4000e-01, mcc=3.0667e-01, roc_auc=7.2417e-01, accuracy=6.7500e-01, log_loss=6.0291e-01, time spent=0.178s, total_time=0.06min
2021-02-23 19:28:05,502 - root - INFO - [Iter 12/80, Epoch 0] train loss=4.5583e-01, gnorm=7.1815e+00, lr=6.3742e-05, #samples processed=96, #sample per second=173.78
2021-02-23 19:28:05,854 - root - INFO - [Iter 12/80, Epoch 0] valid f1=7.8341e-01, mcc=3.4582e-01, roc_auc=7.5767e-01, accuracy=7.0625e-01, log_loss=5.6450e-01, time spent=0.178s, total_time=0.07min
2021-02-23 19:28:06,216 - root - INFO - [Iter 14/80, Epoch 0] train loss=4.2757e-01, gnorm=6.3245e+00, lr=7.4365e-05, #samples processed=96, #sample per second=134.39
2021-02-23 19:28:06,393 - root - INFO - [Iter 14/80, Epoch 0] valid f1=7.7043e-01, mcc=8.3280e-02, roc_auc=7.9217e-01, accuracy=6.3125e-01, log_loss=8.5800e-01, time spent=0.176s, total_time=0.08min
2021-02-23 19:28:06,638 - root - INFO - [Iter 16/80, Epoch 0] train loss=3.2936e-01, gnorm=3.5680e+00, lr=7.8451e-05, #samples processed=96, #sample per second=227.95
2021-02-23 19:28:06,954 - root - INFO - [Iter 16/80, Epoch 0] valid f1=8.0531e-01, mcc=3.8659e-01, roc_auc=7.8367e-01, accuracy=7.2500e-01, log_loss=5.4979e-01, time spent=0.180s, total_time=0.09min
2021-02-23 19:28:07,229 - root - INFO - [Iter 18/80, Epoch 0] train loss=3.6221e-01, gnorm=8.1735e+00, lr=7.6000e-05, #samples processed=96, #sample per second=162.39
2021-02-23 19:28:07,582 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.1553e-01, mcc=4.8461e-01, roc_auc=7.8233e-01, accuracy=7.6250e-01, log_loss=5.4563e-01, time spent=0.177s, total_time=0.10min
2021-02-23 19:28:07,935 - root - INFO - [Iter 20/80, Epoch 0] train loss=3.5750e-01, gnorm=5.1890e+00, lr=7.3548e-05, #samples processed=96, #sample per second=135.92
2021-02-23 19:28:08,114 - root - INFO - [Iter 20/80, Epoch 0] valid f1=8.1250e-01, mcc=4.1737e-01, roc_auc=7.8767e-01, accuracy=7.3750e-01, log_loss=5.8954e-01, time spent=0.178s, total_time=0.11min
2021-02-23 19:28:08,368 - root - INFO - [Iter 22/80, Epoch 1] train loss=4.1607e-01, gnorm=4.1247e+00, lr=7.1096e-05, #samples processed=96, #sample per second=221.91
2021-02-23 19:28:08,545 - root - INFO - [Iter 22/80, Epoch 1] valid f1=8.0976e-01, mcc=4.7227e-01, roc_auc=7.8683e-01, accuracy=7.5625e-01, log_loss=5.5142e-01, time spent=0.177s, total_time=0.11min
2021-02-23 19:28:08,799 - root - INFO - [Iter 24/80, Epoch 1] train loss=3.2410e-01, gnorm=7.0990e+00, lr=6.8645e-05, #samples processed=96, #sample per second=222.69
2021-02-23 19:28:08,975 - root - INFO - [Iter 24/80, Epoch 1] valid f1=7.9675e-01, mcc=3.0840e-01, roc_auc=7.9750e-01, accuracy=6.8750e-01, log_loss=6.9531e-01, time spent=0.176s, total_time=0.12min
2021-02-23 19:28:09,218 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.9328e-01, gnorm=3.9558e+00, lr=6.6193e-05, #samples processed=96, #sample per second=229.29
2021-02-23 19:28:09,396 - root - INFO - [Iter 26/80, Epoch 1] valid f1=7.9032e-01, mcc=2.6958e-01, roc_auc=8.0450e-01, accuracy=6.7500e-01, log_loss=6.4710e-01, time spent=0.177s, total_time=0.13min
2021-02-23 19:28:09,702 - root - INFO - [Iter 28/80, Epoch 1] train loss=4.6539e-01, gnorm=6.7899e+00, lr=6.3742e-05, #samples processed=96, #sample per second=198.63
2021-02-23 19:28:09,880 - root - INFO - [Iter 28/80, Epoch 1] valid f1=8.1860e-01, mcc=4.6301e-01, roc_auc=7.5983e-01, accuracy=7.5625e-01, log_loss=5.7829e-01, time spent=0.178s, total_time=0.14min
2021-02-23 19:28:10,139 - root - INFO - [Iter 30/80, Epoch 1] train loss=3.6052e-01, gnorm=5.6333e+00, lr=6.1290e-05, #samples processed=96, #sample per second=219.44
2021-02-23 19:28:10,317 - root - INFO - [Iter 30/80, Epoch 1] valid f1=7.7291e-01, mcc=1.4708e-01, roc_auc=7.7400e-01, accuracy=6.4375e-01, log_loss=5.8634e-01, time spent=0.178s, total_time=0.14min
2021-02-23 19:28:10,578 - root - INFO - [Iter 32/80, Epoch 1] train loss=3.5156e-01, gnorm=5.8901e+00, lr=5.8838e-05, #samples processed=96, #sample per second=218.72
2021-02-23 19:28:10,757 - root - INFO - [Iter 32/80, Epoch 1] valid f1=8.1106e-01, mcc=4.3319e-01, roc_auc=7.8267e-01, accuracy=7.4375e-01, log_loss=5.5320e-01, time spent=0.179s, total_time=0.15min
2021-02-23 19:28:11,021 - root - INFO - [Iter 34/80, Epoch 1] train loss=2.8000e-01, gnorm=6.7179e+00, lr=5.6387e-05, #samples processed=96, #sample per second=216.95
2021-02-23 19:28:11,199 - root - INFO - [Iter 34/80, Epoch 1] valid f1=8.0349e-01, mcc=3.7155e-01, roc_auc=7.7917e-01, accuracy=7.1875e-01, log_loss=5.7475e-01, time spent=0.178s, total_time=0.16min
2021-02-23 19:28:11,441 - root - INFO - [Iter 36/80, Epoch 1] train loss=3.4089e-01, gnorm=5.3717e+00, lr=5.3935e-05, #samples processed=96, #sample per second=228.52
2021-02-23 19:28:11,617 - root - INFO - [Iter 36/80, Epoch 1] valid f1=7.3514e-01, mcc=3.8482e-01, roc_auc=7.6000e-01, accuracy=6.9375e-01, log_loss=5.9669e-01, time spent=0.175s, total_time=0.17min
2021-02-23 19:28:11,867 - root - INFO - [Iter 38/80, Epoch 1] train loss=3.8594e-01, gnorm=7.0408e+00, lr=5.1484e-05, #samples processed=96, #sample per second=225.48
2021-02-23 19:28:12,042 - root - INFO - [Iter 38/80, Epoch 1] valid f1=6.9364e-01, mcc=3.7259e-01, roc_auc=7.7083e-01, accuracy=6.6875e-01, log_loss=6.3087e-01, time spent=0.175s, total_time=0.17min
2021-02-23 19:28:12,045 - root - INFO - Early stopping patience reached!
We can again evaluate our model’s performance on separate test data.
dev_score = predictor_mrpc.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
self._build_cache(*args)
Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.mid_units': 80, 'search_space▁optimization.layerwise_lr_decay': 0.9, 'search_space▁optimization.lr': 5.5e-05, 'search_space▁optimization.warmup_portion': 0.15}
Total Time = 49.173131227493286s
Accuracy = 76.23%
F1 = 83.25%
And also use the model to predict whether new sentence pairs are paraphrases of each other or not.
sentence1 = 'It is simple to solve NLP problems with AutoGluon.'
sentence2 = 'With AutoGluon, it is easy to solve NLP problems.'
sentence3 = 'AutoGluon gives you a very bad user experience for solving NLP problems.'
prediction1 = predictor_mrpc.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))
A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.03616907 0.96383095]"
A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "False"
Prob = "[0.6190837 0.38091624]"
Use Bayesian Optimization¶
Instead of random search, we can perform HPO via Bayesian Optimization. Here we specify bayesopt as the searcher.
hyperparameters['hpo_params'] = {
'search_strategy': 'bayesopt'
}
predictor_mrpc_bo = task.fit(train_data, label='label',
hyperparameters=hyperparameters,
time_limits=60 * 2,
num_trials=2, # increase this to get good performance in your applications
ngpus_per_trial=1, seed=123,
output_directory='./ag_mrpc_custom_space_fifo_bo')
2021-02-23 19:28:18,743 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_custom_space_fifo_bo/ag_text_prediction.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_custom_space_fifo_bo/ag_text_prediction.log
2021-02-23 19:28:18,757 - autogluon.text.text_prediction.text_prediction - INFO - Train Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Train Dataset:
2021-02-23 19:28:18,758 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=44/115.9390625/200
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=42/116.503125/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[216, 424]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=44/115.9390625/200
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=42/116.503125/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[216, 424]
)
2021-02-23 19:28:18,759 - autogluon.text.text_prediction.text_prediction - INFO - Tuning Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Tuning Dataset:
2021-02-23 19:28:18,760 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=54/123.14375/195
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=59/123.26875/208
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[52, 108]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=54/123.14375/195
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=59/123.26875/208
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[52, 108]
)
2021-02-23 19:28:18,762 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_custom_space_fifo_bo/main.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_custom_space_fifo_bo/main.log
0%| | 0/2 [00:00<?, ?it/s]
(task:2) 2021-02-23 19:28:20,954 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_fifo_bo/task2/training.log
2021-02-23 19:28:20,955 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_fifo_bo/task2
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: False
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 80
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.9
log_frequency: 0.1
lr: 5.5e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.15
wd: 0.01
version: 1
2021-02-23 19:28:21,121 - root - INFO - Process training set...
2021-02-23 19:28:21,393 - root - INFO - Done!
2021-02-23 19:28:21,394 - root - INFO - Process dev set...
2021-02-23 19:28:21,459 - root - INFO - Done!
2021-02-23 19:28:26,852 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:28:26,869 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:28:27,790 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.9825e-01, gnorm=9.3682e+00, lr=9.1667e-06, #samples processed=96, #sample per second=109.21
2021-02-23 19:28:28,051 - root - INFO - [Iter 2/80, Epoch 0] valid f1=8.0597e-01, mcc=0.0000e+00, roc_auc=4.2717e-01, accuracy=6.7500e-01, log_loss=6.9478e-01, time spent=0.182s, total_time=0.02min
2021-02-23 19:28:28,414 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.1952e-01, gnorm=4.9405e+00, lr=1.8333e-05, #samples processed=96, #sample per second=154.02
2021-02-23 19:28:28,583 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.6000e-01, mcc=-3.5896e-02, roc_auc=5.1015e-01, accuracy=6.2500e-01, log_loss=6.5581e-01, time spent=0.169s, total_time=0.03min
2021-02-23 19:28:28,947 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.4682e-01, gnorm=5.1538e+00, lr=2.7500e-05, #samples processed=96, #sample per second=179.96
2021-02-23 19:28:29,118 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.0423e-01, mcc=1.1589e-01, roc_auc=6.3159e-01, accuracy=6.0625e-01, log_loss=6.3528e-01, time spent=0.170s, total_time=0.04min
2021-02-23 19:28:29,419 - root - INFO - [Iter 8/80, Epoch 0] train loss=4.5215e-01, gnorm=4.2853e+00, lr=3.6667e-05, #samples processed=96, #sample per second=203.65
2021-02-23 19:28:29,709 - root - INFO - [Iter 8/80, Epoch 0] valid f1=8.0934e-01, mcc=1.8062e-01, roc_auc=6.9854e-01, accuracy=6.9375e-01, log_loss=5.8346e-01, time spent=0.168s, total_time=0.05min
2021-02-23 19:28:30,004 - root - INFO - [Iter 10/80, Epoch 0] train loss=3.5321e-01, gnorm=6.0354e+00, lr=4.5833e-05, #samples processed=96, #sample per second=164.11
2021-02-23 19:28:30,334 - root - INFO - [Iter 10/80, Epoch 0] valid f1=8.0162e-01, mcc=2.0450e-01, roc_auc=7.3771e-01, accuracy=6.9375e-01, log_loss=5.6618e-01, time spent=0.169s, total_time=0.06min
2021-02-23 19:28:30,635 - root - INFO - [Iter 12/80, Epoch 0] train loss=3.7397e-01, gnorm=3.7483e+00, lr=5.5000e-05, #samples processed=96, #sample per second=152.19
2021-02-23 19:28:30,922 - root - INFO - [Iter 12/80, Epoch 0] valid f1=8.0717e-01, mcc=3.6728e-01, roc_auc=7.4501e-01, accuracy=7.3125e-01, log_loss=5.6052e-01, time spent=0.171s, total_time=0.07min
2021-02-23 19:28:31,174 - root - INFO - [Iter 14/80, Epoch 0] train loss=3.4767e-01, gnorm=7.7745e+00, lr=5.3382e-05, #samples processed=96, #sample per second=178.28
2021-02-23 19:28:31,345 - root - INFO - [Iter 14/80, Epoch 0] valid f1=7.3892e-01, mcc=2.9547e-01, roc_auc=7.5107e-01, accuracy=6.6875e-01, log_loss=6.0647e-01, time spent=0.171s, total_time=0.07min
2021-02-23 19:28:31,589 - root - INFO - [Iter 16/80, Epoch 0] train loss=4.7838e-01, gnorm=5.1344e+00, lr=5.1765e-05, #samples processed=96, #sample per second=231.09
2021-02-23 19:28:31,761 - root - INFO - [Iter 16/80, Epoch 0] valid f1=7.9051e-01, mcc=9.7283e-02, roc_auc=7.5873e-01, accuracy=6.6875e-01, log_loss=6.2947e-01, time spent=0.171s, total_time=0.08min
2021-02-23 19:28:32,075 - root - INFO - [Iter 18/80, Epoch 0] train loss=3.9948e-01, gnorm=8.8831e+00, lr=5.0147e-05, #samples processed=96, #sample per second=197.52
2021-02-23 19:28:32,245 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.0934e-01, mcc=1.8062e-01, roc_auc=7.6941e-01, accuracy=6.9375e-01, log_loss=6.2007e-01, time spent=0.168s, total_time=0.09min
2021-02-23 19:28:32,490 - root - INFO - [Iter 20/80, Epoch 0] train loss=4.1535e-01, gnorm=8.1681e+00, lr=4.8529e-05, #samples processed=96, #sample per second=232.27
2021-02-23 19:28:32,662 - root - INFO - [Iter 20/80, Epoch 0] valid f1=7.6555e-01, mcc=3.2706e-01, roc_auc=7.7172e-01, accuracy=6.9375e-01, log_loss=5.8448e-01, time spent=0.171s, total_time=0.10min
2021-02-23 19:28:32,921 - root - INFO - [Iter 22/80, Epoch 1] train loss=3.2063e-01, gnorm=8.2923e+00, lr=4.6912e-05, #samples processed=96, #sample per second=223.03
2021-02-23 19:28:33,090 - root - INFO - [Iter 22/80, Epoch 1] valid f1=7.1579e-01, mcc=3.3771e-01, roc_auc=7.7778e-01, accuracy=6.6250e-01, log_loss=6.4837e-01, time spent=0.169s, total_time=0.10min
2021-02-23 19:28:33,380 - root - INFO - [Iter 24/80, Epoch 1] train loss=4.5408e-01, gnorm=3.3139e+00, lr=4.5294e-05, #samples processed=96, #sample per second=208.83
2021-02-23 19:28:33,684 - root - INFO - [Iter 24/80, Epoch 1] valid f1=8.1081e-01, mcc=3.8476e-01, roc_auc=7.8419e-01, accuracy=7.3750e-01, log_loss=5.3649e-01, time spent=0.171s, total_time=0.11min
2021-02-23 19:28:33,968 - root - INFO - [Iter 26/80, Epoch 1] train loss=4.1108e-01, gnorm=5.1668e+00, lr=4.3676e-05, #samples processed=96, #sample per second=163.52
2021-02-23 19:28:34,285 - root - INFO - [Iter 26/80, Epoch 1] valid f1=8.1739e-01, mcc=3.6531e-01, roc_auc=7.9078e-01, accuracy=7.3750e-01, log_loss=5.2384e-01, time spent=0.171s, total_time=0.12min
2021-02-23 19:28:34,532 - root - INFO - [Iter 28/80, Epoch 1] train loss=2.9167e-01, gnorm=3.1397e+00, lr=4.2059e-05, #samples processed=96, #sample per second=170.27
2021-02-23 19:28:34,704 - root - INFO - [Iter 28/80, Epoch 1] valid f1=8.1385e-01, mcc=3.4734e-01, roc_auc=7.9238e-01, accuracy=7.3125e-01, log_loss=5.3014e-01, time spent=0.172s, total_time=0.13min
2021-02-23 19:28:34,939 - root - INFO - [Iter 30/80, Epoch 1] train loss=3.6416e-01, gnorm=2.9714e+00, lr=4.0441e-05, #samples processed=96, #sample per second=235.95
2021-02-23 19:28:35,240 - root - INFO - [Iter 30/80, Epoch 1] valid f1=8.1250e-01, mcc=3.7954e-01, roc_auc=7.9309e-01, accuracy=7.3750e-01, log_loss=5.3422e-01, time spent=0.170s, total_time=0.14min
2021-02-23 19:28:35,472 - root - INFO - [Iter 32/80, Epoch 1] train loss=3.8094e-01, gnorm=8.6718e+00, lr=3.8824e-05, #samples processed=96, #sample per second=179.92
2021-02-23 19:28:35,643 - root - INFO - [Iter 32/80, Epoch 1] valid f1=7.6847e-01, mcc=3.7698e-01, roc_auc=7.9131e-01, accuracy=7.0625e-01, log_loss=5.9106e-01, time spent=0.171s, total_time=0.15min
2021-02-23 19:28:35,910 - root - INFO - [Iter 34/80, Epoch 1] train loss=4.4406e-01, gnorm=5.3567e+00, lr=3.7206e-05, #samples processed=96, #sample per second=219.69
2021-02-23 19:28:36,230 - root - INFO - [Iter 34/80, Epoch 1] valid f1=8.1106e-01, mcc=4.1307e-01, roc_auc=7.9630e-01, accuracy=7.4375e-01, log_loss=5.4847e-01, time spent=0.170s, total_time=0.16min
2021-02-23 19:28:36,477 - root - INFO - [Iter 36/80, Epoch 1] train loss=3.2083e-01, gnorm=6.7313e+00, lr=3.5588e-05, #samples processed=96, #sample per second=169.21
2021-02-23 19:28:36,649 - root - INFO - [Iter 36/80, Epoch 1] valid f1=8.1739e-01, mcc=3.6531e-01, roc_auc=8.0146e-01, accuracy=7.3750e-01, log_loss=5.3995e-01, time spent=0.171s, total_time=0.16min
2021-02-23 19:28:36,899 - root - INFO - [Iter 38/80, Epoch 1] train loss=2.6656e-01, gnorm=4.9067e+00, lr=3.3971e-05, #samples processed=96, #sample per second=227.59
2021-02-23 19:28:37,071 - root - INFO - [Iter 38/80, Epoch 1] valid f1=8.2114e-01, mcc=3.0418e-01, roc_auc=8.0467e-01, accuracy=7.2500e-01, log_loss=5.8405e-01, time spent=0.172s, total_time=0.17min
2021-02-23 19:28:37,314 - root - INFO - [Iter 40/80, Epoch 1] train loss=3.6528e-01, gnorm=5.5076e+00, lr=3.2353e-05, #samples processed=96, #sample per second=231.66
2021-02-23 19:28:37,600 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.2251e-01, mcc=3.7899e-01, roc_auc=8.0538e-01, accuracy=7.4375e-01, log_loss=5.4628e-01, time spent=0.170s, total_time=0.18min
2021-02-23 19:28:37,840 - root - INFO - [Iter 42/80, Epoch 2] train loss=3.4586e-01, gnorm=3.6367e+00, lr=3.0735e-05, #samples processed=96, #sample per second=182.41
2021-02-23 19:28:38,154 - root - INFO - [Iter 42/80, Epoch 2] valid f1=8.1279e-01, mcc=4.0747e-01, roc_auc=8.0271e-01, accuracy=7.4375e-01, log_loss=5.3476e-01, time spent=0.176s, total_time=0.19min
2021-02-23 19:28:38,405 - root - INFO - [Iter 44/80, Epoch 2] train loss=2.9936e-01, gnorm=3.8109e+00, lr=2.9118e-05, #samples processed=96, #sample per second=169.94
2021-02-23 19:28:38,577 - root - INFO - [Iter 44/80, Epoch 2] valid f1=7.9812e-01, mcc=3.9684e-01, roc_auc=8.0573e-01, accuracy=7.3125e-01, log_loss=5.3807e-01, time spent=0.171s, total_time=0.19min
2021-02-23 19:28:38,821 - root - INFO - [Iter 46/80, Epoch 2] train loss=2.8047e-01, gnorm=2.7548e+00, lr=2.7500e-05, #samples processed=96, #sample per second=230.71
2021-02-23 19:28:39,127 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.1279e-01, mcc=4.0747e-01, roc_auc=8.1072e-01, accuracy=7.4375e-01, log_loss=5.2923e-01, time spent=0.173s, total_time=0.20min
2021-02-23 19:28:39,375 - root - INFO - [Iter 48/80, Epoch 2] train loss=2.7852e-01, gnorm=4.4091e+00, lr=2.5882e-05, #samples processed=96, #sample per second=173.47
2021-02-23 19:28:39,546 - root - INFO - [Iter 48/80, Epoch 2] valid f1=8.1416e-01, mcc=3.7455e-01, roc_auc=8.1535e-01, accuracy=7.3750e-01, log_loss=5.2629e-01, time spent=0.171s, total_time=0.21min
2021-02-23 19:28:39,798 - root - INFO - [Iter 50/80, Epoch 2] train loss=3.3480e-01, gnorm=2.8557e+00, lr=2.4265e-05, #samples processed=96, #sample per second=227.02
2021-02-23 19:28:40,112 - root - INFO - [Iter 50/80, Epoch 2] valid f1=8.1938e-01, mcc=3.8743e-01, roc_auc=8.1873e-01, accuracy=7.4375e-01, log_loss=5.3077e-01, time spent=0.170s, total_time=0.22min
2021-02-23 19:28:40,358 - root - INFO - [Iter 52/80, Epoch 2] train loss=3.3085e-01, gnorm=6.6906e+00, lr=2.2647e-05, #samples processed=96, #sample per second=171.44
2021-02-23 19:28:40,528 - root - INFO - [Iter 52/80, Epoch 2] valid f1=8.0717e-01, mcc=3.6728e-01, roc_auc=8.1891e-01, accuracy=7.3125e-01, log_loss=5.2353e-01, time spent=0.169s, total_time=0.23min
2021-02-23 19:28:40,804 - root - INFO - [Iter 54/80, Epoch 2] train loss=2.2954e-01, gnorm=5.2841e+00, lr=2.1029e-05, #samples processed=96, #sample per second=215.19
2021-02-23 19:28:41,146 - root - INFO - [Iter 54/80, Epoch 2] valid f1=8.1818e-01, mcc=4.1931e-01, roc_auc=8.1909e-01, accuracy=7.5000e-01, log_loss=5.2415e-01, time spent=0.171s, total_time=0.24min
2021-02-23 19:28:41,401 - root - INFO - [Iter 56/80, Epoch 2] train loss=3.5338e-01, gnorm=8.0082e+00, lr=1.9412e-05, #samples processed=96, #sample per second=161.06
2021-02-23 19:28:41,573 - root - INFO - [Iter 56/80, Epoch 2] valid f1=8.0717e-01, mcc=3.6728e-01, roc_auc=8.1980e-01, accuracy=7.3125e-01, log_loss=5.2250e-01, time spent=0.172s, total_time=0.24min
2021-02-23 19:28:41,825 - root - INFO - [Iter 58/80, Epoch 2] train loss=4.0803e-01, gnorm=6.6504e+00, lr=1.7794e-05, #samples processed=96, #sample per second=226.23
2021-02-23 19:28:41,995 - root - INFO - [Iter 58/80, Epoch 2] valid f1=8.0717e-01, mcc=3.6728e-01, roc_auc=8.1962e-01, accuracy=7.3125e-01, log_loss=5.2174e-01, time spent=0.169s, total_time=0.25min
2021-02-23 19:28:42,250 - root - INFO - [Iter 60/80, Epoch 2] train loss=1.9458e-01, gnorm=4.1155e+00, lr=1.6176e-05, #samples processed=96, #sample per second=226.09
2021-02-23 19:28:42,420 - root - INFO - [Iter 60/80, Epoch 2] valid f1=8.0717e-01, mcc=3.6728e-01, roc_auc=8.1962e-01, accuracy=7.3125e-01, log_loss=5.2363e-01, time spent=0.170s, total_time=0.26min
2021-02-23 19:28:42,678 - root - INFO - [Iter 62/80, Epoch 3] train loss=2.9717e-01, gnorm=4.2407e+00, lr=1.4559e-05, #samples processed=96, #sample per second=224.38
2021-02-23 19:28:42,849 - root - INFO - [Iter 62/80, Epoch 3] valid f1=8.0556e-01, mcc=4.0171e-01, roc_auc=8.1873e-01, accuracy=7.3750e-01, log_loss=5.2369e-01, time spent=0.171s, total_time=0.27min
2021-02-23 19:28:43,093 - root - INFO - [Iter 64/80, Epoch 3] train loss=2.3936e-01, gnorm=9.4930e+00, lr=1.2941e-05, #samples processed=96, #sample per second=231.44
2021-02-23 19:28:43,267 - root - INFO - [Iter 64/80, Epoch 3] valid f1=8.0751e-01, mcc=4.2494e-01, roc_auc=8.1749e-01, accuracy=7.4375e-01, log_loss=5.3018e-01, time spent=0.174s, total_time=0.27min
2021-02-23 19:28:43,527 - root - INFO - [Iter 66/80, Epoch 3] train loss=2.4479e-01, gnorm=3.0960e+00, lr=1.1324e-05, #samples processed=96, #sample per second=221.18
2021-02-23 19:28:43,703 - root - INFO - [Iter 66/80, Epoch 3] valid f1=8.0930e-01, mcc=4.1889e-01, roc_auc=8.1838e-01, accuracy=7.4375e-01, log_loss=5.2763e-01, time spent=0.176s, total_time=0.28min
2021-02-23 19:28:43,968 - root - INFO - [Iter 68/80, Epoch 3] train loss=2.5821e-01, gnorm=6.5340e+00, lr=9.7059e-06, #samples processed=96, #sample per second=217.67
2021-02-23 19:28:44,141 - root - INFO - [Iter 68/80, Epoch 3] valid f1=8.0556e-01, mcc=4.0171e-01, roc_auc=8.1784e-01, accuracy=7.3750e-01, log_loss=5.2673e-01, time spent=0.172s, total_time=0.29min
2021-02-23 19:28:44,393 - root - INFO - [Iter 70/80, Epoch 3] train loss=2.8625e-01, gnorm=3.9592e+00, lr=8.0882e-06, #samples processed=96, #sample per second=226.15
2021-02-23 19:28:44,568 - root - INFO - [Iter 70/80, Epoch 3] valid f1=8.1448e-01, mcc=4.0210e-01, roc_auc=8.1820e-01, accuracy=7.4375e-01, log_loss=5.2745e-01, time spent=0.175s, total_time=0.29min
2021-02-23 19:28:44,819 - root - INFO - [Iter 72/80, Epoch 3] train loss=2.6015e-01, gnorm=3.6656e+00, lr=6.4706e-06, #samples processed=96, #sample per second=225.33
2021-02-23 19:28:44,991 - root - INFO - [Iter 72/80, Epoch 3] valid f1=8.0909e-01, mcc=3.9019e-01, roc_auc=8.1713e-01, accuracy=7.3750e-01, log_loss=5.2817e-01, time spent=0.172s, total_time=0.30min
2021-02-23 19:28:45,240 - root - INFO - [Iter 74/80, Epoch 3] train loss=2.7299e-01, gnorm=9.0514e+00, lr=4.8529e-06, #samples processed=96, #sample per second=228.24
2021-02-23 19:28:45,410 - root - INFO - [Iter 74/80, Epoch 3] valid f1=8.0734e-01, mcc=3.9585e-01, roc_auc=8.1624e-01, accuracy=7.3750e-01, log_loss=5.2847e-01, time spent=0.170s, total_time=0.31min
2021-02-23 19:28:45,412 - root - INFO - Early stopping patience reached!
2021-02-23 19:29:15,829 - autogluon.text.text_prediction.text_prediction - INFO - Results=
INFO:autogluon.text.text_prediction.text_prediction:Results=
2021-02-23 19:29:15,833 - autogluon.text.text_prediction.text_prediction - INFO - Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.mid_units': 52, 'search_space▁optimization.layerwise_lr_decay': 0.8964584420622498, 'search_space▁optimization.lr': 4.356400129415584e-05, 'search_space▁optimization.warmup_portion': 0.17596939190261784}
INFO:autogluon.text.text_prediction.text_prediction:Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.mid_units': 52, 'search_space▁optimization.layerwise_lr_decay': 0.8964584420622498, 'search_space▁optimization.lr': 4.356400129415584e-05, 'search_space▁optimization.warmup_portion': 0.17596939190261784}
(task:3) 2021-02-23 19:28:48,533 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_fifo_bo/task3/training.log
2021-02-23 19:28:48,533 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_fifo_bo/task3
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: True
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 52
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.8964584420622498
log_frequency: 0.1
lr: 4.356400129415584e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.17596939190261784
wd: 0.01
version: 1
2021-02-23 19:28:48,681 - root - INFO - Process training set...
2021-02-23 19:28:48,969 - root - INFO - Done!
2021-02-23 19:28:48,969 - root - INFO - Process dev set...
2021-02-23 19:28:49,032 - root - INFO - Done!
2021-02-23 19:28:54,419 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:28:54,434 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:28:55,367 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.9887e-01, gnorm=9.3989e+00, lr=6.2234e-06, #samples processed=96, #sample per second=107.68
2021-02-23 19:28:55,630 - root - INFO - [Iter 2/80, Epoch 0] valid f1=8.0597e-01, mcc=0.0000e+00, roc_auc=4.2094e-01, accuracy=6.7500e-01, log_loss=7.0384e-01, time spent=0.185s, total_time=0.02min
2021-02-23 19:28:55,987 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.2641e-01, gnorm=4.8122e+00, lr=1.2447e-05, #samples processed=96, #sample per second=154.79
2021-02-23 19:28:56,162 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.8161e-01, mcc=-8.3181e-02, roc_auc=4.7276e-01, accuracy=6.4375e-01, log_loss=6.5702e-01, time spent=0.174s, total_time=0.03min
2021-02-23 19:28:56,534 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.5132e-01, gnorm=5.0655e+00, lr=1.8670e-05, #samples processed=96, #sample per second=175.78
2021-02-23 19:28:56,708 - root - INFO - [Iter 6/80, Epoch 0] valid f1=6.6667e-01, mcc=5.9756e-02, roc_auc=5.7550e-01, accuracy=5.6875e-01, log_loss=6.6964e-01, time spent=0.173s, total_time=0.04min
2021-02-23 19:28:57,005 - root - INFO - [Iter 8/80, Epoch 0] train loss=4.8558e-01, gnorm=6.2225e+00, lr=2.4894e-05, #samples processed=96, #sample per second=203.84
2021-02-23 19:28:57,176 - root - INFO - [Iter 8/80, Epoch 0] valid f1=7.6271e-01, mcc=1.2010e-01, roc_auc=6.6222e-01, accuracy=6.5000e-01, log_loss=5.9754e-01, time spent=0.170s, total_time=0.05min
2021-02-23 19:28:57,478 - root - INFO - [Iter 10/80, Epoch 0] train loss=3.7074e-01, gnorm=4.7998e+00, lr=3.1117e-05, #samples processed=96, #sample per second=203.14
2021-02-23 19:28:57,785 - root - INFO - [Iter 10/80, Epoch 0] valid f1=8.0934e-01, mcc=1.8062e-01, roc_auc=7.1261e-01, accuracy=6.9375e-01, log_loss=5.9406e-01, time spent=0.170s, total_time=0.06min
2021-02-23 19:28:58,072 - root - INFO - [Iter 12/80, Epoch 0] train loss=3.8591e-01, gnorm=6.6273e+00, lr=3.7341e-05, #samples processed=96, #sample per second=161.44
2021-02-23 19:28:58,406 - root - INFO - [Iter 12/80, Epoch 0] valid f1=8.1102e-01, mcc=2.1015e-01, roc_auc=7.3166e-01, accuracy=7.0000e-01, log_loss=5.8493e-01, time spent=0.172s, total_time=0.07min
2021-02-23 19:28:58,657 - root - INFO - [Iter 14/80, Epoch 0] train loss=3.5634e-01, gnorm=3.7480e+00, lr=4.3564e-05, #samples processed=96, #sample per second=164.20
2021-02-23 19:28:58,966 - root - INFO - [Iter 14/80, Epoch 0] valid f1=7.8302e-01, mcc=3.5810e-01, roc_auc=7.4074e-01, accuracy=7.1250e-01, log_loss=5.7794e-01, time spent=0.172s, total_time=0.07min
2021-02-23 19:28:59,208 - root - INFO - [Iter 16/80, Epoch 0] train loss=4.3559e-01, gnorm=5.5241e+00, lr=4.2244e-05, #samples processed=96, #sample per second=174.44
2021-02-23 19:28:59,549 - root - INFO - [Iter 16/80, Epoch 0] valid f1=8.0508e-01, mcc=2.8690e-01, roc_auc=7.5053e-01, accuracy=7.1250e-01, log_loss=5.8084e-01, time spent=0.175s, total_time=0.08min
2021-02-23 19:28:59,884 - root - INFO - [Iter 18/80, Epoch 0] train loss=3.8332e-01, gnorm=7.9770e+00, lr=4.0924e-05, #samples processed=96, #sample per second=141.97
2021-02-23 19:29:00,059 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.0315e-01, mcc=1.6292e-01, roc_auc=7.6211e-01, accuracy=6.8750e-01, log_loss=6.2931e-01, time spent=0.174s, total_time=0.09min
2021-02-23 19:29:00,313 - root - INFO - [Iter 20/80, Epoch 0] train loss=4.3933e-01, gnorm=1.1573e+01, lr=3.9604e-05, #samples processed=96, #sample per second=223.98
2021-02-23 19:29:00,653 - root - INFO - [Iter 20/80, Epoch 0] valid f1=8.0349e-01, mcc=3.2090e-01, roc_auc=7.6727e-01, accuracy=7.1875e-01, log_loss=5.5580e-01, time spent=0.172s, total_time=0.10min
2021-02-23 19:29:00,902 - root - INFO - [Iter 22/80, Epoch 1] train loss=2.7659e-01, gnorm=5.0155e+00, lr=3.8284e-05, #samples processed=96, #sample per second=162.94
2021-02-23 19:29:01,077 - root - INFO - [Iter 22/80, Epoch 1] valid f1=7.6098e-01, mcc=3.4208e-01, roc_auc=7.6834e-01, accuracy=6.9375e-01, log_loss=5.9289e-01, time spent=0.174s, total_time=0.11min
2021-02-23 19:29:01,366 - root - INFO - [Iter 24/80, Epoch 1] train loss=4.5134e-01, gnorm=4.2386e+00, lr=3.6963e-05, #samples processed=96, #sample per second=207.25
2021-02-23 19:29:01,708 - root - INFO - [Iter 24/80, Epoch 1] valid f1=7.8873e-01, mcc=3.6875e-01, roc_auc=7.7439e-01, accuracy=7.1875e-01, log_loss=5.6384e-01, time spent=0.171s, total_time=0.12min
2021-02-23 19:29:02,009 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.9664e-01, gnorm=4.1588e+00, lr=3.5643e-05, #samples processed=96, #sample per second=149.41
2021-02-23 19:29:02,316 - root - INFO - [Iter 26/80, Epoch 1] valid f1=8.0176e-01, mcc=3.2629e-01, roc_auc=7.8579e-01, accuracy=7.1875e-01, log_loss=5.3537e-01, time spent=0.173s, total_time=0.13min
2021-02-23 19:29:02,558 - root - INFO - [Iter 28/80, Epoch 1] train loss=3.0989e-01, gnorm=3.6874e+00, lr=3.4323e-05, #samples processed=96, #sample per second=174.88
2021-02-23 19:29:02,845 - root - INFO - [Iter 28/80, Epoch 1] valid f1=8.2591e-01, mcc=3.2306e-01, roc_auc=7.9416e-01, accuracy=7.3125e-01, log_loss=5.5632e-01, time spent=0.171s, total_time=0.14min
2021-02-23 19:29:03,084 - root - INFO - [Iter 30/80, Epoch 1] train loss=3.9963e-01, gnorm=5.4321e+00, lr=3.3003e-05, #samples processed=96, #sample per second=182.57
2021-02-23 19:29:03,389 - root - INFO - [Iter 30/80, Epoch 1] valid f1=8.2927e-01, mcc=3.4292e-01, roc_auc=7.9861e-01, accuracy=7.3750e-01, log_loss=5.5059e-01, time spent=0.170s, total_time=0.15min
2021-02-23 19:29:03,620 - root - INFO - [Iter 32/80, Epoch 1] train loss=3.9360e-01, gnorm=4.1036e+00, lr=3.1683e-05, #samples processed=96, #sample per second=179.18
2021-02-23 19:29:03,910 - root - INFO - [Iter 32/80, Epoch 1] valid f1=8.2143e-01, mcc=4.0942e-01, roc_auc=7.9790e-01, accuracy=7.5000e-01, log_loss=5.1994e-01, time spent=0.171s, total_time=0.16min
2021-02-23 19:29:04,179 - root - INFO - [Iter 34/80, Epoch 1] train loss=4.0065e-01, gnorm=4.5816e+00, lr=3.0363e-05, #samples processed=96, #sample per second=171.78
2021-02-23 19:29:04,350 - root - INFO - [Iter 34/80, Epoch 1] valid f1=7.7358e-01, mcc=3.3012e-01, roc_auc=7.9825e-01, accuracy=7.0000e-01, log_loss=5.3487e-01, time spent=0.171s, total_time=0.16min
2021-02-23 19:29:04,602 - root - INFO - [Iter 36/80, Epoch 1] train loss=3.1762e-01, gnorm=3.3184e+00, lr=2.9043e-05, #samples processed=96, #sample per second=227.01
2021-02-23 19:29:04,776 - root - INFO - [Iter 36/80, Epoch 1] valid f1=8.0184e-01, mcc=3.8443e-01, roc_auc=7.9932e-01, accuracy=7.3125e-01, log_loss=5.2471e-01, time spent=0.173s, total_time=0.17min
2021-02-23 19:29:05,024 - root - INFO - [Iter 38/80, Epoch 1] train loss=3.0898e-01, gnorm=5.0123e+00, lr=2.7723e-05, #samples processed=96, #sample per second=228.00
2021-02-23 19:29:05,197 - root - INFO - [Iter 38/80, Epoch 1] valid f1=8.0519e-01, mcc=3.1569e-01, roc_auc=8.0039e-01, accuracy=7.1875e-01, log_loss=5.1552e-01, time spent=0.173s, total_time=0.18min
2021-02-23 19:29:05,445 - root - INFO - [Iter 40/80, Epoch 1] train loss=3.4842e-01, gnorm=3.9619e+00, lr=2.6402e-05, #samples processed=96, #sample per second=227.78
2021-02-23 19:29:05,618 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.0172e-01, mcc=2.9719e-01, roc_auc=8.0199e-01, accuracy=7.1250e-01, log_loss=5.1741e-01, time spent=0.172s, total_time=0.19min
2021-02-23 19:29:05,875 - root - INFO - [Iter 42/80, Epoch 2] train loss=3.4928e-01, gnorm=3.6998e+00, lr=2.5082e-05, #samples processed=96, #sample per second=223.34
2021-02-23 19:29:06,228 - root - INFO - [Iter 42/80, Epoch 2] valid f1=8.2883e-01, mcc=4.4372e-01, roc_auc=8.0395e-01, accuracy=7.6250e-01, log_loss=5.1524e-01, time spent=0.171s, total_time=0.20min
2021-02-23 19:29:06,478 - root - INFO - [Iter 44/80, Epoch 2] train loss=2.7810e-01, gnorm=3.0554e+00, lr=2.3762e-05, #samples processed=96, #sample per second=159.41
2021-02-23 19:29:06,651 - root - INFO - [Iter 44/80, Epoch 2] valid f1=8.2511e-01, mcc=4.2664e-01, roc_auc=8.0823e-01, accuracy=7.5625e-01, log_loss=5.1521e-01, time spent=0.173s, total_time=0.20min
2021-02-23 19:29:06,894 - root - INFO - [Iter 46/80, Epoch 2] train loss=2.6657e-01, gnorm=3.7362e+00, lr=2.2442e-05, #samples processed=96, #sample per second=230.58
2021-02-23 19:29:07,068 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.2028e-01, mcc=4.4171e-01, roc_auc=8.1001e-01, accuracy=7.5625e-01, log_loss=5.2239e-01, time spent=0.174s, total_time=0.21min
2021-02-23 19:29:07,324 - root - INFO - [Iter 48/80, Epoch 2] train loss=2.9092e-01, gnorm=7.5654e+00, lr=2.1122e-05, #samples processed=96, #sample per second=223.71
2021-02-23 19:29:07,637 - root - INFO - [Iter 48/80, Epoch 2] valid f1=8.2569e-01, mcc=4.5343e-01, roc_auc=8.1143e-01, accuracy=7.6250e-01, log_loss=5.2126e-01, time spent=0.173s, total_time=0.22min
2021-02-23 19:29:07,887 - root - INFO - [Iter 50/80, Epoch 2] train loss=3.3064e-01, gnorm=3.8873e+00, lr=1.9802e-05, #samples processed=96, #sample per second=170.31
2021-02-23 19:29:08,200 - root - INFO - [Iter 50/80, Epoch 2] valid f1=8.2727e-01, mcc=4.4843e-01, roc_auc=8.1392e-01, accuracy=7.6250e-01, log_loss=5.2052e-01, time spent=0.172s, total_time=0.23min
2021-02-23 19:29:08,445 - root - INFO - [Iter 52/80, Epoch 2] train loss=3.1184e-01, gnorm=4.1648e+00, lr=1.8482e-05, #samples processed=96, #sample per second=172.36
2021-02-23 19:29:08,754 - root - INFO - [Iter 52/80, Epoch 2] valid f1=8.2569e-01, mcc=4.5343e-01, roc_auc=8.1464e-01, accuracy=7.6250e-01, log_loss=5.2218e-01, time spent=0.171s, total_time=0.24min
2021-02-23 19:29:09,036 - root - INFO - [Iter 54/80, Epoch 2] train loss=2.5854e-01, gnorm=6.2130e+00, lr=1.7162e-05, #samples processed=96, #sample per second=162.37
2021-02-23 19:29:09,335 - root - INFO - [Iter 54/80, Epoch 2] valid f1=8.3105e-01, mcc=4.6537e-01, roc_auc=8.1660e-01, accuracy=7.6875e-01, log_loss=5.2243e-01, time spent=0.171s, total_time=0.25min
2021-02-23 19:29:09,590 - root - INFO - [Iter 56/80, Epoch 2] train loss=3.4897e-01, gnorm=8.9861e+00, lr=1.5841e-05, #samples processed=96, #sample per second=173.45
2021-02-23 19:29:09,760 - root - INFO - [Iter 56/80, Epoch 2] valid f1=8.2511e-01, mcc=4.2664e-01, roc_auc=8.1838e-01, accuracy=7.5625e-01, log_loss=5.2442e-01, time spent=0.170s, total_time=0.25min
2021-02-23 19:29:10,031 - root - INFO - [Iter 58/80, Epoch 2] train loss=4.1001e-01, gnorm=8.7218e+00, lr=1.4521e-05, #samples processed=96, #sample per second=217.62
2021-02-23 19:29:10,203 - root - INFO - [Iter 58/80, Epoch 2] valid f1=8.1416e-01, mcc=3.7455e-01, roc_auc=8.1731e-01, accuracy=7.3750e-01, log_loss=5.2858e-01, time spent=0.172s, total_time=0.26min
2021-02-23 19:29:10,464 - root - INFO - [Iter 60/80, Epoch 2] train loss=1.9524e-01, gnorm=6.4126e+00, lr=1.3201e-05, #samples processed=96, #sample per second=221.79
2021-02-23 19:29:10,641 - root - INFO - [Iter 60/80, Epoch 2] valid f1=8.2143e-01, mcc=4.0942e-01, roc_auc=8.1855e-01, accuracy=7.5000e-01, log_loss=5.2740e-01, time spent=0.176s, total_time=0.27min
2021-02-23 19:29:10,905 - root - INFO - [Iter 62/80, Epoch 3] train loss=2.6149e-01, gnorm=3.8227e+00, lr=1.1881e-05, #samples processed=96, #sample per second=217.50
2021-02-23 19:29:11,079 - root - INFO - [Iter 62/80, Epoch 3] valid f1=8.2569e-01, mcc=4.5343e-01, roc_auc=8.1766e-01, accuracy=7.6250e-01, log_loss=5.2213e-01, time spent=0.173s, total_time=0.28min
2021-02-23 19:29:11,326 - root - INFO - [Iter 64/80, Epoch 3] train loss=2.3328e-01, gnorm=9.4966e+00, lr=1.0561e-05, #samples processed=96, #sample per second=228.44
2021-02-23 19:29:11,500 - root - INFO - [Iter 64/80, Epoch 3] valid f1=8.2407e-01, mcc=4.5869e-01, roc_auc=8.1624e-01, accuracy=7.6250e-01, log_loss=5.2556e-01, time spent=0.174s, total_time=0.28min
2021-02-23 19:29:11,774 - root - INFO - [Iter 66/80, Epoch 3] train loss=2.2598e-01, gnorm=3.4263e+00, lr=9.2408e-06, #samples processed=96, #sample per second=214.43
2021-02-23 19:29:11,947 - root - INFO - [Iter 66/80, Epoch 3] valid f1=8.2407e-01, mcc=4.5869e-01, roc_auc=8.1588e-01, accuracy=7.6250e-01, log_loss=5.2522e-01, time spent=0.173s, total_time=0.29min
2021-02-23 19:29:12,206 - root - INFO - [Iter 68/80, Epoch 3] train loss=2.5200e-01, gnorm=6.4434e+00, lr=7.9207e-06, #samples processed=96, #sample per second=222.30
2021-02-23 19:29:12,517 - root - INFO - [Iter 68/80, Epoch 3] valid f1=8.2949e-01, mcc=4.7034e-01, roc_auc=8.1517e-01, accuracy=7.6875e-01, log_loss=5.2576e-01, time spent=0.173s, total_time=0.30min
2021-02-23 19:29:12,757 - root - INFO - [Iter 70/80, Epoch 3] train loss=2.8592e-01, gnorm=3.4654e+00, lr=6.6006e-06, #samples processed=96, #sample per second=174.10
2021-02-23 19:29:13,053 - root - INFO - [Iter 70/80, Epoch 3] valid f1=8.2949e-01, mcc=4.7034e-01, roc_auc=8.1446e-01, accuracy=7.6875e-01, log_loss=5.2593e-01, time spent=0.172s, total_time=0.31min
2021-02-23 19:29:13,311 - root - INFO - [Iter 72/80, Epoch 3] train loss=2.5687e-01, gnorm=3.5543e+00, lr=5.2805e-06, #samples processed=96, #sample per second=173.39
2021-02-23 19:29:13,659 - root - INFO - [Iter 72/80, Epoch 3] valid f1=8.2949e-01, mcc=4.7034e-01, roc_auc=8.1357e-01, accuracy=7.6875e-01, log_loss=5.2686e-01, time spent=0.176s, total_time=0.32min
2021-02-23 19:29:13,918 - root - INFO - [Iter 74/80, Epoch 3] train loss=2.6742e-01, gnorm=7.0542e+00, lr=3.9604e-06, #samples processed=96, #sample per second=158.38
2021-02-23 19:29:14,214 - root - INFO - [Iter 74/80, Epoch 3] valid f1=8.2949e-01, mcc=4.7034e-01, roc_auc=8.1339e-01, accuracy=7.6875e-01, log_loss=5.2716e-01, time spent=0.175s, total_time=0.33min
2021-02-23 19:29:14,460 - root - INFO - [Iter 76/80, Epoch 3] train loss=2.2228e-01, gnorm=5.8448e+00, lr=2.6402e-06, #samples processed=96, #sample per second=177.01
2021-02-23 19:29:14,633 - root - INFO - [Iter 76/80, Epoch 3] valid f1=8.2407e-01, mcc=4.5869e-01, roc_auc=8.1286e-01, accuracy=7.6250e-01, log_loss=5.2776e-01, time spent=0.172s, total_time=0.34min
2021-02-23 19:29:14,878 - root - INFO - [Iter 78/80, Epoch 3] train loss=2.0582e-01, gnorm=3.6147e+00, lr=1.3201e-06, #samples processed=96, #sample per second=229.69
2021-02-23 19:29:15,052 - root - INFO - [Iter 78/80, Epoch 3] valid f1=8.2407e-01, mcc=4.5869e-01, roc_auc=8.1268e-01, accuracy=7.6250e-01, log_loss=5.2767e-01, time spent=0.173s, total_time=0.34min
2021-02-23 19:29:15,323 - root - INFO - [Iter 80/80, Epoch 3] train loss=3.2737e-01, gnorm=4.7786e+00, lr=0.0000e+00, #samples processed=96, #sample per second=216.41
2021-02-23 19:29:15,497 - root - INFO - [Iter 80/80, Epoch 3] valid f1=8.2407e-01, mcc=4.5869e-01, roc_auc=8.1268e-01, accuracy=7.6250e-01, log_loss=5.2766e-01, time spent=0.174s, total_time=0.35min
dev_score = predictor_mrpc_bo.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc_bo.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc_bo.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
self._build_cache(*args)
Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.mid_units': 52, 'search_space▁optimization.layerwise_lr_decay': 0.8964584420622498, 'search_space▁optimization.lr': 4.356400129415584e-05, 'search_space▁optimization.warmup_portion': 0.17596939190261784}
Total Time = 57.08823013305664s
Accuracy = 76.72%
F1 = 83.70%
predictions = predictor_mrpc_bo.predict(dev_data)
prediction1 = predictor_mrpc_bo.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc_bo.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc_bo.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc_bo.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))
A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.01867648 0.98132354]"
A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "True"
Prob = "[0.3887104 0.61128956]"
Use Hyperband¶
Alternatively, we can instead use the Hyperband algorithm for HPO. Hyperband will try multiple hyperparameter configurations simultaneously and will early stop training under poor configurations to free compute resources for exploring new hyperparameter configurations. It may be able to identify good hyperparameter values more quickly than other search strategies in your applications.
scheduler_options = {'max_t': 40} # Maximal number of epochs for training the neural network
hyperparameters['hpo_params'] = {
'search_strategy': 'hyperband',
'scheduler_options': scheduler_options
}
predictor_mrpc_hyperband = task.fit(train_data, label='label',
hyperparameters=hyperparameters,
time_limits=60 * 2, ngpus_per_trial=1, seed=123,
output_directory='./ag_mrpc_custom_space_hyperband')
2021-02-23 19:29:17,426 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_custom_space_hyperband/ag_text_prediction.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_custom_space_hyperband/ag_text_prediction.log
2021-02-23 19:29:17,440 - autogluon.text.text_prediction.text_prediction - INFO - Train Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Train Dataset:
2021-02-23 19:29:17,441 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=46/116.8921875/197
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=42/117.6984375/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[215, 425]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=46/116.8921875/197
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=42/117.6984375/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[215, 425]
)
2021-02-23 19:29:17,442 - autogluon.text.text_prediction.text_prediction - INFO - Tuning Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Tuning Dataset:
2021-02-23 19:29:17,443 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=44/119.33125/200
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=50/118.4875/207
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[53, 107]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=44/119.33125/200
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=50/118.4875/207
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[53, 107]
)
2021-02-23 19:29:17,446 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_custom_space_hyperband/main.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_custom_space_hyperband/main.log
0%| | 0/3 [00:00<?, ?it/s]
(task:4) 2021-02-23 19:29:20,001 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_hyperband/task4/training.log
2021-02-23 19:29:20,001 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_hyperband/task4
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: False
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 80
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.9
log_frequency: 0.1
lr: 5.5e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.15
wd: 0.01
version: 1
2021-02-23 19:29:20,150 - root - INFO - Process training set...
2021-02-23 19:29:20,419 - root - INFO - Done!
2021-02-23 19:29:20,419 - root - INFO - Process dev set...
2021-02-23 19:29:20,483 - root - INFO - Done!
2021-02-23 19:29:25,695 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:29:25,711 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:29:26,637 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.3547e-01, gnorm=4.9765e+00, lr=9.1667e-06, #samples processed=96, #sample per second=107.95
2021-02-23 19:29:26,922 - root - INFO - [Iter 2/80, Epoch 0] valid f1=8.0150e-01, mcc=0.0000e+00, roc_auc=4.7858e-01, accuracy=6.6875e-01, log_loss=6.8746e-01, time spent=0.203s, total_time=0.02min
2021-02-23 19:29:27,267 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.1031e-01, gnorm=6.0039e+00, lr=1.8333e-05, #samples processed=96, #sample per second=152.41
2021-02-23 19:29:27,451 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.8927e-01, mcc=8.7370e-04, roc_auc=5.7644e-01, accuracy=6.5625e-01, log_loss=6.2832e-01, time spent=0.183s, total_time=0.03min
2021-02-23 19:29:27,798 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.2857e-01, gnorm=6.2404e+00, lr=2.7500e-05, #samples processed=96, #sample per second=180.93
2021-02-23 19:29:27,981 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.8049e-01, mcc=1.1970e-01, roc_auc=7.1240e-01, accuracy=6.6250e-01, log_loss=5.8070e-01, time spent=0.182s, total_time=0.04min
2021-02-23 19:29:28,253 - root - INFO - [Iter 8/80, Epoch 0] train loss=3.8460e-01, gnorm=6.2930e+00, lr=3.6667e-05, #samples processed=96, #sample per second=211.17
2021-02-23 19:29:28,583 - root - INFO - [Iter 8/80, Epoch 0] valid f1=8.0755e-01, mcc=1.5986e-01, roc_auc=7.8328e-01, accuracy=6.8125e-01, log_loss=6.0428e-01, time spent=0.183s, total_time=0.05min
2021-02-23 19:29:28,893 - root - INFO - [Iter 10/80, Epoch 0] train loss=4.2980e-01, gnorm=3.6170e+00, lr=4.5833e-05, #samples processed=96, #sample per second=149.92
2021-02-23 19:29:29,213 - root - INFO - [Iter 10/80, Epoch 0] valid f1=8.0755e-01, mcc=1.5986e-01, roc_auc=8.1132e-01, accuracy=6.8125e-01, log_loss=6.4549e-01, time spent=0.181s, total_time=0.06min
2021-02-23 19:29:29,465 - root - INFO - [Iter 12/80, Epoch 0] train loss=3.6511e-01, gnorm=2.9147e+00, lr=5.5000e-05, #samples processed=96, #sample per second=167.84
2021-02-23 19:29:29,769 - root - INFO - [Iter 12/80, Epoch 0] valid f1=8.2540e-01, mcc=3.2033e-01, roc_auc=8.1714e-01, accuracy=7.2500e-01, log_loss=5.1673e-01, time spent=0.182s, total_time=0.07min
2021-02-23 19:29:30,098 - root - INFO - [Iter 14/80, Epoch 0] train loss=3.8532e-01, gnorm=4.6245e+00, lr=5.3382e-05, #samples processed=96, #sample per second=151.86
2021-02-23 19:29:30,453 - root - INFO - [Iter 14/80, Epoch 0] valid f1=8.3333e-01, mcc=3.9200e-01, roc_auc=8.1855e-01, accuracy=7.5000e-01, log_loss=4.9029e-01, time spent=0.180s, total_time=0.08min
2021-02-23 19:29:30,744 - root - INFO - [Iter 16/80, Epoch 0] train loss=4.0137e-01, gnorm=7.5817e+00, lr=5.1765e-05, #samples processed=96, #sample per second=148.66
2021-02-23 19:29:31,039 - root - INFO - [Iter 16/80, Epoch 0] valid f1=8.3843e-01, mcc=4.4974e-01, roc_auc=8.2014e-01, accuracy=7.6875e-01, log_loss=4.7919e-01, time spent=0.178s, total_time=0.09min
2021-02-23 19:29:31,374 - root - INFO - [Iter 18/80, Epoch 0] train loss=4.1220e-01, gnorm=4.6400e+00, lr=5.0147e-05, #samples processed=96, #sample per second=152.25
2021-02-23 19:29:31,559 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.2969e-01, mcc=4.1853e-01, roc_auc=8.2155e-01, accuracy=7.5625e-01, log_loss=4.8127e-01, time spent=0.184s, total_time=0.10min
2021-02-23 19:29:31,849 - root - INFO - [Iter 20/80, Epoch 0] train loss=3.9729e-01, gnorm=3.3208e+00, lr=4.8529e-05, #samples processed=96, #sample per second=202.43
2021-02-23 19:29:32,028 - root - INFO - [Iter 20/80, Epoch 0] valid f1=7.5789e-01, mcc=4.3835e-01, roc_auc=8.2349e-01, accuracy=7.1250e-01, log_loss=5.3774e-01, time spent=0.179s, total_time=0.10min
2021-02-23 19:29:32,289 - root - INFO - [Iter 22/80, Epoch 1] train loss=3.1556e-01, gnorm=8.4473e+00, lr=4.6912e-05, #samples processed=96, #sample per second=218.21
2021-02-23 19:29:32,471 - root - INFO - [Iter 22/80, Epoch 1] valid f1=8.2008e-01, mcc=3.3987e-01, roc_auc=8.2084e-01, accuracy=7.3125e-01, log_loss=4.9304e-01, time spent=0.182s, total_time=0.11min
2021-02-23 19:29:32,728 - root - INFO - [Iter 24/80, Epoch 1] train loss=3.7162e-01, gnorm=2.9947e+00, lr=4.5294e-05, #samples processed=96, #sample per second=218.54
2021-02-23 19:29:32,912 - root - INFO - [Iter 24/80, Epoch 1] valid f1=8.2449e-01, mcc=3.3596e-01, roc_auc=8.2560e-01, accuracy=7.3125e-01, log_loss=4.9650e-01, time spent=0.184s, total_time=0.12min
2021-02-23 19:29:33,155 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.4470e-01, gnorm=5.4141e+00, lr=4.3676e-05, #samples processed=96, #sample per second=225.33
2021-02-23 19:29:33,334 - root - INFO - [Iter 26/80, Epoch 1] valid f1=8.2819e-01, mcc=4.2167e-01, roc_auc=8.2878e-01, accuracy=7.5625e-01, log_loss=4.7930e-01, time spent=0.179s, total_time=0.13min
2021-02-23 19:29:33,591 - root - INFO - [Iter 28/80, Epoch 1] train loss=3.1994e-01, gnorm=5.6954e+00, lr=4.2059e-05, #samples processed=96, #sample per second=219.97
2021-02-23 19:29:33,774 - root - INFO - [Iter 28/80, Epoch 1] valid f1=8.2301e-01, mcc=4.0817e-01, roc_auc=8.4006e-01, accuracy=7.5000e-01, log_loss=4.6406e-01, time spent=0.183s, total_time=0.13min
2021-02-23 19:29:34,014 - root - INFO - [Iter 30/80, Epoch 1] train loss=4.1321e-01, gnorm=3.8550e+00, lr=4.0441e-05, #samples processed=96, #sample per second=227.05
2021-02-23 19:29:34,196 - root - INFO - [Iter 30/80, Epoch 1] valid f1=8.4462e-01, mcc=4.2936e-01, roc_auc=8.4817e-01, accuracy=7.5625e-01, log_loss=5.2434e-01, time spent=0.181s, total_time=0.14min
2021-02-23 19:29:34,439 - root - INFO - [Iter 32/80, Epoch 1] train loss=4.4192e-01, gnorm=1.0692e+01, lr=3.8824e-05, #samples processed=96, #sample per second=226.02
2021-02-23 19:29:34,803 - root - INFO - [Iter 32/80, Epoch 1] valid f1=8.4898e-01, mcc=4.5164e-01, roc_auc=8.4923e-01, accuracy=7.6875e-01, log_loss=4.9600e-01, time spent=0.180s, total_time=0.15min
2021-02-23 19:29:35,052 - root - INFO - [Iter 34/80, Epoch 1] train loss=3.1296e-01, gnorm=2.3973e+00, lr=3.7206e-05, #samples processed=96, #sample per second=156.74
2021-02-23 19:29:35,234 - root - INFO - [Iter 34/80, Epoch 1] valid f1=8.2819e-01, mcc=4.2167e-01, roc_auc=8.4359e-01, accuracy=7.5625e-01, log_loss=4.5439e-01, time spent=0.182s, total_time=0.16min
2021-02-23 19:29:35,556 - root - INFO - [Iter 36/80, Epoch 1] train loss=3.3151e-01, gnorm=2.5929e+00, lr=3.5588e-05, #samples processed=96, #sample per second=190.49
2021-02-23 19:29:35,736 - root - INFO - [Iter 36/80, Epoch 1] valid f1=8.3186e-01, mcc=4.3858e-01, roc_auc=8.4218e-01, accuracy=7.6250e-01, log_loss=4.5604e-01, time spent=0.180s, total_time=0.17min
2021-02-23 19:29:35,987 - root - INFO - [Iter 38/80, Epoch 1] train loss=3.5329e-01, gnorm=2.8889e+00, lr=3.3971e-05, #samples processed=96, #sample per second=222.99
2021-02-23 19:29:36,296 - root - INFO - [Iter 38/80, Epoch 1] valid f1=8.4685e-01, mcc=5.0486e-01, roc_auc=8.4183e-01, accuracy=7.8750e-01, log_loss=4.5638e-01, time spent=0.183s, total_time=0.18min
2021-02-23 19:29:36,542 - root - INFO - [Iter 40/80, Epoch 1] train loss=3.5500e-01, gnorm=3.0678e+00, lr=3.2353e-05, #samples processed=96, #sample per second=172.84
2021-02-23 19:29:36,723 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.1731e-01, mcc=4.8046e-01, roc_auc=8.3953e-01, accuracy=7.6250e-01, log_loss=4.7714e-01, time spent=0.181s, total_time=0.18min
2021-02-23 19:29:36,978 - root - INFO - [Iter 42/80, Epoch 2] train loss=2.9320e-01, gnorm=3.2170e+00, lr=3.0735e-05, #samples processed=96, #sample per second=220.38
2021-02-23 19:29:37,158 - root - INFO - [Iter 42/80, Epoch 2] valid f1=8.3178e-01, mcc=4.9215e-01, roc_auc=8.4165e-01, accuracy=7.7500e-01, log_loss=4.6688e-01, time spent=0.180s, total_time=0.19min
2021-02-23 19:29:37,412 - root - INFO - [Iter 44/80, Epoch 2] train loss=2.3462e-01, gnorm=2.9665e+00, lr=2.9118e-05, #samples processed=96, #sample per second=221.32
2021-02-23 19:29:37,595 - root - INFO - [Iter 44/80, Epoch 2] valid f1=8.4018e-01, mcc=4.9551e-01, roc_auc=8.4200e-01, accuracy=7.8125e-01, log_loss=4.5733e-01, time spent=0.183s, total_time=0.20min
2021-02-23 19:29:37,847 - root - INFO - [Iter 46/80, Epoch 2] train loss=2.8292e-01, gnorm=8.6002e+00, lr=2.7500e-05, #samples processed=96, #sample per second=220.76
2021-02-23 19:29:38,026 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.4348e-01, mcc=4.6435e-01, roc_auc=8.4447e-01, accuracy=7.7500e-01, log_loss=4.6617e-01, time spent=0.179s, total_time=0.20min
2021-02-23 19:29:38,266 - root - INFO - [Iter 48/80, Epoch 2] train loss=3.8576e-01, gnorm=1.5435e+01, lr=2.5882e-05, #samples processed=96, #sample per second=229.38
2021-02-23 19:29:38,446 - root - INFO - [Iter 48/80, Epoch 2] valid f1=8.3486e-01, mcc=4.8309e-01, roc_auc=8.4482e-01, accuracy=7.7500e-01, log_loss=4.5216e-01, time spent=0.180s, total_time=0.21min
2021-02-23 19:29:38,705 - root - INFO - [Iter 50/80, Epoch 2] train loss=3.0190e-01, gnorm=6.3876e+00, lr=2.4265e-05, #samples processed=96, #sample per second=218.75
2021-02-23 19:29:38,888 - root - INFO - [Iter 50/80, Epoch 2] valid f1=8.3412e-01, mcc=5.1366e-01, roc_auc=8.4430e-01, accuracy=7.8125e-01, log_loss=4.5926e-01, time spent=0.183s, total_time=0.22min
2021-02-23 19:29:39,143 - root - INFO - [Iter 52/80, Epoch 2] train loss=3.5359e-01, gnorm=8.5257e+00, lr=2.2647e-05, #samples processed=96, #sample per second=218.93
2021-02-23 19:29:39,324 - root - INFO - [Iter 52/80, Epoch 2] valid f1=8.4018e-01, mcc=4.9551e-01, roc_auc=8.4659e-01, accuracy=7.8125e-01, log_loss=4.4922e-01, time spent=0.181s, total_time=0.23min
2021-02-23 19:29:39,562 - root - INFO - [Iter 54/80, Epoch 2] train loss=2.8777e-01, gnorm=3.6168e+00, lr=2.1029e-05, #samples processed=96, #sample per second=229.22
2021-02-23 19:29:39,743 - root - INFO - [Iter 54/80, Epoch 2] valid f1=8.4255e-01, mcc=4.4485e-01, roc_auc=8.4959e-01, accuracy=7.6875e-01, log_loss=4.7122e-01, time spent=0.181s, total_time=0.23min
2021-02-23 19:29:40,006 - root - INFO - [Iter 56/80, Epoch 2] train loss=4.2096e-01, gnorm=9.4590e+00, lr=1.9412e-05, #samples processed=96, #sample per second=216.45
2021-02-23 19:29:40,186 - root - INFO - [Iter 56/80, Epoch 2] valid f1=8.3471e-01, mcc=3.9201e-01, roc_auc=8.5241e-01, accuracy=7.5000e-01, log_loss=4.8570e-01, time spent=0.180s, total_time=0.24min
2021-02-23 19:29:40,451 - root - INFO - [Iter 58/80, Epoch 2] train loss=2.8379e-01, gnorm=2.8715e+00, lr=1.7794e-05, #samples processed=96, #sample per second=215.74
2021-02-23 19:29:40,634 - root - INFO - [Iter 58/80, Epoch 2] valid f1=8.4874e-01, mcc=4.6170e-01, roc_auc=8.5329e-01, accuracy=7.7500e-01, log_loss=4.6999e-01, time spent=0.183s, total_time=0.25min
2021-02-23 19:29:40,638 - root - INFO - Early stopping patience reached!
(task:5) 2021-02-23 19:29:43,358 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_hyperband/task5/training.log
2021-02-23 19:29:43,358 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_hyperband/task5
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: False
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 93
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.9705516632779506
log_frequency: 0.1
lr: 7.967709362655271e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.19737539140923366
wd: 0.01
version: 1
2021-02-23 19:29:43,516 - root - INFO - Process training set...
2021-02-23 19:29:43,776 - root - INFO - Done!
2021-02-23 19:29:43,776 - root - INFO - Process dev set...
2021-02-23 19:29:43,839 - root - INFO - Done!
2021-02-23 19:29:49,098 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:29:49,125 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:29:50,095 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.3567e-01, gnorm=5.0479e+00, lr=1.0624e-05, #samples processed=96, #sample per second=103.90
2021-02-23 19:29:50,370 - root - INFO - [Iter 2/80, Epoch 0] valid f1=8.0150e-01, mcc=0.0000e+00, roc_auc=4.9162e-01, accuracy=6.6875e-01, log_loss=6.7458e-01, time spent=0.191s, total_time=0.02min
2021-02-23 19:29:50,715 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.0404e-01, gnorm=5.5539e+00, lr=2.1247e-05, #samples processed=96, #sample per second=154.82
2021-02-23 19:29:50,897 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.8906e-01, mcc=7.1177e-02, roc_auc=6.3781e-01, accuracy=6.6250e-01, log_loss=6.0975e-01, time spent=0.182s, total_time=0.03min
2021-02-23 19:29:51,231 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.1086e-01, gnorm=6.1494e+00, lr=3.1871e-05, #samples processed=96, #sample per second=186.14
2021-02-23 19:29:51,532 - root - INFO - [Iter 6/80, Epoch 0] valid f1=8.0934e-01, mcc=2.0229e-01, roc_auc=7.5965e-01, accuracy=6.9375e-01, log_loss=5.4910e-01, time spent=0.178s, total_time=0.04min
2021-02-23 19:29:51,811 - root - INFO - [Iter 8/80, Epoch 0] train loss=3.5924e-01, gnorm=3.7307e+00, lr=4.2494e-05, #samples processed=96, #sample per second=165.39
2021-02-23 19:29:51,992 - root - INFO - [Iter 8/80, Epoch 0] valid f1=8.0755e-01, mcc=1.5986e-01, roc_auc=8.0797e-01, accuracy=6.8125e-01, log_loss=6.5244e-01, time spent=0.180s, total_time=0.05min
2021-02-23 19:29:52,321 - root - INFO - [Iter 10/80, Epoch 0] train loss=4.5007e-01, gnorm=3.6055e+00, lr=5.3118e-05, #samples processed=96, #sample per second=188.26
2021-02-23 19:29:52,680 - root - INFO - [Iter 10/80, Epoch 0] valid f1=8.1569e-01, mcc=2.5334e-01, roc_auc=8.2208e-01, accuracy=7.0625e-01, log_loss=5.5606e-01, time spent=0.179s, total_time=0.06min
2021-02-23 19:29:52,922 - root - INFO - [Iter 12/80, Epoch 0] train loss=3.8468e-01, gnorm=8.9216e+00, lr=6.3742e-05, #samples processed=96, #sample per second=159.76
2021-02-23 19:29:53,222 - root - INFO - [Iter 12/80, Epoch 0] valid f1=8.2727e-01, mcc=4.4989e-01, roc_auc=8.2437e-01, accuracy=7.6250e-01, log_loss=4.7454e-01, time spent=0.181s, total_time=0.07min
2021-02-23 19:29:53,545 - root - INFO - [Iter 14/80, Epoch 0] train loss=3.6837e-01, gnorm=4.4309e+00, lr=7.4365e-05, #samples processed=96, #sample per second=154.13
2021-02-23 19:29:53,724 - root - INFO - [Iter 14/80, Epoch 0] valid f1=8.1061e-01, mcc=1.9641e-01, roc_auc=8.2455e-01, accuracy=6.8750e-01, log_loss=6.7877e-01, time spent=0.178s, total_time=0.08min
2021-02-23 19:29:54,007 - root - INFO - [Iter 16/80, Epoch 0] train loss=4.1462e-01, gnorm=2.7626e+00, lr=7.8451e-05, #samples processed=96, #sample per second=208.23
2021-02-23 19:29:54,187 - root - INFO - [Iter 16/80, Epoch 0] valid f1=8.1250e-01, mcc=2.2862e-01, roc_auc=8.2490e-01, accuracy=7.0000e-01, log_loss=5.3734e-01, time spent=0.180s, total_time=0.08min
2021-02-23 19:29:54,511 - root - INFO - [Iter 18/80, Epoch 0] train loss=3.9732e-01, gnorm=5.3100e+00, lr=7.6000e-05, #samples processed=96, #sample per second=190.30
2021-02-23 19:29:54,692 - root - INFO - [Iter 18/80, Epoch 0] valid f1=7.5393e-01, mcc=4.2081e-01, roc_auc=7.9968e-01, accuracy=7.0625e-01, log_loss=5.7128e-01, time spent=0.181s, total_time=0.09min
2021-02-23 19:29:54,967 - root - INFO - [Iter 20/80, Epoch 0] train loss=4.0383e-01, gnorm=3.3730e+00, lr=7.3548e-05, #samples processed=96, #sample per second=210.58
2021-02-23 19:29:55,150 - root - INFO - [Iter 20/80, Epoch 0] valid f1=8.1452e-01, mcc=2.7529e-01, roc_auc=7.9580e-01, accuracy=7.1250e-01, log_loss=5.2285e-01, time spent=0.182s, total_time=0.10min
2021-02-23 19:29:55,410 - root - INFO - [Iter 22/80, Epoch 1] train loss=3.1305e-01, gnorm=6.7776e+00, lr=7.1096e-05, #samples processed=96, #sample per second=216.90
2021-02-23 19:29:55,592 - root - INFO - [Iter 22/80, Epoch 1] valid f1=8.2906e-01, mcc=3.9609e-01, roc_auc=8.2120e-01, accuracy=7.5000e-01, log_loss=4.9324e-01, time spent=0.182s, total_time=0.11min
2021-02-23 19:29:55,844 - root - INFO - [Iter 24/80, Epoch 1] train loss=3.6631e-01, gnorm=3.1125e+00, lr=6.8645e-05, #samples processed=96, #sample per second=221.26
2021-02-23 19:29:56,025 - root - INFO - [Iter 24/80, Epoch 1] valid f1=8.2927e-01, mcc=3.5565e-01, roc_auc=8.4553e-01, accuracy=7.3750e-01, log_loss=4.9195e-01, time spent=0.181s, total_time=0.11min
2021-02-23 19:29:56,278 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.5676e-01, gnorm=4.7436e+00, lr=6.6193e-05, #samples processed=96, #sample per second=221.47
2021-02-23 19:29:56,621 - root - INFO - [Iter 26/80, Epoch 1] valid f1=8.6555e-01, mcc=5.3064e-01, roc_auc=8.4782e-01, accuracy=8.0000e-01, log_loss=4.7292e-01, time spent=0.181s, total_time=0.12min
2021-02-23 19:29:56,880 - root - INFO - [Iter 28/80, Epoch 1] train loss=2.8513e-01, gnorm=4.4106e+00, lr=6.3742e-05, #samples processed=96, #sample per second=159.54
2021-02-23 19:29:57,062 - root - INFO - [Iter 28/80, Epoch 1] valid f1=8.5470e-01, mcc=4.9455e-01, roc_auc=8.4482e-01, accuracy=7.8750e-01, log_loss=4.6802e-01, time spent=0.183s, total_time=0.13min
2021-02-23 19:29:57,300 - root - INFO - [Iter 30/80, Epoch 1] train loss=4.1381e-01, gnorm=6.2963e+00, lr=6.1290e-05, #samples processed=96, #sample per second=228.31
2021-02-23 19:29:57,479 - root - INFO - [Iter 30/80, Epoch 1] valid f1=8.4337e-01, mcc=4.2183e-01, roc_auc=8.4588e-01, accuracy=7.5625e-01, log_loss=5.6043e-01, time spent=0.179s, total_time=0.14min
2021-02-23 19:29:57,730 - root - INFO - [Iter 32/80, Epoch 1] train loss=4.6173e-01, gnorm=5.6386e+00, lr=5.8838e-05, #samples processed=96, #sample per second=223.30
2021-02-23 19:29:57,911 - root - INFO - [Iter 32/80, Epoch 1] valid f1=7.0000e-01, mcc=3.7808e-01, roc_auc=8.3759e-01, accuracy=6.6250e-01, log_loss=5.6964e-01, time spent=0.180s, total_time=0.15min
2021-02-23 19:29:58,165 - root - INFO - [Iter 34/80, Epoch 1] train loss=5.4206e-01, gnorm=1.6676e+01, lr=5.6387e-05, #samples processed=96, #sample per second=220.76
2021-02-23 19:29:58,349 - root - INFO - [Iter 34/80, Epoch 1] valid f1=6.8966e-01, mcc=4.0896e-01, roc_auc=8.4500e-01, accuracy=6.6250e-01, log_loss=5.7221e-01, time spent=0.184s, total_time=0.15min
2021-02-23 19:29:58,658 - root - INFO - [Iter 36/80, Epoch 1] train loss=4.5525e-01, gnorm=5.6364e+00, lr=5.3935e-05, #samples processed=96, #sample per second=194.85
2021-02-23 19:29:58,839 - root - INFO - [Iter 36/80, Epoch 1] valid f1=8.0608e-01, mcc=1.4247e-01, roc_auc=8.4835e-01, accuracy=6.8125e-01, log_loss=5.6085e-01, time spent=0.180s, total_time=0.16min
2021-02-23 19:29:59,083 - root - INFO - [Iter 38/80, Epoch 1] train loss=4.7857e-01, gnorm=8.8923e+00, lr=5.1484e-05, #samples processed=96, #sample per second=225.86
2021-02-23 19:29:59,263 - root - INFO - [Iter 38/80, Epoch 1] valid f1=8.0150e-01, mcc=0.0000e+00, roc_auc=8.3812e-01, accuracy=6.6875e-01, log_loss=7.0589e-01, time spent=0.179s, total_time=0.17min
2021-02-23 19:29:59,512 - root - INFO - [Iter 40/80, Epoch 1] train loss=4.9274e-01, gnorm=7.2955e+00, lr=4.9032e-05, #samples processed=96, #sample per second=223.84
2021-02-23 19:29:59,692 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.4898e-01, mcc=4.5164e-01, roc_auc=8.4959e-01, accuracy=7.6875e-01, log_loss=4.8941e-01, time spent=0.179s, total_time=0.18min
2021-02-23 19:29:59,948 - root - INFO - [Iter 42/80, Epoch 2] train loss=3.2521e-01, gnorm=3.2260e+00, lr=4.6580e-05, #samples processed=96, #sample per second=220.34
2021-02-23 19:30:00,130 - root - INFO - [Iter 42/80, Epoch 2] valid f1=6.0377e-01, mcc=3.7495e-01, roc_auc=8.2613e-01, accuracy=6.0625e-01, log_loss=6.2655e-01, time spent=0.181s, total_time=0.18min
2021-02-23 19:30:00,380 - root - INFO - [Iter 44/80, Epoch 2] train loss=4.0624e-01, gnorm=1.3343e+01, lr=4.4129e-05, #samples processed=96, #sample per second=222.42
2021-02-23 19:30:00,561 - root - INFO - [Iter 44/80, Epoch 2] valid f1=6.7066e-01, mcc=4.3544e-01, roc_auc=8.3089e-01, accuracy=6.5625e-01, log_loss=6.0322e-01, time spent=0.180s, total_time=0.19min
2021-02-23 19:30:00,808 - root - INFO - [Iter 46/80, Epoch 2] train loss=3.3165e-01, gnorm=3.4014e+00, lr=4.1677e-05, #samples processed=96, #sample per second=224.27
2021-02-23 19:30:00,988 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.3333e-01, mcc=5.4715e-01, roc_auc=8.3583e-01, accuracy=7.8750e-01, log_loss=5.0189e-01, time spent=0.179s, total_time=0.20min
2021-02-23 19:30:00,990 - root - INFO - Early stopping patience reached!
2021-02-23 19:30:30,995 - autogluon.text.text_prediction.text_prediction - INFO - Results=
INFO:autogluon.text.text_prediction.text_prediction:Results=
2021-02-23 19:30:30,999 - autogluon.text.text_prediction.text_prediction - INFO - Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.mid_units': 93, 'search_space▁optimization.layerwise_lr_decay': 0.9705516632779506, 'search_space▁optimization.lr': 7.967709362655271e-05, 'search_space▁optimization.warmup_portion': 0.19737539140923366}
INFO:autogluon.text.text_prediction.text_prediction:Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.mid_units': 93, 'search_space▁optimization.layerwise_lr_decay': 0.9705516632779506, 'search_space▁optimization.lr': 7.967709362655271e-05, 'search_space▁optimization.warmup_portion': 0.19737539140923366}
(task:6) 2021-02-23 19:30:04,250 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_hyperband/task6/training.log
2021-02-23 19:30:04,250 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_hyperband/task6
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: True
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 98
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.912355300099337
log_frequency: 0.1
lr: 7.14729072876085e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.19366799436501417
wd: 0.01
version: 1
2021-02-23 19:30:04,396 - root - INFO - Process training set...
2021-02-23 19:30:04,670 - root - INFO - Done!
2021-02-23 19:30:04,670 - root - INFO - Process dev set...
2021-02-23 19:30:04,733 - root - INFO - Done!
2021-02-23 19:30:09,994 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:30:10,011 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:30:10,972 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.3549e-01, gnorm=4.9864e+00, lr=9.5297e-06, #samples processed=96, #sample per second=103.96
2021-02-23 19:30:11,256 - root - INFO - [Iter 2/80, Epoch 0] valid f1=8.0150e-01, mcc=0.0000e+00, roc_auc=4.8069e-01, accuracy=6.6875e-01, log_loss=6.8502e-01, time spent=0.202s, total_time=0.02min
2021-02-23 19:30:11,611 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.0901e-01, gnorm=5.9010e+00, lr=1.9059e-05, #samples processed=96, #sample per second=150.25
2021-02-23 19:30:11,792 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.8462e-01, mcc=-2.0694e-02, roc_auc=5.8596e-01, accuracy=6.5000e-01, log_loss=6.2470e-01, time spent=0.180s, total_time=0.03min
2021-02-23 19:30:12,177 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.2576e-01, gnorm=6.3123e+00, lr=2.8589e-05, #samples processed=96, #sample per second=169.65
2021-02-23 19:30:12,511 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.9032e-01, mcc=1.5214e-01, roc_auc=7.2104e-01, accuracy=6.7500e-01, log_loss=5.7363e-01, time spent=0.182s, total_time=0.04min
2021-02-23 19:30:12,791 - root - INFO - [Iter 8/80, Epoch 0] train loss=3.7816e-01, gnorm=5.6389e+00, lr=3.8119e-05, #samples processed=96, #sample per second=156.35
2021-02-23 19:30:13,097 - root - INFO - [Iter 8/80, Epoch 0] valid f1=8.0755e-01, mcc=1.5986e-01, roc_auc=7.9122e-01, accuracy=6.8125e-01, log_loss=6.2159e-01, time spent=0.179s, total_time=0.05min
2021-02-23 19:30:13,411 - root - INFO - [Iter 10/80, Epoch 0] train loss=4.3765e-01, gnorm=3.6367e+00, lr=4.7649e-05, #samples processed=96, #sample per second=155.04
2021-02-23 19:30:13,722 - root - INFO - [Iter 10/80, Epoch 0] valid f1=8.0755e-01, mcc=1.5986e-01, roc_auc=8.1520e-01, accuracy=6.8125e-01, log_loss=6.4259e-01, time spent=0.182s, total_time=0.06min
2021-02-23 19:30:13,966 - root - INFO - [Iter 12/80, Epoch 0] train loss=3.6668e-01, gnorm=2.8876e+00, lr=5.7178e-05, #samples processed=96, #sample per second=172.92
2021-02-23 19:30:14,268 - root - INFO - [Iter 12/80, Epoch 0] valid f1=8.3401e-01, mcc=3.7643e-01, roc_auc=8.2084e-01, accuracy=7.4375e-01, log_loss=5.0659e-01, time spent=0.181s, total_time=0.07min
2021-02-23 19:30:14,595 - root - INFO - [Iter 14/80, Epoch 0] train loss=3.7909e-01, gnorm=4.3793e+00, lr=6.6708e-05, #samples processed=96, #sample per second=152.77
2021-02-23 19:30:14,775 - root - INFO - [Iter 14/80, Epoch 0] valid f1=8.3200e-01, mcc=3.6063e-01, roc_auc=8.1908e-01, accuracy=7.3750e-01, log_loss=5.1345e-01, time spent=0.180s, total_time=0.08min
2021-02-23 19:30:15,066 - root - INFO - [Iter 16/80, Epoch 0] train loss=4.0017e-01, gnorm=7.3508e+00, lr=7.0373e-05, #samples processed=96, #sample per second=203.61
2021-02-23 19:30:15,249 - root - INFO - [Iter 16/80, Epoch 0] valid f1=8.2449e-01, mcc=3.3596e-01, roc_auc=8.1996e-01, accuracy=7.3125e-01, log_loss=4.9105e-01, time spent=0.181s, total_time=0.09min
2021-02-23 19:30:15,575 - root - INFO - [Iter 18/80, Epoch 0] train loss=4.0456e-01, gnorm=4.4914e+00, lr=6.8174e-05, #samples processed=96, #sample per second=189.07
2021-02-23 19:30:15,890 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.0952e-01, mcc=4.4696e-01, roc_auc=8.1961e-01, accuracy=7.5000e-01, log_loss=5.1007e-01, time spent=0.182s, total_time=0.10min
2021-02-23 19:30:16,183 - root - INFO - [Iter 20/80, Epoch 0] train loss=3.9422e-01, gnorm=4.2783e+00, lr=6.5975e-05, #samples processed=96, #sample per second=158.00
2021-02-23 19:30:16,507 - root - INFO - [Iter 20/80, Epoch 0] valid f1=8.2791e-01, mcc=4.7560e-01, roc_auc=8.1150e-01, accuracy=7.6875e-01, log_loss=5.1753e-01, time spent=0.182s, total_time=0.11min
2021-02-23 19:30:16,760 - root - INFO - [Iter 22/80, Epoch 1] train loss=2.8193e-01, gnorm=2.5907e+00, lr=6.3776e-05, #samples processed=96, #sample per second=166.30
2021-02-23 19:30:16,940 - root - INFO - [Iter 22/80, Epoch 1] valid f1=8.1061e-01, mcc=1.9641e-01, roc_auc=8.1502e-01, accuracy=6.8750e-01, log_loss=6.6950e-01, time spent=0.180s, total_time=0.11min
2021-02-23 19:30:17,193 - root - INFO - [Iter 24/80, Epoch 1] train loss=4.8727e-01, gnorm=8.6890e+00, lr=6.1577e-05, #samples processed=96, #sample per second=221.65
2021-02-23 19:30:17,379 - root - INFO - [Iter 24/80, Epoch 1] valid f1=8.1226e-01, mcc=2.1056e-01, roc_auc=8.2790e-01, accuracy=6.9375e-01, log_loss=6.2069e-01, time spent=0.185s, total_time=0.12min
2021-02-23 19:30:17,624 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.8881e-01, gnorm=4.6038e+00, lr=5.9377e-05, #samples processed=96, #sample per second=222.74
2021-02-23 19:30:17,806 - root - INFO - [Iter 26/80, Epoch 1] valid f1=8.2192e-01, mcc=4.3756e-01, roc_auc=8.3442e-01, accuracy=7.5625e-01, log_loss=4.7729e-01, time spent=0.181s, total_time=0.13min
2021-02-23 19:30:18,063 - root - INFO - [Iter 28/80, Epoch 1] train loss=3.4256e-01, gnorm=9.3340e+00, lr=5.7178e-05, #samples processed=96, #sample per second=219.19
2021-02-23 19:30:18,376 - root - INFO - [Iter 28/80, Epoch 1] valid f1=8.2524e-01, mcc=5.1383e-01, roc_auc=8.4959e-01, accuracy=7.7500e-01, log_loss=4.7259e-01, time spent=0.181s, total_time=0.14min
2021-02-23 19:30:18,612 - root - INFO - [Iter 30/80, Epoch 1] train loss=4.3167e-01, gnorm=4.8754e+00, lr=5.4979e-05, #samples processed=96, #sample per second=174.88
2021-02-23 19:30:18,795 - root - INFO - [Iter 30/80, Epoch 1] valid f1=8.2540e-01, mcc=3.2033e-01, roc_auc=8.5805e-01, accuracy=7.2500e-01, log_loss=5.3567e-01, time spent=0.183s, total_time=0.15min
2021-02-23 19:30:19,033 - root - INFO - [Iter 32/80, Epoch 1] train loss=4.7593e-01, gnorm=1.1158e+01, lr=5.2780e-05, #samples processed=96, #sample per second=227.80
2021-02-23 19:30:19,217 - root - INFO - [Iter 32/80, Epoch 1] valid f1=8.3004e-01, mcc=3.4600e-01, roc_auc=8.5928e-01, accuracy=7.3125e-01, log_loss=5.4812e-01, time spent=0.184s, total_time=0.15min
2021-02-23 19:30:19,476 - root - INFO - [Iter 34/80, Epoch 1] train loss=3.2492e-01, gnorm=2.3086e+00, lr=5.0581e-05, #samples processed=96, #sample per second=216.71
2021-02-23 19:30:19,657 - root - INFO - [Iter 34/80, Epoch 1] valid f1=8.3700e-01, mcc=4.5234e-01, roc_auc=8.5382e-01, accuracy=7.6875e-01, log_loss=4.4575e-01, time spent=0.180s, total_time=0.16min
2021-02-23 19:30:19,974 - root - INFO - [Iter 36/80, Epoch 1] train loss=3.2427e-01, gnorm=2.4882e+00, lr=4.8382e-05, #samples processed=96, #sample per second=192.91
2021-02-23 19:30:20,155 - root - INFO - [Iter 36/80, Epoch 1] valid f1=8.3761e-01, mcc=4.2891e-01, roc_auc=8.5241e-01, accuracy=7.6250e-01, log_loss=4.5220e-01, time spent=0.180s, total_time=0.17min
2021-02-23 19:30:20,395 - root - INFO - [Iter 38/80, Epoch 1] train loss=3.5534e-01, gnorm=3.1772e+00, lr=4.6182e-05, #samples processed=96, #sample per second=228.35
2021-02-23 19:30:20,711 - root - INFO - [Iter 38/80, Epoch 1] valid f1=8.4483e-01, mcc=4.6276e-01, roc_auc=8.5135e-01, accuracy=7.7500e-01, log_loss=4.5442e-01, time spent=0.183s, total_time=0.18min
2021-02-23 19:30:20,951 - root - INFO - [Iter 40/80, Epoch 1] train loss=3.4663e-01, gnorm=2.8449e+00, lr=4.3983e-05, #samples processed=96, #sample per second=172.64
2021-02-23 19:30:21,134 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.1159e-01, mcc=4.6973e-01, roc_auc=8.4659e-01, accuracy=7.5625e-01, log_loss=4.6692e-01, time spent=0.183s, total_time=0.18min
2021-02-23 19:30:21,386 - root - INFO - [Iter 42/80, Epoch 2] train loss=2.9411e-01, gnorm=2.7856e+00, lr=4.1784e-05, #samples processed=96, #sample per second=220.77
2021-02-23 19:30:21,704 - root - INFO - [Iter 42/80, Epoch 2] valid f1=8.3178e-01, mcc=4.9215e-01, roc_auc=8.4641e-01, accuracy=7.7500e-01, log_loss=4.5581e-01, time spent=0.183s, total_time=0.19min
2021-02-23 19:30:21,953 - root - INFO - [Iter 44/80, Epoch 2] train loss=2.4472e-01, gnorm=3.9305e+00, lr=3.9585e-05, #samples processed=96, #sample per second=169.30
2021-02-23 19:30:22,262 - root - INFO - [Iter 44/80, Epoch 2] valid f1=8.4615e-01, mcc=4.6173e-01, roc_auc=8.4588e-01, accuracy=7.7500e-01, log_loss=4.6457e-01, time spent=0.186s, total_time=0.20min
2021-02-23 19:30:22,503 - root - INFO - [Iter 46/80, Epoch 2] train loss=2.7678e-01, gnorm=8.0375e+00, lr=3.7386e-05, #samples processed=96, #sample per second=174.61
2021-02-23 19:30:22,829 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.4615e-01, mcc=4.6173e-01, roc_auc=8.4835e-01, accuracy=7.7500e-01, log_loss=4.6801e-01, time spent=0.183s, total_time=0.21min
2021-02-23 19:30:23,075 - root - INFO - [Iter 48/80, Epoch 2] train loss=3.4094e-01, gnorm=1.2487e+01, lr=3.5187e-05, #samples processed=96, #sample per second=167.82
2021-02-23 19:30:23,374 - root - INFO - [Iter 48/80, Epoch 2] valid f1=8.2857e-01, mcc=5.0241e-01, roc_auc=8.4800e-01, accuracy=7.7500e-01, log_loss=4.5729e-01, time spent=0.181s, total_time=0.22min
2021-02-23 19:30:23,627 - root - INFO - [Iter 50/80, Epoch 2] train loss=3.0676e-01, gnorm=9.1641e+00, lr=3.2987e-05, #samples processed=96, #sample per second=174.01
2021-02-23 19:30:23,809 - root - INFO - [Iter 50/80, Epoch 2] valid f1=8.0597e-01, mcc=4.8925e-01, roc_auc=8.4835e-01, accuracy=7.5625e-01, log_loss=4.7884e-01, time spent=0.182s, total_time=0.23min
2021-02-23 19:30:24,060 - root - INFO - [Iter 52/80, Epoch 2] train loss=3.6823e-01, gnorm=1.0581e+01, lr=3.0788e-05, #samples processed=96, #sample per second=222.08
2021-02-23 19:30:24,416 - root - INFO - [Iter 52/80, Epoch 2] valid f1=8.4932e-01, mcc=5.2449e-01, roc_auc=8.5153e-01, accuracy=7.9375e-01, log_loss=4.4868e-01, time spent=0.185s, total_time=0.24min
2021-02-23 19:30:24,658 - root - INFO - [Iter 54/80, Epoch 2] train loss=2.7763e-01, gnorm=4.5301e+00, lr=2.8589e-05, #samples processed=96, #sample per second=160.49
2021-02-23 19:30:24,840 - root - INFO - [Iter 54/80, Epoch 2] valid f1=8.2988e-01, mcc=3.7391e-01, roc_auc=8.5382e-01, accuracy=7.4375e-01, log_loss=5.1045e-01, time spent=0.181s, total_time=0.25min
2021-02-23 19:30:25,102 - root - INFO - [Iter 56/80, Epoch 2] train loss=4.2491e-01, gnorm=1.1061e+01, lr=2.6390e-05, #samples processed=96, #sample per second=216.47
2021-02-23 19:30:25,284 - root - INFO - [Iter 56/80, Epoch 2] valid f1=8.4082e-01, mcc=4.1308e-01, roc_auc=8.5558e-01, accuracy=7.5625e-01, log_loss=5.2989e-01, time spent=0.182s, total_time=0.25min
2021-02-23 19:30:25,538 - root - INFO - [Iter 58/80, Epoch 2] train loss=3.0231e-01, gnorm=2.2808e+00, lr=2.4191e-05, #samples processed=96, #sample per second=219.80
2021-02-23 19:30:25,720 - root - INFO - [Iter 58/80, Epoch 2] valid f1=8.4746e-01, mcc=4.6134e-01, roc_auc=8.5382e-01, accuracy=7.7500e-01, log_loss=4.8411e-01, time spent=0.182s, total_time=0.26min
2021-02-23 19:30:25,970 - root - INFO - [Iter 60/80, Epoch 2] train loss=2.5804e-01, gnorm=4.9796e+00, lr=2.1992e-05, #samples processed=96, #sample per second=222.38
2021-02-23 19:30:26,154 - root - INFO - [Iter 60/80, Epoch 2] valid f1=8.4685e-01, mcc=5.0486e-01, roc_auc=8.5523e-01, accuracy=7.8750e-01, log_loss=4.4668e-01, time spent=0.183s, total_time=0.27min
2021-02-23 19:30:26,407 - root - INFO - [Iter 62/80, Epoch 3] train loss=2.9414e-01, gnorm=3.6621e+00, lr=1.9792e-05, #samples processed=96, #sample per second=220.07
2021-02-23 19:30:26,592 - root - INFO - [Iter 62/80, Epoch 3] valid f1=8.1340e-01, mcc=4.6373e-01, roc_auc=8.5399e-01, accuracy=7.5625e-01, log_loss=4.4706e-01, time spent=0.185s, total_time=0.28min
2021-02-23 19:30:26,841 - root - INFO - [Iter 64/80, Epoch 3] train loss=3.2791e-01, gnorm=3.5324e+00, lr=1.7593e-05, #samples processed=96, #sample per second=221.32
2021-02-23 19:30:27,022 - root - INFO - [Iter 64/80, Epoch 3] valid f1=8.0203e-01, mcc=5.0358e-01, roc_auc=8.5311e-01, accuracy=7.5625e-01, log_loss=4.7863e-01, time spent=0.182s, total_time=0.28min
2021-02-23 19:30:27,269 - root - INFO - [Iter 66/80, Epoch 3] train loss=2.8469e-01, gnorm=6.8909e+00, lr=1.5394e-05, #samples processed=96, #sample per second=224.35
2021-02-23 19:30:27,450 - root - INFO - [Iter 66/80, Epoch 3] valid f1=8.2000e-01, mcc=5.3311e-01, roc_auc=8.5346e-01, accuracy=7.7500e-01, log_loss=4.7214e-01, time spent=0.181s, total_time=0.29min
2021-02-23 19:30:27,713 - root - INFO - [Iter 68/80, Epoch 3] train loss=2.1974e-01, gnorm=6.6917e+00, lr=1.3195e-05, #samples processed=96, #sample per second=215.99
2021-02-23 19:30:27,901 - root - INFO - [Iter 68/80, Epoch 3] valid f1=8.2857e-01, mcc=5.0241e-01, roc_auc=8.5329e-01, accuracy=7.7500e-01, log_loss=4.4595e-01, time spent=0.187s, total_time=0.30min
2021-02-23 19:30:28,146 - root - INFO - [Iter 70/80, Epoch 3] train loss=1.9593e-01, gnorm=6.4887e+00, lr=1.0996e-05, #samples processed=96, #sample per second=221.91
2021-02-23 19:30:28,466 - root - INFO - [Iter 70/80, Epoch 3] valid f1=8.5590e-01, mcc=5.1215e-01, roc_auc=8.5276e-01, accuracy=7.9375e-01, log_loss=4.5058e-01, time spent=0.188s, total_time=0.31min
2021-02-23 19:30:28,726 - root - INFO - [Iter 72/80, Epoch 3] train loss=1.8270e-01, gnorm=2.9262e+00, lr=8.7967e-06, #samples processed=96, #sample per second=165.69
2021-02-23 19:30:28,909 - root - INFO - [Iter 72/80, Epoch 3] valid f1=8.5232e-01, mcc=4.7843e-01, roc_auc=8.5382e-01, accuracy=7.8125e-01, log_loss=4.6848e-01, time spent=0.183s, total_time=0.31min
2021-02-23 19:30:29,166 - root - INFO - [Iter 74/80, Epoch 3] train loss=2.7357e-01, gnorm=4.5726e+00, lr=6.5975e-06, #samples processed=96, #sample per second=218.11
2021-02-23 19:30:29,348 - root - INFO - [Iter 74/80, Epoch 3] valid f1=8.5593e-01, mcc=4.9494e-01, roc_auc=8.5417e-01, accuracy=7.8750e-01, log_loss=4.6853e-01, time spent=0.181s, total_time=0.32min
2021-02-23 19:30:29,601 - root - INFO - [Iter 76/80, Epoch 3] train loss=1.9237e-01, gnorm=2.7009e+00, lr=4.3983e-06, #samples processed=96, #sample per second=220.93
2021-02-23 19:30:29,785 - root - INFO - [Iter 76/80, Epoch 3] valid f1=8.5232e-01, mcc=4.7843e-01, roc_auc=8.5435e-01, accuracy=7.8125e-01, log_loss=4.7141e-01, time spent=0.184s, total_time=0.33min
2021-02-23 19:30:30,040 - root - INFO - [Iter 78/80, Epoch 3] train loss=2.3753e-01, gnorm=6.5637e+00, lr=2.1992e-06, #samples processed=96, #sample per second=218.70
2021-02-23 19:30:30,223 - root - INFO - [Iter 78/80, Epoch 3] valid f1=8.5593e-01, mcc=4.9494e-01, roc_auc=8.5435e-01, accuracy=7.8750e-01, log_loss=4.6976e-01, time spent=0.183s, total_time=0.34min
2021-02-23 19:30:30,465 - root - INFO - [Iter 80/80, Epoch 3] train loss=2.4713e-01, gnorm=3.4608e+00, lr=0.0000e+00, #samples processed=96, #sample per second=225.82
2021-02-23 19:30:30,648 - root - INFO - [Iter 80/80, Epoch 3] valid f1=8.5593e-01, mcc=4.9494e-01, roc_auc=8.5435e-01, accuracy=7.8750e-01, log_loss=4.6848e-01, time spent=0.183s, total_time=0.34min
dev_score = predictor_mrpc_hyperband.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc_hyperband.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc_hyperband.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
self._build_cache(*args)
Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 0, 'search_space▁model.network.agg_net.mid_units': 93, 'search_space▁optimization.layerwise_lr_decay': 0.9705516632779506, 'search_space▁optimization.lr': 7.967709362655271e-05, 'search_space▁optimization.warmup_portion': 0.19737539140923366}
Total Time = 73.5722336769104s
Accuracy = 78.43%
F1 = 85.81%
predictions = predictor_mrpc_hyperband.predict(dev_data)
prediction1 = predictor_mrpc_hyperband.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc_hyperband.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc_hyperband.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc_hyperband.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))
A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.03501643 0.9649836 ]"
A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "False"
Prob = "[0.5099241 0.49007583]"
Use Hyperband together with Bayesian Optimization¶
Finally, we can use a combination of Hyperband and Bayesian Optimization.
scheduler_options = {'max_t': 40}
hyperparameters['hpo_params'] = {
'search_strategy': 'bayesopt_hyperband',
'scheduler_options': scheduler_options
}
predictor_mrpc_bohb = task.fit(
train_data, label='label',
hyperparameters=hyperparameters,
time_limits=60 * 2, ngpus_per_trial=1, seed=123,
output_directory='./ag_mrpc_custom_space_bohb')
2021-02-23 19:30:32,585 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_custom_space_bohb/ag_text_prediction.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_custom_space_bohb/ag_text_prediction.log
2021-02-23 19:30:32,599 - autogluon.text.text_prediction.text_prediction - INFO - Train Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Train Dataset:
2021-02-23 19:30:32,600 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=44/117.4984375/200
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=46/117.54375/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[206, 434]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=640/0
length, min/avg/max=44/117.4984375/200
)
- Text(
name="sentence2"
#total/missing=640/0
length, min/avg/max=46/117.54375/210
)
- Categorical(
name="label"
#total/missing=640/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[206, 434]
)
2021-02-23 19:30:32,601 - autogluon.text.text_prediction.text_prediction - INFO - Tuning Dataset:
INFO:autogluon.text.text_prediction.text_prediction:Tuning Dataset:
2021-02-23 19:30:32,602 - autogluon.text.text_prediction.text_prediction - INFO - Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=51/116.90625/193
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=42/119.10625/208
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[62, 98]
)
INFO:autogluon.text.text_prediction.text_prediction:Columns:
- Text(
name="sentence1"
#total/missing=160/0
length, min/avg/max=51/116.90625/193
)
- Text(
name="sentence2"
#total/missing=160/0
length, min/avg/max=42/119.10625/208
)
- Categorical(
name="label"
#total/missing=160/0
num_class (total/non_special)=2/2
categories=[0, 1]
freq=[62, 98]
)
2021-02-23 19:30:32,604 - autogluon.text.text_prediction.text_prediction - INFO - All Logs will be saved to ./ag_mrpc_custom_space_bohb/main.log
INFO:autogluon.text.text_prediction.text_prediction:All Logs will be saved to ./ag_mrpc_custom_space_bohb/main.log
0%| | 0/3 [00:00<?, ?it/s]
(task:7) 2021-02-23 19:30:35,462 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_bohb/task7/training.log
2021-02-23 19:30:35,462 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_bohb/task7
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: False
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 80
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.9
log_frequency: 0.1
lr: 5.5e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.15
wd: 0.01
version: 1
2021-02-23 19:30:35,617 - root - INFO - Process training set...
2021-02-23 19:30:35,885 - root - INFO - Done!
2021-02-23 19:30:35,885 - root - INFO - Process dev set...
2021-02-23 19:30:35,948 - root - INFO - Done!
2021-02-23 19:30:41,192 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:30:41,208 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:30:42,105 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.6791e-01, gnorm=7.9308e+00, lr=9.1667e-06, #samples processed=96, #sample per second=111.64
2021-02-23 19:30:42,362 - root - INFO - [Iter 2/80, Epoch 0] valid f1=7.5486e-01, mcc=-6.3079e-02, roc_auc=4.8025e-01, accuracy=6.0625e-01, log_loss=7.8599e-01, time spent=0.171s, total_time=0.02min
2021-02-23 19:30:42,688 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.8375e-01, gnorm=6.6881e+00, lr=1.8333e-05, #samples processed=96, #sample per second=164.86
2021-02-23 19:30:42,848 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.3810e-01, mcc=-8.9473e-02, roc_auc=5.4345e-01, accuracy=5.8750e-01, log_loss=6.9470e-01, time spent=0.159s, total_time=0.03min
2021-02-23 19:30:43,202 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.3886e-01, gnorm=5.8876e+00, lr=2.7500e-05, #samples processed=96, #sample per second=186.86
2021-02-23 19:30:43,507 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.5294e-01, mcc=-1.5369e-02, roc_auc=6.5471e-01, accuracy=6.0625e-01, log_loss=6.8356e-01, time spent=0.164s, total_time=0.04min
2021-02-23 19:30:43,894 - root - INFO - [Iter 8/80, Epoch 0] train loss=4.2837e-01, gnorm=4.1448e+00, lr=3.6667e-05, #samples processed=96, #sample per second=138.79
2021-02-23 19:30:44,228 - root - INFO - [Iter 8/80, Epoch 0] valid f1=7.5200e-01, mcc=5.2977e-02, roc_auc=7.2646e-01, accuracy=6.1250e-01, log_loss=6.3593e-01, time spent=0.165s, total_time=0.05min
2021-02-23 19:30:44,512 - root - INFO - [Iter 10/80, Epoch 0] train loss=4.4732e-01, gnorm=3.9817e+00, lr=4.5833e-05, #samples processed=96, #sample per second=155.18
2021-02-23 19:30:44,797 - root - INFO - [Iter 10/80, Epoch 0] valid f1=7.8431e-01, mcc=4.0900e-01, roc_auc=7.6893e-01, accuracy=7.2500e-01, log_loss=5.5903e-01, time spent=0.161s, total_time=0.06min
2021-02-23 19:30:45,042 - root - INFO - [Iter 12/80, Epoch 0] train loss=4.6483e-01, gnorm=1.0117e+01, lr=5.5000e-05, #samples processed=96, #sample per second=181.36
2021-02-23 19:30:45,204 - root - INFO - [Iter 12/80, Epoch 0] valid f1=7.0115e-01, mcc=3.7122e-01, roc_auc=7.8275e-01, accuracy=6.7500e-01, log_loss=5.8711e-01, time spent=0.162s, total_time=0.07min
2021-02-23 19:30:45,526 - root - INFO - [Iter 14/80, Epoch 0] train loss=4.6429e-01, gnorm=7.9917e+00, lr=5.3382e-05, #samples processed=96, #sample per second=198.27
2021-02-23 19:30:45,692 - root - INFO - [Iter 14/80, Epoch 0] valid f1=7.9476e-01, mcc=3.5842e-01, roc_auc=7.9444e-01, accuracy=7.0625e-01, log_loss=5.9082e-01, time spent=0.166s, total_time=0.07min
2021-02-23 19:30:45,971 - root - INFO - [Iter 16/80, Epoch 0] train loss=3.7742e-01, gnorm=4.0525e+00, lr=5.1765e-05, #samples processed=96, #sample per second=215.98
2021-02-23 19:30:46,133 - root - INFO - [Iter 16/80, Epoch 0] valid f1=8.0508e-01, mcc=3.9022e-01, roc_auc=8.0135e-01, accuracy=7.1250e-01, log_loss=6.5584e-01, time spent=0.162s, total_time=0.08min
2021-02-23 19:30:46,372 - root - INFO - [Iter 18/80, Epoch 0] train loss=2.5084e-01, gnorm=3.2974e+00, lr=5.0147e-05, #samples processed=96, #sample per second=239.68
2021-02-23 19:30:46,533 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.0687e-01, mcc=3.9970e-01, roc_auc=8.0431e-01, accuracy=7.1875e-01, log_loss=6.3652e-01, time spent=0.161s, total_time=0.09min
2021-02-23 19:30:46,843 - root - INFO - [Iter 20/80, Epoch 0] train loss=3.6108e-01, gnorm=3.3778e+00, lr=4.8529e-05, #samples processed=96, #sample per second=203.63
2021-02-23 19:30:47,011 - root - INFO - [Iter 20/80, Epoch 0] valid f1=7.8924e-01, mcc=3.5494e-01, roc_auc=8.0859e-01, accuracy=7.0625e-01, log_loss=5.6816e-01, time spent=0.168s, total_time=0.10min
2021-02-23 19:30:47,267 - root - INFO - [Iter 22/80, Epoch 1] train loss=3.6218e-01, gnorm=3.1874e+00, lr=4.6912e-05, #samples processed=96, #sample per second=226.42
2021-02-23 19:30:47,606 - root - INFO - [Iter 22/80, Epoch 1] valid f1=7.8571e-01, mcc=4.4700e-01, roc_auc=8.0843e-01, accuracy=7.3750e-01, log_loss=5.3263e-01, time spent=0.163s, total_time=0.11min
2021-02-23 19:30:47,893 - root - INFO - [Iter 24/80, Epoch 1] train loss=3.8519e-01, gnorm=2.8437e+00, lr=4.5294e-05, #samples processed=96, #sample per second=153.56
2021-02-23 19:30:48,208 - root - INFO - [Iter 24/80, Epoch 1] valid f1=8.0976e-01, mcc=4.7598e-01, roc_auc=8.0908e-01, accuracy=7.5625e-01, log_loss=5.4140e-01, time spent=0.162s, total_time=0.12min
2021-02-23 19:30:48,496 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.4764e-01, gnorm=3.1183e+00, lr=4.3676e-05, #samples processed=96, #sample per second=159.14
2021-02-23 19:30:48,661 - root - INFO - [Iter 26/80, Epoch 1] valid f1=8.0687e-01, mcc=3.9970e-01, roc_auc=8.1254e-01, accuracy=7.1875e-01, log_loss=6.4566e-01, time spent=0.164s, total_time=0.12min
2021-02-23 19:30:48,920 - root - INFO - [Iter 28/80, Epoch 1] train loss=2.9671e-01, gnorm=3.1238e+00, lr=4.2059e-05, #samples processed=96, #sample per second=226.43
2021-02-23 19:30:49,082 - root - INFO - [Iter 28/80, Epoch 1] valid f1=7.9661e-01, mcc=3.5297e-01, roc_auc=8.1715e-01, accuracy=7.0000e-01, log_loss=6.7966e-01, time spent=0.162s, total_time=0.13min
2021-02-23 19:30:49,341 - root - INFO - [Iter 30/80, Epoch 1] train loss=3.6157e-01, gnorm=5.7556e+00, lr=4.0441e-05, #samples processed=96, #sample per second=228.06
2021-02-23 19:30:49,502 - root - INFO - [Iter 30/80, Epoch 1] valid f1=8.1308e-01, mcc=4.5826e-01, roc_auc=8.1501e-01, accuracy=7.5000e-01, log_loss=5.6756e-01, time spent=0.161s, total_time=0.14min
2021-02-23 19:30:49,761 - root - INFO - [Iter 32/80, Epoch 1] train loss=2.8307e-01, gnorm=6.0553e+00, lr=3.8824e-05, #samples processed=96, #sample per second=228.65
2021-02-23 19:30:49,924 - root - INFO - [Iter 32/80, Epoch 1] valid f1=8.0000e-01, mcc=4.6769e-01, roc_auc=8.1386e-01, accuracy=7.5000e-01, log_loss=5.3241e-01, time spent=0.162s, total_time=0.14min
2021-02-23 19:30:50,176 - root - INFO - [Iter 34/80, Epoch 1] train loss=3.3902e-01, gnorm=4.8646e+00, lr=3.7206e-05, #samples processed=96, #sample per second=231.65
2021-02-23 19:30:50,336 - root - INFO - [Iter 34/80, Epoch 1] valid f1=7.9024e-01, mcc=4.2146e-01, roc_auc=8.1419e-01, accuracy=7.3125e-01, log_loss=5.4198e-01, time spent=0.160s, total_time=0.15min
2021-02-23 19:30:50,594 - root - INFO - [Iter 36/80, Epoch 1] train loss=3.8521e-01, gnorm=7.6779e+00, lr=3.5588e-05, #samples processed=96, #sample per second=229.94
2021-02-23 19:30:50,890 - root - INFO - [Iter 36/80, Epoch 1] valid f1=8.0402e-01, mcc=4.8228e-01, roc_auc=8.1583e-01, accuracy=7.5625e-01, log_loss=5.2568e-01, time spent=0.161s, total_time=0.16min
2021-02-23 19:30:51,132 - root - INFO - [Iter 38/80, Epoch 1] train loss=3.0653e-01, gnorm=3.5811e+00, lr=3.3971e-05, #samples processed=96, #sample per second=178.26
2021-02-23 19:30:51,437 - root - INFO - [Iter 38/80, Epoch 1] valid f1=8.2192e-01, mcc=4.7472e-01, roc_auc=8.2011e-01, accuracy=7.5625e-01, log_loss=5.7234e-01, time spent=0.164s, total_time=0.17min
2021-02-23 19:30:51,751 - root - INFO - [Iter 40/80, Epoch 1] train loss=3.5284e-01, gnorm=4.2739e+00, lr=3.2353e-05, #samples processed=96, #sample per second=155.12
2021-02-23 19:30:52,065 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.2028e-01, mcc=4.7349e-01, roc_auc=8.2126e-01, accuracy=7.5625e-01, log_loss=5.6622e-01, time spent=0.162s, total_time=0.18min
2021-02-23 19:30:52,303 - root - INFO - [Iter 42/80, Epoch 2] train loss=3.6525e-01, gnorm=4.3552e+00, lr=3.0735e-05, #samples processed=96, #sample per second=174.01
2021-02-23 19:30:52,467 - root - INFO - [Iter 42/80, Epoch 2] valid f1=7.9024e-01, mcc=4.2146e-01, roc_auc=8.2061e-01, accuracy=7.3125e-01, log_loss=5.1452e-01, time spent=0.163s, total_time=0.19min
2021-02-23 19:30:52,719 - root - INFO - [Iter 44/80, Epoch 2] train loss=3.0358e-01, gnorm=4.1425e+00, lr=2.9118e-05, #samples processed=96, #sample per second=230.94
2021-02-23 19:30:52,882 - root - INFO - [Iter 44/80, Epoch 2] valid f1=7.8947e-01, mcc=4.8400e-01, roc_auc=8.2225e-01, accuracy=7.5000e-01, log_loss=5.0908e-01, time spent=0.163s, total_time=0.19min
2021-02-23 19:30:53,151 - root - INFO - [Iter 46/80, Epoch 2] train loss=3.4133e-01, gnorm=1.0405e+01, lr=2.7500e-05, #samples processed=96, #sample per second=222.28
2021-02-23 19:30:53,312 - root - INFO - [Iter 46/80, Epoch 2] valid f1=7.8392e-01, mcc=4.2910e-01, roc_auc=8.2472e-01, accuracy=7.3125e-01, log_loss=5.0056e-01, time spent=0.161s, total_time=0.20min
2021-02-23 19:30:53,554 - root - INFO - [Iter 48/80, Epoch 2] train loss=2.2456e-01, gnorm=3.0549e+00, lr=2.5882e-05, #samples processed=96, #sample per second=238.34
2021-02-23 19:30:53,726 - root - INFO - [Iter 48/80, Epoch 2] valid f1=8.1081e-01, mcc=4.3164e-01, roc_auc=8.2686e-01, accuracy=7.3750e-01, log_loss=5.4388e-01, time spent=0.172s, total_time=0.21min
2021-02-23 19:30:53,978 - root - INFO - [Iter 50/80, Epoch 2] train loss=3.0032e-01, gnorm=6.4737e+00, lr=2.4265e-05, #samples processed=96, #sample per second=226.53
2021-02-23 19:30:54,139 - root - INFO - [Iter 50/80, Epoch 2] valid f1=8.0357e-01, mcc=4.0220e-01, roc_auc=8.2900e-01, accuracy=7.2500e-01, log_loss=5.6919e-01, time spent=0.160s, total_time=0.21min
2021-02-23 19:30:54,407 - root - INFO - [Iter 52/80, Epoch 2] train loss=2.8373e-01, gnorm=6.1021e+00, lr=2.2647e-05, #samples processed=96, #sample per second=223.72
2021-02-23 19:30:54,568 - root - INFO - [Iter 52/80, Epoch 2] valid f1=7.8873e-01, mcc=3.8699e-01, roc_auc=8.2900e-01, accuracy=7.1875e-01, log_loss=5.2485e-01, time spent=0.160s, total_time=0.22min
2021-02-23 19:30:54,822 - root - INFO - [Iter 54/80, Epoch 2] train loss=2.2877e-01, gnorm=4.8045e+00, lr=2.1029e-05, #samples processed=96, #sample per second=231.42
2021-02-23 19:30:54,986 - root - INFO - [Iter 54/80, Epoch 2] valid f1=7.9426e-01, mcc=4.1783e-01, roc_auc=8.2982e-01, accuracy=7.3125e-01, log_loss=5.2284e-01, time spent=0.163s, total_time=0.23min
2021-02-23 19:30:55,245 - root - INFO - [Iter 56/80, Epoch 2] train loss=1.9754e-01, gnorm=3.5169e+00, lr=1.9412e-05, #samples processed=96, #sample per second=227.21
2021-02-23 19:30:55,412 - root - INFO - [Iter 56/80, Epoch 2] valid f1=7.8873e-01, mcc=3.8699e-01, roc_auc=8.3196e-01, accuracy=7.1875e-01, log_loss=5.4789e-01, time spent=0.167s, total_time=0.24min
2021-02-23 19:30:55,697 - root - INFO - [Iter 58/80, Epoch 2] train loss=2.9065e-01, gnorm=8.5100e+00, lr=1.7794e-05, #samples processed=96, #sample per second=212.43
2021-02-23 19:30:55,860 - root - INFO - [Iter 58/80, Epoch 2] valid f1=7.9439e-01, mcc=4.0080e-01, roc_auc=8.3246e-01, accuracy=7.2500e-01, log_loss=5.6454e-01, time spent=0.163s, total_time=0.24min
2021-02-23 19:30:56,109 - root - INFO - [Iter 60/80, Epoch 2] train loss=2.3498e-01, gnorm=5.4026e+00, lr=1.6176e-05, #samples processed=96, #sample per second=232.80
2021-02-23 19:30:56,271 - root - INFO - [Iter 60/80, Epoch 2] valid f1=7.9808e-01, mcc=4.3246e-01, roc_auc=8.3361e-01, accuracy=7.3750e-01, log_loss=5.5528e-01, time spent=0.162s, total_time=0.25min
2021-02-23 19:30:56,274 - root - INFO - Early stopping patience reached!
(task:8) 2021-02-23 19:30:58,762 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_bohb/task8/training.log
2021-02-23 19:30:58,762 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_bohb/task8
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: True
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 52
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.8964584420622498
log_frequency: 0.1
lr: 4.356400129415584e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.17596939190261784
wd: 0.01
version: 1
2021-02-23 19:30:58,941 - root - INFO - Process training set...
2021-02-23 19:30:59,219 - root - INFO - Done!
2021-02-23 19:30:59,219 - root - INFO - Process dev set...
2021-02-23 19:30:59,297 - root - INFO - Done!
2021-02-23 19:31:04,638 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:31:04,658 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:31:05,590 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.6804e-01, gnorm=7.9807e+00, lr=6.2234e-06, #samples processed=96, #sample per second=108.03
2021-02-23 19:31:05,835 - root - INFO - [Iter 2/80, Epoch 0] valid f1=7.5486e-01, mcc=-6.3079e-02, roc_auc=4.7301e-01, accuracy=6.0625e-01, log_loss=7.9844e-01, time spent=0.181s, total_time=0.02min
2021-02-23 19:31:06,168 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.8572e-01, gnorm=5.8066e+00, lr=1.2447e-05, #samples processed=96, #sample per second=166.07
2021-02-23 19:31:06,501 - root - INFO - [Iter 4/80, Epoch 0] valid f1=7.5781e-01, mcc=2.5981e-02, roc_auc=5.1810e-01, accuracy=6.1250e-01, log_loss=7.1942e-01, time spent=0.163s, total_time=0.03min
2021-02-23 19:31:06,869 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.3720e-01, gnorm=5.8239e+00, lr=1.8670e-05, #samples processed=96, #sample per second=136.97
2021-02-23 19:31:07,031 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.5294e-01, mcc=-1.5369e-02, roc_auc=6.1438e-01, accuracy=6.0625e-01, log_loss=6.8994e-01, time spent=0.161s, total_time=0.04min
2021-02-23 19:31:07,410 - root - INFO - [Iter 8/80, Epoch 0] train loss=4.3505e-01, gnorm=4.4181e+00, lr=2.4894e-05, #samples processed=96, #sample per second=177.60
2021-02-23 19:31:07,572 - root - INFO - [Iter 8/80, Epoch 0] valid f1=7.4900e-01, mcc=1.8032e-02, roc_auc=7.0095e-01, accuracy=6.0625e-01, log_loss=6.4799e-01, time spent=0.161s, total_time=0.05min
2021-02-23 19:31:07,852 - root - INFO - [Iter 10/80, Epoch 0] train loss=4.5925e-01, gnorm=4.7369e+00, lr=3.1117e-05, #samples processed=96, #sample per second=217.17
2021-02-23 19:31:08,150 - root - INFO - [Iter 10/80, Epoch 0] valid f1=7.6271e-01, mcc=2.0396e-01, roc_auc=7.4967e-01, accuracy=6.5000e-01, log_loss=5.9263e-01, time spent=0.162s, total_time=0.06min
2021-02-23 19:31:08,402 - root - INFO - [Iter 12/80, Epoch 0] train loss=4.5596e-01, gnorm=6.0247e+00, lr=3.7341e-05, #samples processed=96, #sample per second=174.75
2021-02-23 19:31:08,695 - root - INFO - [Iter 12/80, Epoch 0] valid f1=7.7720e-01, mcc=4.3916e-01, roc_auc=7.7189e-01, accuracy=7.3125e-01, log_loss=5.5971e-01, time spent=0.162s, total_time=0.07min
2021-02-23 19:31:09,011 - root - INFO - [Iter 14/80, Epoch 0] train loss=4.2789e-01, gnorm=6.7716e+00, lr=4.3564e-05, #samples processed=96, #sample per second=157.70
2021-02-23 19:31:09,173 - root - INFO - [Iter 14/80, Epoch 0] valid f1=7.8603e-01, mcc=3.2512e-01, roc_auc=7.8226e-01, accuracy=6.9375e-01, log_loss=5.8778e-01, time spent=0.162s, total_time=0.07min
2021-02-23 19:31:09,451 - root - INFO - [Iter 16/80, Epoch 0] train loss=3.6579e-01, gnorm=3.5455e+00, lr=4.2244e-05, #samples processed=96, #sample per second=217.81
2021-02-23 19:31:09,613 - root - INFO - [Iter 16/80, Epoch 0] valid f1=7.9828e-01, mcc=3.6437e-01, roc_auc=7.8703e-01, accuracy=7.0625e-01, log_loss=6.2841e-01, time spent=0.161s, total_time=0.08min
2021-02-23 19:31:09,858 - root - INFO - [Iter 18/80, Epoch 0] train loss=2.5388e-01, gnorm=3.8592e+00, lr=4.0924e-05, #samples processed=96, #sample per second=236.24
2021-02-23 19:31:10,019 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.0169e-01, mcc=3.7470e-01, roc_auc=7.9493e-01, accuracy=7.0625e-01, log_loss=6.8931e-01, time spent=0.160s, total_time=0.09min
2021-02-23 19:31:10,327 - root - INFO - [Iter 20/80, Epoch 0] train loss=3.7424e-01, gnorm=4.6357e+00, lr=3.9604e-05, #samples processed=96, #sample per second=204.68
2021-02-23 19:31:10,487 - root - INFO - [Iter 20/80, Epoch 0] valid f1=7.9295e-01, mcc=3.5664e-01, roc_auc=7.9493e-01, accuracy=7.0625e-01, log_loss=6.2082e-01, time spent=0.159s, total_time=0.10min
2021-02-23 19:31:10,746 - root - INFO - [Iter 22/80, Epoch 1] train loss=3.9150e-01, gnorm=3.0151e+00, lr=3.8284e-05, #samples processed=96, #sample per second=229.19
2021-02-23 19:31:11,045 - root - INFO - [Iter 22/80, Epoch 1] valid f1=7.8974e-01, mcc=4.6181e-01, roc_auc=7.9674e-01, accuracy=7.4375e-01, log_loss=5.4699e-01, time spent=0.161s, total_time=0.11min
2021-02-23 19:31:11,360 - root - INFO - [Iter 24/80, Epoch 1] train loss=3.9338e-01, gnorm=4.5685e+00, lr=3.6963e-05, #samples processed=96, #sample per second=156.42
2021-02-23 19:31:11,523 - root - INFO - [Iter 24/80, Epoch 1] valid f1=7.8571e-01, mcc=4.4700e-01, roc_auc=7.9921e-01, accuracy=7.3750e-01, log_loss=5.4336e-01, time spent=0.162s, total_time=0.11min
2021-02-23 19:31:11,829 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.6700e-01, gnorm=5.1609e+00, lr=3.5643e-05, #samples processed=96, #sample per second=204.69
2021-02-23 19:31:11,992 - root - INFO - [Iter 26/80, Epoch 1] valid f1=8.0000e-01, mcc=3.8722e-01, roc_auc=8.0300e-01, accuracy=7.1875e-01, log_loss=5.9639e-01, time spent=0.162s, total_time=0.12min
2021-02-23 19:31:12,247 - root - INFO - [Iter 28/80, Epoch 1] train loss=3.0115e-01, gnorm=3.0306e+00, lr=3.4323e-05, #samples processed=96, #sample per second=229.85
2021-02-23 19:31:12,408 - root - INFO - [Iter 28/80, Epoch 1] valid f1=7.9832e-01, mcc=3.5882e-01, roc_auc=8.0941e-01, accuracy=7.0000e-01, log_loss=7.6219e-01, time spent=0.161s, total_time=0.13min
2021-02-23 19:31:12,661 - root - INFO - [Iter 30/80, Epoch 1] train loss=4.2346e-01, gnorm=9.6354e+00, lr=3.3003e-05, #samples processed=96, #sample per second=231.96
2021-02-23 19:31:12,824 - root - INFO - [Iter 30/80, Epoch 1] valid f1=7.9832e-01, mcc=3.5882e-01, roc_auc=8.1139e-01, accuracy=7.0000e-01, log_loss=7.6749e-01, time spent=0.162s, total_time=0.14min
2021-02-23 19:31:13,071 - root - INFO - [Iter 32/80, Epoch 1] train loss=3.0103e-01, gnorm=4.1343e+00, lr=3.1683e-05, #samples processed=96, #sample per second=234.26
2021-02-23 19:31:13,236 - root - INFO - [Iter 32/80, Epoch 1] valid f1=8.0172e-01, mcc=3.7992e-01, roc_auc=8.1271e-01, accuracy=7.1250e-01, log_loss=6.1906e-01, time spent=0.164s, total_time=0.14min
2021-02-23 19:31:13,481 - root - INFO - [Iter 34/80, Epoch 1] train loss=3.5056e-01, gnorm=6.0376e+00, lr=3.0363e-05, #samples processed=96, #sample per second=234.96
2021-02-23 19:31:13,779 - root - INFO - [Iter 34/80, Epoch 1] valid f1=7.9602e-01, mcc=4.5307e-01, roc_auc=8.1221e-01, accuracy=7.4375e-01, log_loss=5.2452e-01, time spent=0.161s, total_time=0.15min
2021-02-23 19:31:14,023 - root - INFO - [Iter 36/80, Epoch 1] train loss=4.2409e-01, gnorm=1.0668e+01, lr=2.9043e-05, #samples processed=96, #sample per second=177.13
2021-02-23 19:31:14,186 - root - INFO - [Iter 36/80, Epoch 1] valid f1=7.5000e-01, mcc=4.6776e-01, roc_auc=8.1139e-01, accuracy=7.2500e-01, log_loss=5.7789e-01, time spent=0.163s, total_time=0.16min
2021-02-23 19:31:14,433 - root - INFO - [Iter 38/80, Epoch 1] train loss=4.6442e-01, gnorm=1.2905e+01, lr=2.7723e-05, #samples processed=96, #sample per second=234.19
2021-02-23 19:31:14,596 - root - INFO - [Iter 38/80, Epoch 1] valid f1=7.4286e-01, mcc=4.5799e-01, roc_auc=8.1435e-01, accuracy=7.1875e-01, log_loss=5.7712e-01, time spent=0.163s, total_time=0.16min
2021-02-23 19:31:14,894 - root - INFO - [Iter 40/80, Epoch 1] train loss=4.0352e-01, gnorm=9.4132e+00, lr=2.6402e-05, #samples processed=96, #sample per second=208.23
2021-02-23 19:31:15,226 - root - INFO - [Iter 40/80, Epoch 1] valid f1=7.8534e-01, mcc=4.6904e-01, roc_auc=8.1764e-01, accuracy=7.4375e-01, log_loss=5.1643e-01, time spent=0.163s, total_time=0.18min
2021-02-23 19:31:15,461 - root - INFO - [Iter 42/80, Epoch 2] train loss=3.6710e-01, gnorm=3.4534e+00, lr=2.5082e-05, #samples processed=96, #sample per second=169.46
2021-02-23 19:31:15,623 - root - INFO - [Iter 42/80, Epoch 2] valid f1=8.0734e-01, mcc=4.2959e-01, roc_auc=8.2488e-01, accuracy=7.3750e-01, log_loss=5.2542e-01, time spent=0.162s, total_time=0.18min
2021-02-23 19:31:15,868 - root - INFO - [Iter 44/80, Epoch 2] train loss=3.7621e-01, gnorm=5.3643e+00, lr=2.3762e-05, #samples processed=96, #sample per second=235.56
2021-02-23 19:31:16,031 - root - INFO - [Iter 44/80, Epoch 2] valid f1=8.0000e-01, mcc=3.7646e-01, roc_auc=8.2637e-01, accuracy=7.1250e-01, log_loss=5.5621e-01, time spent=0.162s, total_time=0.19min
2021-02-23 19:31:16,290 - root - INFO - [Iter 46/80, Epoch 2] train loss=2.5297e-01, gnorm=2.5919e+00, lr=2.2442e-05, #samples processed=96, #sample per second=227.66
2021-02-23 19:31:16,453 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.0342e-01, mcc=3.8443e-01, roc_auc=8.2571e-01, accuracy=7.1250e-01, log_loss=5.8966e-01, time spent=0.163s, total_time=0.20min
2021-02-23 19:31:16,708 - root - INFO - [Iter 48/80, Epoch 2] train loss=2.6823e-01, gnorm=4.6158e+00, lr=2.1122e-05, #samples processed=96, #sample per second=229.97
2021-02-23 19:31:16,872 - root - INFO - [Iter 48/80, Epoch 2] valid f1=8.1034e-01, mcc=4.1470e-01, roc_auc=8.2488e-01, accuracy=7.2500e-01, log_loss=5.7433e-01, time spent=0.164s, total_time=0.20min
2021-02-23 19:31:17,121 - root - INFO - [Iter 50/80, Epoch 2] train loss=3.3379e-01, gnorm=4.0063e+00, lr=1.9802e-05, #samples processed=96, #sample per second=232.14
2021-02-23 19:31:17,283 - root - INFO - [Iter 50/80, Epoch 2] valid f1=8.0189e-01, mcc=4.3014e-01, roc_auc=8.2126e-01, accuracy=7.3750e-01, log_loss=5.2297e-01, time spent=0.161s, total_time=0.21min
2021-02-23 19:31:17,549 - root - INFO - [Iter 52/80, Epoch 2] train loss=3.0559e-01, gnorm=3.1957e+00, lr=1.8482e-05, #samples processed=96, #sample per second=224.86
2021-02-23 19:31:17,709 - root - INFO - [Iter 52/80, Epoch 2] valid f1=7.8607e-01, mcc=4.2628e-01, roc_auc=8.1847e-01, accuracy=7.3125e-01, log_loss=5.0727e-01, time spent=0.160s, total_time=0.22min
2021-02-23 19:31:17,964 - root - INFO - [Iter 54/80, Epoch 2] train loss=2.8045e-01, gnorm=6.9815e+00, lr=1.7162e-05, #samples processed=96, #sample per second=231.32
2021-02-23 19:31:18,129 - root - INFO - [Iter 54/80, Epoch 2] valid f1=7.8571e-01, mcc=4.4700e-01, roc_auc=8.2110e-01, accuracy=7.3750e-01, log_loss=5.1177e-01, time spent=0.165s, total_time=0.22min
2021-02-23 19:31:18,387 - root - INFO - [Iter 56/80, Epoch 2] train loss=2.4837e-01, gnorm=5.7700e+00, lr=1.5841e-05, #samples processed=96, #sample per second=227.11
2021-02-23 19:31:18,550 - root - INFO - [Iter 56/80, Epoch 2] valid f1=7.8818e-01, mcc=4.2373e-01, roc_auc=8.2324e-01, accuracy=7.3125e-01, log_loss=5.1264e-01, time spent=0.163s, total_time=0.23min
2021-02-23 19:31:18,814 - root - INFO - [Iter 58/80, Epoch 2] train loss=3.1888e-01, gnorm=5.8266e+00, lr=1.4521e-05, #samples processed=96, #sample per second=224.63
2021-02-23 19:31:19,113 - root - INFO - [Iter 58/80, Epoch 2] valid f1=8.0000e-01, mcc=4.4872e-01, roc_auc=8.2439e-01, accuracy=7.4375e-01, log_loss=5.2592e-01, time spent=0.164s, total_time=0.24min
2021-02-23 19:31:19,357 - root - INFO - [Iter 60/80, Epoch 2] train loss=2.7697e-01, gnorm=4.9545e+00, lr=1.3201e-05, #samples processed=96, #sample per second=177.01
2021-02-23 19:31:19,650 - root - INFO - [Iter 60/80, Epoch 2] valid f1=8.0193e-01, mcc=4.4703e-01, roc_auc=8.2538e-01, accuracy=7.4375e-01, log_loss=5.3835e-01, time spent=0.163s, total_time=0.25min
2021-02-23 19:31:19,906 - root - INFO - [Iter 62/80, Epoch 3] train loss=2.2896e-01, gnorm=3.8794e+00, lr=1.1881e-05, #samples processed=96, #sample per second=174.90
2021-02-23 19:31:20,210 - root - INFO - [Iter 62/80, Epoch 3] valid f1=8.0000e-01, mcc=4.4872e-01, roc_auc=8.2637e-01, accuracy=7.4375e-01, log_loss=5.3690e-01, time spent=0.164s, total_time=0.26min
2021-02-23 19:31:20,461 - root - INFO - [Iter 64/80, Epoch 3] train loss=3.2718e-01, gnorm=5.0634e+00, lr=1.0561e-05, #samples processed=96, #sample per second=172.84
2021-02-23 19:31:20,623 - root - INFO - [Iter 64/80, Epoch 3] valid f1=7.9208e-01, mcc=4.3842e-01, roc_auc=8.2554e-01, accuracy=7.3750e-01, log_loss=5.3336e-01, time spent=0.162s, total_time=0.27min
2021-02-23 19:31:20,883 - root - INFO - [Iter 66/80, Epoch 3] train loss=2.8560e-01, gnorm=3.2448e+00, lr=9.2408e-06, #samples processed=96, #sample per second=227.83
2021-02-23 19:31:21,046 - root - INFO - [Iter 66/80, Epoch 3] valid f1=7.9208e-01, mcc=4.3842e-01, roc_auc=8.2571e-01, accuracy=7.3750e-01, log_loss=5.3455e-01, time spent=0.163s, total_time=0.27min
2021-02-23 19:31:21,307 - root - INFO - [Iter 68/80, Epoch 3] train loss=2.3243e-01, gnorm=5.0325e+00, lr=7.9207e-06, #samples processed=96, #sample per second=226.27
2021-02-23 19:31:21,472 - root - INFO - [Iter 68/80, Epoch 3] valid f1=7.9208e-01, mcc=4.3842e-01, roc_auc=8.2620e-01, accuracy=7.3750e-01, log_loss=5.3661e-01, time spent=0.164s, total_time=0.28min
2021-02-23 19:31:21,723 - root - INFO - [Iter 70/80, Epoch 3] train loss=2.3944e-01, gnorm=4.7250e+00, lr=6.6006e-06, #samples processed=96, #sample per second=231.17
2021-02-23 19:31:21,888 - root - INFO - [Iter 70/80, Epoch 3] valid f1=7.9024e-01, mcc=4.2146e-01, roc_auc=8.2653e-01, accuracy=7.3125e-01, log_loss=5.4175e-01, time spent=0.165s, total_time=0.29min
2021-02-23 19:31:22,144 - root - INFO - [Iter 72/80, Epoch 3] train loss=2.4984e-01, gnorm=4.7902e+00, lr=5.2805e-06, #samples processed=96, #sample per second=227.89
2021-02-23 19:31:22,306 - root - INFO - [Iter 72/80, Epoch 3] valid f1=7.9227e-01, mcc=4.1949e-01, roc_auc=8.2637e-01, accuracy=7.3125e-01, log_loss=5.4586e-01, time spent=0.162s, total_time=0.29min
2021-02-23 19:31:22,555 - root - INFO - [Iter 74/80, Epoch 3] train loss=2.8986e-01, gnorm=3.0545e+00, lr=3.9604e-06, #samples processed=96, #sample per second=233.69
2021-02-23 19:31:22,722 - root - INFO - [Iter 74/80, Epoch 3] valid f1=7.9227e-01, mcc=4.1949e-01, roc_auc=8.2637e-01, accuracy=7.3125e-01, log_loss=5.4943e-01, time spent=0.166s, total_time=0.30min
2021-02-23 19:31:22,969 - root - INFO - [Iter 76/80, Epoch 3] train loss=3.1683e-01, gnorm=4.0859e+00, lr=2.6402e-06, #samples processed=96, #sample per second=231.85
2021-02-23 19:31:23,134 - root - INFO - [Iter 76/80, Epoch 3] valid f1=7.9227e-01, mcc=4.1949e-01, roc_auc=8.2719e-01, accuracy=7.3125e-01, log_loss=5.5389e-01, time spent=0.165s, total_time=0.31min
2021-02-23 19:31:23,398 - root - INFO - [Iter 78/80, Epoch 3] train loss=2.4451e-01, gnorm=3.9167e+00, lr=1.3201e-06, #samples processed=96, #sample per second=224.12
2021-02-23 19:31:23,560 - root - INFO - [Iter 78/80, Epoch 3] valid f1=7.9227e-01, mcc=4.1949e-01, roc_auc=8.2768e-01, accuracy=7.3125e-01, log_loss=5.5662e-01, time spent=0.162s, total_time=0.31min
2021-02-23 19:31:23,799 - root - INFO - [Iter 80/80, Epoch 3] train loss=3.0569e-01, gnorm=5.7299e+00, lr=0.0000e+00, #samples processed=96, #sample per second=239.44
2021-02-23 19:31:23,962 - root - INFO - [Iter 80/80, Epoch 3] valid f1=7.9227e-01, mcc=4.1949e-01, roc_auc=8.2768e-01, accuracy=7.3125e-01, log_loss=5.5729e-01, time spent=0.163s, total_time=0.32min
2021-02-23 19:31:46,032 - autogluon.text.text_prediction.text_prediction - INFO - Results=
INFO:autogluon.text.text_prediction.text_prediction:Results=
2021-02-23 19:31:46,037 - autogluon.text.text_prediction.text_prediction - INFO - Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.mid_units': 35, 'search_space▁optimization.layerwise_lr_decay': 0.9738439468526894, 'search_space▁optimization.lr': 6.0942557927534055e-05, 'search_space▁optimization.warmup_portion': 0.15224771640154977}
INFO:autogluon.text.text_prediction.text_prediction:Best_config={'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.mid_units': 35, 'search_space▁optimization.layerwise_lr_decay': 0.9738439468526894, 'search_space▁optimization.lr': 6.0942557927534055e-05, 'search_space▁optimization.warmup_portion': 0.15224771640154977}
(task:9) 2021-02-23 19:31:27,109 - root - INFO - All Logs will be saved to /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_bohb/task9/training.log
2021-02-23 19:31:27,109 - root - INFO - learning:
early_stopping_patience: 10
log_metrics: auto
stop_metric: auto
valid_ratio: 0.15
misc:
exp_dir: /var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/docs/_build/eval/tutorials/text_prediction/ag_mrpc_custom_space_bohb/task9
seed: 123
model:
backbone:
name: google_electra_small
network:
agg_net:
activation: tanh
agg_type: concat
data_dropout: True
dropout: 0.1
feature_proj_num_layers: -1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 35
norm_eps: 1e-05
normalization: layer_norm
out_proj_num_layers: 0
categorical_net:
activation: leaky
data_dropout: False
dropout: 0.1
emb_units: 32
initializer:
bias: ['zeros']
embed: ['xavier', 'gaussian', 'in', 1.0]
weight: ['xavier', 'uniform', 'avg', 3.0]
mid_units: 64
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
feature_units: -1
initializer:
bias: ['zeros']
weight: ['truncnorm', 0, 0.02]
numerical_net:
activation: leaky
data_dropout: False
dropout: 0.1
initializer:
bias: ['zeros']
weight: ['xavier', 'uniform', 'avg', 3.0]
input_centering: False
mid_units: 128
norm_eps: 1e-05
normalization: layer_norm
num_layers: 1
text_net:
pool_type: cls
use_segment_id: True
preprocess:
max_length: 128
merge_text: True
optimization:
batch_size: 32
begin_lr: 0.0
final_lr: 0.0
layerwise_lr_decay: 0.9738439468526894
log_frequency: 0.1
lr: 6.0942557927534055e-05
lr_scheduler: triangular
max_grad_norm: 1.0
model_average: 5
num_train_epochs: 4
optimizer: adamw
optimizer_params: [('beta1', 0.9), ('beta2', 0.999), ('epsilon', 1e-06), ('correct_bias', False)]
per_device_batch_size: 16
val_batch_size_mult: 2
valid_frequency: 0.1
warmup_portion: 0.15224771640154977
wd: 0.01
version: 1
2021-02-23 19:31:27,255 - root - INFO - Process training set...
2021-02-23 19:31:27,517 - root - INFO - Done!
2021-02-23 19:31:27,517 - root - INFO - Process dev set...
2021-02-23 19:31:27,581 - root - INFO - Done!
2021-02-23 19:31:32,847 - root - INFO - #Total Params/Fixed Params=13483522/0
2021-02-23 19:31:32,863 - root - INFO - Using gradient accumulation. Global batch size = 32
2021-02-23 19:31:33,789 - root - INFO - [Iter 2/80, Epoch 0] train loss=4.6767e-01, gnorm=7.8515e+00, lr=1.0157e-05, #samples processed=96, #sample per second=108.65
2021-02-23 19:31:34,034 - root - INFO - [Iter 2/80, Epoch 0] valid f1=7.5486e-01, mcc=-6.3079e-02, roc_auc=4.8585e-01, accuracy=6.0625e-01, log_loss=7.6924e-01, time spent=0.169s, total_time=0.02min
2021-02-23 19:31:34,379 - root - INFO - [Iter 4/80, Epoch 0] train loss=5.8420e-01, gnorm=8.2817e+00, lr=2.0314e-05, #samples processed=96, #sample per second=162.90
2021-02-23 19:31:34,542 - root - INFO - [Iter 4/80, Epoch 0] valid f1=6.9748e-01, mcc=-1.0668e-01, roc_auc=5.7456e-01, accuracy=5.5000e-01, log_loss=6.7545e-01, time spent=0.163s, total_time=0.03min
2021-02-23 19:31:34,897 - root - INFO - [Iter 6/80, Epoch 0] train loss=4.4306e-01, gnorm=5.4854e+00, lr=3.0471e-05, #samples processed=96, #sample per second=185.23
2021-02-23 19:31:35,200 - root - INFO - [Iter 6/80, Epoch 0] valid f1=7.5486e-01, mcc=-6.3079e-02, roc_auc=6.9305e-01, accuracy=6.0625e-01, log_loss=6.9877e-01, time spent=0.163s, total_time=0.04min
2021-02-23 19:31:35,584 - root - INFO - [Iter 8/80, Epoch 0] train loss=4.3820e-01, gnorm=5.5528e+00, lr=4.0628e-05, #samples processed=96, #sample per second=139.79
2021-02-23 19:31:35,913 - root - INFO - [Iter 8/80, Epoch 0] valid f1=7.6860e-01, mcc=2.0526e-01, roc_auc=7.5181e-01, accuracy=6.5000e-01, log_loss=6.1022e-01, time spent=0.163s, total_time=0.05min
2021-02-23 19:31:36,199 - root - INFO - [Iter 10/80, Epoch 0] train loss=4.3169e-01, gnorm=3.9696e+00, lr=5.0785e-05, #samples processed=96, #sample per second=156.17
2021-02-23 19:31:36,496 - root - INFO - [Iter 10/80, Epoch 0] valid f1=7.5532e-01, mcc=4.1054e-01, roc_auc=7.8111e-01, accuracy=7.1250e-01, log_loss=5.6111e-01, time spent=0.164s, total_time=0.06min
2021-02-23 19:31:36,762 - root - INFO - [Iter 12/80, Epoch 0] train loss=4.3639e-01, gnorm=7.6728e+00, lr=6.0943e-05, #samples processed=96, #sample per second=170.44
2021-02-23 19:31:37,085 - root - INFO - [Iter 12/80, Epoch 0] valid f1=8.0909e-01, mcc=4.3034e-01, roc_auc=8.0349e-01, accuracy=7.3750e-01, log_loss=5.3263e-01, time spent=0.160s, total_time=0.07min
2021-02-23 19:31:37,395 - root - INFO - [Iter 14/80, Epoch 0] train loss=3.8425e-01, gnorm=2.8631e+00, lr=5.9150e-05, #samples processed=96, #sample per second=151.67
2021-02-23 19:31:37,558 - root - INFO - [Iter 14/80, Epoch 0] valid f1=7.8543e-01, mcc=2.9090e-01, roc_auc=8.1880e-01, accuracy=6.6875e-01, log_loss=7.5351e-01, time spent=0.162s, total_time=0.08min
2021-02-23 19:31:37,853 - root - INFO - [Iter 16/80, Epoch 0] train loss=4.6195e-01, gnorm=3.7787e+00, lr=5.7358e-05, #samples processed=96, #sample per second=209.80
2021-02-23 19:31:38,016 - root - INFO - [Iter 16/80, Epoch 0] valid f1=7.9828e-01, mcc=3.6437e-01, roc_auc=8.1633e-01, accuracy=7.0625e-01, log_loss=5.6123e-01, time spent=0.163s, total_time=0.09min
2021-02-23 19:31:38,256 - root - INFO - [Iter 18/80, Epoch 0] train loss=3.1475e-01, gnorm=7.4730e+00, lr=5.5565e-05, #samples processed=96, #sample per second=238.22
2021-02-23 19:31:38,419 - root - INFO - [Iter 18/80, Epoch 0] valid f1=8.0870e-01, mcc=4.1022e-01, roc_auc=8.1238e-01, accuracy=7.2500e-01, log_loss=5.3643e-01, time spent=0.162s, total_time=0.09min
2021-02-23 19:31:38,728 - root - INFO - [Iter 20/80, Epoch 0] train loss=3.9714e-01, gnorm=3.4761e+00, lr=5.3773e-05, #samples processed=96, #sample per second=203.73
2021-02-23 19:31:38,890 - root - INFO - [Iter 20/80, Epoch 0] valid f1=8.0508e-01, mcc=3.9022e-01, roc_auc=8.2242e-01, accuracy=7.1250e-01, log_loss=6.1066e-01, time spent=0.162s, total_time=0.10min
2021-02-23 19:31:39,153 - root - INFO - [Iter 22/80, Epoch 1] train loss=3.9160e-01, gnorm=3.3331e+00, lr=5.1980e-05, #samples processed=96, #sample per second=225.88
2021-02-23 19:31:39,432 - root - INFO - [Iter 22/80, Epoch 1] valid f1=8.1188e-01, mcc=4.9221e-01, roc_auc=8.1797e-01, accuracy=7.6250e-01, log_loss=5.3141e-01, time spent=0.162s, total_time=0.11min
2021-02-23 19:31:39,726 - root - INFO - [Iter 24/80, Epoch 1] train loss=4.1469e-01, gnorm=2.9006e+00, lr=5.0188e-05, #samples processed=96, #sample per second=167.65
2021-02-23 19:31:40,035 - root - INFO - [Iter 24/80, Epoch 1] valid f1=8.2075e-01, mcc=4.8683e-01, roc_auc=8.1600e-01, accuracy=7.6250e-01, log_loss=5.6190e-01, time spent=0.166s, total_time=0.12min
2021-02-23 19:31:40,330 - root - INFO - [Iter 26/80, Epoch 1] train loss=3.4090e-01, gnorm=3.3052e+00, lr=4.8396e-05, #samples processed=96, #sample per second=158.93
2021-02-23 19:31:40,496 - root - INFO - [Iter 26/80, Epoch 1] valid f1=7.9325e-01, mcc=3.3671e-01, roc_auc=8.1863e-01, accuracy=6.9375e-01, log_loss=7.1003e-01, time spent=0.166s, total_time=0.13min
2021-02-23 19:31:40,765 - root - INFO - [Iter 28/80, Epoch 1] train loss=2.8015e-01, gnorm=3.1893e+00, lr=4.6603e-05, #samples processed=96, #sample per second=220.58
2021-02-23 19:31:40,929 - root - INFO - [Iter 28/80, Epoch 1] valid f1=8.0172e-01, mcc=3.7992e-01, roc_auc=8.1649e-01, accuracy=7.1250e-01, log_loss=6.8628e-01, time spent=0.163s, total_time=0.13min
2021-02-23 19:31:41,187 - root - INFO - [Iter 30/80, Epoch 1] train loss=3.6034e-01, gnorm=7.0928e+00, lr=4.4811e-05, #samples processed=96, #sample per second=227.83
2021-02-23 19:31:41,503 - root - INFO - [Iter 30/80, Epoch 1] valid f1=8.3333e-01, mcc=5.4465e-01, roc_auc=8.1666e-01, accuracy=7.8750e-01, log_loss=5.7218e-01, time spent=0.164s, total_time=0.14min
2021-02-23 19:31:41,748 - root - INFO - [Iter 32/80, Epoch 1] train loss=2.8978e-01, gnorm=7.5893e+00, lr=4.3018e-05, #samples processed=96, #sample per second=171.14
2021-02-23 19:31:41,914 - root - INFO - [Iter 32/80, Epoch 1] valid f1=7.6800e-01, mcc=1.7070e-01, roc_auc=7.9987e-01, accuracy=6.3750e-01, log_loss=9.5487e-01, time spent=0.166s, total_time=0.15min
2021-02-23 19:31:42,154 - root - INFO - [Iter 34/80, Epoch 1] train loss=5.7251e-01, gnorm=2.1974e+01, lr=4.1226e-05, #samples processed=96, #sample per second=236.37
2021-02-23 19:31:42,319 - root - INFO - [Iter 34/80, Epoch 1] valid f1=7.6800e-01, mcc=1.7070e-01, roc_auc=7.9707e-01, accuracy=6.3750e-01, log_loss=9.6916e-01, time spent=0.165s, total_time=0.16min
2021-02-23 19:31:42,564 - root - INFO - [Iter 36/80, Epoch 1] train loss=6.0730e-01, gnorm=5.3172e+00, lr=3.9433e-05, #samples processed=96, #sample per second=234.39
2021-02-23 19:31:42,729 - root - INFO - [Iter 36/80, Epoch 1] valid f1=8.2407e-01, mcc=4.8765e-01, roc_auc=8.0579e-01, accuracy=7.6250e-01, log_loss=5.7205e-01, time spent=0.165s, total_time=0.16min
2021-02-23 19:31:42,985 - root - INFO - [Iter 38/80, Epoch 1] train loss=3.2658e-01, gnorm=3.3674e+00, lr=3.7641e-05, #samples processed=96, #sample per second=228.23
2021-02-23 19:31:43,150 - root - INFO - [Iter 38/80, Epoch 1] valid f1=8.1106e-01, mcc=4.4410e-01, roc_auc=8.2143e-01, accuracy=7.4375e-01, log_loss=5.4215e-01, time spent=0.165s, total_time=0.17min
2021-02-23 19:31:43,447 - root - INFO - [Iter 40/80, Epoch 1] train loss=3.7708e-01, gnorm=6.4115e+00, lr=3.5849e-05, #samples processed=96, #sample per second=207.89
2021-02-23 19:31:43,611 - root - INFO - [Iter 40/80, Epoch 1] valid f1=8.0583e-01, mcc=4.6153e-01, roc_auc=8.2587e-01, accuracy=7.5000e-01, log_loss=5.1500e-01, time spent=0.164s, total_time=0.18min
2021-02-23 19:31:43,856 - root - INFO - [Iter 42/80, Epoch 2] train loss=3.8652e-01, gnorm=4.4555e+00, lr=3.4056e-05, #samples processed=96, #sample per second=234.35
2021-02-23 19:31:44,020 - root - INFO - [Iter 42/80, Epoch 2] valid f1=8.1385e-01, mcc=4.2945e-01, roc_auc=8.2785e-01, accuracy=7.3125e-01, log_loss=5.7447e-01, time spent=0.163s, total_time=0.19min
2021-02-23 19:31:44,276 - root - INFO - [Iter 44/80, Epoch 2] train loss=3.5389e-01, gnorm=3.3103e+00, lr=3.2264e-05, #samples processed=96, #sample per second=228.60
2021-02-23 19:31:44,440 - root - INFO - [Iter 44/80, Epoch 2] valid f1=8.0930e-01, mcc=4.4385e-01, roc_auc=8.2258e-01, accuracy=7.4375e-01, log_loss=5.2430e-01, time spent=0.163s, total_time=0.19min
2021-02-23 19:31:44,703 - root - INFO - [Iter 46/80, Epoch 2] train loss=2.9589e-01, gnorm=7.1420e+00, lr=3.0471e-05, #samples processed=96, #sample per second=225.01
2021-02-23 19:31:44,867 - root - INFO - [Iter 46/80, Epoch 2] valid f1=8.1223e-01, mcc=4.2502e-01, roc_auc=8.1929e-01, accuracy=7.3125e-01, log_loss=5.8554e-01, time spent=0.164s, total_time=0.20min
2021-02-23 19:31:45,103 - root - INFO - [Iter 48/80, Epoch 2] train loss=2.2526e-01, gnorm=2.7700e+00, lr=2.8679e-05, #samples processed=96, #sample per second=239.97
2021-02-23 19:31:45,268 - root - INFO - [Iter 48/80, Epoch 2] valid f1=8.0672e-01, mcc=3.9761e-01, roc_auc=8.2192e-01, accuracy=7.1250e-01, log_loss=6.6485e-01, time spent=0.164s, total_time=0.21min
2021-02-23 19:31:45,523 - root - INFO - [Iter 50/80, Epoch 2] train loss=3.5919e-01, gnorm=4.8537e+00, lr=2.6886e-05, #samples processed=96, #sample per second=228.86
2021-02-23 19:31:45,688 - root - INFO - [Iter 50/80, Epoch 2] valid f1=8.1106e-01, mcc=4.4410e-01, roc_auc=8.2061e-01, accuracy=7.4375e-01, log_loss=5.7935e-01, time spent=0.165s, total_time=0.21min
2021-02-23 19:31:45,692 - root - INFO - Early stopping patience reached!
dev_score = predictor_mrpc_bohb.evaluate(dev_data, metrics=['acc', 'f1'])
print('Best Config = {}'.format(predictor_mrpc_bohb.results['best_config']))
print('Total Time = {}s'.format(predictor_mrpc_bohb.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
print('F1 = {:.2f}%'.format(dev_score['f1'] * 100))
/var/lib/jenkins/workspace/workspace/autogluon-tutorial-text-v3/venv/lib/python3.8/site-packages/mxnet/gluon/block.py:995: UserWarning: The 3-th input to HybridBlock is not used by any computation. Is this intended?
self._build_cache(*args)
Best Config = {'search_space▁model.network.agg_net.data_dropout▁choice': 1, 'search_space▁model.network.agg_net.mid_units': 35, 'search_space▁optimization.layerwise_lr_decay': 0.9738439468526894, 'search_space▁optimization.lr': 6.0942557927534055e-05, 'search_space▁optimization.warmup_portion': 0.15224771640154977}
Total Time = 73.45021677017212s
Accuracy = 73.53%
F1 = 81.18%
predictions = predictor_mrpc_bohb.predict(dev_data)
prediction1 = predictor_mrpc_bohb.predict({'sentence1': [sentence1], 'sentence2': [sentence2]})
prediction1_prob = predictor_mrpc_bohb.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence2]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence2))
print('Prediction = "{}"'.format(prediction1[0] == 1))
print('Prob = "{}"'.format(prediction1_prob[0]))
print('')
prediction2 = predictor_mrpc_bohb.predict({'sentence1': [sentence1], 'sentence2': [sentence3]})
prediction2_prob = predictor_mrpc_bohb.predict_proba({'sentence1': [sentence1], 'sentence2': [sentence3]})
print('A = "{}"'.format(sentence1))
print('B = "{}"'.format(sentence3))
print('Prediction = "{}"'.format(prediction2[0] == 1))
print('Prob = "{}"'.format(prediction2_prob[0]))
A = "It is simple to solve NLP problems with AutoGluon."
B = "With AutoGluon, it is easy to solve NLP problems."
Prediction = "True"
Prob = "[0.01103185 0.9889682 ]"
A = "It is simple to solve NLP problems with AutoGluon."
B = "AutoGluon gives you a very bad user experience for solving NLP problems."
Prediction = "True"
Prob = "[0.133422 0.86657804]"