optimization
- class LinearDecayWithWarmup(learning_rate, total_steps, warmup, last_epoch=-1, verbose=False)[source]
Bases:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
learning_rate
, after this warmup period learning rate would be decreased linearly from the base learning rate to 0.- Parameters:
learning_rate (float) – The base learning rate. It is a python float number.
total_steps (int) – The number of training steps.
warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
verbose (bool, optional) – If True, prints a message to stdout for each update. Defaults to False.
Examples
from paddlenlp.transformers import LinearDecayWithWarmup lr, warmup_steps, max_steps = 0.1, 100, 1000 lr_scheduler = LinearDecayWithWarmup(lr, max_steps, warmup_steps)
- class ConstScheduleWithWarmup(learning_rate, warmup, total_steps=None, last_epoch=-1, verbose=False)[source]
Bases:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
learning_rate
during warmup periods and keeps learning rate a constant after that.- Parameters:
learning_rate (float) – The base learning rate. It is a python float number.
warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
total_steps (int, optional) – The number of training steps. If
warmup
is a float number,total_steps
must be provided. Defaults to None.last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
Examples
from paddlenlp.transformers import ConstScheduleWithWarmup lr, warmup_steps = 0.1, 100 lr_scheduler = ConstScheduleWithWarmup(lr, warmup_steps)
- class CosineDecayWithWarmup(learning_rate, total_steps, warmup, with_hard_restarts=False, num_cycles=None, last_epoch=-1, verbose=False)[source]
Bases:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
learning_rate
, after this warmup period learning rate would be decreased following the values of the cosine function. Ifwith_hard_restarts
is True, the cosine function could have serveral hard restarts.- Parameters:
learning_rate (float) – The base learning rate. It is a python float number.
total_steps (int) – The number of training steps.
warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
with_hard_restarts (bool) – Whether cosine function has several hard restarts. Defaults to False.
num_cycles (int or float, optional) – If
with_hard_restarts
is False, it means the number of waves in cosine scheduler and should be an integer number and defaults to 1. Ifwith_hard_restarts
is True, it means the number of hard restarts to use and should be a float number and defaults to be 0.5. Defaults to None.last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
Examples
from paddlenlp.transformers import CosineDecayWithWarmup lr, warmup_steps, max_steps = 0.1, 100, 1000 lr_scheduler = CosineDecayWithWarmup(lr, max_steps, warmup_steps)
- class PolyDecayWithWarmup(learning_rate, total_steps, warmup, lr_end=1e-07, power=1.0, last_epoch=-1, verbose=False)[source]
Bases:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
lr_init
, after this warmup period learning rate would be decreased as a polynomial decay from the base learning rate to the end learning ratelr_end
.- Parameters:
learning_rate (float) – The base learning rate. It is a python float number.
total_steps (int) – The number of training steps.
warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
lr_end (float, optional) – The end learning rate. Defaults to 1e-7.
power (float, optional) – Power factor. Defaults to 1.0.
last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
Examples
from paddlenlp.transformers import PolyDecayWithWarmup lr, lr_end, warmup_steps, max_steps = 0.1, 1e-6, 100, 1000 lr_scheduler = PolyDecayWithWarmup(lr, max_steps, warmup_steps, lr_end)
- class CosineAnnealingWithWarmupDecay(max_lr, min_lr, warmup_step, decay_step, last_epoch=-1, verbose=False)[source]
Bases:
LRScheduler