optimization#
- class LinearDecayWithWarmup(learning_rate, total_steps, warmup, last_epoch=-1, verbose=False)[源代码]#
基类:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
learning_rate
, after this warmup period learning rate would be decreased linearly from the base learning rate to 0.- 参数:
learning_rate (float) -- The base learning rate. It is a python float number.
total_steps (int) -- The number of training steps.
warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
verbose (bool, optional) -- If True, prints a message to stdout for each update. Defaults to False.
示例
from paddlenlp.transformers import LinearDecayWithWarmup lr, warmup_steps, max_steps = 0.1, 100, 1000 lr_scheduler = LinearDecayWithWarmup(lr, max_steps, warmup_steps)
- class ConstScheduleWithWarmup(learning_rate, warmup, total_steps=None, last_epoch=-1, verbose=False)[源代码]#
基类:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
learning_rate
during warmup periods and keeps learning rate a constant after that.- 参数:
learning_rate (float) -- The base learning rate. It is a python float number.
warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
total_steps (int, optional) -- The number of training steps. If
warmup
is a float number,total_steps
must be provided. Defaults to None.last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
示例
from paddlenlp.transformers import ConstScheduleWithWarmup lr, warmup_steps = 0.1, 100 lr_scheduler = ConstScheduleWithWarmup(lr, warmup_steps)
- class CosineDecayWithWarmup(learning_rate, total_steps, warmup, with_hard_restarts=False, num_cycles=None, last_epoch=-1, verbose=False)[源代码]#
基类:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
learning_rate
, after this warmup period learning rate would be decreased following the values of the cosine function. Ifwith_hard_restarts
is True, the cosine function could have serveral hard restarts.- 参数:
learning_rate (float) -- The base learning rate. It is a python float number.
total_steps (int) -- The number of training steps.
warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
with_hard_restarts (bool) -- Whether cosine function has several hard restarts. Defaults to False.
num_cycles (int or float, optional) -- If
with_hard_restarts
is False, it means the number of waves in cosine scheduler and should be an integer number and defaults to 1. Ifwith_hard_restarts
is True, it means the number of hard restarts to use and should be a float number and defaults to be 0.5. Defaults to None.last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
示例
from paddlenlp.transformers import CosineDecayWithWarmup lr, warmup_steps, max_steps = 0.1, 100, 1000 lr_scheduler = CosineDecayWithWarmup(lr, max_steps, warmup_steps)
- class PolyDecayWithWarmup(learning_rate, total_steps, warmup, lr_end=1e-07, power=1.0, last_epoch=-1, verbose=False)[源代码]#
基类:
LambdaDecay
Creates a learning rate scheduler, which increases learning rate linearly from 0 to given
lr_init
, after this warmup period learning rate would be decreased as a polynomial decay from the base learning rate to the end learning ratelr_end
.- 参数:
learning_rate (float) -- The base learning rate. It is a python float number.
total_steps (int) -- The number of training steps.
warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.
lr_end (float, optional) -- The end learning rate. Defaults to 1e-7.
power (float, optional) -- Power factor. Defaults to 1.0.
last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.
示例
from paddlenlp.transformers import PolyDecayWithWarmup lr, lr_end, warmup_steps, max_steps = 0.1, 1e-6, 100, 1000 lr_scheduler = PolyDecayWithWarmup(lr, max_steps, warmup_steps, lr_end)
- class CosineAnnealingWithWarmupDecay(max_lr, min_lr, warmup_step, decay_step, last_epoch=-1, verbose=False)[源代码]#
基类:
LRScheduler