optimization

class LinearDecayWithWarmup(learning_rate, total_steps, warmup, last_epoch=- 1, verbose=False)[源代码]

基类:paddle.optimizer.lr.LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given learning_rate, after this warmup period learning rate would be decreased linearly from the base learning rate to 0.

参数
  • learning_rate (float) -- The base learning rate. It is a python float number.

  • total_steps (int) -- The number of training steps.

  • warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

  • verbose (bool, optional) -- If True, prints a message to stdout for each update. Defaults to False.

实际案例

from paddlenlp.transformers import LinearDecayWithWarmup
lr, warmup_steps, max_steps = 0.1, 100, 1000
lr_scheduler = LinearDecayWithWarmup(lr, max_steps, warmup_steps)
class ConstScheduleWithWarmup(learning_rate, warmup, total_steps=None, last_epoch=- 1, verbose=False)[源代码]

基类:paddle.optimizer.lr.LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given learning_rate during warmup periods and keeps learning rate a constant after that.

参数
  • learning_rate (float) -- The base learning rate. It is a python float number.

  • warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • total_steps (int, optional) -- The number of training steps. If warmup is a float number, total_steps must be provided. Defaults to None.

  • last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

实际案例

from paddlenlp.transformers import ConstScheduleWithWarmup
lr, warmup_steps = 0.1, 100
lr_scheduler = ConstScheduleWithWarmup(lr, warmup_steps)
class CosineDecayWithWarmup(learning_rate, total_steps, warmup, with_hard_restarts=False, num_cycles=None, last_epoch=- 1, verbose=False)[源代码]

基类:paddle.optimizer.lr.LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given learning_rate, after this warmup period learning rate would be decreased following the values of the cosine function. If with_hard_restarts is True, the cosine function could have serveral hard restarts.

参数
  • learning_rate (float) -- The base learning rate. It is a python float number.

  • total_steps (int) -- The number of training steps.

  • warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • with_hard_restarts (bool) -- Whether cosine function has several hard restarts. Defaults to False.

  • num_cycles (int or float, optional) -- If with_hard_restarts is False, it means the number of waves in cosine scheduler and should be an integer number and defaults to 1. If with_hard_restarts is True, it means the number of hard restarts to use and should be a float number and defaults to be 0.5. Defaults to None.

  • last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

实际案例

from paddlenlp.transformers import CosineDecayWithWarmup
lr, warmup_steps, max_steps = 0.1, 100, 1000
lr_scheduler = CosineDecayWithWarmup(lr, max_steps, warmup_steps)
class PolyDecayWithWarmup(learning_rate, total_steps, warmup, lr_end=1e-07, power=1.0, last_epoch=- 1, verbose=False)[源代码]

基类:paddle.optimizer.lr.LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given lr_init, after this warmup period learning rate would be decreased as a polynomial decay from the base learning rate to the end learning rate lr_end.

参数
  • learning_rate (float) -- The base learning rate. It is a python float number.

  • total_steps (int) -- The number of training steps.

  • warmup (int or float) -- If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • lr_end (float, optional) -- The end learning rate. Defaults to 1e-7.

  • power (float, optional) -- Power factor. Defaults to 1.0.

  • last_epoch (int, optional) -- The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

实际案例

from paddlenlp.transformers import PolyDecayWithWarmup
lr, lr_end, warmup_steps, max_steps = 0.1, 1e-6, 100, 1000
lr_scheduler = PolyDecayWithWarmup(lr, max_steps, warmup_steps, lr_end)
class CosineAnnealingWithWarmupDecay(max_lr, min_lr, warmup_step, decay_step, last_epoch=0, verbose=False)[源代码]

基类:paddle.optimizer.lr.LRScheduler

get_lr()[源代码]

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.