optimization#

class LinearDecayWithWarmup(learning_rate, total_steps, warmup, last_epoch=-1, verbose=False)[source]#

Bases: LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given learning_rate, after this warmup period learning rate would be decreased linearly from the base learning rate to 0.

Parameters:
  • learning_rate (float) – The base learning rate. It is a python float number.

  • total_steps (int) – The number of training steps.

  • warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

  • verbose (bool, optional) – If True, prints a message to stdout for each update. Defaults to False.

Examples

from paddlenlp.transformers import LinearDecayWithWarmup
lr, warmup_steps, max_steps = 0.1, 100, 1000
lr_scheduler = LinearDecayWithWarmup(lr, max_steps, warmup_steps)
class ConstScheduleWithWarmup(learning_rate, warmup, total_steps=None, last_epoch=-1, verbose=False)[source]#

Bases: LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given learning_rate during warmup periods and keeps learning rate a constant after that.

Parameters:
  • learning_rate (float) – The base learning rate. It is a python float number.

  • warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • total_steps (int, optional) – The number of training steps. If warmup is a float number, total_steps must be provided. Defaults to None.

  • last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

Examples

from paddlenlp.transformers import ConstScheduleWithWarmup
lr, warmup_steps = 0.1, 100
lr_scheduler = ConstScheduleWithWarmup(lr, warmup_steps)
class CosineDecayWithWarmup(learning_rate, total_steps, warmup, with_hard_restarts=False, num_cycles=None, last_epoch=-1, verbose=False)[source]#

Bases: LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given learning_rate, after this warmup period learning rate would be decreased following the values of the cosine function. If with_hard_restarts is True, the cosine function could have serveral hard restarts.

Parameters:
  • learning_rate (float) – The base learning rate. It is a python float number.

  • total_steps (int) – The number of training steps.

  • warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • with_hard_restarts (bool) – Whether cosine function has several hard restarts. Defaults to False.

  • num_cycles (int or float, optional) – If with_hard_restarts is False, it means the number of waves in cosine scheduler and should be an integer number and defaults to 1. If with_hard_restarts is True, it means the number of hard restarts to use and should be a float number and defaults to be 0.5. Defaults to None.

  • last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

Examples

from paddlenlp.transformers import CosineDecayWithWarmup
lr, warmup_steps, max_steps = 0.1, 100, 1000
lr_scheduler = CosineDecayWithWarmup(lr, max_steps, warmup_steps)
class PolyDecayWithWarmup(learning_rate, total_steps, warmup, lr_end=1e-07, power=1.0, last_epoch=-1, verbose=False)[source]#

Bases: LambdaDecay

Creates a learning rate scheduler, which increases learning rate linearly from 0 to given lr_init, after this warmup period learning rate would be decreased as a polynomial decay from the base learning rate to the end learning rate lr_end.

Parameters:
  • learning_rate (float) – The base learning rate. It is a python float number.

  • total_steps (int) – The number of training steps.

  • warmup (int or float) – If int, it means the number of steps for warmup. If float, it means the proportion of warmup in total training steps.

  • lr_end (float, optional) – The end learning rate. Defaults to 1e-7.

  • power (float, optional) – Power factor. Defaults to 1.0.

  • last_epoch (int, optional) – The index of last epoch. It can be set to restart training. If None, it means initial learning rate. Defaults to -1.

Examples

from paddlenlp.transformers import PolyDecayWithWarmup
lr, lr_end, warmup_steps, max_steps = 0.1, 1e-6, 100, 1000
lr_scheduler = PolyDecayWithWarmup(lr, max_steps, warmup_steps, lr_end)
class CosineAnnealingWithWarmupDecay(max_lr, min_lr, warmup_step, decay_step, last_epoch=-1, verbose=False)[source]#

Bases: LRScheduler

get_lr()[source]#

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

class LinearAnnealingWithWarmupDecay(max_lr, min_lr, warmup_step, decay_step, last_epoch=-1, verbose=False)[source]#

Bases: LRScheduler

get_lr()[source]#

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.