decoder#

class InferTransformerDecoder(decoder, n_head, size_per_head, decoder_lib=None, use_fp16_decoder=False, use_batch_major_op_cache=False)[source]#

FasterTransformer decoder block.

Parameters:
  • decoder (TransformerDecoder) – Transformer decoder block.

  • n_head (int) – The number of head used in multi-head attention.

  • size_per_head (int) – The size of per head used in multi-head attention.

  • decoder_lib (str) – The path to decoder_lib. Default to None.

  • use_fp16_decoder (bool) – Whether to use fp16 for decoder. Default to False.

forward(from_tensor, memory_tensor, mem_seq_len, self_cache_key, self_cache_value, mem_cache, step, memory_hidden_dim, is_fuse_qkv)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

class FasterDecoder(src_vocab_size, trg_vocab_size, max_length, num_encoder_layers, num_decoder_layers, n_head, d_model, d_inner_hid, dropout, weight_sharing, bos_id=0, eos_id=1, max_out_len=256, decoder_lib=None, use_fp16_decoder=False, use_batch_major_op_cache=False)[source]#

FasterTransformer decoder for auto-regressive generation.

Parameters:
  • src_vocab_size (int) – The size of source vocabulary.

  • trg_vocab_size (int) – The size of target vocabulary.

  • max_length (int) – The maximum length of input sequences.

  • num_encoder_layers (int) – The number of sub-layers to be stacked in the encoder.

  • num_decoder_layers (int) – The number of sub-layers to be stacked in the decoder.

  • n_head (int) – The number of head used in multi-head attention.

  • d_model (int) – The dimension for word embeddings, which is also the last dimension of the input and output of multi-head attention, position-wise feed-forward networks, encoder and decoder.

  • d_inner_hid (int) – Size of the hidden layer in position-wise feed-forward networks.

  • dropout (float) – Dropout rates. Used for pre-process, activation and inside attention.

  • weight_sharing (bool) – Whether to use weight sharing.

  • bos_id (int, optional) – The start token id and also is used as padding id. Defaults to 0.

  • eos_id (int, optional) – The end token id. Defaults to 1.

  • max_out_len (int, optional) – The maximum output length. Defaults to 256.

  • decoder_lib (str) – The path to decoder_lib. Default to None.

  • use_fp16_decoder (bool) – Whether to use fp16 for decoder. Default to False.

forward(src_word)[source]#

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters:
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments