encoder

infer_transformer_encoder(input, q_weight, q_bias, k_weight, k_bias, v_weight, v_bias, attn_out_weight, attn_out_bias, attn_mask, norm1_weight, norm1_bias, norm2_weight, norm2_bias, ffn_inter_weight, ffn_inter_bias, ffn_out_weight, ffn_out_bias, n_head, size_per_head, n_layer=12, use_gelu=True, remove_padding=False, int8_mode=0, layer_idx=0, allow_gemm_test=False, use_trt_kernel=False, normalize_before=False)[源代码]

Fusion Encoder API intergrating Encoder inference in FasterTransformer. It accepts the weight and bias of TransformerEncoder and some other parameters for inference.

encoder_layer_forward(self, src, src_mask, cache=None, sequence_id_offset=None, trt_seq_len=None)[源代码]

Redefines forward function of paddle.nn.TransformerEncoderLayer for integrating FasterTransformer for inference.

The original forward function would not be replaced unless enable_faster_encoder is called by objects of its base class. After replacing, objects of paddle.nn.TransformerEncoderLayer also have the same member variables as before.

After inference, disable_faster_encoder could be called to restore the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer.

参数
  • src (Tensor) -- The input of Transformer encoder layer. It is a tensor with shape [batch_size, sequence_length, d_model]. The data type should be float32 or float64.

  • src_mask (Tensor, optional) -- A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape [batch_size, 1, 1, sequence_length]. When the data type is bool, the unwanted positions have False values and the others have True values. When the data type is int, the unwanted positions have 0 values and the others have 1 values. When the data type is float, the unwanted positions have -INF values and the others have 0 values. It can be None when nothing wanted or needed to be prevented attention to. Defaults to None.

返回

It is a tensor that has the same shape and data type as enc_input, representing the output of Transformer encoder layer. Or a tuple if cache is not None, except for encoder layer output, the tuple includes the new cache which is same as input cache argument but incremental_cache has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache and paddle.nn.MultiHeadAttention.forward for more details.

返回类型

src(Tensor|tuple)

encoder_forward(self, src, src_mask=None, cache=None)[源代码]

Redefines forward function of paddle.nn.TransformerEncoder for integrating FasterTransformer for inference.

The original forward function would not be replaced unless enable_faster_encoder is called by objects of its base class. After replacing, objects of paddle.nn.TransformerEncoder also have the same member variables as before.

After inference, disable_faster_encoder could be called to restore the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer.

参数
  • src (Tensor) -- The input of Transformer encoder. It is a tensor with shape [batch_size, sequence_length, d_model]. The data type should be float32 or float64.

  • src_mask (Tensor, optional) -- A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape [batch_size, 1, 1, sequence_length]. When the data type is bool, the unwanted positions have False values and the others have True values. When the data type is int, the unwanted positions have 0 values and the others have 1 values. When the data type is float, the unwanted positions have -INF values and the others have 0 values. It can be None when nothing wanted or needed to be prevented attention to. Defaults to None.

返回

It is a tensor that has the same shape and data type as src, representing the output of Transformer encoder. Or a tuple if cache is not None, except for encoder output, the tuple includes the new cache which is same as input cache argument but incremental_cache in it has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache and paddle.nn.MultiHeadAttention.forward for more details.

返回类型

output (Tensor|tuple)

enable_faster_encoder(self, need_build=True, use_fp16=False)[源代码]

Compiles fusion encoder operator intergrated FasterTransformer using the method of JIT(Just-In-Time) and replaces the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer objects inherited from self to support inference using FasterTransformer.

实际案例

from paddlenlp.ops import enable_faster_encoder, disable_faster_encoder

model.eval()
model = enable_faster_encoder(model)
enc_out = model(src, src_mask)
model = disable_faster_encoder(model)
disable_faster_encoder(self)[源代码]

Restores the original forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer objects inherited from self.

实际案例

from paddlenlp.ops import enable_faster_encoder, disable_faster_encoder

model.eval()
model = enable_faster_encoder(model)
enc_out = model(src, src_mask)
model = disable_faster_encoder(model)
convert_to_fp16(transformer_encoder)[源代码]

Convert paddle.nn.TransformerEncoder's parameter from float32 to float16

参数

transformer_encoder (obeject, paddle.nn.TransformerEncoder) -- The object to be converted to float16 inplaced, it must be an isinstance of paddle.nn.TransformerEncoder.