encoder#

infer_transformer_encoder(input, attn_mask, q_weight, q_bias, k_weight, k_bias, v_weight, v_bias, attn_out_weight, attn_out_bias, norm1_weight, norm1_bias, norm2_weight, norm2_bias, ffn_inter_weight, ffn_inter_bias, ffn_out_weight, ffn_out_bias, n_head, size_per_head, n_layer=12, use_gelu=True, remove_padding=False, int8_mode=0, layer_idx=0, allow_gemm_test=False, use_trt_kernel=False, normalize_before=False)[源代码]#

Fusion Encoder API intergrating Encoder inference in FastGeneration. It accepts the weight and bias of TransformerEncoder and some other parameters for inference.

encoder_layer_forward(self, src, src_mask, cache=None, sequence_id_offset=None, trt_seq_len=None)[源代码]#

Redefines forward function of paddle.nn.TransformerEncoderLayer for integrating FastGeneration for inference.

The original forward function would not be replaced unless enable_fast_encoder is called by objects of its base class. After replacing, objects of paddle.nn.TransformerEncoderLayer also have the same member variables as before.

After inference, disable_fast_encoder could be called to restore the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer.

参数:
  • src (Tensor) -- The input of Transformer encoder layer. It is a tensor with shape [batch_size, sequence_length, d_model]. The data type should be float32 or float64.

  • src_mask (Tensor, optional) -- A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape [batch_size, 1, 1, sequence_length]. When the data type is bool, the unwanted positions have False values and the others have True values. When the data type is int, the unwanted positions have 0 values and the others have 1 values. When the data type is float, the unwanted positions have -INF values and the others have 0 values. It can be None when nothing wanted or needed to be prevented attention to. Defaults to None.

返回:

It is a tensor that has the same shape and data type as enc_input, representing the output of Transformer encoder layer. Or a tuple if cache is not None, except for encoder layer output, the tuple includes the new cache which is same as input cache argument but incremental_cache has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache and paddle.nn.MultiHeadAttention.forward for more details.

返回类型:

src(Tensor|tuple)

encoder_forward(self, src, src_mask=None, cache=None)[源代码]#

Redefines forward function of paddle.nn.TransformerEncoder for integrating FastGeneration for inference.

The original forward function would not be replaced unless enable_fast_encoder is called by objects of its base class. After replacing, objects of paddle.nn.TransformerEncoder also have the same member variables as before.

After inference, disable_fast_encoder could be called to restore the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer.

参数:
  • src (Tensor) -- The input of Transformer encoder. It is a tensor with shape [batch_size, sequence_length, d_model]. The data type should be float32 or float16.

  • src_mask (Tensor, optional) -- A tensor used in multi-head attention to prevents attention to some unwanted positions, usually the paddings or the subsequent positions. It is a tensor with shape [batch_size, 1, 1, sequence_length]. The data type must be float, the unwanted positions have -INF values or other non-zeros and the wanted positions must be 0.0.

返回:

It is a tensor that has the same shape and data type as src, representing the output of Transformer encoder. Or a tuple if cache is not None, except for encoder output, the tuple includes the new cache which is same as input cache argument but incremental_cache in it has an incremental length. See paddle.nn.MultiHeadAttention.gen_cache and paddle.nn.MultiHeadAttention.forward for more details.

返回类型:

output (Tensor|tuple)

enable_fast_encoder(self, use_fp16=False, encoder_lib=None)[源代码]#

Compiles fusion encoder operator intergrated FastGeneration using the method of JIT(Just-In-Time) and replaces the forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer objects inherited from self to support inference using FastGeneration.

示例

from paddlenlp.ops import enable_fast_encoder, disable_fast_encoder

model.eval()
model = enable_fast_encoder(model)
enc_out = model(src, src_mask)
model = disable_fast_encoder(model)
disable_fast_encoder(self)[源代码]#

Restores the original forward function of paddle.nn.TransformerEncoder and paddle.nn.TransformerEncoderLayer objects inherited from self.

示例

from paddlenlp.ops import enable_fast_encoder, disable_fast_encoder

model.eval()
model = enable_fast_encoder(model)
enc_out = model(src, src_mask)
model = disable_fast_encoder(model)
convert_to_fp16(transformer_encoder)[源代码]#

Convert paddle.nn.TransformerEncoder's parameter from float32 to float16

参数:

transformer_encoder (obeject, paddle.nn.TransformerEncoder) -- The object to be converted to float16 inplaced, it must be an isinstance of paddle.nn.TransformerEncoder.