modeling¶
-
class
OPTModel
(vocab_size: int, hidden_size: int = 768, word_embed_proj_dim: int = 768, num_hidden_layers: int = 12, num_attention_heads: int = 12, intermediate_size: int = 3072, hidden_act: str = 'relu', hidden_dropout_prob: float = 0.1, attention_probs_dropout_prob: float = 0.1, max_position_embeddings: int = 512, type_vocab_size: int = 16, initializer_range: float = 0.02, pad_token_id: int = 0, eos_token_id: int = 7, bos_token_id: int = 0, eol_token_id: int = 3, normalize_before: bool = True, **kwargs)[源代码]¶ 基类:
paddlenlp.transformers.opt.modeling.OPTPretrainedModel
The bare OPT Model transformer outputting raw hidden-states.
This model inherits from
PretrainedModel
. Refer to the superclass documentation for the generic methods.This model is also a Paddle paddle.nn.Layer subclass. Use it as a regular Paddle Layer and refer to the Paddle documentation for all matter related to general usage and behavior.
- 参数
vocab_size (int) -- Vocabulary size of
inputs_ids
inOPTModel
. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by theinputs_ids
passed when callingOPTModel
.hidden_size (int, optional) -- Dimensionality of the embedding layer and decoder layer. Defaults to
768
.num_hidden_layers (int, optional) -- Number of hidden layers in the Transformer decoder. Defaults to
12
.num_attention_heads (int, optional) -- Number of attention heads for each attention layer in the Transformer decoder. Defaults to
12
.intermediate_size (int, optional) -- Dimensionality of the feed-forward (ff) layer in the decoder. Input tensors to ff layers are firstly projected from
hidden_size
tointermediate_size
, and then projected back tohidden_size
. Typicallyintermediate_size
is larger thanhidden_size
. Defaults to3072
.hidden_act (str, optional) -- The non-linear activation function in the feed-forward layer.
"gelu"
,"relu"
and any other paddle supported activation functions are supported. Defaults to"relu"
.hidden_dropout_prob (float, optional) -- The dropout probability for all fully connected layers in the embeddings and decoder. Defaults to
0.1
.attention_probs_dropout_prob (float, optional) -- The dropout probability used in MultiHeadAttention in all decoder layers to drop some attention target. Defaults to
0.1
.max_position_embeddings (int, optional) -- The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence. Defaults to
512
.type_vocab_size (int, optional) --
The vocabulary size of the
token_type_ids
. Defaults to16
.注解
Please NOT using
type_vocab_size
, for it will be obsolete in the future..initializer_range (float, optional) --
The standard deviation of the normal initializer. Default to
0.02
.注解
A normal_initializer initializes weight matrices as normal distributions. See
OPTPretrainedModel._init_weights()
for how weights are initialized inOPTModel
.pad_token_id (int, optional) --
- The index of padding token in the token vocabulary.
to
0
.
-
forward
(input_ids, position_ids=None, attention_mask=None, use_cache=False, cache=None)[源代码]¶ The OPTModel forward method, overrides the
__call__()
special method.- 参数
input_ids (Tensor) -- Indices of input sequence tokens in the vocabulary. They are numerical representations of tokens that build the input sequence. Its data type should be
int64
and it has a shape of [batch_size, sequence_length].position_ids (Tensor, optional) -- Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, max_position_embeddings - 1]
. Shape as(batch_size, num_tokens)
and dtype as int64. Defaults toNone
.attention_mask (Tensor, optional) -- Mask used in self attention to avoid performing attention to some unwanted positions, usually the subsequent positions. It is a tensor with shape broadcasted to
[batch_size, num_attention_heads, sequence_length, sequence_length]
. It is a tensor with shape broadcasted to[batch_size, num_attention_heads, sequence_length, sequence_length]
. For example, its shape can be [batch_size, sequence_length], [batch_size, sequence_length, sequence_length], [batch_size, num_attention_heads, sequence_length, sequence_length]. Its data type should be float32. Themasked
tokens have-1e-9
values, and theunmasked
tokens have0
values. Defaults toNone
, which means nothing needed to be prevented attention to.use_cache (bool, optional) -- Whether or not to use cache. Defaults to
False
. If set toTrue
, key value states will be returned and can be used to speed up decoding.cache (list, optional) -- It is a list, and each element in the list is a tuple
(incremental_cache, static_cache)
. See TransformerDecoder.gen_cache for more details. It is only used for inference and should be None for training. Default toNone
.
- 返回
Returns tensor
encoder_output
, which is the output at the last layer of the model. Its data type should be float32 and has a shape of [batch_size, sequence_length, hidden_size].- 返回类型
Tensor
示例
import paddle from paddlenlp.transformers import OPTModel, GPTTokenizer tokenizer = GPTTokenizer.from_pretrained('facebook/opt-125m') model = OPTModel.from_pretrained('facebook/opt-125m') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLimage.pngP!", return_token_type_ids=False) inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} output = model(**inputs)
-
class
OPTPretrainedModel
(*args, **kwargs)[源代码]¶ 基类:
paddlenlp.transformers.model_utils.PretrainedModel
An abstract class for pretrained OPT models. It provides OPT related
model_config_file
,resource_files_names
,pretrained_resource_files_map
,pretrained_init_configuration
,base_model_prefix
for downloading and loading pretrained models. SeePretrainedModel
for more details.-
base_model_class
¶
-
-
class
OPTForCausalLM
(opt: paddlenlp.transformers.opt.modeling.OPTModel)[源代码]¶ 基类:
paddlenlp.transformers.opt.modeling.OPTPretrainedModel
The OPT Model with a
language modeling
head on top.-
forward
(input_ids, position_ids=None, attention_mask=None, use_cache=False, cache=None)[源代码]¶ - 参数
- 返回
Returns tensor
logits
or tuple(logits, cached_kvs)
. Ifuse_cache
is True, tuple (logits, cached_kvs
) will be returned. Otherwise, tensorlogits
will be returned.logits
is the output of the opt model.cache_kvs
is the cache output of opt model ifuse_cache
is True.- 返回类型
Tensor or tuple
示例
import paddle from paddlenlp.transformers import OPTForCausalLM, GPTTokenizer tokenizer = GPTTokenizer.from_pretrained('facebook/opt-125m') model = OPTForCausalLM.from_pretrained('facebook/opt-125m') inputs = tokenizer("Welcome to use PaddlePaddle and PaddleNLP!") inputs = {k:paddle.to_tensor([v]) for (k, v) in inputs.items()} output_ids, score = model.generate(input_ids=inputs['input_ids']) print(tokenizer.batch_decode(output_ids[0]))
-