ALBERT模型汇总#

下表汇总介绍了目前PaddleNLP支持的ALBERT模型对应预训练权重。 关于模型的具体细节可以参考对应链接。

Pretrained Weight

Language

Details of the model

albert-base-v1

English

12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters. ALBERT base model

albert-large-v1

English

24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters. ALBERT large model

albert-xlarge-v1

English

24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters. ALBERT xlarge model

albert-xxlarge-v1

English

12 repeating layers, 128 embedding, 4096-hidden, 64-heads, 223M parameters. ALBERT xxlarge model

albert-base-v2

English

12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters. ALBERT base model (version2)

albert-large-v2

English

24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters. ALBERT large model (version2)

albert-xlarge-v2

English

24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters. ALBERT xlarge model (version2)

albert-xxlarge-v2

English

12 repeating layers, 128 embedding, 4096-hidden, 64-heads, 223M parameters. ALBERT xxlarge model (version2)

albert-chinese-tiny

Chinese

4 repeating layers, 128 embedding, 312-hidden, 12-heads, 4M parameters. ALBERT tiny model (Chinese)

albert-chinese-small

Chinese

6 repeating layers, 128 embedding, 384-hidden, 12-heads, _M parameters. ALBERT small model (Chinese)

albert-chinese-base

Chinese

12 repeating layers, 128 embedding, 768-hidden, 12-heads, 12M parameters. ALBERT base model (Chinese)

albert-chinese-large

Chinese

24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 18M parameters. ALBERT large model (Chinese)

albert-chinese-xlarge

Chinese

24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 60M parameters. ALBERT xlarge model (Chinese)

albert-chinese-xxlarge

Chinese

12 repeating layers, 128 embedding, 4096-hidden, 16-heads, 235M parameters. ALBERT xxlarge model (Chinese)