ALBERT模型汇总¶
下表汇总介绍了目前PaddleNLP支持的ALBERT模型对应预训练权重。 关于模型的具体细节可以参考对应链接。
Pretrained Weight |
Language |
Details of the model |
---|---|---|
|
English |
12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters. ALBERT base model |
|
English |
24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters. ALBERT large model |
|
English |
24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters. ALBERT xlarge model |
|
English |
12 repeating layers, 128 embedding, 4096-hidden, 64-heads, 223M parameters. ALBERT xxlarge model |
|
English |
12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters. ALBERT base model (version2) |
|
English |
24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters. ALBERT large model (version2) |
|
English |
24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters. ALBERT xlarge model (version2) |
|
English |
12 repeating layers, 128 embedding, 4096-hidden, 64-heads, 223M parameters. ALBERT xxlarge model (version2) |
|
Chinese |
4 repeating layers, 128 embedding, 312-hidden, 12-heads, 4M parameters. ALBERT tiny model (Chinese) |
|
Chinese |
6 repeating layers, 128 embedding, 384-hidden, 12-heads, _M parameters. ALBERT small model (Chinese) |
|
Chinese |
12 repeating layers, 128 embedding, 768-hidden, 12-heads, 12M parameters. ALBERT base model (Chinese) |
|
Chinese |
24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 18M parameters. ALBERT large model (Chinese) |
|
Chinese |
24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 60M parameters. ALBERT xlarge model (Chinese) |
|
Chinese |
12 repeating layers, 128 embedding, 4096-hidden, 16-heads, 235M parameters. ALBERT xxlarge model (Chinese) |