Aria#

README(From Huggingface)#

language:

en library_name: transformers license: apache-2.0 pipeline_tag: image-text-to-text tags:
multimodal
aria base_model:
rhymes-ai/Aria-Base-64K

Aria Model Card#

[Dec 1, 2024] We have released the base models (with native multimodal pre-training) for Aria (Aria-Base-8K and Aria-Base-64K) for research purposes and continue training.

Key features#

SoTA Multimodal Native Performance: Aria achieves strong performance on a wide range of multimodal, language, and coding tasks. It is superior in video and document understanding.
Lightweight and Fast: Aria is a mixture-of-expert model with 3.9B activated parameters per token. It efficently encodes visual input of variable sizes and aspect ratios.
Long Multimodal Context Window: Aria supports multimodal input of up to 64K tokens. It can caption a 256-frame video in 10 seconds.

🔗 Try Aria! · 📖 Blog · 📌 Paper · ⭐ GitHub · 🟣 Discord

Aria

目录

Aria#

README(From Huggingface)#

Aria Model Card#

Key features#