Aria#


README(From Huggingface)#


language:

  • en library_name: transformers license: apache-2.0 pipeline_tag: image-text-to-text tags:

  • multimodal

  • aria base_model:

  • rhymes-ai/Aria-Base-64K


Aria Model Card#

[Dec 1, 2024] We have released the base models (with native multimodal pre-training) for Aria (Aria-Base-8K and Aria-Base-64K) for research purposes and continue training.

Key features#

  • SoTA Multimodal Native Performance: Aria achieves strong performance on a wide range of multimodal, language, and coding tasks. It is superior in video and document understanding.

  • Lightweight and Fast: Aria is a mixture-of-expert model with 3.9B activated parameters per token. It efficently encodes visual input of variable sizes and aspect ratios.

  • Long Multimodal Context Window: Aria supports multimodal input of up to 64K tokens. It can caption a 256-frame video in 10 seconds.

🔗 Try Aria! · 📖 Blog · 📌 Paper · ⭐ GitHub · 🟣 Discord