Jamba (language model)

Jamba
Developer(s)	AI21 Labs
Initial release	28 March 2024
Type	Large language model; Generative pre-trained transformer; Mamba (deep learning architecture); Mixture of experts; Foundation model;
License	Apache 2.0 License

Jamba is an open-weights large language model (LLM) developed by AI21 Labs.^[1]^[2] It utilizes a Mamba-based model built on a novel state space model (SSM) and transformer hybrid architecture.^[3]^[1]^[4] It is a 52 billion parameter model trained using a mixture-of-experts (MoE) technique with 12B active parameters (number of parameters active per token).^[2]^[1] Jamba can fit up to 256K tokens in its context window and is the largest Mamba-variant LLM created, or 140k tokens in a single 80GB GPU^[2]^[3]

Jamba performs well across a number of key measurements including throughput and efficiency while outperforming or matching other state-of-the-art models in its class on a wide range of performance benchmarks while having significantly greater context limits enabling use-cases that require increased context.^[1]^[2] The model is released with open weights under an Apache 2.0 license^[5]^[4]

The company plans to release a beta-version instruct-tuned version on the AI21 Platform in the near future^[6]

Characteristics

Context window size: 256k tokens^[6]
Parameters: 52 billion^[6]
Architecture: Hybrid Mamba (SSM) Transformer using Mixture of Experts (MoE)^[6]

References

[1]

[2]

[3]

[4]

[5]

[6]

Search

Jamba (language model)

Characteristics

See also

References