File size: 608 Bytes

a21caa0
 
 
 
 
 
 
 
16a919e
a21caa0
81034cc

---
license: apache-2.0
datasets:
- cerebras/SlimPajama-627B
language:
- en
---

Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated Linear Attention Transformers with Hardware-Efficient Training](https://arxiv.org/abs/2312.06635).

The model was trained on a sample of SlimPajama with 15B tokens.

Due to changes in the MLP layer structure in the latest version of fla, the weights cannot be loaded. You can use the version at [fla](https://github.com/fla-org/flash-linear-attention/tree/8346a33792558d8e3eb206fe18404de037e11d9c) instead.