Checkpoints for the main experiments in "Forgetting Transformer: Softmax Attention with a Forget Gate" (https://arxiv.org/abs/2503.02130).
-
zhixuan-lin/fox-pro-760m-longcrawl64-48b
Text Generation • Updated • 7 -
zhixuan-lin/transformer-pro-760m-longcrawl64-48b
Text Generation • Updated • 5 -
zhixuan-lin/fox-llama-760m-longcrawl64-48b
Text Generation • Updated • 9 -
zhixuan-lin/transformer-llama-760m-longcrawl64-48b
Text Generation • Updated • 6