view article Article Welcome FalconMamba: The first strong attention-free 7B model By JingweiZuo and 5 others • Aug 12, 2024 • 112
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14, 2024 • 33