view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! 1 day ago • 18
Comma v0.1 Artifacts Collection A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated 1 day ago • 3
Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 1 day ago • 7
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published 1 day ago • 25
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated 1 day ago • 13
view article Article Announcing the Common Pile and Comma v0.1 By common-pile • about 19 hours ago • 12
GRMR V3 Models Collection An improved set of models for grammar correction. (Chat template should work, no "responding as an LLM" anymore, that kind of stuff). • 6 items • Updated 3 days ago • 9
Teuken-7B-v0.4 Collection OpenGPT-X Teuken 7B models trained on 4 trillion tokens • 4 items • Updated Dec 6, 2024 • 4
Pinecone Inference Collection Models available in the Inference API • 2 items • Updated Sep 9, 2024 • 2
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory Paper • 2504.19413 • Published Apr 28 • 17
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published 13 days ago • 144
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published 18 days ago • 129
view article Article 🥬 LettuceDetect Goes Multilingual: Fine-tuning EuroBERT on Synthetic Translations By adaamko and 1 other • 19 days ago • 9
RAGTruth LLM Translations Collection This collection includes our translated training data that we've used to create multilingual hallucination detection models. • 8 items • Updated 20 days ago • 3