Common Pile v0.1 Raw Data Collection 8TB of public domain and openly licensed text • 30 items • Updated 1 day ago • 5
Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 1 day ago • 7
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! 1 day ago • 18
view article Article Explore, Build, and Innovate AI Reasoning with NVIDIA’s Open Models and Recipes By nvidia and 2 others • 3 days ago • 16
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other • 4 days ago • 60
BioReason Collection BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model • 3 items • Updated 5 days ago • 9
ConTEB training datasets Collection Training data for the InSeNT method. • 3 items • Updated 5 days ago • 1
ConTEB evaluation datasets Collection Evaluation datasets of the ConTEB benchmark. Use "test" split where available, otherwise "validation", otherwise "train". • 8 items • Updated 5 days ago • 1
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other • 5 days ago • 23
Comma v0.1 Artifacts Collection A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated 1 day ago • 3
view article Article Interactive Tools for machine learning, deep learning, and math By Suzana • 12 days ago • 40
view article Article Tiny Agents in Python: a MCP-powered agent in ~70 lines of code By celinah and 3 others • 15 days ago • 122
view changelog Changelog Xet is now the default storage option for new users and organizations 15 days ago • 58
view article Article NVIDIA Cosmos Now Available On Hugging Face For Physical AI Reasoning By PranjaliJoshi and 1 other • 19 days ago • 24
LightLab: Controlling Light Sources in Images with Diffusion Models Paper • 2505.09608 • Published 24 days ago • 31
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Paper • 2505.02370 • Published May 5 • 14
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 23 days ago • 112
view article Article Highlights from the First ICLR 2025 Watermarking Workshop By hadyelsahar and 4 others • 24 days ago • 11