The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper β’ 2506.05209 β’ Published 1 day ago β’ 23
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper β’ 2506.05209 β’ Published 1 day ago β’ 23
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text β’ 4 items β’ Updated 1 day ago β’ 13
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others β’ Dec 23, 2024 β’ 20
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B Text Generation β’ Updated 2 days ago β’ 2.14k β’ 116
π©βπ» OlympicCoder Collection Reasoning datasets and models for competitive coding β’ 4 items β’ Updated 25 days ago β’ 17
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated 19 days ago β’ 148
view article Article Interactive Tools for machine learning, deep learning, and math By Suzana β’ 12 days ago β’ 40
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others β’ 23 days ago β’ 112