Announcing the Common Pile and Comma v0.1
•
6
None defined yet.
We are a group of researchers working together to collect and curate openly licensed and public domain data for training large language models. So far, we have released:
If you're interested in contributing, please open an issue on GitHub!