Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training Paper • 2506.01732 • Published Jun 2 • 3
NLP for Economics 1.2 Collection NLP tools for sentiment analysis and relevance detection • 4 items • Updated Mar 25 • 1
Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language Paper • 2103.08052 • Published Mar 14, 2021 • 1