ModernGBERT: German-only 1B Encoder Model Trained from Scratch
Abstract
ModernGBERT and LL\"aMmlein2Vec, new German encoder models, outperform existing models in terms of performance and parameter-efficiency across various NLP tasks.
Despite the prominence of decoder-only language models, encoders remain crucial for resource-constrained applications. We introduce ModernGBERT (134M, 1B), a fully transparent family of German encoder models trained from scratch, incorporating architectural innovations from ModernBERT. To evaluate the practical trade-offs of training encoders from scratch, we also present LL\"aMmlein2Vec (120M, 1B, 7B), a family of encoders derived from German decoder-only models via LLM2Vec. We benchmark all models on natural language understanding, text embedding, and long-context reasoning tasks, enabling a controlled comparison between dedicated encoders and converted decoders. Our results show that ModernGBERT 1B outperforms prior state-of-the-art German encoders as well as encoders adapted via LLM2Vec, with regard to performance and parameter-efficiency. All models, training data, checkpoints and code are publicly available, advancing the German NLP ecosystem with transparent, high-performance encoder models.
Community
Our New and Modern German Encoder Model - Entirely trained from scratch 🚀
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance (2025)
- llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length (2025)
- From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models (2025)
- Leveraging Decoder Architectures for Learned Sparse Retrieval (2025)
- Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation (2025)
- CMLFormer: A Dual Decoder Transformer with Switching Point Learning for Code-Mixed Language Modeling (2025)
- Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper