Stefan Schweter's picture

Stefan Schweter PRO

stefan-it

·

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models

Recent Activity

commented on a paper about 10 hours ago

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

upvoted a paper about 10 hours ago

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

upvoted a paper about 10 hours ago

Static Word Embeddings for Sentence Semantic Representation

View all activity

Organizations

stefan-it's activity

upvoted 2 papers about 10 hours ago

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Paper • 2506.05209 • Published 1 day ago • 16

Static Word Embeddings for Sentence Semantic Representation

Paper • 2506.04624 • Published 1 day ago • 2

upvoted 2 papers 2 days ago

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Paper • 2506.01732 • Published 4 days ago • 1

XToM: Exploring the Multilingual Theory of Mind for Large Language Models

Paper • 2506.02461 • Published 3 days ago • 1

upvoted an article 3 days ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

By

and 5 others •

4 days ago

• 37

upvoted 3 papers 3 days ago

EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

Paper • 2505.23297 • Published 8 days ago • 1

LLM in the Loop: Creating the PARADEHATE Dataset for Hate Speech Detoxification

Paper • 2506.01484 • Published 4 days ago • 4

Novel Benchmark for NER in the Wastewater and Stormwater Domain

Paper • 2506.01938 • Published 4 days ago • 1

upvoted a collection 4 days ago

Common Pile v0.1

All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated about 11 hours ago • 10

upvoted a collection 10 days ago

ModernGBERT

3 items • Updated 10 days ago • 4

upvoted a paper 10 days ago

ModernGBERT: German-only 1B Encoder Model Trained from Scratch

Paper • 2505.13136 • Published 18 days ago • 21

upvoted 3 papers 11 days ago

On Relation-Specific Neurons in Large Language Models

Paper • 2502.17355 • Published Feb 24 • 9

Understanding Gated Neurons in Transformers from Their Input-Output Functionality

Paper • 2505.17936 • Published 14 days ago • 1

Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes

Paper • 2505.14815 • Published 17 days ago • 1

upvoted a paper 14 days ago

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

Paper • 2505.16538 • Published 15 days ago • 2

upvoted a paper 15 days ago

Tracing Multilingual Factual Knowledge Acquisition in Pretraining

Paper • 2505.14824 • Published 17 days ago • 4

upvoted a paper 16 days ago

Probing BERT for German Compound Semantics

Paper • 2505.14130 • Published 17 days ago • 1

upvoted a collection 17 days ago

Gemma 3n Preview

2 items • Updated 7 days ago • 110

upvoted a paper 23 days ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 168

upvoted a paper 27 days ago

When Bad Data Leads to Good Models

Paper • 2505.04741 • Published 30 days ago • 1