Finetune of Qwen-2.5-7B model on a dump of DTF posts and comments.
Nikita Sushko
chameleon-lizard
AI & ML interests
NLP, Multilingual Models, Multiagent Systems
Recent Activity
upvoted
a
paper
about 10 hours ago
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers
for Long Contexts
upvoted
a
paper
9 days ago
Exploring the Latent Capacity of LLMs for One-Step Text Generation
upvoted
a
paper
11 days ago
Quartet: Native FP4 Training Can Be Optimal for Large Language Models