llm course @ HSE and vk llm
A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness
Daniil Tsesarev
tsessk
AI & ML interests
transformers)
Organizations
None yet
models
14

tsessk/SmolLM2-FT-Summarization-Aligned
Text Generation
•
2B
•
Updated
•
3

tsessk/SmolLM2-FT-Summarization
2B
•
Updated
•
3

tsessk/smollm-sft-xsum
0.1B
•
Updated
•
4

tsessk/Qwen2-0.5B-TLDR
Updated

tsessk/qwen2-0.5b-tldr-lora
Updated

tsessk/llm-course-hw2-dpo
Text Generation
•
0.1B
•
Updated
•
4

tsessk/llm-course-hw2-reward-model
Text Classification
•
0.1B
•
Updated
•
2

tsessk/llm-course-hw2-ppo
Text Generation
•
0.1B
•
Updated
•
4

tsessk/content
Text Classification
•
0.1B
•
Updated
•
2

tsessk/llm-course-hw1
0.1B
•
Updated
•
2
datasets
6
tsessk/yetanother_tldr
Viewer
•
Updated
•
130k
•
3
tsessk/tldr-17-truncated-tokenized
Viewer
•
Updated
•
130k
•
2
tsessk/tldr-17-t-512
Viewer
•
Updated
•
3.09M
•
3
tsessk/tldr-17-ChatML-tokenized-truncated
Viewer
•
Updated
•
130k
•
3
tsessk/tldr-17-ChatML
Viewer
•
Updated
•
3.85M
•
16
•
1
tsessk/tldr-17-chat
Viewer
•
Updated
•
3.85M
•
7