2 4

Sainbayar Sukhbaatar

sainbar

https://tesatory.github.io/

AI & ML interests

None yet

Recent Activity

authored a paper 1 day ago

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

authored a paper 1 day ago

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

authored a paper 1 day ago

Augmenting Self-attention with Persistent Memory

View all activity

Organizations

None yet

sainbar's activity

authored 7 papers 1 day ago

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Paper • 2407.19594 • Published Jul 28, 2024 • 21

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

Paper • 2410.09918 • Published Oct 13, 2024 • 3

upvoted 3 papers 2 days ago

Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14, 2024 • 21

Multi-Token Attention

Paper • 2504.00927 • Published Apr 1 • 52

Self-Challenging Language Model Agents

Paper • 2506.01716 • Published 4 days ago • 8

authored a paper 2 months ago

Multi-Token Attention

Paper • 2504.00927 • Published Apr 1 • 52

authored a paper 3 months ago

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Paper • 2503.15478 • Published Mar 19 • 11

authored a paper 4 months ago

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18 • 15

authored a paper 6 months ago

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 86

authored a paper 7 months ago

Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published Nov 14, 2024 • 10

authored a paper 8 months ago

Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14, 2024 • 21

upvoted a paper 10 months ago

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Paper • 2407.19594 • Published Jul 28, 2024 • 21

commented a paper 10 months ago

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Paper • 2407.19594 • Published Jul 28, 2024 • 21 •

authored 2 papers 10 months ago

Large Language Model Programs

Paper • 2305.05364 • Published May 9, 2023 • 2

End-To-End Memory Networks

Paper • 1503.08895 • Published Mar 31, 2015