-
3.08k
The Ultra-Scale Playbook
🌌The ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
microsoft/Magma-8B
Image-Text-to-Text • 9B • Updated • 12.1k • 405
Collections
Discover the best community collections!
Collections including paper arxiv:2405.07863
-
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 28 -
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Paper • 2409.01392 • Published • 9 -
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
Paper • 2409.17433 • Published • 9 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 35
-
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification • 8B • Updated • 11.6k • 179 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 1.66k • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 2.35k • 60 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72
-
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 40 -
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 15 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 -
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 15
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 17 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 11 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
RLHFlow/prompt-collection-v0.1
Viewer • Updated • 179k • 37 • 9 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 1.66k • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 2.35k • 60 -
RLHFlow/SFT-OpenHermes-2.5-Standard
Viewer • Updated • 1M • 7 • 3
-
3.08k
The Ultra-Scale Playbook
🌌The ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
microsoft/Magma-8B
Image-Text-to-Text • 9B • Updated • 12.1k • 405
-
Benchmarking Agentic Workflow Generation
Paper • 2410.07869 • Published • 28 -
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Paper • 2409.01392 • Published • 9 -
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
Paper • 2409.17433 • Published • 9 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 35
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 17 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 11 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification • 8B • Updated • 11.6k • 179 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 1.66k • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 2.35k • 60 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
-
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 40 -
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 15 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 -
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 15
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
RLHFlow/prompt-collection-v0.1
Viewer • Updated • 179k • 37 • 9 -
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation • 8B • Updated • 1.66k • • 38 -
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification • 8B • Updated • 2.35k • 60 -
RLHFlow/SFT-OpenHermes-2.5-Standard
Viewer • Updated • 1M • 7 • 3