FLAME: Factuality-Aware Alignment for Large Language Models
Paper
•
2405.01525
•
Published
•
29
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
•
2405.14333
•
Published
•
41
Transformers Can Do Arithmetic with the Right Embeddings
Paper
•
2405.17399
•
Published
•
54
EasyAnimate: A High-Performance Long Video Generation Method based on
Transformer Architecture
Paper
•
2405.18991
•
Published
•
12
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
•
2406.06608
•
Published
•
64
Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation
Paper
•
2406.06525
•
Published
•
71
Transformers meet Neural Algorithmic Reasoners
Paper
•
2406.09308
•
Published
•
45
Self-MoE: Towards Compositional Large Language Models with
Self-Specialized Experts
Paper
•
2406.12034
•
Published
•
16
A Closer Look into Mixture-of-Experts in Large Language Models
Paper
•
2406.18219
•
Published
•
16
DiffusionPDE: Generative PDE-Solving Under Partial Observation
Paper
•
2406.17763
•
Published
•
25
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
Paper
•
2406.18790
•
Published
•
35
Controlling Space and Time with Diffusion Models
Paper
•
2407.07860
•
Published
•
17
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in
Large Language Models Using Only Attention Maps
Paper
•
2407.07071
•
Published
•
12
Open-FinLLMs: Open Multimodal Large Language Models for Financial
Applications
Paper
•
2408.11878
•
Published
•
61
Leveraging Open Knowledge for Advancing Task Expertise in Large Language
Models
Paper
•
2408.15915
•
Published
•
20
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with
100+ NLP Researchers
Paper
•
2409.04109
•
Published
•
49
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
139
Scaling Smart: Accelerating Large Language Model Pre-training with Small
Model Initialization
Paper
•
2409.12903
•
Published
•
23
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of
Experts
Paper
•
2409.16040
•
Published
•
14
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Paper
•
2409.20566
•
Published
•
57
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
•
2410.10814
•
Published
•
52
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
Quantization
Paper
•
2411.02355
•
Published
•
52
POINTS1.5: Building a Vision-Language Model towards Real World
Applications
Paper
•
2412.08443
•
Published
•
39
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
•
2412.08737
•
Published
•
54
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
•
2412.08635
•
Published
•
46
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
146
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within Generation
Paper
•
2412.11919
•
Published
•
37
Smaller Language Models Are Better Instruction Evolvers
Paper
•
2412.11231
•
Published
•
29
Learned Compression for Compressed Learning
Paper
•
2412.09405
•
Published
•
13
Paper
•
2412.13501
•
Published
•
29
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
89
YuLan-Mini: An Open Data-efficient Language Model
Paper
•
2412.17743
•
Published
•
67
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive
Survey
Paper
•
2412.18619
•
Published
•
59
Task Preference Optimization: Improving Multimodal Large Language Models
with Vision Task Alignment
Paper
•
2412.19326
•
Published
•
18
LUSIFER: Language Universal Space Integration for Enhanced Multilingual
Embeddings with Large Language Models
Paper
•
2501.00874
•
Published
•
13
Personalized Graph-Based Retrieval for Large Language Models
Paper
•
2501.02157
•
Published
•
32
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
•
2501.03262
•
Published
•
99
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video
Generation Control
Paper
•
2501.03847
•
Published
•
23
LLM4SR: A Survey on Large Language Models for Scientific Research
Paper
•
2501.04306
•
Published
•
37
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
101
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper
•
2501.06282
•
Published
•
52
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
55
ChemAgent: Self-updating Library in Large Language Models Improves
Chemical Reasoning
Paper
•
2501.06590
•
Published
•
11
deepseek-ai/DeepSeek-V3
Text Generation
•
Updated
•
2.29M
•
•
3.86k
Learnings from Scaling Visual Tokenizers for Reconstruction and
Generation
Paper
•
2501.09755
•
Published
•
37
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
Paper
•
2501.08617
•
Published
•
10
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
41
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities
Paper
•
2501.08983
•
Published
•
20
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
115
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial
Network for High-Fidelity Speech Super-Resolution
Paper
•
2501.10045
•
Published
•
9
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
•
2501.12202
•
Published
•
47
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video
Understanding
Paper
•
2501.13106
•
Published
•
91
Autonomy-of-Experts Models
Paper
•
2501.13074
•
Published
•
45
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
•
2501.17703
•
Published
•
59
Optimizing Large Language Model Training Using FP4 Quantization
Paper
•
2501.17116
•
Published
•
38
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
•
2501.18511
•
Published
•
20
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
•
2501.18427
•
Published
•
20
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
30
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
•
2501.19324
•
Published
•
40
The Curse of Depth in Large Language Models
Paper
•
2502.05795
•
Published
•
40
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
153
ARR: Question Answering with Large Language Models via Analyzing,
Retrieving, and Reasoning
Paper
•
2502.04689
•
Published
•
7
Generating Symbolic World Models via Test-time Scaling of Large Language
Models
Paper
•
2502.04728
•
Published
•
19
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents
Paper
•
2502.05957
•
Published
•
16
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
142
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
•
2502.07617
•
Published
•
29
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
•
2502.07374
•
Published
•
41
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper
•
2502.07445
•
Published
•
11
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper
•
2502.07737
•
Published
•
9
CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging
Paper
•
2502.05664
•
Published
•
23
LLM Pretraining with Continuous Concepts
Paper
•
2502.08524
•
Published
•
29
Retrieval-augmented Large Language Models for Financial Time Series
Forecasting
Paper
•
2502.05878
•
Published
•
41
Hephaestus: Improving Fundamental Agent Capabilities of Large Language
Models through Continual Pre-Training
Paper
•
2502.06589
•
Published
•
18
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
•
2502.06060
•
Published
•
37
SelfCite: Self-Supervised Alignment for Context Attribution in Large
Language Models
Paper
•
2502.09604
•
Published
•
36
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM
Multi-Agent Systems
Paper
•
2502.11098
•
Published
•
13
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
119
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising
Trajectory Sharpening
Paper
•
2502.12146
•
Published
•
16
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
•
2502.10458
•
Published
•
36
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper
•
2502.11775
•
Published
•
8
Intuitive physics understanding emerges from self-supervised pretraining
on natural videos
Paper
•
2502.11831
•
Published
•
19
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning
for Financial Trading
Paper
•
2502.11433
•
Published
•
36
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o
Under Data Scarsity
Paper
•
2502.11901
•
Published
•
6
LongPO: Long Context Self-Evolution of Large Language Models through
Short-to-Long Preference Optimization
Paper
•
2502.13922
•
Published
•
28
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule
Generation
Paper
•
2502.12638
•
Published
•
8
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song
Generation
Paper
•
2502.13128
•
Published
•
42
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper
•
2502.13347
•
Published
•
28
Train Small, Infer Large: Memory-Efficient LoRA Training for Large
Language Models
Paper
•
2502.13533
•
Published
•
11
Is That Your Final Answer? Test-Time Scaling Improves Selective Question
Answering
Paper
•
2502.13962
•
Published
•
29
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question
Answering?
Paper
•
2502.13233
•
Published
•
15
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement
Learning
Paper
•
2502.12853
•
Published
•
29
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper
•
2502.14502
•
Published
•
91
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
•
2502.14768
•
Published
•
48
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers
Paper
•
2502.14377
•
Published
•
12
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal
Models via Human Feedback
Paper
•
2502.15027
•
Published
•
7
SurveyX: Academic Survey Automation via Large Language Models
Paper
•
2502.14776
•
Published
•
100
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Paper
•
2502.16894
•
Published
•
31
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Paper
•
2502.17157
•
Published
•
53
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper
•
2502.18418
•
Published
•
27
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
Models
Paper
•
2502.16614
•
Published
•
27
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language
Models via Mixture-of-LoRAs
Paper
•
2503.01743
•
Published
•
87
Qwen/QwQ-32B
Text Generation
•
Updated
•
309k
•
•
2.77k
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
•
2503.05179
•
Published
•
46
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
•
2503.05379
•
Published
•
37
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
•
2503.05592
•
Published
•
27
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Paper
•
2503.04504
•
Published
•
3
Effective and Efficient Masked Image Generation Models
Paper
•
2503.07197
•
Published
•
11
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos
via Diffusion Models
Paper
•
2503.05638
•
Published
•
19
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Paper
•
2503.02199
•
Published
•
8
Self-Taught Self-Correction for Small Language Models
Paper
•
2503.08681
•
Published
•
15
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
•
2503.10639
•
Published
•
50
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
164
Autoregressive Image Generation with Randomized Parallel Decoding
Paper
•
2503.10568
•
Published
•
8
Silent Branding Attack: Trigger-free Data Poisoning Attack on
Text-to-Image Diffusion Models
Paper
•
2503.09669
•
Published
•
36
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large
Language Models
Paper
•
2503.10437
•
Published
•
32
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
•
2503.04808
•
Published
•
18
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
•
2503.12937
•
Published
•
29
API Agents vs. GUI Agents: Divergence and Convergence
Paper
•
2503.11069
•
Published
•
37
Being-0: A Humanoid Robotic Agent with Vision-Language Models and
Modular Skills
Paper
•
2503.12533
•
Published
•
66
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
128
Personalize Anything for Free with Diffusion Transformer
Paper
•
2503.12590
•
Published
•
44
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement
Learning
Paper
•
2503.15265
•
Published
•
47
Fin-R1: A Large Language Model for Financial Reasoning through
Reinforcement Learning
Paper
•
2503.16252
•
Published
•
27
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
•
2503.16419
•
Published
•
74
Why Do Multi-Agent LLM Systems Fail?
Paper
•
2503.13657
•
Published
•
47
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
•
2503.16219
•
Published
•
51
Expert Race: A Flexible Routing Strategy for Scaling Diffusion
Transformer with Mixture of Experts
Paper
•
2503.16057
•
Published
•
14
ELTEX: A Framework for Domain-Driven Synthetic Data Generation
Paper
•
2503.15055
•
Published
•
6
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language
Models
Paper
•
2503.16257
•
Published
•
24
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
•
2503.17352
•
Published
•
23
MAPS: A Multi-Agent Framework Based on Big Seven Personality and
Socratic Guidance for Multimodal Scientific Problem Solving
Paper
•
2503.16905
•
Published
•
54
Modifying Large Language Model Post-Training for Diverse Creative
Writing
Paper
•
2503.17126
•
Published
•
36
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
•
2503.18878
•
Published
•
118
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper
•
2503.20201
•
Published
•
47
ReSearch: Learning to Reason with Search for LLMs via Reinforcement
Learning
Paper
•
2503.19470
•
Published
•
18
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
Learning
Paper
•
2503.21620
•
Published
•
62
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data
Synthesis
Paper
•
2503.21749
•
Published
•
26
Qwen2.5-Omni Technical Report
Paper
•
2503.20215
•
Published
•
156
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Paper
•
2503.22194
•
Published
•
24
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
•
2503.24235
•
Published
•
54
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
62
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
•
2503.24376
•
Published
•
38
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
LLMs on Academic Resources
Paper
•
2504.00595
•
Published
•
36
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
•
2504.00824
•
Published
•
43
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via
Iterative Instruction Tuning and Reinforcement Learning
Paper
•
2504.02949
•
Published
•
21
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
Agent-Human Interplay
Paper
•
2504.03601
•
Published
•
16
Tuning-Free Image Editing with Fidelity and Editability via Unified
Latent Diffusion Model
Paper
•
2504.05594
•
Published
•
12
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
•
2504.06958
•
Published
•
11
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric
Capabilities in Multimodal Large Language Models
Paper
•
2504.06148
•
Published
•
13
DDT: Decoupled Diffusion Transformer
Paper
•
2504.05741
•
Published
•
75
A Unified Agentic Framework for Evaluating Conditional Image Generation
Paper
•
2504.07046
•
Published
•
30
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned
Guidance
Paper
•
2504.06232
•
Published
•
14
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
•
2504.07128
•
Published
•
85
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
•
2504.10479
•
Published
•
268
Have we unified image generation and understanding yet? An empirical
study of GPT-4o's image generation ability
Paper
•
2504.08003
•
Published
•
49
CoRAG: Collaborative Retrieval-Augmented Generation
Paper
•
2504.01883
•
Published
•
10
How new data permeates LLM knowledge and how to dilute it
Paper
•
2504.09522
•
Published
•
8
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
•
2504.08600
•
Published
•
29
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper
•
2504.05303
•
Published
•
5
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on
Transformer Encoder Models Performance
Paper
•
2504.08716
•
Published
•
10
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
•
2504.08672
•
Published
•
55
Efficient Generative Model Training via Embedded Representation Warmup
Paper
•
2504.10188
•
Published
•
12
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
•
2504.09643
•
Published
•
34
Vidi: Large Multimodal Models for Video Understanding and Editing
Paper
•
2504.15681
•
Published
•
15
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
•
2504.16078
•
Published
•
20
CheXWorld: Exploring Image World Modeling for Radiograph Representation
Learning
Paper
•
2504.13820
•
Published
•
17
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World
Model-based LLM Agents
Paper
•
2504.15785
•
Published
•
19
Can Large Language Models Help Multimodal Language Analysis? MMLA: A
Comprehensive Benchmark
Paper
•
2504.16427
•
Published
•
17
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
•
2504.21776
•
Published
•
56
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level
and Token-level CoT
Paper
•
2505.00703
•
Published
•
42
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
•
2505.00234
•
Published
•
26
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG
Evaluation Prompts
Paper
•
2504.21117
•
Published
•
25
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
Paper
•
2505.02625
•
Published
•
22
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop
Reasoning with Transformers
Paper
•
2504.20752
•
Published
•
91
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement
Fine-Tuning
Paper
•
2505.03318
•
Published
•
92
Improving Editability in Image Generation with Layer-wise Memory
Paper
•
2505.01079
•
Published
•
28
Think on your Feet: Adaptive Thinking via Reinforcement Learning for
Social Agents
Paper
•
2505.02156
•
Published
•
17
An Empirical Study of Qwen3 Quantization
Paper
•
2505.02214
•
Published
•
23
Unified Multimodal Understanding and Generation Models: Advances,
Challenges, and Opportunities
Paper
•
2505.02567
•
Published
•
74
A Survey on Inference Engines for Large Language Models: Perspectives on
Optimization and Efficiency
Paper
•
2505.01658
•
Published
•
35
Knowledge Augmented Complex Problem Solving with Large Language Models:
A Survey
Paper
•
2505.03418
•
Published
•
8
Multi-Agent System for Comprehensive Soccer Understanding
Paper
•
2505.03735
•
Published
•
22
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with
Auto-Regressive Transformer
Paper
•
2505.04622
•
Published
•
26
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in
Large Language Models
Paper
•
2505.02847
•
Published
•
27
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
Reasoners With Verifiers
Paper
•
2505.04842
•
Published
•
12
Sailing AI by the Stars: A Survey of Learning from Rewards in
Post-Training and Test-Time Scaling of Large Language Models
Paper
•
2505.02686
•
Published
•
15
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
•
2505.07608
•
Published
•
78
StreamBridge: Turning Your Offline Video Large Language Model into a
Proactive Streaming Assistant
Paper
•
2505.05467
•
Published
•
13
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
•
2505.09568
•
Published
•
90
Exploring the Deep Fusion of Large Language Models and Diffusion
Transformers for Text-to-Image Synthesis
Paper
•
2505.10046
•
Published
•
9
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
•
2505.10554
•
Published
•
118
OpenThinkIMG: Learning to Think with Images via Visual Tool
Reinforcement Learning
Paper
•
2505.08617
•
Published
•
41
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via
Reinforcement Learning
Paper
•
2505.11896
•
Published
•
57
Chain-of-Model Learning for Language Model
Paper
•
2505.11820
•
Published
•
116
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Paper
•
2505.14669
•
Published
•
73
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
•
2505.14146
•
Published
•
16
Synthetic Data RL: Task Definition Is All You Need
Paper
•
2505.17063
•
Published
•
10
ComposeAnything: Composite Object Priors for Text-to-Image Generation
Paper
•
2505.24086
•
Published
•
4