Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Paper • 2505.09601 • Published 23 days ago • 5
Efficacy of Language Model Self-Play in Non-Zero-Sum Games Paper • 2406.18872 • Published Jun 27, 2024
Understanding Game-Playing Agents with Natural Language Annotations Paper • 2204.07531 • Published Apr 15, 2022
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection Paper • 2503.12271 • Published Mar 15 • 9
EmbedLLM: Learning Compact Representations of Large Language Models Paper • 2410.02223 • Published Oct 3, 2024 • 3
PokerBench: Training Large Language Models to become Professional Poker Players Paper • 2501.08328 • Published Jan 14 • 18
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published Dec 2, 2024 • 13
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing Paper • 2410.12189 • Published Oct 16, 2024 • 1
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences Paper • 2404.12272 • Published Apr 18, 2024 • 1
Deep Multimodal Fusion for Surgical Feedback Classification Paper • 2312.03231 • Published Dec 6, 2023
Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization Paper • 2403.14973 • Published Mar 22, 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Paper • 2403.19822 • Published Mar 28, 2024
ALOHa: A New Measure for Hallucination in Captioning Models Paper • 2404.02904 • Published Apr 3, 2024
Virtual Personas for Language Models via an Anthology of Backstories Paper • 2407.06576 • Published Jul 9, 2024
Visual Haystacks: Answering Harder Questions About Sets of Images Paper • 2407.13766 • Published Jul 18, 2024 • 2
view post Post 592 🚨 Launching The Visual Haystacks (VHs) Benchmark: the first "visual-centric" Needle-In-A-Haystack (NIAH) benchmark to assess LMMs' capability in long-context visual retrieval and reasoning. Check it out! tsunghanwu/visual_haystackshttps://visual-haystacks.github.io/https://arxiv.org/abs/2407.13766https://github.com/visual-haystacks/vhs_benchmark 🔥 1 1 + Reply
The Wisdom of Hindsight Makes Language Models Better Instruction Followers Paper • 2302.05206 • Published Feb 10, 2023
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published Apr 29, 2024 • 15