Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks Paper • 2505.20047 • Published May 26 • 3
This Time is Different: An Observability Perspective on Time Series Foundation Models Paper • 2505.14766 • Published May 20 • 39
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Paper • 2502.01618 • Published Feb 3 • 10
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 416
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 42
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published Dec 24, 2024 • 22
Toto: Time Series Optimized Transformer for Observability Paper • 2407.07874 • Published Jul 10, 2024 • 35
The Unreasonable Ineffectiveness of the Deeper Layers Paper • 2403.17887 • Published Mar 26, 2024 • 82
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 624
Common 7B Language Models Already Possess Strong Math Capabilities Paper • 2403.04706 • Published Mar 7, 2024 • 21
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 189
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 83
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models Paper • 2401.00788 • Published Jan 1, 2024 • 24
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 31
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 51
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 258