AlphaAdam:Asynchronous Masked Optimization with Dynamic Alpha for Selective Updates Paper • 2501.18094 • Published Jan 30 • 1
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published Feb 24 • 19
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published 19 days ago • 62
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Paper • 2503.05139 • Published Mar 7 • 4
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published 29 days ago • 116
Singular Value Decomposition on Kronecker Adaptation for Large Language Model Paper • 2506.15251 • Published Jun 18 • 1
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Paper • 2507.12142 • Published Jul 16 • 36
TC-Light: Temporally Consistent Relighting for Dynamic Long Videos Paper • 2506.18904 • Published Jun 23 • 10
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2 • 60
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published May 5 • 35
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41