Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.00663

A New Federated Learning Framework Against Gradient Inversion Attacks

Paper • 2412.07187 • Published Dec 10, 2024 • 3
Selective Aggregation for Low-Rank Adaptation in Federated Learning

Paper • 2410.01463 • Published Oct 2, 2024 • 19
Exploring Federated Pruning for Large Language Models

Paper • 2505.13547 • Published 18 days ago • 13
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Paper • 2504.13173 • Published Apr 17 • 19

LLM and Reasoning Papers

Papers dump of LLM Reasoning domain

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Paper • 2407.14507 • Published Jul 19, 2024 • 47
Large Language Models are Zero-Shot Reasoners

Paper • 2205.11916 • Published May 24, 2022 • 1
Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 10
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper • 2201.11903 • Published Jan 28, 2022 • 13

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 73
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Paper • 2309.17207 • Published Sep 29, 2023
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 24
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 159

LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 24
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 122
You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 38

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 24

New Directions for AI

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 24
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 142
An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 4

foundational models

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 21
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 24
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9 • 55
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 159

LLM Reasoning Papers

improve reasoning capabilities of LLMs

Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 10
LLM Critics Help Catch LLM Bugs

Paper • 2407.00215 • Published Jun 28, 2024
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31, 2024 • 13
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

advancing research

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 8
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 55
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 21
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 34

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

Paper • 2407.04363 • Published Jul 5, 2024 • 34
LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 24

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs