-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
Abhranil Chandra
abhranil14
AI & ML interests
Reinforcement Learning, Deep Unsupervised Learning, NLP and Bayesian Deep Learning
Recent Activity
updated
a model
3 days ago
abhranil14/gemma2_2B_FF_gemini_flash_gold_7114_batch256_lr10e-6_warmup0.1_max_tokens_2048
published
a model
3 days ago
abhranil14/gemma2_2B_FF_gemini_flash_gold_7114_batch256_lr10e-6_warmup0.1_max_tokens_2048
updated
a model
3 days ago
abhranil14/llama_FF_gemini_flash_gold_7114_batch256_lr10e-6_warmup0.1_max_tokens_1024