Multimodal Benchmarking IR

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

zhangysk authored a paper about 11 hours ago

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

zhangysk authored a paper about 11 hours ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

zhangysk authored a paper 8 days ago

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

View all activity

zhangysk

authored 2 papers about 11 hours ago

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Paper • 2508.11987 • Published 4 days ago • 1

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published 14 days ago • 62

zhangysk

authored 5 papers 8 days ago

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper • 2507.23726 • Published 20 days ago • 107

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published 16 days ago • 126

zhangysk

authored 5 papers 29 days ago

Multilingual Multimodal Software Developer for Code Generation

Paper • 2507.08719 • Published Jul 11

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Paper • 2505.14552 • Published May 20

First Return, Entropy-Eliciting Explore

Paper • 2507.07017 • Published Jul 9 • 23

A Systematic Analysis of Hybrid Linear Attention

Paper • 2507.06457 • Published Jul 8 • 22

A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 88

zhangysk

authored 4 papers about 1 month ago

SciDA: Scientific Dynamic Assessor of LLMs

Paper • 2506.12909 • Published Jun 15

OAgents: An Empirical Study of Building Effective Agents

Paper • 2506.15741 • Published Jun 17 • 35

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Paper • 2507.06229 • Published Jul 8 • 73

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 41

zhangysk

authored 4 papers 2 months ago

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10 • 4

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Paper • 2505.14640 • Published May 20 • 15

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

Paper • 2505.23922 • Published May 29

P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

Paper • 2505.17104 • Published May 21

AI & ML interests

Recent Activity

Team members 5

MBEIR's activity