-
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Paper • 2312.13010 • Published • 6 -
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Paper • 2409.16299 • Published • 12 -
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15
Collections
Discover the best community collections!
Collections including paper arxiv:2502.05664
-
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
Paper • 2502.01584 • Published • 10 -
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper • 2502.13347 • Published • 29 -
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Paper • 2504.16427 • Published • 18
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 38 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 52 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 49
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 18 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 66 -
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 35 -
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Paper • 2408.07060 • Published • 43
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84 -
meta-llama/CodeLlama-7b-Instruct-hf
Text Generation • 7B • Updated • 3.7k • 52 -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation • 33B • Updated • 73.6k • • 1.92k -
huihui-ai/Qwen2.5-Coder-32B-Instruct-abliterated
Text Generation • 33B • Updated • 920 • 32
-
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
Paper • 2501.08331 • Published • 20 -
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 61 -
GameFactory: Creating New Games with Generative Interactive Videos
Paper • 2501.08325 • Published • 68 -
DiffuEraser: A Diffusion Model for Video Inpainting
Paper • 2501.10018 • Published • 15
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 42 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 52 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 88 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70
-
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Paper • 2312.13010 • Published • 6 -
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Paper • 2409.16299 • Published • 12 -
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15
-
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models
Paper • 2502.01584 • Published • 10 -
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Paper • 2502.05664 • Published • 24 -
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper • 2502.13347 • Published • 29 -
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Paper • 2504.16427 • Published • 18
-
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
Paper • 2501.08331 • Published • 20 -
MangaNinja: Line Art Colorization with Precise Reference Following
Paper • 2501.08332 • Published • 61 -
GameFactory: Creating New Games with Generative Interactive Videos
Paper • 2501.08325 • Published • 68 -
DiffuEraser: A Diffusion Model for Video Inpainting
Paper • 2501.10018 • Published • 15
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 38 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 52 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 49
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 42 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 18 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 66 -
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 35 -
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Paper • 2408.07060 • Published • 43
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 52 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 88 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84 -
meta-llama/CodeLlama-7b-Instruct-hf
Text Generation • 7B • Updated • 3.7k • 52 -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation • 33B • Updated • 73.6k • • 1.92k -
huihui-ai/Qwen2.5-Coder-32B-Instruct-abliterated
Text Generation • 33B • Updated • 920 • 32
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70