1 95 20

NK

NeuralKartMocker

AI & ML interests

Gen AI, GAN, LLMs, NLP, Gen Music

Recent Activity

upvoted a paper 29 days ago

Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

upvoted a paper 29 days ago

Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models

upvoted a paper 29 days ago

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

View all activity

Organizations

NeuralKartMocker's activity

upvoted 9 papers 29 days ago

Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

Paper • 2505.04528 • Published about 1 month ago • 11

Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models

Paper • 2505.03821 • Published May 3 • 24

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published about 1 month ago • 35

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 82

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 169

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5 • 74

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 20

Augmenting CLIP with Improved Visio-Linguistic Reasoning

Paper • 2307.09233 • Published Jul 18, 2023 • 9

Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining

Paper • 2308.03235 • Published Aug 7, 2023 • 2

upvoted 11 papers about 1 month ago

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published May 6 • 92

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Paper • 2505.03005 • Published May 5 • 31

Multi-Agent System for Comprehensive Soccer Understanding

Paper • 2505.03735 • Published May 6 • 22

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Paper • 2505.00703 • Published May 1 • 42

A Robust Deep Networks based Multi-Object MultiCamera Tracking System for City Scale Traffic

Paper • 2505.00534 • Published May 1 • 2

Spatial Speech Translation: Translating Across Space With Binaural Hearables

Paper • 2504.18715 • Published Apr 25 • 7

LLMs for Engineering: Teaching Models to Design High Powered Rockets

Paper • 2504.19394 • Published Apr 27 • 13

AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

Paper • 2504.21659 • Published Apr 30 • 12