metadata

title: Rex-Thinker Demo
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: demo/app.py
pinned: false
license: apache-2.0

Rex-Thinker Demo

This is a demo application for Rex-Thinker-GRPO, a visual reasoning model that combines GroundingDINO for object detection with advanced referring expression comprehension.

Features

Object Detection: Uses GroundingDINO to detect objects based on category names
Referring Expression Comprehension: Identifies specific objects based on detailed descriptions
Interactive Web Interface: Easy-to-use Gradio interface with real-time streaming
Visual Reasoning: Shows the model's thinking process with detailed explanations

How to Use

Upload an Image: Click on "Input Image" to upload your image
Set Object Category: Enter the general category of objects you want to detect (e.g., "person", "car", "dog")
Enter Referring Expression: Provide a detailed description of the specific object you want to identify (e.g., "person wearing red shirt and black hat")
Adjust Visualization Settings: Modify draw width and font size for better visualization
Run the Model: Click "Run with Streaming" to see the results

Examples

The demo includes several pre-loaded examples:

Tomato detection
Helmet identification
Person in vehicle
Text recognition on clothing
Pet detection

Technical Details

Base Model: Rex-Thinker-GRPO-7B
Object Detection: GroundingDINO with SwinT backbone
Framework: Gradio for web interface
Inference: Supports streaming text generation

Model Information

Rex-Thinker-GRPO is a multimodal reasoning model that:

Uses GroundingDINO to propose candidate object locations
Applies visual reasoning to identify specific objects based on referring expressions
Provides detailed explanations of its reasoning process
Outputs precise bounding box coordinates for detected objects

For more information, visit the original repository.