Rex-Thinker-Demo / README_HF.md
Mountchicken's picture
Upload 53 files
e0483c8 verified

A newer version of the Gradio SDK is available: 5.33.2

Upgrade
metadata
title: Rex-Thinker Demo
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: demo/app.py
pinned: false
license: apache-2.0

Rex-Thinker Demo

This is a demo application for Rex-Thinker-GRPO, a visual reasoning model that combines GroundingDINO for object detection with advanced referring expression comprehension.

Features

  • Object Detection: Uses GroundingDINO to detect objects based on category names
  • Referring Expression Comprehension: Identifies specific objects based on detailed descriptions
  • Interactive Web Interface: Easy-to-use Gradio interface with real-time streaming
  • Visual Reasoning: Shows the model's thinking process with detailed explanations

How to Use

  1. Upload an Image: Click on "Input Image" to upload your image
  2. Set Object Category: Enter the general category of objects you want to detect (e.g., "person", "car", "dog")
  3. Enter Referring Expression: Provide a detailed description of the specific object you want to identify (e.g., "person wearing red shirt and black hat")
  4. Adjust Visualization Settings: Modify draw width and font size for better visualization
  5. Run the Model: Click "Run with Streaming" to see the results

Examples

The demo includes several pre-loaded examples:

  • Tomato detection
  • Helmet identification
  • Person in vehicle
  • Text recognition on clothing
  • Pet detection

Technical Details

  • Base Model: Rex-Thinker-GRPO-7B
  • Object Detection: GroundingDINO with SwinT backbone
  • Framework: Gradio for web interface
  • Inference: Supports streaming text generation

Model Information

Rex-Thinker-GRPO is a multimodal reasoning model that:

  1. Uses GroundingDINO to propose candidate object locations
  2. Applies visual reasoning to identify specific objects based on referring expressions
  3. Provides detailed explanations of its reasoning process
  4. Outputs precise bounding box coordinates for detected objects

For more information, visit the original repository.