Spaces:

AhmedHAnwar
/

Gradio_image_code

Runtime error

App Files Files Community

AhmedHAnwar commited on 27 days ago

Commit

6e9b93f

verified ·

1 Parent(s): 6cfced7

Update README.md

Browse files

Files changed (1) hide show

README.md +90 -9

README.md CHANGED Viewed

@@ -1,12 +1,93 @@
 ---
-title: Gradio Image Code
-emoji: 🌖
-colorFrom: pink
-colorTo: yellow
-sdk: gradio
-sdk_version: 5.32.1
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🧠 Qwen + DeepSeek Gradio App
+A Gradio web app that demonstrates:
+- **Image Captioning** using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)
+- **Code Generation** using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
+This app is tested and runs efficiently on **Kaggle notebooks** with **T4 x2 GPU accelerators**.
+> ⚠️ **Note:** Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable.
+---
+## 🚀 Features
+- 🖼️ Vision-Language tab: Upload an image + custom prompt → generate short description
+- 💻 Code Generator tab: Write a prompt → get streaming code output
+- Adjustable decoding parameters: temperature, top-p, max_new_tokens
 ---
+## 🧩 Installation
+```bash
+pip install transformers
+pip install gradio
+pip install transformers_stream_generator optimum auto-gptq
+```
+Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).
 ---
+## 📦 Model Details
+### 1. Qwen-VL-Chat-Int4 (Image-to-Text)
+- Used for concise image descriptions.
+- Streaming output with `TextIteratorStreamer`.
+- Prompt format:
+```
+<|system|>
+You are a helpful assistant that describes images very concisely...
+<|end|>
+<|user|>
+Describe the image...
+<|end|>
+<|assistant|>
+```
+#### 🔧 Prompt Engineering Insight
+- Without `<|assistant|>` tag, the model sometimes overwrites or fails to complete properly.
+- Adding `<|assistant|>` clearly indicates the model’s turn, reducing hallucinations.
+- **Temperature capped to ~1.0** because higher values (e.g., 1.2+) lead to creative but false outputs.
+### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)
+- Generates Python or other code from natural language prompts.
+- Uses chat-based prompting with:
+  - `<think>...</think>` block for reasoning.
+  - Final answer separated to improve clarity.
+#### 🔧 Prompt Engineering Insight
+- Initially used no system prompt → vague reasoning.
+- Adding a system prompt improved guidance.
+- Separating "thinking" and "final answer" boosted relevance.
+- Future improvement: split thinking and answer into **separate UI tabs**.
+## 🖼️ Usage: Image Description Tab
+- Upload an image.
+- Write a natural prompt (e.g., "What is in this picture?")
+- Adjust:
+  - `Temperature`: Higher = more creativity, but limit for stability.
+  - `Top-p`: Controls sampling diversity.
+  - `Max new tokens`: Max length of generated sentence.
+- Click **Generate** → streaming description appears.
+## 💻 Usage: Code Generation Tab
+- Write a programming task (e.g., "Write Python code to reverse a string.")
+- Adjust generation settings as above.
+- Streaming output displays generated code.
+- Stops early if vague prompt → clarify prompt to improve results.
+## 🧠 Future Work
+- Add a **separate tab** for model “thinking” (`<think>...</think>`) versus final code.
+- Optional logging for input-output pairs to track hallucinations or failures.
+- Add Markdown rendering for image descriptions.