AhmedHAnwar commited on
Commit
6e9b93f
·
verified ·
1 Parent(s): 6cfced7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -9
README.md CHANGED
@@ -1,12 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Gradio Image Code
3
- emoji: 🌖
4
- colorFrom: pink
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 5.32.1
8
- app_file: app.py
9
- pinned: false
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Qwen + DeepSeek Gradio App
2
+
3
+ A Gradio web app that demonstrates:
4
+ - **Image Captioning** using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)
5
+ - **Code Generation** using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
6
+
7
+ This app is tested and runs efficiently on **Kaggle notebooks** with **T4 x2 GPU accelerators**.
8
+
9
+ > ⚠️ **Note:** Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable.
10
+
11
+ ---
12
+ ## 🚀 Features
13
+
14
+ - 🖼️ Vision-Language tab: Upload an image + custom prompt → generate short description
15
+ - 💻 Code Generator tab: Write a prompt → get streaming code output
16
+ - Adjustable decoding parameters: temperature, top-p, max_new_tokens
17
+
18
  ---
19
+
20
+ ## 🧩 Installation
21
+ ```bash
22
+ pip install transformers
23
+ pip install gradio
24
+ pip install transformers_stream_generator optimum auto-gptq
25
+ ```
26
+
27
+ Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).
28
+
29
  ---
30
 
31
+ ## 📦 Model Details
32
+
33
+ ### 1. Qwen-VL-Chat-Int4 (Image-to-Text)
34
+
35
+ - Used for concise image descriptions.
36
+ - Streaming output with `TextIteratorStreamer`.
37
+ - Prompt format:
38
+
39
+ ```
40
+ <|system|>
41
+ You are a helpful assistant that describes images very concisely...
42
+ <|end|>
43
+ <|user|>
44
+ Describe the image...
45
+ <|end|>
46
+ <|assistant|>
47
+ ```
48
+
49
+ #### 🔧 Prompt Engineering Insight
50
+
51
+ - Without `<|assistant|>` tag, the model sometimes overwrites or fails to complete properly.
52
+ - Adding `<|assistant|>` clearly indicates the model’s turn, reducing hallucinations.
53
+ - **Temperature capped to ~1.0** because higher values (e.g., 1.2+) lead to creative but false outputs.
54
+
55
+ ### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)
56
+
57
+ - Generates Python or other code from natural language prompts.
58
+ - Uses chat-based prompting with:
59
+ - `<think>...</think>` block for reasoning.
60
+ - Final answer separated to improve clarity.
61
+
62
+ #### 🔧 Prompt Engineering Insight
63
+
64
+ - Initially used no system prompt → vague reasoning.
65
+ - Adding a system prompt improved guidance.
66
+ - Separating "thinking" and "final answer" boosted relevance.
67
+ - Future improvement: split thinking and answer into **separate UI tabs**.
68
+
69
+
70
+ ## 🖼️ Usage: Image Description Tab
71
+
72
+ - Upload an image.
73
+ - Write a natural prompt (e.g., "What is in this picture?")
74
+ - Adjust:
75
+ - `Temperature`: Higher = more creativity, but limit for stability.
76
+ - `Top-p`: Controls sampling diversity.
77
+ - `Max new tokens`: Max length of generated sentence.
78
+ - Click **Generate** → streaming description appears.
79
+
80
+
81
+ ## 💻 Usage: Code Generation Tab
82
+
83
+ - Write a programming task (e.g., "Write Python code to reverse a string.")
84
+ - Adjust generation settings as above.
85
+ - Streaming output displays generated code.
86
+ - Stops early if vague prompt → clarify prompt to improve results.
87
+
88
+
89
+ ## 🧠 Future Work
90
+
91
+ - Add a **separate tab** for model “thinking” (`<think>...</think>`) versus final code.
92
+ - Optional logging for input-output pairs to track hallucinations or failures.
93
+ - Add Markdown rendering for image descriptions.