ashish-soni08 commited on
Commit
bec574b
Β·
verified Β·
1 Parent(s): 00dc2e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -17
README.md CHANGED
@@ -7,31 +7,112 @@ sdk: gradio
7
  sdk_version: 5.31.0
8
  app_file: app.py
9
  pinned: false
10
- license: afl-3.0
11
  ---
12
 
13
- # Image Captioning App
14
 
15
- This application provides a simple interface to generate captions for images using a pre-trained model from Hugging Face's Transformers library.
16
 
17
- ## Features
18
 
19
- - **Image Captioning**: Automatically generate descriptive captions for uploaded images.
20
- - **User-Friendly Interface**: Built using Gradio for an easy-to-use web interface.
21
 
22
- ## Model
23
 
24
- - **Model Used**: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
25
- - **Framework**: Hugging Face Transformers
26
 
27
- ## Software Packages
 
 
 
28
 
29
- - **Gradio**: Used to create the web interface.
30
- - **Transformers**: Used for model inference.
31
- - **Spaces**: Utilized for GPU acceleration during model execution.
32
 
33
- ## How to Use
 
 
 
34
 
35
- 1. Upload an image using the "Upload image" button.
36
- 2. The app will automatically generate and display a caption for the image.
37
- 3. The generated caption will appear in the textbox labeled "Caption".
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  sdk_version: 5.31.0
8
  app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
  ---
12
 
13
+ # Image Captioning App πŸ–ΌοΈπŸ“
14
 
15
+ A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces.
16
 
17
+ ![Demo Screenshot](image-captioning-logo.png)
18
 
19
+ ## πŸš€ Live Demo
 
20
 
21
+ Try the app: [Image-Captioning](https://huggingface.co/spaces/ashish-soni08/Image-Captioning)
22
 
23
+ ## ✨ Features
 
24
 
25
+ - **Automatic Caption Generation**: Upload any image and get descriptive captions instantly
26
+ - **Visual Understanding**: AI model analyzes objects, scenes, and activities in images
27
+ - **Clean Interface**: Intuitive web UI built with Gradio for seamless image uploads
28
+ - **Responsive Design**: Works on desktop and mobile devices
29
 
30
+ ## πŸ› οΈ Technology Stack
 
 
31
 
32
+ - **Backend**: Python, Hugging Face Transformers
33
+ - **Frontend**: Gradio
34
+ - **Model**: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
35
+ - **Deployment**: Hugging Face Spaces
36
 
37
+ ## πŸƒβ€β™‚οΈ Quick Start
38
+
39
+ ### Prerequisites
40
+
41
+ ```bash
42
+ Python 3.8+
43
+ pip
44
+ ```
45
+
46
+ ### Installation
47
+
48
+ 1. Clone the repository:
49
+ ```bash
50
+ git clone https://github.com/Ashish-Soni08/image-captioning-app.git
51
+ cd image-captioning-app
52
+ ```
53
+
54
+ 2. Install dependencies:
55
+ ```bash
56
+ pip install -r requirements.txt
57
+ ```
58
+
59
+ 3. Run the application:
60
+ ```bash
61
+ python app.py
62
+ ```
63
+
64
+ 4. Open your browser and navigate to `http://localhost:7860`
65
+
66
+ ## πŸ“‹ Usage
67
+
68
+ 1. **Upload Image**: Click the "Upload image" button and select an image from your device
69
+ 2. **Generate Caption**: The app automatically processes the image and generates a caption
70
+ 3. **View Results**: The descriptive caption appears in the output textbox
71
+
72
+ ### Example
73
+
74
+ **Input Image:**
75
+ ```
76
+ [A photo of a golden retriever playing in a park]
77
+ ```
78
+
79
+ **Generated Caption:**
80
+ ```
81
+ "A golden retriever dog playing with a ball in a grassy park on a sunny day"
82
+ ```
83
+
84
+ ## 🧠 Model Information
85
+
86
+ This app uses **Salesforce/blip-image-captioning-base**, a vision-language model for image captioning:
87
+
88
+ - **Architecture**: BLIP with ViT-Base backbone
89
+ - **Model Size**: ~990MB (PyTorch model file)
90
+ - **Training Data**: COCO dataset with bootstrapped captions from web data
91
+ - **Capabilities**: Both conditional and unconditional image captioning
92
+ - **Performance**: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement)
93
+
94
+ ## πŸ“ Project Structure
95
+
96
+ ```
97
+ image-captioning-app/
98
+ β”œβ”€β”€ app.py # Main Gradio application
99
+ β”œβ”€β”€ requirements.txt # Python dependencies
100
+ β”œβ”€β”€ README.md # Project documentation
101
+ └── example_images/ # Sample images for testing
102
+ ```
103
+
104
+ ## πŸ“„ License
105
+
106
+ This project is licensed under the Apache License 2.0
107
+
108
+ ## πŸ™ Acknowledgments
109
+
110
+ - [Hugging Face](https://huggingface.co/) for the Transformers library and model hosting
111
+ - [Gradio](https://gradio.app/) for the web interface framework
112
+ - [Salesforce Research](https://github.com/salesforce/BLIP) for the BLIP model
113
+
114
+ ## πŸ“ž Contact
115
+
116
+ Ashish Soni - ashish.soni2091@gmail.com
117
+
118
+ Project Link: [github](https://github.com/Ashish-Soni08/image-captioning-app)