File size: 3,277 Bytes
e8f448a
 
5dc82f7
e8f448a
 
 
00dc2e5
e8f448a
 
bec574b
e8f448a
 
bec574b
5dc82f7
bec574b
5dc82f7
bec574b
5dc82f7
bec574b
5dc82f7
bec574b
5dc82f7
bec574b
5dc82f7
bec574b
 
 
 
5dc82f7
bec574b
5dc82f7
bec574b
 
 
 
5dc82f7
bec574b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: Image Captioning
emoji: πŸ–ΌοΈπŸ“
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: apache-2.0
---

# Image Captioning App πŸ–ΌοΈπŸ“

A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces.

![Demo Screenshot](image-captioning-logo.png)

## πŸš€ Live Demo

Try the app: [Image-Captioning](https://huggingface.co/spaces/ashish-soni08/Image-Captioning)

## ✨ Features

- **Automatic Caption Generation**: Upload any image and get descriptive captions instantly
- **Visual Understanding**: AI model analyzes objects, scenes, and activities in images
- **Clean Interface**: Intuitive web UI built with Gradio for seamless image uploads
- **Responsive Design**: Works on desktop and mobile devices

## πŸ› οΈ Technology Stack

- **Backend**: Python, Hugging Face Transformers
- **Frontend**: Gradio
- **Model**: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
- **Deployment**: Hugging Face Spaces

## πŸƒβ€β™‚οΈ Quick Start

### Prerequisites

```bash
Python 3.8+
pip
```

### Installation

1. Clone the repository:
```bash
git clone https://github.com/Ashish-Soni08/image-captioning-app.git
cd image-captioning-app
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the application:
```bash
python app.py
```

4. Open your browser and navigate to `http://localhost:7860`

## πŸ“‹ Usage

1. **Upload Image**: Click the "Upload image" button and select an image from your device
2. **Generate Caption**: The app automatically processes the image and generates a caption
3. **View Results**: The descriptive caption appears in the output textbox

### Example

**Input Image:**
```
[A photo of a golden retriever playing in a park]
```

**Generated Caption:**
```
"A golden retriever dog playing with a ball in a grassy park on a sunny day"
```

## 🧠 Model Information

This app uses **Salesforce/blip-image-captioning-base**, a vision-language model for image captioning:

- **Architecture**: BLIP with ViT-Base backbone
- **Model Size**: ~990MB (PyTorch model file)
- **Training Data**: COCO dataset with bootstrapped captions from web data
- **Capabilities**: Both conditional and unconditional image captioning
- **Performance**: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement)

## πŸ“ Project Structure

```
image-captioning-app/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # Project documentation
└── example_images/        # Sample images for testing
```

## πŸ“„ License

This project is licensed under the Apache License 2.0

## πŸ™ Acknowledgments

- [Hugging Face](https://huggingface.co/) for the Transformers library and model hosting
- [Gradio](https://gradio.app/) for the web interface framework
- [Salesforce Research](https://github.com/salesforce/BLIP) for the BLIP model

## πŸ“ž Contact

Ashish Soni - ashish.soni2091@gmail.com

Project Link: [github](https://github.com/Ashish-Soni08/image-captioning-app)