Image-Captioning / README.md
ashish-soni08's picture
Update README.md
bec574b verified
---
title: Image Captioning
emoji: πŸ–ΌοΈπŸ“
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: apache-2.0
---
# Image Captioning App πŸ–ΌοΈπŸ“
A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces.
![Demo Screenshot](image-captioning-logo.png)
## πŸš€ Live Demo
Try the app: [Image-Captioning](https://huggingface.co/spaces/ashish-soni08/Image-Captioning)
## ✨ Features
- **Automatic Caption Generation**: Upload any image and get descriptive captions instantly
- **Visual Understanding**: AI model analyzes objects, scenes, and activities in images
- **Clean Interface**: Intuitive web UI built with Gradio for seamless image uploads
- **Responsive Design**: Works on desktop and mobile devices
## πŸ› οΈ Technology Stack
- **Backend**: Python, Hugging Face Transformers
- **Frontend**: Gradio
- **Model**: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
- **Deployment**: Hugging Face Spaces
## πŸƒβ€β™‚οΈ Quick Start
### Prerequisites
```bash
Python 3.8+
pip
```
### Installation
1. Clone the repository:
```bash
git clone https://github.com/Ashish-Soni08/image-captioning-app.git
cd image-captioning-app
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the application:
```bash
python app.py
```
4. Open your browser and navigate to `http://localhost:7860`
## πŸ“‹ Usage
1. **Upload Image**: Click the "Upload image" button and select an image from your device
2. **Generate Caption**: The app automatically processes the image and generates a caption
3. **View Results**: The descriptive caption appears in the output textbox
### Example
**Input Image:**
```
[A photo of a golden retriever playing in a park]
```
**Generated Caption:**
```
"A golden retriever dog playing with a ball in a grassy park on a sunny day"
```
## 🧠 Model Information
This app uses **Salesforce/blip-image-captioning-base**, a vision-language model for image captioning:
- **Architecture**: BLIP with ViT-Base backbone
- **Model Size**: ~990MB (PyTorch model file)
- **Training Data**: COCO dataset with bootstrapped captions from web data
- **Capabilities**: Both conditional and unconditional image captioning
- **Performance**: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement)
## πŸ“ Project Structure
```
image-captioning-app/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # Project documentation
└── example_images/ # Sample images for testing
```
## πŸ“„ License
This project is licensed under the Apache License 2.0
## πŸ™ Acknowledgments
- [Hugging Face](https://huggingface.co/) for the Transformers library and model hosting
- [Gradio](https://gradio.app/) for the web interface framework
- [Salesforce Research](https://github.com/salesforce/BLIP) for the BLIP model
## πŸ“ž Contact
Ashish Soni - ashish.soni2091@gmail.com
Project Link: [github](https://github.com/Ashish-Soni08/image-captioning-app)