Spaces:
Sleeping
Sleeping
title: Image Captioning | |
emoji: πΌοΈπ | |
colorFrom: pink | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.31.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# Image Captioning App πΌοΈπ | |
A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces. | |
 | |
## π Live Demo | |
Try the app: [Image-Captioning](https://huggingface.co/spaces/ashish-soni08/Image-Captioning) | |
## β¨ Features | |
- **Automatic Caption Generation**: Upload any image and get descriptive captions instantly | |
- **Visual Understanding**: AI model analyzes objects, scenes, and activities in images | |
- **Clean Interface**: Intuitive web UI built with Gradio for seamless image uploads | |
- **Responsive Design**: Works on desktop and mobile devices | |
## π οΈ Technology Stack | |
- **Backend**: Python, Hugging Face Transformers | |
- **Frontend**: Gradio | |
- **Model**: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base) | |
- **Deployment**: Hugging Face Spaces | |
## πββοΈ Quick Start | |
### Prerequisites | |
```bash | |
Python 3.8+ | |
pip | |
``` | |
### Installation | |
1. Clone the repository: | |
```bash | |
git clone https://github.com/Ashish-Soni08/image-captioning-app.git | |
cd image-captioning-app | |
``` | |
2. Install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. Run the application: | |
```bash | |
python app.py | |
``` | |
4. Open your browser and navigate to `http://localhost:7860` | |
## π Usage | |
1. **Upload Image**: Click the "Upload image" button and select an image from your device | |
2. **Generate Caption**: The app automatically processes the image and generates a caption | |
3. **View Results**: The descriptive caption appears in the output textbox | |
### Example | |
**Input Image:** | |
``` | |
[A photo of a golden retriever playing in a park] | |
``` | |
**Generated Caption:** | |
``` | |
"A golden retriever dog playing with a ball in a grassy park on a sunny day" | |
``` | |
## π§ Model Information | |
This app uses **Salesforce/blip-image-captioning-base**, a vision-language model for image captioning: | |
- **Architecture**: BLIP with ViT-Base backbone | |
- **Model Size**: ~990MB (PyTorch model file) | |
- **Training Data**: COCO dataset with bootstrapped captions from web data | |
- **Capabilities**: Both conditional and unconditional image captioning | |
- **Performance**: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement) | |
## π Project Structure | |
``` | |
image-captioning-app/ | |
βββ app.py # Main Gradio application | |
βββ requirements.txt # Python dependencies | |
βββ README.md # Project documentation | |
βββ example_images/ # Sample images for testing | |
``` | |
## π License | |
This project is licensed under the Apache License 2.0 | |
## π Acknowledgments | |
- [Hugging Face](https://huggingface.co/) for the Transformers library and model hosting | |
- [Gradio](https://gradio.app/) for the web interface framework | |
- [Salesforce Research](https://github.com/salesforce/BLIP) for the BLIP model | |
## π Contact | |
Ashish Soni - ashish.soni2091@gmail.com | |
Project Link: [github](https://github.com/Ashish-Soni08/image-captioning-app) |