Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.33.0
metadata
title: Image Captioning
emoji: πΌοΈπ
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: apache-2.0
Image Captioning App πΌοΈπ
A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces.
π Live Demo
Try the app: Image-Captioning
β¨ Features
- Automatic Caption Generation: Upload any image and get descriptive captions instantly
- Visual Understanding: AI model analyzes objects, scenes, and activities in images
- Clean Interface: Intuitive web UI built with Gradio for seamless image uploads
- Responsive Design: Works on desktop and mobile devices
π οΈ Technology Stack
- Backend: Python, Hugging Face Transformers
- Frontend: Gradio
- Model: Salesforce/blip-image-captioning-base
- Deployment: Hugging Face Spaces
πββοΈ Quick Start
Prerequisites
Python 3.8+
pip
Installation
- Clone the repository:
git clone https://github.com/Ashish-Soni08/image-captioning-app.git
cd image-captioning-app
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python app.py
- Open your browser and navigate to
http://localhost:7860
π Usage
- Upload Image: Click the "Upload image" button and select an image from your device
- Generate Caption: The app automatically processes the image and generates a caption
- View Results: The descriptive caption appears in the output textbox
Example
Input Image:
[A photo of a golden retriever playing in a park]
Generated Caption:
"A golden retriever dog playing with a ball in a grassy park on a sunny day"
π§ Model Information
This app uses Salesforce/blip-image-captioning-base, a vision-language model for image captioning:
- Architecture: BLIP with ViT-Base backbone
- Model Size: ~990MB (PyTorch model file)
- Training Data: COCO dataset with bootstrapped captions from web data
- Capabilities: Both conditional and unconditional image captioning
- Performance: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement)
π Project Structure
image-captioning-app/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ example_images/ # Sample images for testing
π License
This project is licensed under the Apache License 2.0
π Acknowledgments
- Hugging Face for the Transformers library and model hosting
- Gradio for the web interface framework
- Salesforce Research for the BLIP model
π Contact
Ashish Soni - ashish.soni2091@gmail.com
Project Link: github