metadata

title: Image Captioning
emoji: 🖼️📝
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: apache-2.0

Image Captioning App 🖼️📝

A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces.

🚀 Live Demo

Try the app: Image-Captioning

✨ Features

Automatic Caption Generation: Upload any image and get descriptive captions instantly
Visual Understanding: AI model analyzes objects, scenes, and activities in images
Clean Interface: Intuitive web UI built with Gradio for seamless image uploads
Responsive Design: Works on desktop and mobile devices

🛠️ Technology Stack

Backend: Python, Hugging Face Transformers
Frontend: Gradio
Model: Salesforce/blip-image-captioning-base
Deployment: Hugging Face Spaces

🏃‍♂️ Quick Start

Prerequisites

Python 3.8+
pip

Installation

Clone the repository:

git clone https://github.com/Ashish-Soni08/image-captioning-app.git
cd image-captioning-app

Install dependencies:

pip install -r requirements.txt

Run the application:

python app.py

Open your browser and navigate to http://localhost:7860

📋 Usage

Upload Image: Click the "Upload image" button and select an image from your device
Generate Caption: The app automatically processes the image and generates a caption
View Results: The descriptive caption appears in the output textbox

Example

Input Image:

[A photo of a golden retriever playing in a park]

Generated Caption:

"A golden retriever dog playing with a ball in a grassy park on a sunny day"

🧠 Model Information

This app uses Salesforce/blip-image-captioning-base, a vision-language model for image captioning:

Architecture: BLIP with ViT-Base backbone
Model Size: ~990MB (PyTorch model file)
Training Data: COCO dataset with bootstrapped captions from web data
Capabilities: Both conditional and unconditional image captioning
Performance: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement)

📁 Project Structure

image-captioning-app/
├── app.py                 # Main Gradio application
├── requirements.txt       # Python dependencies
├── README.md             # Project documentation
└── example_images/        # Sample images for testing

📄 License

This project is licensed under the Apache License 2.0

🙏 Acknowledgments

Hugging Face for the Transformers library and model hosting
Gradio for the web interface framework
Salesforce Research for the BLIP model

📞 Contact

Ashish Soni - ashish.soni2091@gmail.com

Project Link: github