Image-Captioning / README.md
ashish-soni08's picture
Update README.md
bec574b verified

A newer version of the Gradio SDK is available: 5.33.0

Upgrade
metadata
title: Image Captioning
emoji: πŸ–ΌοΈπŸ“
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: apache-2.0

Image Captioning App πŸ–ΌοΈπŸ“

A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces.

Demo Screenshot

πŸš€ Live Demo

Try the app: Image-Captioning

✨ Features

  • Automatic Caption Generation: Upload any image and get descriptive captions instantly
  • Visual Understanding: AI model analyzes objects, scenes, and activities in images
  • Clean Interface: Intuitive web UI built with Gradio for seamless image uploads
  • Responsive Design: Works on desktop and mobile devices

πŸ› οΈ Technology Stack

πŸƒβ€β™‚οΈ Quick Start

Prerequisites

Python 3.8+
pip

Installation

  1. Clone the repository:
git clone https://github.com/Ashish-Soni08/image-captioning-app.git
cd image-captioning-app
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python app.py
  1. Open your browser and navigate to http://localhost:7860

πŸ“‹ Usage

  1. Upload Image: Click the "Upload image" button and select an image from your device
  2. Generate Caption: The app automatically processes the image and generates a caption
  3. View Results: The descriptive caption appears in the output textbox

Example

Input Image:

[A photo of a golden retriever playing in a park]

Generated Caption:

"A golden retriever dog playing with a ball in a grassy park on a sunny day"

🧠 Model Information

This app uses Salesforce/blip-image-captioning-base, a vision-language model for image captioning:

  • Architecture: BLIP with ViT-Base backbone
  • Model Size: ~990MB (PyTorch model file)
  • Training Data: COCO dataset with bootstrapped captions from web data
  • Capabilities: Both conditional and unconditional image captioning
  • Performance: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement)

πŸ“ Project Structure

image-captioning-app/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # Project documentation
└── example_images/        # Sample images for testing

πŸ“„ License

This project is licensed under the Apache License 2.0

πŸ™ Acknowledgments

πŸ“ž Contact

Ashish Soni - ashish.soni2091@gmail.com

Project Link: github