Spaces:

ashish-soni08
/

Image-Captioning

Sleeping

App Files Files Community

Image-Captioning / README.md

ashish-soni08

Update README.md

bec574b verified 16 days ago

preview code

raw

history blame contribute delete

3.28 kB

	---
	title: Image Captioning
	emoji: 🖼️📝
	colorFrom: pink
	colorTo: purple
	sdk: gradio
	sdk_version: 5.31.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# Image Captioning App 🖼️📝

	A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces.

	![Demo Screenshot](image-captioning-logo.png)

	## 🚀 Live Demo

	Try the app: [Image-Captioning](https://huggingface.co/spaces/ashish-soni08/Image-Captioning)

	## ✨ Features

	- Automatic Caption Generation: Upload any image and get descriptive captions instantly
	- Visual Understanding: AI model analyzes objects, scenes, and activities in images
	- Clean Interface: Intuitive web UI built with Gradio for seamless image uploads
	- Responsive Design: Works on desktop and mobile devices

	## 🛠️ Technology Stack

	- Backend: Python, Hugging Face Transformers
	- Frontend: Gradio
	- Model: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
	- Deployment: Hugging Face Spaces

	## 🏃‍♂️ Quick Start

	### Prerequisites

	```bash
	Python 3.8+
	pip
	```

	### Installation

	1. Clone the repository:
	```bash
	git clone https://github.com/Ashish-Soni08/image-captioning-app.git
	cd image-captioning-app
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the application:
	```bash
	python app.py
	```

	4. Open your browser and navigate to `http://localhost:7860`

	## 📋 Usage

	1. Upload Image: Click the "Upload image" button and select an image from your device
	2. Generate Caption: The app automatically processes the image and generates a caption
	3. View Results: The descriptive caption appears in the output textbox

	### Example

	Input Image:
	```
	[A photo of a golden retriever playing in a park]
	```

	Generated Caption:
	```
	"A golden retriever dog playing with a ball in a grassy park on a sunny day"
	```

	## 🧠 Model Information

	This app uses Salesforce/blip-image-captioning-base, a vision-language model for image captioning:

	- Architecture: BLIP with ViT-Base backbone
	- Model Size: ~990MB (PyTorch model file)
	- Training Data: COCO dataset with bootstrapped captions from web data
	- Capabilities: Both conditional and unconditional image captioning
	- Performance: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement)

	## 📁 Project Structure

	```
	image-captioning-app/
	├── app.py # Main Gradio application
	├── requirements.txt # Python dependencies
	├── README.md # Project documentation
	└── example_images/ # Sample images for testing
	```

	## 📄 License

	This project is licensed under the Apache License 2.0

	## 🙏 Acknowledgments

	- [Hugging Face](https://huggingface.co/) for the Transformers library and model hosting
	- [Gradio](https://gradio.app/) for the web interface framework
	- [Salesforce Research](https://github.com/salesforce/BLIP) for the BLIP model

	## 📞 Contact

	Ashish Soni - ashish.soni2091@gmail.com

	Project Link: [github](https://github.com/Ashish-Soni08/image-captioning-app)