Spaces:

MoJaff
/

Mustalhim_AI

Running

App Files Files Community

MoJaff commited on Feb 26

Commit

5c9a65a

verified ·

1 Parent(s): 4454fff

Update README.md

Browse files

Files changed (1) hide show

README.md +113 -1

README.md CHANGED Viewed

@@ -2,12 +2,124 @@
 title: Mustalhim AI
 emoji: 👁
 colorFrom: indigo
-colorTo: gray
 sdk: gradio
 sdk_version: 5.18.0
 app_file: app.py
 pinned: false
 license: apache-2.0
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 title: Mustalhim AI
 emoji: 👁
 colorFrom: indigo
+colorTo: blue
 sdk: gradio
 sdk_version: 5.18.0
 app_file: app.py
 pinned: false
 license: apache-2.0
 ---
+# Mustalhim: Image to Audio Story Generator
+![Gradio Interface](https://img.shields.io/badge/Interface-Gradio-ff69b4)
+![Hugging Face Spaces](https://img.shields.io/badge/Deploy-Hugging%20Face%20Spaces-blue)
+![Python](https://img.shields.io/badge/Language-Python-green)
+**Mustalhim** (مستلهم), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience.
+## Features
+- **Image Captioning**: Generates a descriptive caption for an uploaded image using the `Salesforce/blip-image-captioning-large` model.
+- **Story Generation**: Creates a long, engaging story inspired by the image caption using the `ALLaM-7B-Instruct-preview` model.
+- **Text-to-Speech**: Converts the generated story into an audio file using the `kokoro` library.
+- **Gradio Interface**: Provides an easy-to-use web interface for uploading images and listening to the generated audio.
+## How It Works
+1. **Image Captioning**: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image.
+2. **Story Generation**: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption.
+3. **Text-to-Speech**: The generated story is converted into an audio file using a text-to-speech library.
+4. **Output**: The app returns the audio file, which can be played directly in the interface.
+## Demo
+You can try the app live on Hugging Face Spaces:
+[![Hugging Face Spaces](https://huggingface.co/spaces/MoJaff/Mustalhim_AI/badge)](https://huggingface.co/spaces/MoJaff/Mustalhim_AI)
+---
+## Files
+- `app.py`: The main application script that defines the Gradio interface and integrates the image captioning, story generation, and text-to-speech functionalities.
+- `requirements.txt`: Lists the Python dependencies required for the project.
+- `Dockerfile`: Defines the environment for deploying the app on Hugging Face Spaces.
+---
+## Requirements
+- Python 3.9+
+- Libraries:
+  - `gradio`
+  - `transformers`
+  - `torch`
+  - `soundfile`
+  - `kokoro`
+  - `sentencepiece`
+---
+## Example Usage
+1. Upload an image using the Gradio interface.
+2. The app will generate a caption for the image.
+3. A story will be created based on the caption.
+4. The story will be converted into an audio file, which you can listen to directly in the app.
+---
+## Screenshots
+![App Screenshot](https://via.placeholder.com/600x400.png?text=App+Screenshot)
+*Example of the Mustalhim interface.*
+---
+## Contributing
+Contributions are welcome! If you'd like to improve this project, please follow these steps:
+1. Fork the repository.
+2. Create a new branch (`git checkout -b feature/YourFeature`).
+3. Commit your changes (`git commit -m 'Add some feature'`).
+4. Push to the branch (`git push origin feature/YourFeature`).
+5. Open a pull request.
+---
+## License
+This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
+---
+## Acknowledgments
+- [Hugging Face](https://huggingface.co) for providing the models and deployment platform.
+- [Gradio](https://gradio.app) for the easy-to-use interface.
+- [Salesforce](https://salesforce.com) for the `blip-image-captioning-large` model.
+- [ALLaM-AI](https://huggingface.co/ALLaM-AI) for the `ALLaM-7B-Instruct-preview` model.
+---
+## About the Name
+**Mustalhim** (مستلهم) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling.
+---
+## Contact
+For questions or feedback, feel free to reach out:
+- **Name**: Mohammad Alkhatim
+- **Email**: your-email@example.com
+- **GitHub**: [MoJaff](https://github.com/MoJaff)
+- **LinkedIn**: [Mohammad Alkhatim](https://www.linkedin.com/in/mohammad-alkhatim-9b1770266/)
+- **Hugging Face**: [MoJaff](https://huggingface.co/MoJaff)
+---
+Experience the magic of **Mustalhim** and let your images inspire stories! 🚀
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference