--- title: Mustalhim AI emoji: 👁 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 5.18.0 app_file: app.py pinned: false license: apache-2.0 --- # Mustalhim: Image to Audio Story Generator **Mustalhim** (مستلهم), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience. ## Features - **Image Captioning**: Generates a descriptive caption for an uploaded image using the `Salesforce/blip-image-captioning-large` model. - **Story Generation**: Creates a long, engaging story inspired by the image caption using the `ALLaM-7B-Instruct-preview` model. - **Text-to-Speech**: Converts the generated story into an audio file using the `kokoro` library. - **Gradio Interface**: Provides an easy-to-use web interface for uploading images and listening to the generated audio. ## How It Works 1. **Image Captioning**: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image. 2. **Story Generation**: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption. 3. **Text-to-Speech**: The generated story is converted into an audio file using a text-to-speech library. 4. **Output**: The app returns the audio file, which can be played directly in the interface. ## Demo You can try the app live on Hugging Face Spaces: [![Hugging Face Spaces](https://huggingface.co/spaces/MoJaff/Mustalhim_AI/badge)](https://huggingface.co/spaces/MoJaff/Mustalhim_AI) --- ## Files - `app.py`: The main application script that defines the Gradio interface and integrates the image captioning, story generation, and text-to-speech functionalities. - `requirements.txt`: Lists the Python dependencies required for the project. - `Dockerfile`: Defines the environment for deploying the app on Hugging Face Spaces. --- ## Requirements - Python 3.9+ - Libraries: - `gradio` - `transformers` - `torch` - `soundfile` - `kokoro` - `sentencepiece` --- ## Example Usage 1. Upload an image using the Gradio interface. 2. The app will generate a caption for the image. 3. A story will be created based on the caption. 4. The story will be converted into an audio file, which you can listen to directly in the app. --- ## Screenshots ![App Screenshot](https://via.placeholder.com/600x400.png?text=App+Screenshot) *Example of the Mustalhim interface.* --- ## Contributing Contributions are welcome! If you'd like to improve this project, please follow these steps: 1. Fork the repository. 2. Create a new branch (`git checkout -b feature/YourFeature`). 3. Commit your changes (`git commit -m 'Add some feature'`). 4. Push to the branch (`git push origin feature/YourFeature`). 5. Open a pull request. --- ## License This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. --- ## Acknowledgments - [Hugging Face](https://huggingface.co) for providing the models and deployment platform. - [Gradio](https://gradio.app) for the easy-to-use interface. - [Salesforce](https://salesforce.com) for the `blip-image-captioning-large` model. - [ALLaM-AI](https://huggingface.co/ALLaM-AI) for the `ALLaM-7B-Instruct-preview` model. --- ## About the Name **Mustalhim** (مستلهم) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling. --- ## Contact For questions or feedback, feel free to reach out: - **Name**: Mohammad Alkhatim - **Email**: your-email@example.com - **GitHub**: [MoJaff](https://github.com/MoJaff) - **LinkedIn**: [Mohammad Alkhatim](https://www.linkedin.com/in/mohammad-alkhatim-9b1770266/) - **Hugging Face**: [MoJaff](https://huggingface.co/MoJaff) --- Experience the magic of **Mustalhim** and let your images inspire stories! 🚀 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference