Mustalhim_AI / README.md
MoJaff's picture
Update README.md
870ed22 verified
|
raw
history blame
4.16 kB
metadata
title: Mustalhim AI
emoji: ๐Ÿ‘
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
license: apache-2.0

Mustalhim: Image to Audio Story Generator

Mustalhim (ู…ุณุชู„ู‡ู…), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience.

Features

  • Image Captioning: Generates a descriptive caption for an uploaded image using the Salesforce/blip-image-captioning-large model.
  • Story Generation: Creates a long, engaging story inspired by the image caption using the ALLaM-7B-Instruct-preview model.
  • Text-to-Speech: Converts the generated story into an audio file using the kokoro library.
  • Gradio Interface: Provides an easy-to-use web interface for uploading images and listening to the generated audio.

How It Works

  1. Image Captioning: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image.
  2. Story Generation: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption.
  3. Text-to-Speech: The generated story is converted into an audio file using a text-to-speech library.
  4. Output: The app returns the audio file, which can be played directly in the interface.

Demo

You can try the app live on Hugging Face Spaces:
Hugging Face Spaces


Files

  • app.py: The main application script that defines the Gradio interface and integrates the image captioning, story generation, and text-to-speech functionalities.
  • requirements.txt: Lists the Python dependencies required for the project.
  • Dockerfile: Defines the environment for deploying the app on Hugging Face Spaces.

Requirements

  • Python 3.9+
  • Libraries:
    • gradio
    • transformers
    • torch
    • soundfile
    • kokoro
    • sentencepiece

Example Usage

  1. Upload an image using the Gradio interface.
  2. The app will generate a caption for the image.
  3. A story will be created based on the caption.
  4. The story will be converted into an audio file, which you can listen to directly in the app.

Screenshots

App Screenshot
Example of the Mustalhim interface.


Contributing

Contributions are welcome! If you'd like to improve this project, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeature).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature/YourFeature).
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

  • Hugging Face for providing the models and deployment platform.
  • Gradio for the easy-to-use interface.
  • Salesforce for the blip-image-captioning-large model.
  • ALLaM-AI for the ALLaM-7B-Instruct-preview model.

About the Name

Mustalhim (ู…ุณุชู„ู‡ู…) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling.


Contact

For questions or feedback, feel free to reach out:


Experience the magic of Mustalhim and let your images inspire stories! ๐Ÿš€

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference