MoJaff commited on
Commit
5c9a65a
ยท
verified ยท
1 Parent(s): 4454fff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -1
README.md CHANGED
@@ -2,12 +2,124 @@
2
  title: Mustalhim AI
3
  emoji: ๐Ÿ‘
4
  colorFrom: indigo
5
- colorTo: gray
6
  sdk: gradio
7
  sdk_version: 5.18.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
2
  title: Mustalhim AI
3
  emoji: ๐Ÿ‘
4
  colorFrom: indigo
5
+ colorTo: blue
6
  sdk: gradio
7
  sdk_version: 5.18.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  ---
12
+ # Mustalhim: Image to Audio Story Generator
13
+
14
+ ![Gradio Interface](https://img.shields.io/badge/Interface-Gradio-ff69b4)
15
+ ![Hugging Face Spaces](https://img.shields.io/badge/Deploy-Hugging%20Face%20Spaces-blue)
16
+ ![Python](https://img.shields.io/badge/Language-Python-green)
17
+
18
+ **Mustalhim** (ู…ุณุชู„ู‡ู…), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience.
19
+
20
+ ## Features
21
+
22
+ - **Image Captioning**: Generates a descriptive caption for an uploaded image using the `Salesforce/blip-image-captioning-large` model.
23
+ - **Story Generation**: Creates a long, engaging story inspired by the image caption using the `ALLaM-7B-Instruct-preview` model.
24
+ - **Text-to-Speech**: Converts the generated story into an audio file using the `kokoro` library.
25
+ - **Gradio Interface**: Provides an easy-to-use web interface for uploading images and listening to the generated audio.
26
+
27
+ ## How It Works
28
+
29
+ 1. **Image Captioning**: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image.
30
+ 2. **Story Generation**: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption.
31
+ 3. **Text-to-Speech**: The generated story is converted into an audio file using a text-to-speech library.
32
+ 4. **Output**: The app returns the audio file, which can be played directly in the interface.
33
+
34
+ ## Demo
35
+
36
+ You can try the app live on Hugging Face Spaces:
37
+ [![Hugging Face Spaces](https://huggingface.co/spaces/MoJaff/Mustalhim_AI/badge)](https://huggingface.co/spaces/MoJaff/Mustalhim_AI)
38
+
39
+ ---
40
+
41
+ ## Files
42
+
43
+ - `app.py`: The main application script that defines the Gradio interface and integrates the image captioning, story generation, and text-to-speech functionalities.
44
+ - `requirements.txt`: Lists the Python dependencies required for the project.
45
+ - `Dockerfile`: Defines the environment for deploying the app on Hugging Face Spaces.
46
+
47
+ ---
48
+
49
+ ## Requirements
50
+
51
+ - Python 3.9+
52
+ - Libraries:
53
+ - `gradio`
54
+ - `transformers`
55
+ - `torch`
56
+ - `soundfile`
57
+ - `kokoro`
58
+ - `sentencepiece`
59
+
60
+ ---
61
+
62
+ ## Example Usage
63
+
64
+ 1. Upload an image using the Gradio interface.
65
+ 2. The app will generate a caption for the image.
66
+ 3. A story will be created based on the caption.
67
+ 4. The story will be converted into an audio file, which you can listen to directly in the app.
68
+
69
+ ---
70
+
71
+ ## Screenshots
72
+
73
+ ![App Screenshot](https://via.placeholder.com/600x400.png?text=App+Screenshot)
74
+ *Example of the Mustalhim interface.*
75
+
76
+ ---
77
+
78
+ ## Contributing
79
+
80
+ Contributions are welcome! If you'd like to improve this project, please follow these steps:
81
+
82
+ 1. Fork the repository.
83
+ 2. Create a new branch (`git checkout -b feature/YourFeature`).
84
+ 3. Commit your changes (`git commit -m 'Add some feature'`).
85
+ 4. Push to the branch (`git push origin feature/YourFeature`).
86
+ 5. Open a pull request.
87
+
88
+ ---
89
+
90
+ ## License
91
+
92
+ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
93
+
94
+ ---
95
+
96
+ ## Acknowledgments
97
+
98
+ - [Hugging Face](https://huggingface.co) for providing the models and deployment platform.
99
+ - [Gradio](https://gradio.app) for the easy-to-use interface.
100
+ - [Salesforce](https://salesforce.com) for the `blip-image-captioning-large` model.
101
+ - [ALLaM-AI](https://huggingface.co/ALLaM-AI) for the `ALLaM-7B-Instruct-preview` model.
102
+
103
+ ---
104
+
105
+ ## About the Name
106
+
107
+ **Mustalhim** (ู…ุณุชู„ู‡ู…) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling.
108
+
109
+ ---
110
+
111
+ ## Contact
112
+
113
+ For questions or feedback, feel free to reach out:
114
+ - **Name**: Mohammad Alkhatim
115
+ - **Email**: your-email@example.com
116
+ - **GitHub**: [MoJaff](https://github.com/MoJaff)
117
+ - **LinkedIn**: [Mohammad Alkhatim](https://www.linkedin.com/in/mohammad-alkhatim-9b1770266/)
118
+ - **Hugging Face**: [MoJaff](https://huggingface.co/MoJaff)
119
+
120
+ ---
121
+
122
+ Experience the magic of **Mustalhim** and let your images inspire stories! ๐Ÿš€
123
+
124
 
125
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference