Spaces:
Running
Running
title: SmartManuals-AI | |
emoji: π§ | |
colorFrom: indigo | |
colorTo: blue | |
sdk: gradio | |
sdk_version: 5.30.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
tags: | |
- RAG | |
- LLM | |
- Chroma | |
- Gradio | |
- OCR | |
- HuggingFace | |
- Word | |
- SemanticSearch | |
- SmartManualsAI | |
# β SmartManuals-AI for Hugging Face Spaces | |
SmartManuals-AI is a local-first document QA system that uses **retrieval-augmented generation (RAG)**, **OCR**, and **semantic embeddings** to answer technical questions from equipment manuals, service guides, and parts catalogs. | |
This app is optimized for Hugging Face Spaces and **requires no user upload** β just preload your manuals in a `Manuals/` folder. | |
--- | |
## π§ Features | |
- π§ Ask **natural-language questions** against your own manuals | |
- π Supports both **PDF** and **Word (.docx)** documents | |
- π Uses `sentence-transformers` for semantic search | |
- ποΈ Indexes chunks in **ChromaDB** (stored locally) | |
- π¬ Generates answers via Hugging Face models (default: **Meta LLaMA 3.1 8B Instruct**) | |
- π₯οΈ Clean **Gradio interface** for querying | |
--- | |
## π Folder Structure | |
``` | |
SmartManuals-AI/ | |
βββ app.py # Main Hugging Face app | |
βββ Manuals/ # Place your PDF and DOCX manuals here | |
β βββ OM_Treadmill.pdf | |
β βββ Parts_Bike.docx | |
βββ chroma_store/ # Vector database (auto-generated) | |
βββ requirements.txt # Dependencies | |
βββ README.md # This file | |
``` | |
--- | |
## π Usage on Hugging Face Spaces | |
### π Environment Variable | |
Add this secret in your Space settings: | |
| Name | Value | | |
|-----------|----------------------| | |
| `HF_TOKEN` | Your Hugging Face token | | |
> **Note**: You must accept model licenses on [Hugging Face Hub](https://huggingface.co/meta-llama) before using gated models like `Llama-3.1-8B-Instruct`. | |
--- | |
### π€ Uploading Manuals | |
- Upload your **PDF and Word documents** directly to the `Manuals/` folder in your Space repository. | |
- No need for file uploads via the interface. | |
--- | |
### π§ How It Works | |
- On app startup: | |
- Text is extracted from **PDFs (with OCR fallback)** and `.docx` Word files | |
- Sentences are cleaned, chunked, and embedded with `all-MiniLM-L6-v2` | |
- Chunks are stored in a local **ChromaDB vector database** | |
- At query time: | |
- Your question is embedded and semantically compared against chunks | |
- The most relevant chunks are passed to the LLM | |
- The **LLM (LLaMA 3.1)** generates a focused answer from context only | |
--- | |
## π€ Default Model | |
- This app uses: **`meta-llama/Llama-3.1-8B-Instruct`** | |
- More models are supported behind-the-scenes (e.g. Mistral, Gemma) | |
- **No need to manually pick** models, doc types, or categories | |
--- | |
## π§© Supported File Types | |
- β PDF (`.pdf`) with OCR fallback using Tesseract | |
- β Word Documents (`.docx`) | |
--- | |
## π§ͺ Local Development | |
Clone and run locally: | |
```bash | |
git clone https://github.com/damoojeje/SmartManuals-AI.git | |
cd SmartManuals-AI | |
pip install -r requirements.txt | |
python app.py | |
``` | |
> π Place your manuals inside the `Manuals/` directory before running. | |
--- | |
## π¨π½βπ» Created By | |
**Damilare Eniolabi** | |
π§ [damilareeniolabi@gmail.com](mailto:damilareeniolabi@gmail.com) | |
π GitHub: [@damoojeje](https://github.com/damoojeje) | |
--- | |
## π Tags | |
`RAG` Β· `LLM` Β· `Gradio` Β· `ChromaDB` Β· `OCR` Β· `SemanticSearch` Β· `PDF` Β· `Word` Β· `SmartManualsAI` Β· `EquipmentQA` |