Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -1,14 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
-
|
|
|
|
1 |
+
# β
SmartManuals-AI for Hugging Face Spaces
|
2 |
+
|
3 |
+
SmartManuals-AI is a local-first document QA system that uses RAG (retrieval-augmented generation), OCR, and embedding search to answer technical questions from PDFs **and Word documents**.
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
## π§ Features
|
8 |
+
|
9 |
+
- π **Ask natural-language questions** to your manuals
|
10 |
+
- π Handles both **PDFs** and **Word `.docx`** files
|
11 |
+
- π§ Uses **semantic search** with `sentence-transformers`
|
12 |
+
- ποΈ ChromaDB for fast local vector indexing
|
13 |
+
- π¬ Answers generated by **Meta LLaMA 3.1 8B Instruct** (default)
|
14 |
+
- π Gradio dashboard for interaction
|
15 |
+
|
16 |
+
---
|
17 |
+
|
18 |
+
## π Folder Structure
|
19 |
+
```
|
20 |
+
SmartManuals-AI/
|
21 |
+
βββ app.py # Hugging Face Spaces main app
|
22 |
+
βββ Manuals/ # π Upload your PDF and Word manuals here
|
23 |
+
β βββ OM_Treadmill.pdf
|
24 |
+
β βββ Parts_Bike.docx
|
25 |
+
βββ chroma_store/ # βοΈ ChromaDB vector DB (auto-generated)
|
26 |
+
βββ requirements.txt # π¦ Dependencies
|
27 |
+
βββ README.md # π This file
|
28 |
+
```
|
29 |
+
|
30 |
+
---
|
31 |
+
|
32 |
+
## π Usage in Hugging Face Spaces
|
33 |
+
|
34 |
+
### π Environment Variables
|
35 |
+
Add your Hugging Face token as a secret:
|
36 |
+
|
37 |
+
- `HF_TOKEN`: Your Hugging Face access token (required for gated models)
|
38 |
+
|
39 |
+
### π€ Upload Your Files
|
40 |
+
Put all your manuals (PDF and Word `.docx`) into the `Manuals/` folder.
|
41 |
+
|
42 |
+
### π§ App Behavior
|
43 |
+
- On startup:
|
44 |
+
- Extracts text (with OCR fallback) from PDFs
|
45 |
+
- Extracts clean text from Word documents
|
46 |
+
- Chunks and embeds content into ChromaDB
|
47 |
+
- During inference:
|
48 |
+
- Retrieves semantically relevant chunks
|
49 |
+
- Sends them to LLaMA 3.1 Instruct for answer generation
|
50 |
+
|
51 |
+
### β No User Upload
|
52 |
+
This app is **designed to work without file uploads**. All processing is done on preloaded files in the `Manuals/` directory.
|
53 |
+
|
54 |
+
---
|
55 |
+
|
56 |
+
## π§ Default Model
|
57 |
+
- Uses **`meta-llama/Llama-3.1-8B-Instruct`**
|
58 |
+
- All question answering is **fully automatic**
|
59 |
+
- User is **not required to pick a model, doc type, or filter** β the system decides based on question and content.
|
60 |
+
|
61 |
+
---
|
62 |
+
|
63 |
+
## π§© Supported File Types
|
64 |
+
- `.pdf` (with OCR for scanned pages)
|
65 |
+
- `.docx` (via `python-docx`)
|
66 |
+
|
67 |
---
|
68 |
+
|
69 |
+
## π§ͺ Local Development
|
70 |
+
Install dependencies:
|
71 |
+
```bash
|
72 |
+
pip install -r requirements.txt
|
73 |
+
```
|
74 |
+
Run locally:
|
75 |
+
```bash
|
76 |
+
python app.py
|
77 |
+
```
|
78 |
+
|
79 |
+
---
|
80 |
+
|
81 |
+
## π¨π½βπ» Project by: [Damilare Eniolabi](mailto:damilareeniolabi@gmail.com)
|
82 |
+
GitHub: [@damoojeje](https://github.com/damoojeje)
|
83 |
+
|
84 |
---
|
85 |
|
86 |
+
## π Tags
|
87 |
+
`RAG` `LLM` `Chroma` `OCR` `PDF` `Word` `Gradio` `HuggingFace` `SmartManualsAI`
|