File size: 1,710 Bytes
84f799d
 
 
 
 
 
 
 
 
6596a21
84f799d
 
 
9fadb81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
title: Web Scraper & Q&A Chatbot with RAG
emoji: 🏃
colorFrom: blue
colorTo: yellow
sdk: streamlit
sdk_version: 1.43.1
app_file: app.py
pinned: false
short_description: 使用RAG的AI爬蟲對話機器人
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


---


## 繁體中文說明

這是一個結合「網頁爬蟲」與「RAG(檢索增強生成)」的 AI 對話機器人專案。  
- 你可以輸入任意網址,系統會自動爬取該網頁(可設定多層遞迴與同網域限制),將內容分段並向量化存入本地資料庫。
- 之後可直接用中文或英文提問,系統會根據爬取內容檢索最相關段落,並用 Gemini LLM 生成回覆。
- 支援中文語意檢索,適合知識管理、網站摘要、FAQ 應用。

### 安裝與執行
1. 安裝依賴:`pip install -r requirements.txt`
2. 複製 `example.env``.env` 並填入你的 Gemini API 金鑰
3. 執行:`streamlit run app.py`

---

## English Description

This project is a Web Scraper & RAG-based AI Chatbot.
- Enter any website URL, and the system will crawl the page (with configurable recursion depth and same-domain restriction), split and vectorize the content, and store it in a local database.
- You can then ask questions in Chinese or English. The system retrieves the most relevant content and generates answers using Gemini LLM.
- Optimized for Chinese semantic search, suitable for knowledge management, website summarization, and FAQ scenarios.

### Installation & Usage
1. Install dependencies: `pip install -r requirements.txt`
2. Copy `example.env` to `.env` and fill in your Gemini API key
3. Run: `streamlit run app.py`