mafzaal commited on
Commit
2754790
·
1 Parent(s): dba85e7

Enhance project structure and documentation

Browse files

- Update Dockerfile to include additional application files.
- Revise README.md for improved clarity and detail on application usage and technology stack.
- Expand chainlit.md with usage instructions and examples.
- Implement main.py as the command-line entry point for running the application and updating the vector database.
- Create .env.example for environment variable configuration.
- Add comprehensive bug report and feature request templates.
- Establish Python CI workflow for automated testing and linting.
- Develop CONTRIBUTING.md to guide new contributors.
- Include LICENSE and SECURITY.md for legal and security guidelines.

.env.example ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TheDataGuy Chat Configuration
2
+ # Copy this file to .env and fill in your values
3
+
4
+ # OpenAI API Key - Required for LLM and embeddings
5
+ OPENAI_API_KEY=your_openai_api_key_here
6
+
7
+ # Vector Store Configuration
8
+ VECTOR_STORAGE_PATH=./db/vector_store_tdg
9
+ QDRANT_COLLECTION=thedataguy_documents
10
+
11
+ # Model Configuration
12
+ EMBEDDING_MODEL=Snowflake/snowflake-arctic-embed-l
13
+ LLM_MODEL=gpt-4o-mini
14
+ LLM_TEMPERATURE=0
15
+
16
+ # For evaluation and synthetic data generation (optional)
17
+ SDG_LLM_MODEL=gpt-4.1
18
+ EVAL_LLM_MODEL=gpt-4.1
19
+
20
+ # Blog Configuration
21
+ DATA_DIR=data/
22
+ BLOG_BASE_URL=https://thedataguy.pro/blog/
23
+
24
+ # Search Configuration
25
+ MAX_SEARCH_RESULTS=5
.github/ISSUE_TEMPLATE/bug_report.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Bug Report
3
+ about: Create a report to help us improve
4
+ title: '[BUG] '
5
+ labels: bug
6
+ assignees: ''
7
+ ---
8
+
9
+ ## Bug Description
10
+ A clear and concise description of what the bug is.
11
+
12
+ ## Steps to Reproduce
13
+ 1. Go to '...'
14
+ 2. Click on '....'
15
+ 3. Scroll down to '....'
16
+ 4. See error
17
+
18
+ ## Expected Behavior
19
+ A clear and concise description of what you expected to happen.
20
+
21
+ ## Actual Behavior
22
+ What actually happened instead.
23
+
24
+ ## Screenshots
25
+ If applicable, add screenshots to help explain your problem.
26
+
27
+ ## Environment
28
+ - OS: [e.g. Windows, macOS, Linux]
29
+ - Browser: [e.g. Chrome, Safari, Firefox]
30
+ - Version: [e.g. 1.0.0]
31
+ - Python Version: [e.g. 3.13.0]
32
+
33
+ ## Additional Context
34
+ Add any other context about the problem here, such as:
35
+ - Error messages or logs
36
+ - Relevant configuration details
37
+ - Any recent changes that might have caused the issue
.github/ISSUE_TEMPLATE/feature_request.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Feature Request
3
+ about: Suggest an idea for this project
4
+ title: '[FEATURE] '
5
+ labels: enhancement
6
+ assignees: ''
7
+ ---
8
+
9
+ ## Feature Description
10
+ A clear and concise description of the feature you'd like to see implemented.
11
+
12
+ ## Use Case
13
+ Describe the context and use case for this feature. How would it benefit the project and its users?
14
+
15
+ ## Proposed Solution
16
+ If you have ideas about how to implement this feature, describe them here.
17
+
18
+ ## Alternatives Considered
19
+ Have you considered any alternative solutions or features? If so, please describe them.
20
+
21
+ ## Additional Context
22
+ Add any other context, screenshots, or mockups about the feature request here.
23
+
24
+ ## Impact
25
+ How would this feature impact the current functionality? Would it require any changes to existing features?
.github/workflows/python-ci.yml ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Python CI
2
+
3
+ on:
4
+ push:
5
+ branches: [ main ]
6
+ pull_request:
7
+ branches: [ main ]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ python-version: ['3.13']
15
+
16
+ steps:
17
+ - uses: actions/checkout@v3
18
+
19
+ - name: Set up Python ${{ matrix.python-version }}
20
+ uses: actions/setup-python@v4
21
+ with:
22
+ python-version: ${{ matrix.python-version }}
23
+
24
+ - name: Install dependencies
25
+ run: |
26
+ python -m pip install --upgrade pip
27
+ pip install uv
28
+ uv init
29
+ uv sync
30
+
31
+ - name: Lint with flake8
32
+ run: |
33
+ uv pip install flake8
34
+ # stop the build if there are Python syntax errors or undefined names
35
+ flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
36
+ # exit-zero treats all errors as warnings
37
+ flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
38
+
39
+ - name: Check if vector store can be built
40
+ run: |
41
+ python py-src/pipeline.py --ci --output-dir ./artifacts
42
+ env:
43
+ OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
44
+ VECTOR_STORAGE_PATH: ./db/vector_store_ci
45
+ EMBEDDING_MODEL: Snowflake/snowflake-arctic-embed-l
CONTRIBUTING.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Contributing to TheDataGuy Chat
2
+
3
+ Thank you for your interest in contributing to the TheDataGuy Chat project! This document provides guidelines and instructions for contributing to this repository.
4
+
5
+ ## Project Overview
6
+
7
+ TheDataGuy Chat is a Q&A chatbot powered by the content from [TheDataGuy blog](https://thedataguy.pro/blog/). It uses RAG (Retrieval Augmented Generation) to provide informative answers about topics such as RAGAS, RAG evaluation, building research agents, metric-driven development, and data science best practices.
8
+
9
+ ## Development Environment Setup
10
+
11
+ ### Prerequisites
12
+
13
+ - Python 3.13 or higher
14
+ - [uv](https://github.com/astral-sh/uv) for Python package management
15
+ - Docker (optional, for containerized development)
16
+ - OpenAI API key
17
+
18
+ ### Local Setup
19
+
20
+ 1. Clone the repository:
21
+ ```bash
22
+ git clone https://github.com/mafzaal/lets-talk.git
23
+ cd lets-talk
24
+ ```
25
+
26
+ 2. Create a `.env` file with the necessary environment variables:
27
+ ```
28
+ OPENAI_API_KEY=your_openai_api_key
29
+ VECTOR_STORAGE_PATH=./db/vector_store_tdg
30
+ LLM_MODEL=gpt-4o-mini
31
+ EMBEDDING_MODEL=Snowflake/snowflake-arctic-embed-l
32
+ ```
33
+
34
+ 3. Install dependencies:
35
+ ```bash
36
+ uv init && uv sync
37
+ ```
38
+
39
+ 4. Build the vector store:
40
+ ```bash
41
+ ./scripts/build-vector-store.sh
42
+ ```
43
+
44
+ 5. Run the application:
45
+ ```bash
46
+ chainlit run py-src/app.py --host 0.0.0.0 --port 7860
47
+ ```
48
+
49
+ ### Using Docker
50
+
51
+ 1. Build the Docker image:
52
+ ```bash
53
+ docker build -t lets-talk .
54
+ ```
55
+
56
+ 2. Run the container:
57
+ ```bash
58
+ docker run -p 7860:7860 --env-file ./.env lets-talk
59
+ ```
60
+
61
+ ## Project Structure
62
+
63
+ ```
64
+ lets-talk/
65
+ ├── data/ # Raw blog post content
66
+ ├── py-src/ # Python source code
67
+ │ ├── lets_talk/ # Core application modules
68
+ │ │ ├── agent.py # Agent implementation
69
+ │ │ ├── config.py # Configuration settings
70
+ │ │ ├── models.py # Data models
71
+ │ │ ├── prompts.py # LLM prompt templates
72
+ │ │ ├── rag.py # RAG implementation
73
+ │ │ ├── rss_tool.py # RSS feed integration
74
+ │ │ ├── tools.py # Tool implementations
75
+ │ │ └── utils/ # Utility functions
76
+ │ ├── app.py # Main application entry point
77
+ │ ├── pipeline.py # Data processing pipeline
78
+ │ └── notebooks/ # Jupyter notebooks for analysis
79
+ ├── db/ # Vector database storage
80
+ ├── evals/ # Evaluation datasets and results
81
+ └── scripts/ # Utility scripts
82
+ ```
83
+
84
+ ## Adding New Blog Posts
85
+
86
+ When new blog posts are published on TheDataGuy.pro, follow these steps to add them to the chat application:
87
+
88
+ 1. Add the markdown content to the `data/` directory in a new folder named after the post slug
89
+ 2. Run the vector store update script:
90
+ ```bash
91
+ python py-src/pipeline.py --force-recreate
92
+ ```
93
+
94
+ ## Workflow
95
+
96
+ 1. **Fork** the repository on GitHub
97
+ 2. **Clone** your fork to your local machine
98
+ 3. Create a new **branch** for your feature or bug fix
99
+ 4. Make your changes
100
+ 5. Run the tests to ensure everything works
101
+ 6. **Commit** your changes with clear, descriptive commit messages
102
+ 7. **Push** your branch to your fork on GitHub
103
+ 8. Submit a **Pull Request** to the main repository
104
+
105
+ ## Code Style
106
+
107
+ - Follow PEP 8 style guidelines for Python code
108
+ - Use meaningful variable and function names
109
+ - Add docstrings to all functions and classes
110
+ - Include type hints where appropriate
111
+
112
+ ## Testing
113
+
114
+ - Write tests for new features and bug fixes
115
+ - Ensure all tests pass before submitting a Pull Request
116
+ - Use the Ragas evaluation framework to test RAG performance
117
+
118
+ ## Documentation
119
+
120
+ - Update relevant documentation when making changes
121
+ - Add docstrings to all functions, classes, and modules
122
+ - Keep the README and other documentation up to date
123
+
124
+ ## License
125
+
126
+ By contributing to this project, you agree that your contributions will be licensed under the same license as the project (MIT License).
127
+
128
+ ## Contact
129
+
130
+ If you have any questions or need further clarification, please reach out to the project maintainer at [contact form](https://thedataguy.pro/contact/).
Dockerfile CHANGED
@@ -26,11 +26,13 @@ RUN uv sync
26
 
27
  # Copy the app to the container
28
  COPY --chown=user ./py-src/ $HOME/app
29
-
 
30
 
31
  #TODO: Fix this to download
32
  #copy posts to container
33
  COPY --chown=user ./data/ $HOME/app/data
 
34
  # Expose the port
35
  EXPOSE 7860
36
 
 
26
 
27
  # Copy the app to the container
28
  COPY --chown=user ./py-src/ $HOME/app
29
+ COPY --chown=user ./.chainlit/ $HOME/app
30
+ COPY --chown=user ./chainlit.md $HOME/app
31
 
32
  #TODO: Fix this to download
33
  #copy posts to container
34
  COPY --chown=user ./data/ $HOME/app/data
35
+
36
  # Expose the port
37
  EXPOSE 7860
38
 
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Muhammad Afzaal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -9,7 +9,7 @@ pinned: false
9
 
10
  # Welcome to TheDataGuy Chat! 👋
11
 
12
- This is a Q&A chatbot powered by TheDataGuy blog posts. Ask questions about topics covered in the blog, such as:
13
 
14
  - RAGAS and RAG evaluation
15
  - Building research agents
@@ -21,15 +21,80 @@ This is a Q&A chatbot powered by TheDataGuy blog posts. Ask questions about topi
21
  Under the hood, this application uses:
22
 
23
  1. **Snowflake Arctic Embeddings**: To convert text into vector representations
 
 
 
24
  2. **Qdrant Vector Database**: To store and search for similar content
 
 
 
25
  3. **GPT-4o-mini**: To generate helpful responses based on retrieved content
 
 
 
26
  4. **LangChain**: For building the RAG workflow
 
 
 
 
27
  5. **Chainlit**: For the chat interface
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- ## Sources
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
- All answers are generated based on content from [TheDataGuy blog](https://thedataguy.pro/blog/). Sources are shown for each response so you can read more about the topic.
 
 
 
 
 
32
 
 
 
 
33
 
34
  ```bash
35
  docker build -t lets-talk .
@@ -38,4 +103,47 @@ docker run -p 7860:7860 \
38
  lets-talk
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
9
 
10
  # Welcome to TheDataGuy Chat! 👋
11
 
12
+ This is a Q&A chatbot powered by [TheDataGuy blog](https://thedataguy.pro/blog/) blog posts. Ask questions about topics covered in the blog, such as:
13
 
14
  - RAGAS and RAG evaluation
15
  - Building research agents
 
21
  Under the hood, this application uses:
22
 
23
  1. **Snowflake Arctic Embeddings**: To convert text into vector representations
24
+ - Base model: `Snowflake/snowflake-arctic-embed-l`
25
+ - Fine-tuned model: `mafzaal/thedataguy_arctic_ft` (custom-tuned using blog-specific query-context pairs)
26
+
27
  2. **Qdrant Vector Database**: To store and search for similar content
28
+ - Efficiently indexes blog post content for fast semantic search
29
+ - Supports real-time updates when new blog posts are published
30
+
31
  3. **GPT-4o-mini**: To generate helpful responses based on retrieved content
32
+ - Primary model: OpenAI `gpt-4o-mini` for production inference
33
+ - Evaluation model: OpenAI `gpt-4.1` for complex tasks including synthetic data generation and evaluation
34
+
35
  4. **LangChain**: For building the RAG workflow
36
+ - Orchestrates the retrieval and generation components
37
+ - Provides flexible components for LLM application development
38
+ - Structured for easy maintenance and future enhancements
39
+
40
  5. **Chainlit**: For the chat interface
41
+ - Offers an interactive UI with message threading
42
+ - Supports file uploads and custom components
43
+
44
+ ## Technology Stack
45
+
46
+ ### Core Components
47
+ - **Vector Database**: Qdrant (stores embeddings via `pipeline.py`)
48
+ - **Embedding Model**: Snowflake Arctic Embeddings
49
+ - **LLM**: OpenAI GPT-4o-mini
50
+ - **Framework**: LangChain + Chainlit
51
+ - **Development Language**: Python 3.13
52
+
53
+ ### Advanced Features
54
+ - **Evaluation**: Ragas metrics for evaluating RAG performance:
55
+ - Faithfulness
56
+ - Context Relevancy
57
+ - Answer Relevancy
58
+ - Topic Adherence
59
+ - **Synthetic Data Generation**: For training and testing
60
+ - **Vector Store Updates**: Automated pipeline to update when new blog content is published
61
+ - **Fine-tuned Embeddings**: Custom embeddings tuned for technical content
62
+
63
+ ## Project Structure
64
 
65
+ ```
66
+ lets-talk/
67
+ ├── data/ # Raw blog post content
68
+ ├── py-src/ # Python source code
69
+ │ ├── lets_talk/ # Core application modules
70
+ │ │ ├── agent.py # Agent implementation
71
+ │ │ ├── config.py # Configuration settings
72
+ │ │ ├── models.py # Data models
73
+ │ │ ├── prompts.py # LLM prompt templates
74
+ │ │ ├── rag.py # RAG implementation
75
+ │ │ ├── rss_tool.py # RSS feed integration
76
+ │ │ └── tools.py # Tool implementations
77
+ │ ├── app.py # Main application entry point
78
+ │ └── pipeline.py # Data processing pipeline
79
+ ├── db/ # Vector database storage
80
+ ├── evals/ # Evaluation datasets and results
81
+ └── notebooks/ # Jupyter notebooks for analysis
82
+ ```
83
+
84
+ ## Environment Setup
85
+
86
+ The application requires the following environment variables:
87
 
88
+ ```
89
+ OPENAI_API_KEY=your_openai_api_key
90
+ VECTOR_STORAGE_PATH=./db/vector_store_tdg
91
+ LLM_MODEL=gpt-4o-mini
92
+ EMBEDDING_MODEL=Snowflake/snowflake-arctic-embed-l
93
+ ```
94
 
95
+ ## Running Locally
96
+
97
+ ### Using Docker
98
 
99
  ```bash
100
  docker build -t lets-talk .
 
103
  lets-talk
104
  ```
105
 
106
+ ### Using Python
107
+
108
+ ```bash
109
+ # Install dependencies
110
+ uv init && uv sync
111
+
112
+ # Run the application
113
+ chainlit run py-src/app.py --host 0.0.0.0 --port 7860
114
+ ```
115
+
116
+ ## Deployment
117
+
118
+ The application is designed to be deployed on:
119
+
120
+ - **Development**: Hugging Face Spaces ([Live Demo](https://huggingface.co/spaces/mafzaal/lets_talk))
121
+ - **Production**: Azure Container Apps (planned)
122
+
123
+ ## Evaluation
124
+
125
+ This project includes extensive evaluation capabilities using the Ragas framework:
126
+
127
+ - **Synthetic Data Generation**: For creating test datasets
128
+ - **Metric Evaluation**: Measuring faithfulness, relevance, and more
129
+ - **Fine-tuning Analysis**: Comparing different embedding models
130
+
131
+ ## Future Enhancements
132
+
133
+ - **Agentic Reasoning**: Adding more sophisticated agent capabilities
134
+ - **Web UI Integration**: Custom Svelte component for the blog
135
+ - **CI/CD**: GitHub Actions workflow for automated deployment
136
+ - **Monitoring**: LangSmith integration for observability
137
+
138
+ ## License
139
+
140
+ This project is available under the MIT License.
141
+
142
+ ## Acknowledgements
143
+
144
+ - [TheDataGuy blog](https://thedataguy.pro/blog/) for the content
145
+ - [Ragas](https://docs.ragas.io/) for evaluation framework
146
+ - [LangChain](https://python.langchain.com/docs/get_started/introduction.html) for RAG components
147
+ - [Chainlit](https://docs.chainlit.io/) for the chat interface
148
+
149
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
SECURITY.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Security Policy
2
+
3
+ ## Supported Versions
4
+
5
+ Use this section to tell people about which versions of your project are currently being supported with security updates.
6
+
7
+ | Version | Supported |
8
+ | ------- | ------------------ |
9
+ | 0.1.x | :white_check_mark: |
10
+
11
+ ## Reporting a Vulnerability
12
+
13
+ We take the security of TheDataGuy Chat seriously. If you believe you've found a security vulnerability, please follow these steps:
14
+
15
+ 1. **Do not** disclose the vulnerability publicly
16
+ 2. **Do not** create a public GitHub issue for the vulnerability
17
+ 3. Email your findings to [contact form](https://thedataguy.pro/contact/)
18
+
19
+ Please include the following in your report:
20
+
21
+ - A description of the vulnerability
22
+ - Steps to reproduce the issue
23
+ - Potential impact of the vulnerability
24
+ - Any potential solutions you've identified
25
+
26
+ ## What to Expect
27
+
28
+ When you report a vulnerability:
29
+
30
+ - You'll receive acknowledgment of your report within 48 hours
31
+ - We'll investigate and provide an estimated timeline for a fix
32
+ - We'll keep you updated as we work on resolving the issue
33
+ - Once fixed, we'll publicly acknowledge your responsible disclosure (unless you prefer to remain anonymous)
34
+
35
+ Thank you for helping to keep TheDataGuy Chat and its users safe!
chainlit.md CHANGED
@@ -1,6 +1,34 @@
1
- # Let's Talk
2
 
3
- `Let's Talk` is chat app based on contents from [TheDataGuy](https://thedataguy.pro)'s blog posts.
4
 
5
- More information at [Let's Talk](https://github.com/mafzaal/lets-talk)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
 
1
+ # Welcome to TheDataGuy Chat! 👋
2
 
3
+ ## About
4
 
5
+ This chat application allows you to ask questions about topics covered in [TheDataGuy](https://thedataguy.pro)'s blog, including:
6
+
7
+ - **RAGAS**: Evaluation frameworks for LLM applications
8
+ - **Research Agents**: Building and evaluating AI agents
9
+ - **Metric-Driven Development**: Data-centric approaches to development
10
+ - **RAG Systems**: Retrieval Augmented Generation techniques
11
+ - **Data Science Best Practices**: Strategies for effective data work
12
+
13
+ ## How To Use
14
+
15
+ 1. **Ask a question** related to any topic covered in the blog
16
+ 2. The system will **search for relevant content** from the blog posts
17
+ 3. You'll receive an **informative response** with links to the original articles
18
+
19
+ ## Examples
20
+
21
+ Try asking questions like:
22
+ - "What is RAGAS and how does it help evaluate LLM applications?"
23
+ - "How can I build a research agent with RSS feed support?"
24
+ - "What are the key principles of metric-driven development?"
25
+ - "How do I evaluate RAG systems effectively?"
26
+
27
+ ## Under The Hood
28
+
29
+ This application uses Snowflake Arctic Embeddings, Qdrant Vector Database, LangChain, and GPT-4o-mini to provide accurate and helpful responses based on blog content.
30
+
31
+ For more details, check out the [GitHub repository](https://github.com/mafzaal/lets-talk).
32
+
33
+ Happy chatting! 💬
34
 
main.py CHANGED
@@ -1,9 +1,60 @@
1
 
2
 
3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  def main():
5
- """Main function to update blog data"""
6
- print("=== Blog Data Update ===")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  if __name__ == "__main__":
9
- main()
 
1
 
2
 
3
 
4
+ #!/usr/bin/env python3
5
+ """
6
+ TheDataGuy Chat - Main Entry Point
7
+
8
+ This script serves as the main entry point for the TheDataGuy Chat application.
9
+ It provides a command-line interface to run the app and update the vector database.
10
+ """
11
+
12
+ import os
13
+ import sys
14
+ import argparse
15
+ from dotenv import load_dotenv
16
+
17
+ # Load environment variables from .env file
18
+ load_dotenv()
19
+
20
  def main():
21
+ """Main function to run the application or update blog data"""
22
+ parser = argparse.ArgumentParser(description="TheDataGuy Chat - RAG-powered blog assistant")
23
+
24
+ # Define commands
25
+ subparsers = parser.add_subparsers(dest="command", help="Command to run")
26
+
27
+ # Run app command
28
+ run_parser = subparsers.add_parser("run", help="Run the chat application")
29
+ run_parser.add_argument("--host", default="0.0.0.0", help="Host to bind to")
30
+ run_parser.add_argument("--port", type=int, default=7860, help="Port to bind to")
31
+
32
+ # Update vector store command
33
+ update_parser = subparsers.add_parser("update", help="Update the vector database")
34
+ update_parser.add_argument("--force", action="store_true", help="Force recreation of the vector store")
35
+
36
+ # Parse arguments
37
+ args = parser.parse_args()
38
+
39
+ # Handle commands
40
+ if args.command == "run":
41
+ # Import here to avoid circular imports
42
+ import chainlit as cl
43
+ os.system(f"chainlit run py-src/app.py --host {args.host} --port {args.port}")
44
+
45
+ elif args.command == "update":
46
+ # Import here to avoid loading heavy dependencies if not needed
47
+ from py_src.pipeline import create_vector_database
48
+ force_flag = "--force-recreate" if args.force else ""
49
+ print(f"Updating vector database (force={args.force})")
50
+ create_vector_database(force_recreate=args.force)
51
+
52
+ else:
53
+ # Show help if no command provided
54
+ parser.print_help()
55
+ return 1
56
+
57
+ return 0
58
 
59
  if __name__ == "__main__":
60
+ sys.exit(main())