# Update Blog Data

This notebook demonstrates how to update the blog data and vector store when new blog posts are published. It uses the utility functions from `utils_data_loading.ipynb`.

In [3]:
import sys
import os
from pathlib import Path
from dotenv import load_dotenv


import sys
import os

# Add the project root to the Python path
package_root = os.path.abspath(os.path.join(os.getcwd(), "../"))
print(f"Adding package root to sys.path: {package_root}")
if package_root not in sys.path:
	sys.path.append(package_root)


Adding package root to sys.path: /home/mafzaal/source/lets-talk/py-src


In [4]:
notebook_dir = os.getcwd()
print(f"Current notebook directory: {notebook_dir}")
# change to the directory to the root of the project
project_root = os.path.abspath(os.path.join(os.getcwd(), "../../"))
print(f"Project root: {project_root}")
os.chdir(project_root)

Current notebook directory: /home/mafzaal/source/lets-talk/py-src/notebooks
Project root: /home/mafzaal/source/lets-talk


## Update Blog Data Process

This process will:
1. Load existing blog posts
2. Process and update metadata
3. Create or update vector embeddings

In [None]:
import lets_talk.utils.blog as blog_utils
docs = blog_utils.load_blog_posts()
docs = blog_utils.update_document_metadata(docs)


100%|██████████| 14/14 [00:00<00:00, 4617.46it/s]

Loaded 14 documents from data/





In [None]:
split_docs = blog_utils.split_documents(docs)

Split 14 documents into 162 chunks


In [8]:
split_docs[0]

Document(metadata={'source': 'data/introduction-to-ragas/index.md', 'url': 'https://thedataguy.pro/blog/introduction-to-ragas/', 'post_slug': 'introduction-to-ragas', 'post_title': '"Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications"', 'content_length': 6994}, page_content='---\ntitle: "Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications"\ndate: 2025-04-26T18:00:00-06:00\nlayout: blog\ndescription: "Explore the essential evaluation framework for LLM applications with Ragas. Learn how to assess performance, ensure accuracy, and improve reliability in Retrieval-Augmented Generation systems."\ncategories: ["AI", "RAG", "Evaluation","Ragas"]\ncoverImage: "https://images.unsplash.com/photo-1593642634367-d91a135587b5?q=80&w=1770&auto=format&fit=crop&ixlib=rb-4.0.3"\nreadingTime: 7\npublished: true\n---\n\nAs Large Language Models (LLMs) become fundamental components of modern applications, effectively evaluating their pe

In [10]:
vector_store = blog_utils = blog_utils.create_vector_store(split_docs,'./db/vector_store_5')

## Testing the Vector Store

Let's test the vector store with a few queries to make sure it's working correctly.

In [11]:
# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_kwargs={"k": 3})

# Test queries
test_queries = [
 "What is RAGAS?",
 "How to build research agents?",
 "What is metric driven development?",
 "Who is TheDataGuy?"
]

for query in test_queries:
 print(f"\nQuery: {query}")
 docs = retriever.invoke(query)
 print(f"Retrieved {len(docs)} documents:")
 for i, doc in enumerate(docs):
 title = doc.metadata.get("post_title", "Unknown")
 url = doc.metadata.get("url", "No URL")
 print(f"{i+1}. {title} ({url})")


Query: What is RAGAS?
Retrieved 3 documents:
1. "Part 3: Evaluating RAG Systems with Ragas" (https://thedataguy.pro/blog/evaluating-rag-systems-with-ragas/)
2. "Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications" (https://thedataguy.pro/blog/introduction-to-ragas/)
3. "Part 4: Generating Test Data with Ragas" (https://thedataguy.pro/blog/generating-test-data-with-ragas/)

Query: How to build research agents?
Retrieved 3 documents:
1. Building a Research Agent with RSS Feed Support (https://thedataguy.pro/blog/building-research-agent/)
2. "Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications" (https://thedataguy.pro/blog/introduction-to-ragas/)
3. Building a Research Agent with RSS Feed Support (https://thedataguy.pro/blog/building-research-agent/)

Query: What is metric driven development?
Retrieved 3 documents:
1. "Metric-Driven Development: Make Smarter Decisions, Faster" (https://thedataguy.pro/blog/metric-driven

In [12]:
vector_store.client.close()