# Update Blog Data

This notebook demonstrates how to update the blog data and vector store when new blog posts are published. It uses the utility functions from `utils_data_loading.ipynb`.

In [1]:
import sys
import os
from pathlib import Path
from dotenv import load_dotenv
import importlib.util


## Update Blog Data Process

This process will:
1. Load existing blog posts
2. Process and update metadata
3. Create or update vector embeddings

In [7]:
import blog_utils

docs = blog_utils.load_blog_posts()
docs = blog_utils.update_document_metadata(docs)




100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 14/14 [00:00<00:00, 42.05it/s]

Loaded 14 documents from data/





In [None]:
docs[0]


Document(metadata={'source': 'data/introduction-to-ragas/index.md', 'url': 'https://thedataguy.pro/blog/introduction-to-ragas/', 'post_slug': 'introduction-to-ragas', 'post_title': 'Introduction To Ragas', 'content_length': 6071}, page_content='title: "Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications" date: 2025-04-26T18:00:00-06:00 layout: blog description: "Explore the essential evaluation framework for LLM applications with Ragas. Learn how to assess performance, ensure accuracy, and improve reliability in Retrieval-Augmented Generation systems." categories: ["AI", "RAG", "Evaluation","Ragas"] coverImage: "https://images.unsplash.com/photo-1593642634367-d91a135587b5?q=80&w=1770&auto=format&fit=crop&ixlib=rb-4.0.3" readingTime: 7 published: true\n\nAs Large Language Models (LLMs) become fundamental components of modern applications, effectively evaluating their performance becomes increasingly critical. Whether you\'re building a question-answeri

In [11]:
vector_store = blog_utils = blog_utils.create_vector_store(docs,'./db/vector_store_4')

## Testing the Vector Store

Let's test the vector store with a few queries to make sure it's working correctly.

In [12]:
# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_kwargs={"k": 3})

# Test queries
test_queries = [
    "What is RAGAS?",
    "How to build research agents?",
    "What is metric driven development?",
    "Who is TheDataGuy?"
]

for query in test_queries:
    print(f"\nQuery: {query}")
    docs = retriever.invoke(query)
    print(f"Retrieved {len(docs)} documents:")
    for i, doc in enumerate(docs):
        title = doc.metadata.get("post_title", "Unknown")
        url = doc.metadata.get("url", "No URL")
        print(f"{i+1}. {title} ({url})")


Query: What is RAGAS?
Retrieved 3 documents:
1. Introduction To Ragas (https://thedataguy.pro/blog/introduction-to-ragas/)
2. Evaluating Rag Systems With Ragas (https://thedataguy.pro/blog/evaluating-rag-systems-with-ragas/)
3. Advanced Metrics And Customization With Ragas (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)

Query: How to build research agents?
Retrieved 3 documents:
1. Building Research Agent (https://thedataguy.pro/blog/building-research-agent/)
2. Advanced Metrics And Customization With Ragas (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)
3. Evaluating Rag Systems With Ragas (https://thedataguy.pro/blog/evaluating-rag-systems-with-ragas/)

Query: What is metric driven development?
Retrieved 3 documents:
1. Metric Driven Development (https://thedataguy.pro/blog/metric-driven-development/)
2. Advanced Metrics And Customization With Ragas (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)

In [13]:
vector_store.client.close()