File size: 12,307 Bytes
af85e91
 
 
 
 
 
 
 
 
 
 
 
 
 
9681c5d
af85e91
 
 
 
 
 
 
 
9681c5d
af85e91
 
 
9681c5d
 
af85e91
 
9681c5d
 
 
 
 
 
af85e91
 
 
 
9681c5d
 
af85e91
9681c5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af85e91
9681c5d
af85e91
9681c5d
 
af85e91
9681c5d
af85e91
 
 
 
 
9681c5d
af85e91
9681c5d
 
 
 
 
 
 
 
 
 
 
 
af85e91
9681c5d
af85e91
 
 
 
9681c5d
 
af85e91
 
 
9681c5d
af85e91
 
 
 
 
 
 
 
 
 
 
 
 
 
9681c5d
af85e91
 
9681c5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af85e91
 
9681c5d
af85e91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9681c5d
 
af85e91
 
 
9681c5d
af85e91
 
 
 
 
 
 
 
 
 
9681c5d
 
 
 
 
 
af85e91
9681c5d
 
af85e91
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b1a955e7",
   "metadata": {},
   "source": [
    "# Update Blog Data\n",
    "\n",
    "This notebook demonstrates how to update the blog data and vector store when new blog posts are published. It uses the utility functions from `utils_data_loading.ipynb`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6ec048b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "from pathlib import Path\n",
    "from dotenv import load_dotenv\n",
    "import importlib.util\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc19ab4c",
   "metadata": {},
   "source": [
    "## Update Blog Data Process\n",
    "\n",
    "This process will:\n",
    "1. Load existing blog posts\n",
    "2. Process and update metadata\n",
    "3. Create or update vector embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3d56f688",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 14/14 [00:00<00:00, 42.05it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded 14 documents from data/\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import blog_utils\n",
    "\n",
    "docs = blog_utils.load_blog_posts()\n",
    "docs = blog_utils.update_document_metadata(docs)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a14c70dc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(metadata={'source': 'data/introduction-to-ragas/index.md', 'url': 'https://thedataguy.pro/blog/introduction-to-ragas/', 'post_slug': 'introduction-to-ragas', 'post_title': 'Introduction To Ragas', 'content_length': 6071}, page_content='title: \"Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications\" date: 2025-04-26T18:00:00-06:00 layout: blog description: \"Explore the essential evaluation framework for LLM applications with Ragas. Learn how to assess performance, ensure accuracy, and improve reliability in Retrieval-Augmented Generation systems.\" categories: [\"AI\", \"RAG\", \"Evaluation\",\"Ragas\"] coverImage: \"https://images.unsplash.com/photo-1593642634367-d91a135587b5?q=80&w=1770&auto=format&fit=crop&ixlib=rb-4.0.3\" readingTime: 7 published: true\\n\\nAs Large Language Models (LLMs) become fundamental components of modern applications, effectively evaluating their performance becomes increasingly critical. Whether you\\'re building a question-answering system, a document retrieval tool, or a conversational agent, you need reliable metrics to assess how well your application performs. This is where Ragas steps in.\\n\\nWhat is Ragas?\\n\\nRagas is an open-source evaluation framework specifically designed for LLM applications, with particular strengths in Retrieval-Augmented Generation (RAG) systems. Unlike traditional NLP evaluation methods, Ragas provides specialized metrics that address the unique challenges of LLM-powered systems.\\n\\nAt its core, Ragas helps answer crucial questions: - Is my application retrieving the right information? - Are the responses factually accurate and consistent with the retrieved context? - Does the system appropriately address the user\\'s query? - How well does my application handle multi-turn conversations?\\n\\nWhy Evaluate LLM Applications?\\n\\nLLMs are powerful but imperfect. They can hallucinate facts, misinterpret queries, or generate convincing but incorrect responses. For applications where accuracy and reliability matter—like healthcare, finance, or education—proper evaluation is non-negotiable.\\n\\nEvaluation serves several key purposes: - Quality assurance: Identify and fix issues before they reach users - Performance tracking: Monitor how changes impact system performance - Benchmarking: Compare different approaches objectively - Continuous improvement: Build feedback loops to enhance your application\\n\\nKey Features of Ragas\\n\\n🎯 Specialized Metrics\\n\\nRagas offers both LLM-based and computational metrics tailored to evaluate different aspects of LLM applications:\\n\\nFaithfulness: Measures if the response is factually consistent with the retrieved context\\n\\nContext Relevancy: Evaluates if the retrieved information is relevant to the query\\n\\nAnswer Relevancy: Assesses if the response addresses the user\\'s question\\n\\nTopic Adherence: Gauges how well multi-turn conversations stay on topic\\n\\n🧪 Test Data Generation\\n\\nCreating high-quality test data is often a bottleneck in evaluation. Ragas helps you generate comprehensive test datasets automatically, saving time and ensuring thorough coverage.\\n\\n🔗 Seamless Integrations\\n\\nRagas works with popular LLM frameworks and tools: - LangChain - LlamaIndex - Haystack - OpenAI\\n\\nObservability platforms - Phoenix - LangSmith - Langfuse\\n\\n📊 Comprehensive Analysis\\n\\nBeyond simple scores, Ragas provides detailed insights into your application\\'s strengths and weaknesses, enabling targeted improvements.\\n\\nGetting Started with Ragas\\n\\nInstalling Ragas is straightforward:\\n\\nbash uv init && uv add ragas\\n\\nHere\\'s a simple example of evaluating a response using Ragas:\\n\\n```python from ragas.metrics import Faithfulness from ragas.evaluation import EvaluationDataset from ragas.dataset_schema import SingleTurnSample from langchain_openai import ChatOpenAI from ragas.llms import LangchainLLMWrapper from langchain_openai import ChatOpenAI\\n\\nInitialize the LLM, you are going to new OPENAI API key\\n\\nevaluator_llm = LangchainLLMWrapper(ChatOpenAI(model=\"gpt-4o\"))\\n\\nYour evaluation data\\n\\ntest_data = { \"user_input\": \"What is the capital of France?\", \"retrieved_contexts\": [\"Paris is the capital and most populous city of France.\"], \"response\": \"The capital of France is Paris.\" }\\n\\nCreate a sample\\n\\nsample = SingleTurnSample(**test_data) # Unpack the dictionary into the constructor\\n\\nCreate metric\\n\\nfaithfulness = Faithfulness(llm=evaluator_llm)\\n\\nCalculate the score\\n\\nresult = await faithfulness.single_turn_ascore(sample) print(f\"Faithfulness score: {result}\") ```\\n\\n💡 Try it yourself: Explore the hands-on notebook for this workflow: 01_Introduction_to_Ragas\\n\\nWhat\\'s Coming in This Blog Series\\n\\nThis introduction is just the beginning. In the upcoming posts, we\\'ll dive deeper into all aspects of evaluating LLM applications with Ragas:\\n\\nPart 2: Basic Evaluation Workflow We\\'ll explore each metric in detail, explaining when and how to use them effectively.\\n\\nPart 3: Evaluating RAG Systems Learn specialized techniques for evaluating retrieval-augmented generation systems, including context precision, recall, and relevance.\\n\\nPart 4: Test Data Generation Discover how to create high-quality test datasets that thoroughly exercise your application\\'s capabilities.\\n\\nPart 5: Advanced Evaluation Techniques Go beyond basic metrics with custom evaluations, multi-aspect analysis, and domain-specific assessments.\\n\\nPart 6: Evaluating AI Agents Learn how to evaluate complex AI agents that engage in multi-turn interactions, use tools, and work toward specific goals.\\n\\nPart 7: Integrations and Observability Connect Ragas with your existing tools and platforms for streamlined evaluation workflows.\\n\\nPart 8: Building Feedback Loops Learn how to implement feedback loops that drive continuous improvement in your LLM applications. Transform evaluation insights into concrete improvements for your LLM applications.\\n\\nConclusion\\n\\nIn a world increasingly powered by LLMs, robust evaluation is the difference between reliable applications and unpredictable ones. Ragas provides the tools you need to confidently assess and improve your LLM applications.\\n\\nReady to Elevate Your LLM Applications?\\n\\nStart exploring Ragas today by visiting the official documentation. Share your thoughts, challenges, or success stories. If you\\'re facing specific evaluation hurdles, don\\'t hesitate to reach out—we\\'d love to help!')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs[0]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "72dd14b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = blog_utils = blog_utils.create_vector_store(docs,'./db/vector_store_4')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad3b2dca",
   "metadata": {},
   "source": [
    "## Testing the Vector Store\n",
    "\n",
    "Let's test the vector store with a few queries to make sure it's working correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8b552e6b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Query: What is RAGAS?\n",
      "Retrieved 3 documents:\n",
      "1. Introduction To Ragas (https://thedataguy.pro/blog/introduction-to-ragas/)\n",
      "2. Evaluating Rag Systems With Ragas (https://thedataguy.pro/blog/evaluating-rag-systems-with-ragas/)\n",
      "3. Advanced Metrics And Customization With Ragas (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)\n",
      "\n",
      "Query: How to build research agents?\n",
      "Retrieved 3 documents:\n",
      "1. Building Research Agent (https://thedataguy.pro/blog/building-research-agent/)\n",
      "2. Advanced Metrics And Customization With Ragas (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)\n",
      "3. Evaluating Rag Systems With Ragas (https://thedataguy.pro/blog/evaluating-rag-systems-with-ragas/)\n",
      "\n",
      "Query: What is metric driven development?\n",
      "Retrieved 3 documents:\n",
      "1. Metric Driven Development (https://thedataguy.pro/blog/metric-driven-development/)\n",
      "2. Advanced Metrics And Customization With Ragas (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)\n",
      "3. Coming Back To Ai Roots (https://thedataguy.pro/blog/coming-back-to-ai-roots/)\n",
      "\n",
      "Query: Who is TheDataGuy?\n",
      "Retrieved 3 documents:\n",
      "1. Advanced Metrics And Customization With Ragas (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)\n",
      "2. Langchain Experience Csharp Perspective (https://thedataguy.pro/blog/langchain-experience-csharp-perspective/)\n",
      "3. Evaluating Rag Systems With Ragas (https://thedataguy.pro/blog/evaluating-rag-systems-with-ragas/)\n"
     ]
    }
   ],
   "source": [
    "# Create a retriever from the vector store\n",
    "retriever = vector_store.as_retriever(search_kwargs={\"k\": 3})\n",
    "\n",
    "# Test queries\n",
    "test_queries = [\n",
    "    \"What is RAGAS?\",\n",
    "    \"How to build research agents?\",\n",
    "    \"What is metric driven development?\",\n",
    "    \"Who is TheDataGuy?\"\n",
    "]\n",
    "\n",
    "for query in test_queries:\n",
    "    print(f\"\\nQuery: {query}\")\n",
    "    docs = retriever.invoke(query)\n",
    "    print(f\"Retrieved {len(docs)} documents:\")\n",
    "    for i, doc in enumerate(docs):\n",
    "        title = doc.metadata.get(\"post_title\", \"Unknown\")\n",
    "        url = doc.metadata.get(\"url\", \"No URL\")\n",
    "        print(f\"{i+1}. {title} ({url})\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4cdd6899",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.client.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}