Spaces:

LiveRAG
/

Challenge

Sleeping

App Files Files Community

Orensomekh commited on Mar 12

Commit

1232fcb

verified ·

1 Parent(s): 28c454e

Upload 9 files

Browse files

Files changed (10) hide show

.gitattributes +2 -0
Operational_Instructions/DM_API_usage_example (1).ipynb +369 -0
Operational_Instructions/DM_Overview (1).md +49 -0
Operational_Instructions/DM_Sandbox.md +30 -0
Operational_Instructions/DM_gen_proc_fig (1).png +3 -0
Operational_Instructions/Evaluation_Guidelines_for_LiveRAG (1).md +43 -0
Operational_Instructions/Generating_Diverse_Q%26A_Benchmarks_for_RAG_Evaluation_with_DataMorgana.pdf +3 -0
Operational_Instructions/Indices_Usage_Examples_for_LiveRAG.ipynb +288 -0
Operational_Instructions/Pinecone_Documentation_Links.md +17 -0
Operational_Instructions/Pinecone_for_LiveRAG.md +27 -0

.gitattributes CHANGED Viewed

@@ -35,3 +35,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 LiveRAG_banner[[:space:]](1).jpg filter=lfs diff=lfs merge=lfs -text
 LiveRAG_banner.jpg filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 LiveRAG_banner[[:space:]](1).jpg filter=lfs diff=lfs merge=lfs -text
 LiveRAG_banner.jpg filter=lfs diff=lfs merge=lfs -text
+Operational_Instructions/DM_gen_proc_fig[[:space:]](1).png filter=lfs diff=lfs merge=lfs -text
+Operational_Instructions/Generating_Diverse_Q%26A_Benchmarks_for_RAG_Evaluation_with_DataMorgana.pdf filter=lfs diff=lfs merge=lfs -text

Operational_Instructions/DM_API_usage_example (1).ipynb ADDED Viewed

	@@ -0,0 +1,369 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**DataMorgana** is a powerful tool for generating synthetic question-answering data, useful for both evaluating and training question-answering systems.\n",
+    "\n",
+    "If you're using DataMorgana for the first time, it's recommended to start with the [DataMorgana Sandbox](https://platform.ai71.ai/playground). The Sandbox provides an intuitive UI for generating individual question-answer pairs interactively.\n",
+    "\n",
+    "In this notebook, we'll explore how to use the DataMorgana API to generate large-scale synthetic question-answering data on FineWeb.\n",
+    "\n",
+    "For the full API documentation, refer to [this link](https://api.ai71.ai/redoc#tag/Synthetic-Conversations)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import time\n",
+    "from typing import Dict, List\n",
+    "\n",
+    "import requests\n",
+    "\n",
+    "BASE_URL = \"https://api.ai71.ai/v1/\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, ensure that you have an API key for the AI71 platform."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "API_KEY = # Your API key"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The generation of the data is done using LLMs, which is costly. Therefore, you will have a limited amount of credits - each credit corresponds to a single generated question. \n",
+    "\n",
+    "You can use the `check_budget` endpoint to see the remaining credits for your organization."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def check_budget():\n",
+    "    resp = requests.get(\n",
+    "        f\"{BASE_URL}check_budget\",\n",
+    "        headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n",
+    "    )\n",
+    "    resp.raise_for_status()\n",
+    "    print(json.dumps(resp.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\n",
+      "    \"remaining_budget\": 9987\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "check_budget()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, let's see how to generate questions using the `bulk_generation endpoint`.\n",
+    "\n",
+    "This endpoint accepts three arguments: `n_questions`, `question_categorizations`, and `user_categorizations`.\n",
+    "\n",
+    "Since the endpoint is **asynchronous**, it returns only a `request_id`. To retrieve the generated questions once they are ready, we need to use the `fetch_generation_results` endpoint with the corresponding `request_id`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def bulk_generate(n_questions: int, question_categorizations: List[Dict], user_categorizations: List[Dict]):\n",
+    "    resp = requests.post(\n",
+    "        f\"{BASE_URL}bulk_generation\",\n",
+    "        headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n",
+    "        json={\n",
+    "                \"n_questions\": n_questions,\n",
+    "                \"question_categorizations\": question_categorizations,\n",
+    "                \"user_categorizations\": user_categorizations\n",
+    "            }\n",
+    "    )\n",
+    "    resp.raise_for_status()\n",
+    "    request_id = resp.json()[\"request_id\"]\n",
+    "    print(json.dumps(resp.json(), indent=4))\n",
+    "\n",
+    "    result = wait_for_generation_to_finish(request_id)\n",
+    "    return result\n",
+    "\n",
+    "\n",
+    "def wait_for_generation_to_finish(request_id: str):\n",
+    "    while True:\n",
+    "        resp = requests.get(\n",
+    "            f\"{BASE_URL}fetch_generation_results\",\n",
+    "            headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n",
+    "            params={\"request_id\": request_id},\n",
+    "        )\n",
+    "        resp.raise_for_status()\n",
+    "        if resp.json()[\"status\"] == \"completed\":\n",
+    "            print(json.dumps(resp.json(), indent=4))\n",
+    "            return resp.json()\n",
+    "        else:\n",
+    "            print(\"Waiting for generation to finish...\")\n",
+    "            time.sleep(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's call the `bulk_generation` endpoint. In this example, we will define two question categorizations and one user categorization.  \n",
+    "\n",
+    "When defining categorizations, keep in mind:  \n",
+    "\n",
+    "- You can create your own categorizations—these are just examples.  \n",
+    "- Each categorization can include as many categories as you like, as long as their probabilities sum to 1.  \n",
+    "- The **descriptions** of the categories are injected into the LLM prompt during question generation. To ensure high-quality outputs, it’s important to write them clearly and thoughtfully.  \n",
+    "\n",
+    "For the competition, you’ll want to evaluate and train your system on a diverse set of questions, since you won’t know in advance what types of questions will appear in the test. Keep in mind that the categorizations used in this notebook are just examples and will not correspond to those used to generate the actual test set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question_length_categorization = {\n",
+    "    \"categorization_name\": \"question_length\",\n",
+    "    \"categories\": [\n",
+    "        {\n",
+    "            \"name\": \"short\",\n",
+    "            \"description\": \"a short question with no more than 6 words.\",\n",
+    "            \"probability\": 0.4\n",
+    "        },\n",
+    "        {\n",
+    "            \"name\": \"long\",\n",
+    "            \"description\": \"a long question with at least 7 words.\",\n",
+    "            \"probability\": 0.6\n",
+    "        }\n",
+    "    ]\n",
+    "}\n",
+    "\n",
+    "question_formulation_categorization = {\n",
+    "    \"categorization_name\": \"question_formulation\",\n",
+    "    \"categories\": [\n",
+    "        {\n",
+    "            \"name\": \"natural\",\n",
+    "            \"description\": \"phrased in the way people typically speak, reflecting everyday language use, without formal or artificial structure.\",\n",
+    "            \"probability\": 0.8\n",
+    "        },\n",
+    "        {\n",
+    "            \"name\": \"search query\",\n",
+    "            \"description\": \"phrased as a typed web query for search engines (only keywords, without punctuation and without a natural-sounding structure).\",\n",
+    "            \"probability\": 0.2\n",
+    "        }\n",
+    "    ]\n",
+    "}\n",
+    "\n",
+    "user_expertise_categorization = {\n",
+    "    \"categorization_name\": \"user_expertise\",\n",
+    "    \"categories\": [\n",
+    "        {\n",
+    "            \"name\": \"expert\",\n",
+    "            \"description\": \"an expert of the subject discussed in the document, therefore he asks complex questions.\",\n",
+    "            \"probability\": 0.8\n",
+    "        },\n",
+    "        {\n",
+    "        \"name\": \"common person\",\n",
+    "            \"description\": \"a common person who is not expert of the subject discussed in the document, therefore he asks basic questions.\",\n",
+    "            \"probability\": 0.2\n",
+    "        }\n",
+    "    ]\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For example, let's use these categorizations to generate 5 questions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\n",
+      "    \"request_id\": \"5d5f6002-2395-4ff9-95a2-83c37947f9ee\",\n",
+      "    \"type\": \"async\"\n",
+      "}\n",
+      "Waiting for generation to finish...\n",
+      "Waiting for generation to finish...\n",
+      "Waiting for generation to finish...\n",
+      "Waiting for generation to finish...\n",
+      "Waiting for generation to finish...\n",
+      "Waiting for generation to finish...\n",
+      "Waiting for generation to finish...\n",
+      "Waiting for generation to finish...\n",
+      "{\n",
+      "    \"status\": \"completed\",\n",
+      "    \"file\": \"https://s3.amazonaws.com/data.aiir/data_morgana/web_api/results_id_d01f3d5e-9cef-4be8-a311-8b07d3a22c6f_user_id_ded43e4d-7723-49d3-ad51-6636cf5aefd2.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA2UC3AHBFWKQ2GES5%2F20250204%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250204T091912Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEA4aCXVzLWVhc3QtMSJIMEYCIQDtXc99Td%2BZFZ5JRPTjV9GHEB7zKrsxhjodL5WmngqN7AIhAM7Cp7PiP%2FvHEfZ0LYKeps6T7nTAKmBZJqy2wNJmbfW%2FKrsFCCcQABoMNzMwMzM1Mjk1NTYzIgy2SkG3NP09ZldZskwqmAVLZLbPkvHnGbkiGqacmLPE5%2FpMud8ZCJUcKedwpv4uu0R5FhbxEhWGWg2pXp%2Fsgz5mYqQkowG%2BdId0PcrhDwW%2FLD0SCVmd0P6Bp3ha5FCNp4ssvl4q6Mozbfq7U%2FBeOAvrDJQg1Z3ofOp%2FxVS%2BgVkRleRwhS36cOWaTeDaAMyIYJNmEnYkkBQ%2FzRTwWkaw0uID2sCVgHRELPE1k9U99PaQ0xWizWX3HnoLjBIsILRSnDYhqm%2B8Klttf4keD093tqLl3U6Clcd0nCEsLpHlpK8ScytwM8RrMbMdiydDtXcM2FsgLQtUA5Ks28qjBxO47C7s8CR4Oop3TS%2BcpWacHDlYEjCX6pTyyu5hhcTTzpl4SkCZ%2FxmQSIKoPy0GRjudzpeAficf9dSmdoCWH3kkeLPa1j6rWpGzTtqldm3lHfWZKOj191blFSF4r2hy95y64hu8EomFf2r3vkQAXC2ZN4DtbFA5MgTRo37tr0nlu%2Bnfxer72dJA9V8SAInQYP9nnFZdJilgTNUdD2E4jdhVz7oZGtWpJhiuYOTDD7%2B4UezNf3idDgBdYZ8bbt2ENCcMKgvpq89TezsPr3BOPux2sh4Y9JvN0rqsvoYu28eVfrGJC6JNL14s9SUy7FnTAY8lCIvHTGjXlVG0nQsSiEJwRG56C4TY77zO4HsReMe9%2B6Rl7s3siB%2B8atqhPhwZrdypOUbmltv8uWjtxHDwuDnaM5yJLTbFKoOK6CpKv8EC16TjszENlAYTAkUlUdvVlseZx80cAVSa7mtQvEEclR9namYo3Wkv6UlKXSYK5ir%2BWavTIE1tnyNBtKp37aTZJr9nmQNund1L6G4bj855NzeFvga0RA58UEQ7dFw0gRysOP7mgU1zsPP%2FZFJzMKHThr0GOrABQuCeKIWKait%2F0Q1YIjaq1jmibSqUI7pLveuxH3Nl3VXeQorntgj3Ucq7GBnZ9Y5rGCrOkDVsepOWAP9piZEBIiwGo7Gp%2BZfmUg%2B0qf9lVGsKVeJtbvpJTFyhHLrPdGMllIg7aq4fqjnXUyyHHlplwphe3ezKYYHTcAbJINGt95Ed5K3zBSe0DqNdiY9bpVrveIbcWU829IJXp4r939aH6pNuwZ3jlFXMAw2%2FY4BDk6g%3D&X-Amz-Signature=6e135cdd7cf2751d2e9231d04ad9a049ef9f1ac8c4fc23a83c42043eca8f870b\",\n",
+      "    \"credits\": 5\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "results = bulk_generate(n_questions=5,\n",
+    "                         question_categorizations=[question_length_categorization, question_formulation_categorization],\n",
+    "                         user_categorizations=[user_expertise_categorization]\n",
+    "                         )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The API response includes a link to the file where the generated results are stored. \n",
+    "This link is valid for 12 hours. If you need to access the file after that time, you can call `fetch_generation_results` again with the original `request_id` to receive a new link.\n",
+    "\n",
+    "Let's retrieve the data from the file and print the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.get(results[\"file\"])\n",
+    "qas = [json.loads(line) for line in response.text.splitlines()]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'question': 'How does ADMS handle network faults?',\n",
+       " 'answer': 'The system assists grid operators by suggesting remedial actions and restoration processes to isolate and restore affected network sections quickly after a fault. It also uses smart meter data for outage prediction, fault detection and clearance.',\n",
+       " 'context': [\"Siemens Smart Grid has introduced a comprehensive distribution management system specifically developed for enhancing and expanding smarter grids. Spectrum Power™ ADMS (Advanced Distribution Management System) combines SCADA (Supervisory Control and Data Acquisition), outage management, and fault and network analysis functions for the first time on a software platform within a consolidated user environment, the first of its kind in North America. This simplifies workflows and facilitates efficient data management. The system also allows network operators to not only control and monitor their distribution network more reliably, but also track and restore outages and carry out maintenance and repair work more efficiently.\\n“Siemens has developed its ADMS solution for North America to specifically address these challenges and achieve wide-spread build out of the smart grid. Utilities will now be able to make data more actionable, increasing operational efficiency for the utility itself and improving power reliability for end users.”\\nBy suggesting remedial actions and restoration processes, the system assists grid operators in isolating and restoring the affected network sections as quickly as possible after a fault. The system also leverages the intelligent use of smart meter data for outage prediction, fault detection and clearance, and for managing distributed energy sources making Spectrum Power ADMS a key component for enhancing and expanding smart grids.\\n“Energy systems worldwide are facing a growing range of challenges, especially in power distribution management,” said Dr. Jan Mrosik, CEO of Siemens’ Smart Grid Division. “Siemens has developed its ADMS solution for North America to specifically address these challenges and achieve wide-spread build out of the smart grid. Utilities will now be able to make data more actionable, increasing operational efficiency for the utility itself and improving power reliability for end users.”\\nSince it was developed for integration in a service-oriented architecture (SOA), the system can make use of services and data from other IT systems, such as network data from geographic information systems or load profiles from meter data management systems. In the same way, other IT systems can access services and data in the distribution network management system, like information for customer information systems about downtime in case of a malfunction, or work orders or switching jobs for the workforce management system. The SOA focused design concept allows Spectrum Power ADMS to be integrated efficiently in the user's IT environment, promoting business process optimization and work process automation.\\nThe system provides the grid operator with the right tools to comply with the requirements of international reliability standards such as NERC CIP (North American Electric Reliability Corporation – Critical Infrastructure Protection) or those of the US federal agency NIST (National Institute of Standards and Technology). Interoperability standards such as IEC 61970 (CIM, Common Information Model) and IEC 61968 (system interfaces for distribution networks) have also been taken into account in the development of Spectrum Power ADMS in order to facilitate the IT integration of the system in an existing infrastructure.\\nThe introduction of Siemens Spectrum Power™ ADMS solution is the culmination of many successfully deployed distribution management solutions as well as Siemens ability to integrate these solutions with the utility’s existing infrastructure. The fully integrated operational environment available from Siemens enables improved monitoring, control and maintenance for a reliable and efficient distribution network.\"],\n",
+       " 'question_categories': [{'categorization_name': 'question_length',\n",
+       "   'category_name': 'short'},\n",
+       "  {'categorization_name': 'question_formulation', 'category_name': 'natural'}],\n",
+       " 'user_categories': [{'categorization_name': 'user_expertise',\n",
+       "   'category_name': 'expert'}],\n",
+       " 'document_ids': ['<urn:uuid:db2e6b90-78a6-418b-9c12-f19144174c74>']}"
+      ]
+     },
+     "execution_count": 41,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "qas[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Each generated result includes:  \n",
+    "\n",
+    "- The generated **question**  \n",
+    "- The generated **answer**  \n",
+    "- The **context** (FineWeb documents) the question is based on  \n",
+    "- The **IDs** of those documents  \n",
+    "- The **question categories** used during generation  \n",
+    "- The **user categories** used during generation  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can track all your requests using the get_all_requests endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_all_requests():\n",
+    "    resp = requests.get(\n",
+    "        f\"{BASE_URL}get_all_requests\",\n",
+    "        headers={\"Authorization\": f\"Bearer {API_KEY}\"},\n",
+    "    )\n",
+    "    resp.raise_for_status()\n",
+    "    print(json.dumps(resp.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "get_all_requests()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

Operational_Instructions/DM_Overview (1).md ADDED Viewed

	@@ -0,0 +1,49 @@

+# DataMorgana Overview
+DataMorgana is an innovative tool designed to generate diverse and customizable synthetic benchmarks for Retrieval-Augmented Generation (RAG) systems. Its key innovation lies in its ability to create highly varied question-answer pairs that more realistically represent how different types of users might interact with a system.
+The tool operates through two main stages: **configuration** and **generation**.
+---
+## Configuration Stage
+The configuration stage allows for the definition of detailed categorizations and associated categories for both questions and end-users, which provide high-level information on the expected traffic of the RAG application. A **categorization** is a list of mutually exclusive question or user categories along with their desired distribution within the generated benchmark.
+For example, a question categorization might include **search queries vs. natural language questions**, while a user categorization might include **novice vs. expert users**. There can be as many categorizations of questions and users as needed, and they can be easily defined to address the specific requirements of the applicative scenario. For instance:
+- In a **healthcare RAG application**, a user categorization could consist of **patient, doctor, and public health authority**.
+- In a **RAG-based embassy chatbot**, a categorization might include **diplomat, student, worker, and tourist**.
+---
+## Generation Stage
+At the generation stage, DataMorgana leverages state-of-the-art **LLMs** (e.g., Claude 3.5 Sonnet) to incrementally build a benchmark of Q&A pairs. Each pair is generated by following the procedure depicted in **Figure 1**.
+![Fig. 1](DM_gen_proc_fig.png)
+<center><b>Fig. 1: DataMorgana Generation Stage</b> In the configuration, we provide an end-user categorization and two question categorizations, namely question formulations and question types.</center>
+More specifically, the DataMorgana generation process follows these steps:
+1. **Category Selection:**
+   - It selects a **user/question category** for each categorization according to the probability distributions specified in the configuration file.
+   - These are automatically combined to create a unique prompt.
+2. **Document Selection:**
+   - It randomly selects **documents** from the target corpus and adds them to the prompt.
+3. **Question-Answer Generation:**
+   - The chosen **LLM** is invoked with the instantiated prompt to generate **𝑘 candidate question-answer pairs** about the selected documents.
+4. **Filtering and Verification:**
+   - A final filtering stage verifies that these candidate pairs:
+     - Adhere to the specified **categories**.
+     - Are **faithful** to the selected documents.
+     - Satisfy general constraints (e.g., be **context-free**).
+   - If multiple pairs satisfy the quality requirements, **one is sampled**.
+---
+## Key Advantages
+The rich and easy-to-use configurability of DataMorgana allows for **fine-grained control** over question and user characteristics. Furthermore, by jointly using multiple categorizations, DataMorgana can achieve a **combinatorial number of possibilities** to define Q&A pairs. This leads to more **diverse benchmarks** compared to existing tools that typically use a predefined list of possible question types.
+Further details about DataMorgana, as well as **experimental results demonstrating its superior diversity**, are available in this [paper](Generating_Diverse_Q&A_Benchmarks_for_RAG_Evaluation_with_DataMorgana.pdf).

Operational_Instructions/DM_Sandbox.md ADDED Viewed

	@@ -0,0 +1,30 @@

+# DataMorgana Sandbox
+1. Login to the [AI71 website](https://platform.ai71.ai/login)
+2. Click on "Sandbox" at the left rail
+3. Click on "DataMorgana" at the top rail to access the DataMorgana Sandbox
+4. Question/User Catecorizations\
+  4.1 Each Categorization includes a title\
+  4.2 You may use the default Categorizations, edit them or add new ones\
+  4.3 You can choose multiple Categorization (e.g., choosing "Formulation" and "Premise" for Question Categorizations)\
+  4.4 Categorizations includes multiple Categories (e.g., "User expertise" includes two categories - "Expert" and "Novice")\
+    4.4.1 Expand a Categorization by clicking the caret icon on its right-hand side to reveal its Categories
+6. Categories\
+  5.1 Each Category includes a title and a description\
+  5.2 You may use the default Categories, edit them or add new ones\
+  5.3 Each Categorization allows only one selected Category (e.g., "Expert" or "Novice" under "User expertise" Categorization)
+7. Once you set the Question/User Catecorizations and their Categories, you enter a **single** FineWeb document ID (e.g., "\<urn:uuid:d69cbebc-133a-4ebe-9378-68235ec9f091\>") in the respective box\
+  6.1 If you leave the document ID field empty, a random document ID will be assigned automatically
+8. Click the blue "Generate" button to generate a DataMorgana synthetic question-answer pair. The results (i.e., question, answer, doc, doc ID, and Question/User Categorizations information) will appear in the "Chat" box on the right\
+  7.1 You may find question-answer examples [here](https://docs.google.com/spreadsheets/d/1LxZsX_ROe5ZiAvZzk7-G9_Wg8wOiOI4qUHK8ZIO9xvw/edit?usp=sharing)
+9. Your remaining question-answer pairs budget is displayed by a counter at the top right of the screen

Operational_Instructions/DM_gen_proc_fig (1).png ADDED Viewed

Git LFS Details

SHA256: ff61f4ba0ff0f5f8bb971ef3b0c3607651e5309ebbc02108892eabb43b3079ad
Pointer size: 131 Bytes
Size of remote file: 236 kB

Operational_Instructions/Evaluation_Guidelines_for_LiveRAG (1).md ADDED Viewed

	@@ -0,0 +1,43 @@

+# Evaluation Guidelines
+## 1. Selected Metrics
+### 1.1 Relevance Metric
+Combines elements of **equivalence** (semantic match with ground truth) and **relevance** (degree to which the answer directly addresses the question).
+Graded on a four-point scale:
+- **2:** Correct and relevant (no irrelevant information).
+- **1:** Correct but contains irrelevant information.
+- **0:** No answer provided (abstention).
+- **-1:** Incorrect answer.
+### 1.2 Faithfulness Metric
+Assesses whether the response is **grounded in the retrieved passages**.
+Graded on a three-point scale:
+- **1:** Full support. All answer parts are grounded.
+- **0:** Partial support. Not all answer parts are grounded.
+- **-1:** No support. All answer parts are not grounded.
+### 1.3 Combination of Metrics
+Both **relevance** and **faithfulness** will contribute to the final evaluation score.
+The specific formula for combining these metrics is not disclosed to participants but will prioritize correctness and grounding.
+## 2. Manual and Automated Evaluation
+### **2.1 First Stage:**
+- Automated evaluation by LLM **Claude 3.5 Sonnet**, using **relevance** and **faithfulness** metrics to rank the participant teams.
+### **2.2 Final Stage:**
+- **Manual evaluation** for the top-ranked submissions (e.g., **top 10 teams**) to determine winners.
+## 3. Other Notable Points
+- A strict **length limit of 200 tokens** will be imposed to encourage concise answers.
+- Participants will submit:
+  - **The answer**.
+  - **All supporting passages**.
+  - **The full prompt used for generation**.
+These measures align the evaluation framework with the challenge's emphasis on **retrieval-augmented systems**.

Operational_Instructions/Generating_Diverse_Q%26A_Benchmarks_for_RAG_Evaluation_with_DataMorgana.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f06a7f64f01b2573d354422ce4ad068f00926433ae693e29ade1a33d722b794b
+size 761950

Operational_Instructions/Indices_Usage_Examples_for_LiveRAG.ipynb ADDED Viewed

	@@ -0,0 +1,288 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# AWS Setup\n",
+    "\n",
+    "1. Get your AWS access key and secret key from AWS console\n",
+    "   1. Log in to the AWS Management Console (it is OK if you see some Access denied messages on the front page. The account is limited on purpose)\n",
+    "   2. Click on your name at the top-right coder and then \"Security Credentials\"\n",
+    "   3. Click on \"Access keys\" and create a new access key. Create an access key for the Command Line Interface (CLI).\n",
+    "   4. Download and save your access key and secret key\n",
+    "2. Install the AWS CLI tool (optional but recommended): https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html\n",
+    "3. Configure the AWS CLI tool to use your access key and secret key:\n",
+    "    ```bash\n",
+    "    aws configure --profile sigir-participant\n",
+    "    # Use the following settings:\n",
+    "    # AWS Access Key ID\n",
+    "    # AWS Secret Access Key\n",
+    "    # Default region name: us-east-1\n",
+    "    ```\n",
+    "4. Test your setup by running the following command:\n",
+    "    ```bash\n",
+    "    # Should display your AWS account ID\n",
+    "    aws sts get-caller-identity --profile sigir-participant\n",
+    "\n",
+    "    # Make sure you are able to access the configuraiton service\n",
+    "    aws ssm get-parameter --name /pinecone/ro_token --profile sigir-participant\n",
+    "\n",
+    "    ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Python Setup\n",
+    "1. Install an up-to-date python version (we use 3.12, but any recent version should work)\n",
+    "2. Install the required python packages (see below). We recommend using a virtual environment, e.g. venv or conda or poetry or uv\n",
+    "3. Now you are ready to try the code samples below\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install dependencies\n",
+    "# The following package versions were tested using Python 3.12 and proved to work but it is possible to use\n",
+    "# different versions, consider them simply as examples.\n",
+    "\n",
+    "!pip install torch==2.5.1 transformers==4.45.2 boto3==1.35.88 pinecone==5.4.2 opensearch-py==2.8.0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# AWS utilities\n",
+    "import boto3\n",
+    "\n",
+    "AWS_PROFILE_NAME = \"sigir-participant\"\n",
+    "AWS_REGION_NAME = \"us-east-1\"\n",
+    "\n",
+    "def get_ssm_value(key: str, profile: str = AWS_PROFILE_NAME, region: str = AWS_REGION_NAME) -> str:\n",
+    "    \"\"\"Get a cleartext value from AWS SSM.\"\"\"\n",
+    "    session = boto3.Session(profile_name=profile, region_name=region)\n",
+    "    ssm = session.client(\"ssm\")\n",
+    "    return ssm.get_parameter(Name=key)[\"Parameter\"][\"Value\"]\n",
+    "\n",
+    "def get_ssm_secret(key: str, profile: str = AWS_PROFILE_NAME, region: str = AWS_REGION_NAME):\n",
+    "    session = boto3.Session(profile_name=profile, region_name=region)\n",
+    "    ssm = session.client(\"ssm\")\n",
+    "    return ssm.get_parameter(Name=key, WithDecryption=True)[\"Parameter\"][\"Value\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Pinecone sample\n",
+    "\n",
+    "from typing import List, Literal, Tuple\n",
+    "from multiprocessing.pool import ThreadPool\n",
+    "import boto3\n",
+    "from pinecone import Pinecone\n",
+    "import torch\n",
+    "from functools import cache\n",
+    "from transformers import AutoModel, AutoTokenizer\n",
+    "\n",
+    "PINECONE_INDEX_NAME = \"fineweb10bt-512-0w-e5-base-v2\"\n",
+    "PINECONE_NAMESPACE=\"default\"\n",
+    "\n",
+    "@cache\n",
+    "def has_mps():\n",
+    "    return torch.backends.mps.is_available()\n",
+    "\n",
+    "@cache\n",
+    "def has_cuda():\n",
+    "    return torch.cuda.is_available()\n",
+    "\n",
+    "@cache\n",
+    "def get_tokenizer(model_name: str = \"intfloat/e5-base-v2\"):\n",
+    "    tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
+    "    return tokenizer\n",
+    "\n",
+    "@cache\n",
+    "def get_model(model_name: str = \"intfloat/e5-base-v2\"):\n",
+    "    model = AutoModel.from_pretrained(model_name, trust_remote_code=True)\n",
+    "    if has_mps():\n",
+    "        model = model.to(\"mps\")\n",
+    "    elif has_cuda():\n",
+    "        model = model.to(\"cuda\")\n",
+    "    else:\n",
+    "        model = model.to(\"cpu\")\n",
+    "    return model\n",
+    "\n",
+    "def average_pool(last_hidden_states: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:\n",
+    "    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)\n",
+    "    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]\n",
+    "\n",
+    "def embed_query(query: str,\n",
+    "                query_prefix: str = \"query: \",\n",
+    "                model_name: str = \"intfloat/e5-base-v2\",\n",
+    "                pooling: Literal[\"cls\", \"avg\"] = \"avg\",\n",
+    "                normalize: bool =True) -> list[float]:\n",
+    "    return batch_embed_queries([query], query_prefix, model_name, pooling, normalize)[0]\n",
+    "\n",
+    "def batch_embed_queries(queries: List[str], query_prefix: str = \"query: \", model_name: str = \"intfloat/e5-base-v2\", pooling: Literal[\"cls\", \"avg\"] = \"avg\", normalize: bool =True) -> List[List[float]]:\n",
+    "    with_prefixes = [\" \".join([query_prefix, query]) for query in queries]\n",
+    "    tokenizer = get_tokenizer(model_name)\n",
+    "    model = get_model(model_name)\n",
+    "    with torch.no_grad():\n",
+    "        encoded = tokenizer(with_prefixes, padding=True, return_tensors=\"pt\", truncation=\"longest_first\")\n",
+    "        encoded = encoded.to(model.device)\n",
+    "        model_out = model(**encoded)\n",
+    "        match pooling:\n",
+    "            case \"cls\":\n",
+    "                embeddings = model_out.last_hidden_state[:, 0]\n",
+    "            case \"avg\":\n",
+    "                embeddings = average_pool(model_out.last_hidden_state, encoded[\"attention_mask\"])\n",
+    "        if normalize:\n",
+    "            embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)\n",
+    "    return embeddings.tolist()\n",
+    "\n",
+    "@cache\n",
+    "def get_pinecone_index(index_name: str = PINECONE_INDEX_NAME):\n",
+    "    pc = Pinecone(api_key=get_ssm_secret(\"/pinecone/ro_token\"))\n",
+    "    index = pc.Index(name=index_name)\n",
+    "    return index\n",
+    "\n",
+    "def query_pinecone(query: str, top_k: int = 10, namespace: str = PINECONE_NAMESPACE) -> dict:\n",
+    "    index = get_pinecone_index()\n",
+    "    results = index.query(\n",
+    "        vector=embed_query(query),\n",
+    "        top_k=top_k,\n",
+    "        include_values=False,\n",
+    "        namespace=namespace,\n",
+    "        include_metadata=True\n",
+    "    )\n",
+    "\n",
+    "    return results\n",
+    "\n",
+    "def batch_query_pinecone(queries: list[str], top_k: int = 10, namespace: str = PINECONE_NAMESPACE, n_parallel: int = 10) -> list[dict]:\n",
+    "    \"\"\"Batch query a Pinecone index and return the results.\n",
+    "\n",
+    "    Internally uses a ThreadPool to parallelize the queries.\n",
+    "    \"\"\"\n",
+    "    index = get_pinecone_index()\n",
+    "    embeds = batch_embed_queries(queries)\n",
+    "    pool = ThreadPool(n_parallel)\n",
+    "    results = pool.map(lambda x: index.query(vector=x, top_k=top_k, include_values=False, namespace=namespace, include_metadata=True), embeds)\n",
+    "    return results\n",
+    "\n",
+    "def show_pinecone_results(results):\n",
+    "    for match in results[\"matches\"]:\n",
+    "        print(\"chunk:\", match[\"id\"], \"score:\", match[\"score\"])\n",
+    "        print(match[\"metadata\"][\"text\"])\n",
+    "        print()\n",
+    "\n",
+    "results = query_pinecone(\"What is a second brain?\")\n",
+    "show_pinecone_results(results)\n",
+    "\n",
+    "batch_results = batch_query_pinecone([\"What is a second brain?\", \"how does a brain work?\", \"Where is Paris?\"], top_k=2)\n",
+    "for results in batch_results:\n",
+    "    show_pinecone_results(results)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# OpenSearch sample\n",
+    "from functools import cache\n",
+    "from opensearchpy import OpenSearch, AWSV4SignerAuth, RequestsHttpConnection\n",
+    "\n",
+    "OPENSEARCH_INDEX_NAME = \"fineweb10bt-512-0w-e5-base-v2\"\n",
+    "\n",
+    "@cache\n",
+    "def get_client(profile: str = AWS_PROFILE_NAME, region: str = AWS_REGION_NAME):\n",
+    "    credentials = boto3.Session(profile_name=profile).get_credentials()\n",
+    "    auth = AWSV4SignerAuth(credentials, region=region)\n",
+    "    host_name = get_ssm_value(\"/opensearch/endpoint\", profile=profile, region=region)\n",
+    "    aos_client = OpenSearch(\n",
+    "        hosts=[{\"host\": host_name, \"port\": 443}],\n",
+    "        http_auth=auth,\n",
+    "        use_ssl=True,\n",
+    "        verify_certs=True,\n",
+    "        connection_class=RequestsHttpConnection,\n",
+    "    )\n",
+    "    return aos_client\n",
+    "\n",
+    "def query_opensearch(query: str, top_k: int = 10) -> dict:\n",
+    "    \"\"\"Query an OpenSearch index and return the results.\"\"\"\n",
+    "    client = get_client()\n",
+    "    results = client.search(index=OPENSEARCH_INDEX_NAME, body={\"query\": {\"match\": {\"text\": query}}, \"size\": top_k})\n",
+    "    return results\n",
+    "\n",
+    "def batch_query_opensearch(queries: list[str], top_k: int = 10, n_parallel: int = 10) -> list[dict]:\n",
+    "    \"\"\"Sends a list of queries to OpenSearch and returns the results.",
+    " Configuration of Connection Timeout might be needed for serving large batches of queries",
+    "\"\"\"\n",
+    "    client = get_client()\n",
+    "    request = []\n",
+    "    for query in queries:\n",
+    "        req_head = {\"index\": OPENSEARCH_INDEX_NAME}\n",
+    "        req_body = {\n",
+    "            \"query\": {\n",
+    "                \"multi_match\": {\n",
+    "                    \"query\": query,\n",
+    "                    \"fields\": [\"text\"],\n",
+    "                }\n",
+    "            },\n",
+    "            \"size\": top_k,\n",
+    "        }\n",
+    "        request.extend([req_head, req_body])\n",
+    "\n",
+    "    return client.msearch(body=request)\n",
+    "\n",
+    "\n",
+    "\n",
+    "def show_opensearch_results(results: dict):\n",
+    "    for match in results[\"hits\"][\"hits\"]:\n",
+    "        print(\"chunk:\", match[\"_id\"], \"score:\", match[\"_score\"])\n",
+    "        print(match[\"_source\"][\"text\"])\n",
+    "        print()\n",
+    "\n",
+    "results = query_opensearch(\"What is a second brain?\")\n",
+    "show_opensearch_results(results)",
+    "\n",
+    "batch_results = batch_query_opensearch([\"What is a second brain?\", \"how does a brain work?\", \"Where is Paris?\"], top_k=1)\n",
+    "\n",
+    "for results in batch_results['responses']:\n",
+    "    show_opensearch_results(results)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

Operational_Instructions/Pinecone_Documentation_Links.md ADDED Viewed

	@@ -0,0 +1,17 @@

+**Indices:**
+- [Creating your first index](https://docs.pinecone.io/guides/indexes/create-an-index)
+- [Understanding Index types](https://docs.pinecone.io/guides/indexes/understanding-indexes)
+- [Managing Indices](https://docs.pinecone.io/guides/indexes/manage-indexes)
+**Data:**
+- [Upsert](https://docs.pinecone.io/guides/data/upsert-data)
+- [Query](https://docs.pinecone.io/guides/data/query-data)
+- [Fetch](https://docs.pinecone.io/guides/data/fetch-data)
+- [Delete](https://docs.pinecone.io/guides/data/delete-data)
+- [Update](https://docs.pinecone.io/guides/data/update-data)
+- [Listing Vectors](https://docs.pinecone.io/guides/data/list-record-ids)
+- [Understanding Metadata](https://docs.pinecone.io/guides/data/understanding-metadata)
+**Advanced Topics:**
+- [Pinecone Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
+- [Pinecone Assistant](https://docs.pinecone.io/guides/assistant/overview)

Operational_Instructions/Pinecone_for_LiveRAG.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# Using Pinecone
+## Using the Pre-Built Index
+We provide access to a pre-built Pinecone index, which you are welcome to use.
+Follow the instructions [here](Indices_Usage_Examples_for_LiveRAG.ipynb) to use it without costs.
+## Building Your Own Index
+In addition to the pre-built Pinecone index, participants are encouraged to experiment with building their own indices.
+In order to use the provided credits, you will have to open your personal Pinecone account and apply the credits.
+## Getting Credits
+Pinecone provides $750 credits per participating group.
+In order to apply the credit to your account you need to:
+* Open a Pinecone account (or use your existing account)
+* Add a payment method (upgrade to standard). You still don't need to pay.
+* Send us (the organizers) your Pinecone organization ID
+* We will work with Pinecone to apply the credits to your account.
+* Now you can use your Pinecone account with the applied credits. Any usage beyond the credits would be paid by your payment method.
+## Deployment Recommendations
+To optimize cost and performance:
+* Use Pinecone Serverless Deployment – This simplifies deployment and reduces overhead.
+* Minimize Data Transfer Costs – Deploy Pinecone in the same AWS region as your workloads to reduce network bandwidth expenses.
+  * Example: If your AWS instances are in us-east-1, deploy the Pinecone index in us-east-1 as well.