Spaces:

LiveRAG
/

Challenge

Sleeping

App Files Files Community

Orensomekh commited on Jun 28

Commit

2b0f9f1

verified ·

1 Parent(s): 6a38b52

Upload LiveRAG_Benchmark.md

Browse files

Files changed (1) hide show

Benchmark/LiveRAG_Benchmark.md +210 -0

Benchmark/LiveRAG_Benchmark.md ADDED Viewed

	@@ -0,0 +1,210 @@

+# LiveRAG Benchmark
+## Description
+The document describes the **LiveRAG benchmark**.
+For more details regarding Q&A generation, see [1].
+The LiveRAG benchmark includes **895 questions**:
+- 500 questions from Session 1
+- 500 questions from Session 2
+- 105 common questions for both sessions
+---
+## Benchmark Parquet File
+### Benchmark Fields
+| Field Name                          | Description                                                                                         | Type                          | Remarks                                          |
+|------------------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------|--------------------------------------------------|
+| `Index`                            | Benchmark index                                                                                     | Integer `[0, 2, ..., 894]`    |                                                  |
+| `Question`                         | DataMorgana question                                                                                 | String                        |                                                  |
+| `Answer`                           | DataMorgana ground truth answer                                                                     | String                        |                                                  |
+| `Question_Answer_Type`            | Number of supporting documents for Q&A generation                                                   | String `["Single", "Double"]` |                                                  |
+| `Supporting_Documents`            | A list of supporting FineWeb-10BT documents (1 for single-doc Q&A, or 2 for double-doc Q&A)         | List of JSON objects          | See `document_json` schema and example below     |
+| `Answer_Claims`                   | A list of claims extracted from the answer for categories: direct, useful, and useless              | JSON object                   | See `claims_json` schema and example below       |
+| `DataMorgana_Config`             | A JSON object with question and user categorizations                                                | JSON object                   | See `categorizations_json` schema and example    |
+| `Falcon_Mirage_Question_Difficulty_Score` | Based on pure LLM and RAG system answer quality (lower means “harder”)                     | Integer `[0, 1, 2]`           |                                                  |
+| `Teams_Question_Difficulty_Score_Avg` | Teams’ correctness average score (lower means “harder”)                                       | Real `[-1 : 2]`               | Aka SDL score                                    |
+| `Teams_Question_Difficulty_Score_Std` | Teams’ correctness score standard deviation (STD)                                              | Real `[0 : 1.5]`              |                                                  |
+---
+## References
+[1] D. Carmel et al., “*The SIGIR 2025 LiveRAG Challenge Benchmark: Mastering the Questions’ Diversity and Difficulty Level*”
+---
+## Appendix
+### `document_json`
+#### Schema
+```json
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "Document",
+  "type": "object",
+  "properties": {
+    "content": {
+      "type": "string",
+      "description": "The full text content of the document."
+    },
+    "doc_id": {
+      "type": "string",
+      "description": "FineWeb-10BT document identifier."
+    },
+    "metadata": {
+      "type": "object",
+      "properties": {
+        "topic": {
+          "type": "string",
+          "description": "High-level topic of the document."
+        },
+        "subtopic": {
+          "type": "string",
+          "description": "More specific subtopic related to the topic."
+        }
+      },
+      "required": ["topic", "subtopic"],
+      "additionalProperties": false
+    }
+  },
+  "required": ["content", "doc_id", "metadata"],
+  "additionalProperties": false
+}
+```
+#### Example
+```json
+{
+  "content": "this is the document content",
+  "doc_id": "<urn:uuid:b5d19fcb-1711-4f9f-82cf-f81403382444>",
+  "metadata": {
+    "subtopic": "Fertigation methods",
+    "topic": "Irrigation"
+  }
+}
+```
+---
+### `claims_json`
+#### Schema
+```json
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "AnswerClaims",
+  "type": "object",
+  "properties": {
+    "direct": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Direct statements answering the question"
+    },
+    "useful": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Statements that provide useful context or supporting information"
+    },
+    "useless": {
+      "type": "array",
+      "items": { "type": "string" },
+      "description": "Statements that are not useful for answering the question"
+    }
+  },
+  "required": ["direct", "useful", "useless"],
+  "additionalProperties": false
+}
+```
+#### Example
+```json
+{
+  "direct": ["direct claim"],
+  "useful": ["useful claim 1", "useful claim 2."],
+  "useless": []
+}
+```
+---
+### `categorizations_json`
+#### Schema
+```json
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "DataMorganaCategorizations",
+  "type": "object",
+  "properties": {
+    "answer-control-categorization": {
+      "type": "string",
+      "description": "Describes how controlled or concise the answer is"
+    },
+    "answer-type-categorization": {
+      "type": "string",
+      "description": "Type of answer, such as yes/no or explanatory"
+    },
+    "formulation-categorization": {
+      "type": "string",
+      "description": "Describes the linguistic formulation of the answer"
+    },
+    "linguistic-correctness-categorization": {
+      "type": "string",
+      "description": "Grammatical and syntactic correctness"
+    },
+    "linguistic-variation-categorization": {
+      "type": "string",
+      "description": "Closeness or distance from the supporting documents"
+    },
+    "politeness-categorization": {
+      "type": "string",
+      "description": "Politeness level of the answer"
+    },
+    "premise-categorization": {
+      "type": "string",
+      "description": "Whether the answer assumes a premise or not"
+    },
+    "user-categorization": {
+      "type": "string",
+      "description": "Categorization of the user (e.g., expert, novice)"
+    }
+  },
+  "required": [
+    "answer-control-categorization",
+    "answer-type-categorization",
+    "formulation-categorization",
+    "linguistic-correctness-categorization",
+    "linguistic-variation-categorization",
+    "politeness-categorization",
+    "premise-categorization",
+    "user-categorization"
+  ],
+  "additionalProperties": false
+}
+```
+#### Example
+```json
+{
+  "answer-control-categorization": "concise-answer",
+  "answer-type-categorization": "yes/no",
+  "formulation-categorization": "verbose and natural",
+  "linguistic-correctness-categorization": "correct",
+  "linguistic-variation-categorization": "distant from documents",
+  "politeness-categorization": "neutral",
+  "premise-categorization": "without premise",
+  "user-categorization": "novice"
+}
+```