Spaces:

LiveRAG
/

Challenge

Sleeping

App Files Files Community

Orensomekh commited on Jun 28

Commit

fbb230b

verified ·

1 Parent(s): 3e671f8

Delete Benchmark/LiveRAG_Benchmark.md

Browse files

Files changed (1) hide show

Benchmark/LiveRAG_Benchmark.md +0 -210

Benchmark/LiveRAG_Benchmark.md DELETED Viewed

@@ -1,210 +0,0 @@
-# LiveRAG Benchmark
-## Description
-The document describes the **LiveRAG benchmark**.
-For more details regarding Q&A generation, see [1].
-The LiveRAG benchmark includes **895 questions**:
-- 500 questions from Session 1
-- 500 questions from Session 2
-- 105 common questions for both sessions
----
-### Benchmark parquet file
-### Benchmark Fields
-| Field Name                          | Description                                                                                         | Type                          | Remarks                                          |
-|------------------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------|--------------------------------------------------|
-| `Index`                            | Benchmark index                                                                                     | Integer `[0, 2, ..., 894]`    |                                                  |
-| `Question`                         | DataMorgana question                                                                                 | String                        |                                                  |
-| `Answer`                           | DataMorgana ground truth answer                                                                     | String                        |                                                  |
-| `Question_Answer_Type`            | Number of supporting documents for Q&A generation                                                   | String `["Single", "Double"]` |                                                  |
-| `Supporting_Documents`            | A list of supporting FineWeb-10BT documents (1 for single-doc Q&A, or 2 for double-doc Q&A)         | List of JSON objects          | See `document_json` schema and example below     |
-| `Answer_Claims`                   | A list of claims extracted from the answer for categories: direct, useful, and useless              | JSON object                   | See `claims_json` schema and example below       |
-| `DataMorgana_Config`             | A JSON object with question and user categorizations                                                | JSON object                   | See `categorizations_json` schema and example    |
-| `Falcon_Mirage_Question_Difficulty_Score` | Based on pure LLM and RAG system answer quality (lower means “harder”)                     | Integer `[0, 1, 2]`           |                                                  |
-| `Teams_Question_Difficulty_Score_Avg` | Teams’ correctness average score (lower means “harder”)                                       | Real `[-1 : 2]`               | Aka SDL score                                    |
-| `Teams_Question_Difficulty_Score_Std` | Teams’ correctness score standard deviation (STD)                                              | Real `[0 : 1.5]`              |                                                  |
----
-## References
-[1] D. Carmel et al., “*The SIGIR 2025 LiveRAG Challenge Benchmark: Mastering the Questions’ Diversity and Difficulty Level*”
----
-## Appendix
-### `document_json`
-#### Schema
-```json
-{
-  "$schema": "http://json-schema.org/draft-07/schema#",
-  "title": "Document",
-  "type": "object",
-  "properties": {
-    "content": {
-      "type": "string",
-      "description": "The full text content of the document."
-    },
-    "doc_id": {
-      "type": "string",
-      "description": "FineWeb-10BT document identifier."
-    },
-    "metadata": {
-      "type": "object",
-      "properties": {
-        "topic": {
-          "type": "string",
-          "description": "High-level topic of the document."
-        },
-        "subtopic": {
-          "type": "string",
-          "description": "More specific subtopic related to the topic."
-        }
-      },
-      "required": ["topic", "subtopic"],
-      "additionalProperties": false
-    }
-  },
-  "required": ["content", "doc_id", "metadata"],
-  "additionalProperties": false
-}
-```
-#### Example
-```json
-{
-  "content": "this is the document content",
-  "doc_id": "<urn:uuid:b5d19fcb-1711-4f9f-82cf-f81403382444>",
-  "metadata": {
-    "subtopic": "Fertigation methods",
-    "topic": "Irrigation"
-  }
-}
-```
----
-### `claims_json`
-#### Schema
-```json
-{
-  "$schema": "http://json-schema.org/draft-07/schema#",
-  "title": "AnswerClaims",
-  "type": "object",
-  "properties": {
-    "direct": {
-      "type": "array",
-      "items": { "type": "string" },
-      "description": "Direct statements answering the question"
-    },
-    "useful": {
-      "type": "array",
-      "items": { "type": "string" },
-      "description": "Statements that provide useful context or supporting information"
-    },
-    "useless": {
-      "type": "array",
-      "items": { "type": "string" },
-      "description": "Statements that are not useful for answering the question"
-    }
-  },
-  "required": ["direct", "useful", "useless"],
-  "additionalProperties": false
-}
-```
-#### Example
-```json
-{
-  "direct": ["direct claim"],
-  "useful": ["useful claim 1", "useful claim 2."],
-  "useless": []
-}
-```
----
-### `categorizations_json`
-#### Schema
-```json
-{
-  "$schema": "http://json-schema.org/draft-07/schema#",
-  "title": "DataMorganaCategorizations",
-  "type": "object",
-  "properties": {
-    "answer-control-categorization": {
-      "type": "string",
-      "description": "Describes how controlled or concise the answer is"
-    },
-    "answer-type-categorization": {
-      "type": "string",
-      "description": "Type of answer, such as yes/no or explanatory"
-    },
-    "formulation-categorization": {
-      "type": "string",
-      "description": "Describes the linguistic formulation of the answer"
-    },
-    "linguistic-correctness-categorization": {
-      "type": "string",
-      "description": "Grammatical and syntactic correctness"
-    },
-    "linguistic-variation-categorization": {
-      "type": "string",
-      "description": "Closeness or distance from the supporting documents"
-    },
-    "politeness-categorization": {
-      "type": "string",
-      "description": "Politeness level of the answer"
-    },
-    "premise-categorization": {
-      "type": "string",
-      "description": "Whether the answer assumes a premise or not"
-    },
-    "user-categorization": {
-      "type": "string",
-      "description": "Categorization of the user (e.g., expert, novice)"
-    }
-  },
-  "required": [
-    "answer-control-categorization",
-    "answer-type-categorization",
-    "formulation-categorization",
-    "linguistic-correctness-categorization",
-    "linguistic-variation-categorization",
-    "politeness-categorization",
-    "premise-categorization",
-    "user-categorization"
-  ],
-  "additionalProperties": false
-}
-```
-#### Example
-```json
-{
-  "answer-control-categorization": "concise-answer",
-  "answer-type-categorization": "yes/no",
-  "formulation-categorization": "verbose and natural",
-  "linguistic-correctness-categorization": "correct",
-  "linguistic-variation-categorization": "distant from documents",
-  "politeness-categorization": "neutral",
-  "premise-categorization": "without premise",
-  "user-categorization": "novice"
-}
-```