Orensomekh commited on
Commit
fbb230b
·
verified ·
1 Parent(s): 3e671f8

Delete Benchmark/LiveRAG_Benchmark.md

Browse files
Files changed (1) hide show
  1. Benchmark/LiveRAG_Benchmark.md +0 -210
Benchmark/LiveRAG_Benchmark.md DELETED
@@ -1,210 +0,0 @@
1
- # LiveRAG Benchmark
2
-
3
- ## Description
4
-
5
- The document describes the **LiveRAG benchmark**.
6
-
7
- For more details regarding Q&A generation, see [1].
8
-
9
- The LiveRAG benchmark includes **895 questions**:
10
- - 500 questions from Session 1
11
- - 500 questions from Session 2
12
- - 105 common questions for both sessions
13
-
14
- ---
15
-
16
- ### Benchmark parquet file
17
-
18
- ### Benchmark Fields
19
-
20
- | Field Name | Description | Type | Remarks |
21
- |------------------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------|--------------------------------------------------|
22
- | `Index` | Benchmark index | Integer `[0, 2, ..., 894]` | |
23
- | `Question` | DataMorgana question | String | |
24
- | `Answer` | DataMorgana ground truth answer | String | |
25
- | `Question_Answer_Type` | Number of supporting documents for Q&A generation | String `["Single", "Double"]` | |
26
- | `Supporting_Documents` | A list of supporting FineWeb-10BT documents (1 for single-doc Q&A, or 2 for double-doc Q&A) | List of JSON objects | See `document_json` schema and example below |
27
- | `Answer_Claims` | A list of claims extracted from the answer for categories: direct, useful, and useless | JSON object | See `claims_json` schema and example below |
28
- | `DataMorgana_Config` | A JSON object with question and user categorizations | JSON object | See `categorizations_json` schema and example |
29
- | `Falcon_Mirage_Question_Difficulty_Score` | Based on pure LLM and RAG system answer quality (lower means “harder”) | Integer `[0, 1, 2]` | |
30
- | `Teams_Question_Difficulty_Score_Avg` | Teams’ correctness average score (lower means “harder”) | Real `[-1 : 2]` | Aka SDL score |
31
- | `Teams_Question_Difficulty_Score_Std` | Teams’ correctness score standard deviation (STD) | Real `[0 : 1.5]` | |
32
-
33
- ---
34
-
35
- ## References
36
-
37
- [1] D. Carmel et al., “*The SIGIR 2025 LiveRAG Challenge Benchmark: Mastering the Questions’ Diversity and Difficulty Level*”
38
-
39
- ---
40
-
41
- ## Appendix
42
-
43
- ### `document_json`
44
-
45
- #### Schema
46
-
47
- ```json
48
- {
49
- "$schema": "http://json-schema.org/draft-07/schema#",
50
- "title": "Document",
51
- "type": "object",
52
- "properties": {
53
- "content": {
54
- "type": "string",
55
- "description": "The full text content of the document."
56
- },
57
- "doc_id": {
58
- "type": "string",
59
- "description": "FineWeb-10BT document identifier."
60
- },
61
- "metadata": {
62
- "type": "object",
63
- "properties": {
64
- "topic": {
65
- "type": "string",
66
- "description": "High-level topic of the document."
67
- },
68
- "subtopic": {
69
- "type": "string",
70
- "description": "More specific subtopic related to the topic."
71
- }
72
- },
73
- "required": ["topic", "subtopic"],
74
- "additionalProperties": false
75
- }
76
- },
77
- "required": ["content", "doc_id", "metadata"],
78
- "additionalProperties": false
79
- }
80
- ```
81
-
82
- #### Example
83
-
84
- ```json
85
- {
86
- "content": "this is the document content",
87
- "doc_id": "<urn:uuid:b5d19fcb-1711-4f9f-82cf-f81403382444>",
88
- "metadata": {
89
- "subtopic": "Fertigation methods",
90
- "topic": "Irrigation"
91
- }
92
- }
93
- ```
94
-
95
- ---
96
-
97
- ### `claims_json`
98
-
99
- #### Schema
100
-
101
- ```json
102
- {
103
- "$schema": "http://json-schema.org/draft-07/schema#",
104
- "title": "AnswerClaims",
105
- "type": "object",
106
- "properties": {
107
- "direct": {
108
- "type": "array",
109
- "items": { "type": "string" },
110
- "description": "Direct statements answering the question"
111
- },
112
- "useful": {
113
- "type": "array",
114
- "items": { "type": "string" },
115
- "description": "Statements that provide useful context or supporting information"
116
- },
117
- "useless": {
118
- "type": "array",
119
- "items": { "type": "string" },
120
- "description": "Statements that are not useful for answering the question"
121
- }
122
- },
123
- "required": ["direct", "useful", "useless"],
124
- "additionalProperties": false
125
- }
126
- ```
127
-
128
- #### Example
129
-
130
- ```json
131
- {
132
- "direct": ["direct claim"],
133
- "useful": ["useful claim 1", "useful claim 2."],
134
- "useless": []
135
- }
136
- ```
137
-
138
- ---
139
-
140
- ### `categorizations_json`
141
-
142
- #### Schema
143
-
144
- ```json
145
- {
146
- "$schema": "http://json-schema.org/draft-07/schema#",
147
- "title": "DataMorganaCategorizations",
148
- "type": "object",
149
- "properties": {
150
- "answer-control-categorization": {
151
- "type": "string",
152
- "description": "Describes how controlled or concise the answer is"
153
- },
154
- "answer-type-categorization": {
155
- "type": "string",
156
- "description": "Type of answer, such as yes/no or explanatory"
157
- },
158
- "formulation-categorization": {
159
- "type": "string",
160
- "description": "Describes the linguistic formulation of the answer"
161
- },
162
- "linguistic-correctness-categorization": {
163
- "type": "string",
164
- "description": "Grammatical and syntactic correctness"
165
- },
166
- "linguistic-variation-categorization": {
167
- "type": "string",
168
- "description": "Closeness or distance from the supporting documents"
169
- },
170
- "politeness-categorization": {
171
- "type": "string",
172
- "description": "Politeness level of the answer"
173
- },
174
- "premise-categorization": {
175
- "type": "string",
176
- "description": "Whether the answer assumes a premise or not"
177
- },
178
- "user-categorization": {
179
- "type": "string",
180
- "description": "Categorization of the user (e.g., expert, novice)"
181
- }
182
- },
183
- "required": [
184
- "answer-control-categorization",
185
- "answer-type-categorization",
186
- "formulation-categorization",
187
- "linguistic-correctness-categorization",
188
- "linguistic-variation-categorization",
189
- "politeness-categorization",
190
- "premise-categorization",
191
- "user-categorization"
192
- ],
193
- "additionalProperties": false
194
- }
195
- ```
196
-
197
- #### Example
198
-
199
- ```json
200
- {
201
- "answer-control-categorization": "concise-answer",
202
- "answer-type-categorization": "yes/no",
203
- "formulation-categorization": "verbose and natural",
204
- "linguistic-correctness-categorization": "correct",
205
- "linguistic-variation-categorization": "distant from documents",
206
- "politeness-categorization": "neutral",
207
- "premise-categorization": "without premise",
208
- "user-categorization": "novice"
209
- }
210
- ```