Orensomekh commited on
Commit
2b0f9f1
·
verified ·
1 Parent(s): 6a38b52

Upload LiveRAG_Benchmark.md

Browse files
Files changed (1) hide show
  1. Benchmark/LiveRAG_Benchmark.md +210 -0
Benchmark/LiveRAG_Benchmark.md ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LiveRAG Benchmark
2
+
3
+ ## Description
4
+
5
+ The document describes the **LiveRAG benchmark**.
6
+
7
+ For more details regarding Q&A generation, see [1].
8
+
9
+ The LiveRAG benchmark includes **895 questions**:
10
+ - 500 questions from Session 1
11
+ - 500 questions from Session 2
12
+ - 105 common questions for both sessions
13
+
14
+ ---
15
+
16
+ ## Benchmark Parquet File
17
+
18
+ ### Benchmark Fields
19
+
20
+ | Field Name | Description | Type | Remarks |
21
+ |------------------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------|--------------------------------------------------|
22
+ | `Index` | Benchmark index | Integer `[0, 2, ..., 894]` | |
23
+ | `Question` | DataMorgana question | String | |
24
+ | `Answer` | DataMorgana ground truth answer | String | |
25
+ | `Question_Answer_Type` | Number of supporting documents for Q&A generation | String `["Single", "Double"]` | |
26
+ | `Supporting_Documents` | A list of supporting FineWeb-10BT documents (1 for single-doc Q&A, or 2 for double-doc Q&A) | List of JSON objects | See `document_json` schema and example below |
27
+ | `Answer_Claims` | A list of claims extracted from the answer for categories: direct, useful, and useless | JSON object | See `claims_json` schema and example below |
28
+ | `DataMorgana_Config` | A JSON object with question and user categorizations | JSON object | See `categorizations_json` schema and example |
29
+ | `Falcon_Mirage_Question_Difficulty_Score` | Based on pure LLM and RAG system answer quality (lower means “harder”) | Integer `[0, 1, 2]` | |
30
+ | `Teams_Question_Difficulty_Score_Avg` | Teams’ correctness average score (lower means “harder”) | Real `[-1 : 2]` | Aka SDL score |
31
+ | `Teams_Question_Difficulty_Score_Std` | Teams’ correctness score standard deviation (STD) | Real `[0 : 1.5]` | |
32
+
33
+ ---
34
+
35
+ ## References
36
+
37
+ [1] D. Carmel et al., “*The SIGIR 2025 LiveRAG Challenge Benchmark: Mastering the Questions’ Diversity and Difficulty Level*”
38
+
39
+ ---
40
+
41
+ ## Appendix
42
+
43
+ ### `document_json`
44
+
45
+ #### Schema
46
+
47
+ ```json
48
+ {
49
+ "$schema": "http://json-schema.org/draft-07/schema#",
50
+ "title": "Document",
51
+ "type": "object",
52
+ "properties": {
53
+ "content": {
54
+ "type": "string",
55
+ "description": "The full text content of the document."
56
+ },
57
+ "doc_id": {
58
+ "type": "string",
59
+ "description": "FineWeb-10BT document identifier."
60
+ },
61
+ "metadata": {
62
+ "type": "object",
63
+ "properties": {
64
+ "topic": {
65
+ "type": "string",
66
+ "description": "High-level topic of the document."
67
+ },
68
+ "subtopic": {
69
+ "type": "string",
70
+ "description": "More specific subtopic related to the topic."
71
+ }
72
+ },
73
+ "required": ["topic", "subtopic"],
74
+ "additionalProperties": false
75
+ }
76
+ },
77
+ "required": ["content", "doc_id", "metadata"],
78
+ "additionalProperties": false
79
+ }
80
+ ```
81
+
82
+ #### Example
83
+
84
+ ```json
85
+ {
86
+ "content": "this is the document content",
87
+ "doc_id": "<urn:uuid:b5d19fcb-1711-4f9f-82cf-f81403382444>",
88
+ "metadata": {
89
+ "subtopic": "Fertigation methods",
90
+ "topic": "Irrigation"
91
+ }
92
+ }
93
+ ```
94
+
95
+ ---
96
+
97
+ ### `claims_json`
98
+
99
+ #### Schema
100
+
101
+ ```json
102
+ {
103
+ "$schema": "http://json-schema.org/draft-07/schema#",
104
+ "title": "AnswerClaims",
105
+ "type": "object",
106
+ "properties": {
107
+ "direct": {
108
+ "type": "array",
109
+ "items": { "type": "string" },
110
+ "description": "Direct statements answering the question"
111
+ },
112
+ "useful": {
113
+ "type": "array",
114
+ "items": { "type": "string" },
115
+ "description": "Statements that provide useful context or supporting information"
116
+ },
117
+ "useless": {
118
+ "type": "array",
119
+ "items": { "type": "string" },
120
+ "description": "Statements that are not useful for answering the question"
121
+ }
122
+ },
123
+ "required": ["direct", "useful", "useless"],
124
+ "additionalProperties": false
125
+ }
126
+ ```
127
+
128
+ #### Example
129
+
130
+ ```json
131
+ {
132
+ "direct": ["direct claim"],
133
+ "useful": ["useful claim 1", "useful claim 2."],
134
+ "useless": []
135
+ }
136
+ ```
137
+
138
+ ---
139
+
140
+ ### `categorizations_json`
141
+
142
+ #### Schema
143
+
144
+ ```json
145
+ {
146
+ "$schema": "http://json-schema.org/draft-07/schema#",
147
+ "title": "DataMorganaCategorizations",
148
+ "type": "object",
149
+ "properties": {
150
+ "answer-control-categorization": {
151
+ "type": "string",
152
+ "description": "Describes how controlled or concise the answer is"
153
+ },
154
+ "answer-type-categorization": {
155
+ "type": "string",
156
+ "description": "Type of answer, such as yes/no or explanatory"
157
+ },
158
+ "formulation-categorization": {
159
+ "type": "string",
160
+ "description": "Describes the linguistic formulation of the answer"
161
+ },
162
+ "linguistic-correctness-categorization": {
163
+ "type": "string",
164
+ "description": "Grammatical and syntactic correctness"
165
+ },
166
+ "linguistic-variation-categorization": {
167
+ "type": "string",
168
+ "description": "Closeness or distance from the supporting documents"
169
+ },
170
+ "politeness-categorization": {
171
+ "type": "string",
172
+ "description": "Politeness level of the answer"
173
+ },
174
+ "premise-categorization": {
175
+ "type": "string",
176
+ "description": "Whether the answer assumes a premise or not"
177
+ },
178
+ "user-categorization": {
179
+ "type": "string",
180
+ "description": "Categorization of the user (e.g., expert, novice)"
181
+ }
182
+ },
183
+ "required": [
184
+ "answer-control-categorization",
185
+ "answer-type-categorization",
186
+ "formulation-categorization",
187
+ "linguistic-correctness-categorization",
188
+ "linguistic-variation-categorization",
189
+ "politeness-categorization",
190
+ "premise-categorization",
191
+ "user-categorization"
192
+ ],
193
+ "additionalProperties": false
194
+ }
195
+ ```
196
+
197
+ #### Example
198
+
199
+ ```json
200
+ {
201
+ "answer-control-categorization": "concise-answer",
202
+ "answer-type-categorization": "yes/no",
203
+ "formulation-categorization": "verbose and natural",
204
+ "linguistic-correctness-categorization": "correct",
205
+ "linguistic-variation-categorization": "distant from documents",
206
+ "politeness-categorization": "neutral",
207
+ "premise-categorization": "without premise",
208
+ "user-categorization": "novice"
209
+ }
210
+ ```