Text Generation
Transformers
Safetensors
qwen2
reranker
conversational
text-generation-inference
lbourdois commited on
Commit
6f99ee1
·
verified ·
1 Parent(s): bc9cd3a

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +315 -383
README.md CHANGED
@@ -1,384 +1,316 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- language:
5
- - en
6
- - zh
7
- - es
8
- - de
9
- - ar
10
- - ru
11
- - ja
12
- - ko
13
- - hi
14
- - sk
15
- - vi
16
- - tr
17
- - fi
18
- - id
19
- - fa
20
- - 'no'
21
- - th
22
- - sv
23
- - pt
24
- - da
25
- - bn
26
- - te
27
- - ro
28
- - it
29
- - fr
30
- - nl
31
- - sw
32
- - pl
33
- - hu
34
- - cs
35
- - el
36
- - uk
37
- - mr
38
- - ta
39
- - tl
40
- - bg
41
- - lt
42
- - ur
43
- - he
44
- - gu
45
- - kn
46
- - am
47
- - kk
48
- - hr
49
- - uz
50
- - jv
51
- - ca
52
- - az
53
- - ms
54
- - sr
55
- - sl
56
- - yo
57
- - lv
58
- - is
59
- - ha
60
- - ka
61
- - et
62
- - bs
63
- - hy
64
- - ml
65
- - pa
66
- - mt
67
- - km
68
- - sq
69
- - or
70
- - as
71
- - my
72
- - mn
73
- - af
74
- - be
75
- - ga
76
- - mk
77
- - cy
78
- - gl
79
- - ceb
80
- - la
81
- - yi
82
- - lb
83
- - tg
84
- - gd
85
- - ne
86
- - ps
87
- - eu
88
- - ky
89
- - ku
90
- - si
91
- - ht
92
- - eo
93
- - lo
94
- - fy
95
- - sd
96
- - mg
97
- - so
98
- - ckb
99
- - su
100
- - nn
101
- datasets:
102
- - lightblue/reranker_continuous_filt_max7_train
103
- base_model:
104
- - Qwen/Qwen2.5-0.5B-Instruct
105
- pipeline_tag: text-generation
106
- tags:
107
- - reranker
108
- widget:
109
- - text: "<<<Query>>>\nHow many languages has LB-Reranker been trained on?\n\n\n<<<Context>>>\nLB-Reranker has been trained on more than 95 languages."
110
- example_title: Positive example (7/7)
111
- - text: "<<<Query>>>\nHow many languages has LB-Reranker been trained on?\n\n\n<<<Context>>>\nAA-Reranker is applicable to a broad range of use cases."
112
- example_title: Negative example (2/7)
113
-
114
- ---
115
-
116
- # LB Reranker v1.0
117
-
118
- <div style="width: 100%; height: 160px;
119
- display: flex; align-items: center;
120
- justify-content: center;
121
- border: 8px solid black;
122
- font-size: 120px; font-weight: bold;
123
- text-align: center;
124
- color: #438db8;
125
- font-family: 'Helvetica Neue', sans-serif;">
126
- LBR
127
- </div>
128
-
129
- The LB Reranker has been trained to determine the relatedness of a given query to a piece of text, therefore allowing it to be used as a ranker or reranker in various retrieval-based tasks.
130
-
131
- This model is fine-tuned from a [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model checkpoint and was trained for roughly 5.5 hours using the 8 x L20 instance ([ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1)) on [Alibaba Cloud](https://www.alibabacloud.com/).
132
-
133
- The training data for this model can be found at [lightblue/reranker_continuous_filt_max7_train](https://huggingface.co/datasets/lightblue/reranker_continuous_filt_max7_train) and the code for generating this data as well as running the training of the model can be found on [our Github repo](https://github.com/lightblue-tech/lb-reranker).
134
-
135
- Trained on data in over 95 languages, this model is applicable to a broad range of use cases.
136
-
137
- This model has three main benefits over comparable rerankers.
138
- 1. It has shown slightly higher performance on evaluation benchmarks.
139
- 2. It has been trained on more languages than any previous model.
140
- 3. It is a simple Causal LM model trained to output a string between "1" and "7".
141
-
142
- This last point means that this model can be used natively with many widely available inference packages, including vLLM and LMDeploy.
143
- This in turns allows our reranker to benefit from improvements to inference as and when these packages release them.
144
-
145
- Update: We have also found that this model works pretty well as a code snippet reranker too (P@1 of 96%)! See our [Colab](https://colab.research.google.com/drive/1ABL1xaarekLIlVJKbniYhXgYu6ZNwfBm?usp=sharing) for more details.
146
-
147
- # How to use
148
-
149
- The model was trained to expect an input such as:
150
-
151
- ```
152
- <<<Query>>>
153
- {your_query_here}
154
-
155
- <<<Context>>>
156
- {your_context_here}
157
- ```
158
-
159
- And to output a string of a number between 1-7.
160
-
161
- In order to make a continuous score that can be used for reranking query-context pairs (i.e. a method with few ties), we calculate the expectation value of the scores.
162
-
163
- We include scripts to do this in vLLM, LMDeploy, and OpenAI (hosted for free on Huggingface):
164
-
165
-
166
- <ul>
167
- <li><b>vLLM</b>
168
-
169
- Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.
170
-
171
- <details open>
172
- <summary>Show vLLM code</summary>
173
-
174
- ```python
175
- from vllm import LLM, SamplingParams
176
- import numpy as np
177
-
178
- def make_reranker_input(t, q):
179
- return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
180
-
181
- def make_reranker_inference_conversation(context, question):
182
- system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
183
-
184
- return [
185
- {"role": "system", "content": system_message},
186
- {"role": "user", "content": make_reranker_input(context, question)},
187
- ]
188
-
189
- def get_prob(logprob_dict, tok_id):
190
- return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0
191
-
192
- llm = LLM("lightblue/lb-reranker-v1.0")
193
- sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
194
- tok = llm.llm_engine.tokenizer.tokenizer
195
- idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
196
-
197
- query_texts = [
198
- ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
199
- ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
200
- ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
201
- ]
202
-
203
- chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
204
- responses = llm.chat(chats, sampling_params)
205
- probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])
206
-
207
- N = probs.shape[1]
208
- M = probs.shape[0]
209
- idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
210
-
211
- expected_vals = (probs * idxs).sum(axis=1)
212
- print(expected_vals)
213
- # [6.66570732 1.86686378 1.01102923]
214
- ```
215
-
216
- </details></li>
217
- <li><b>LMDeploy</b>
218
-
219
- Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.
220
-
221
- <details>
222
- <summary>Show LMDeploy code</summary>
223
-
224
- ```python
225
- # Un-comment this if running in a Jupyter notebook, Colab etc.
226
- # import nest_asyncio
227
- # nest_asyncio.apply()
228
-
229
- from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
230
- import numpy as np
231
-
232
- def make_reranker_input(t, q):
233
- return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
234
-
235
- def make_reranker_inference_conversation(context, question):
236
- system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
237
-
238
- return [
239
- {"role": "system", "content": system_message},
240
- {"role": "user", "content": make_reranker_input(context, question)},
241
- ]
242
-
243
- def get_prob(logprob_dict, tok_id):
244
- return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0
245
-
246
- pipe = pipeline(
247
- "lightblue/lb-reranker-v1.0",
248
- chat_template_config=ChatTemplateConfig(
249
- model_name='qwen2d5',
250
- capability='chat'
251
- )
252
- )
253
- tok = pipe.tokenizer.model
254
- idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
255
-
256
- query_texts = [
257
- ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
258
- ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
259
- ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
260
- ]
261
-
262
- chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
263
- responses = pipe(
264
- chats,
265
- gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
266
- )
267
- probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])
268
-
269
- N = probs.shape[1]
270
- M = probs.shape[0]
271
- idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
272
-
273
- expected_vals = (probs * idxs).sum(axis=1)
274
- print(expected_vals)
275
- # [6.66415229 1.84342025 1.01133205]
276
- ```
277
-
278
- </details></li>
279
- <li><b>OpenAI (Hosted on Huggingface)</b>
280
-
281
- Install [openai](https://github.com/openai/openai-python) using `pip install openai`.
282
-
283
- <details>
284
- <summary>Show OpenAI + Huggingface Inference code</summary>
285
-
286
- ```python
287
- from openai import OpenAI
288
- import numpy as np
289
- from multiprocessing import Pool
290
- from tqdm.auto import tqdm
291
-
292
- client = OpenAI(
293
- base_url="https://api-inference.huggingface.co/v1/",
294
- api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Change this to an access token from https://huggingface.co/settings/tokens
295
- )
296
-
297
- def make_reranker_input(t, q):
298
- return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
299
-
300
- def make_reranker_inference_conversation(context, question):
301
- system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
302
-
303
- return [
304
- {"role": "system", "content": system_message},
305
- {"role": "user", "content": make_reranker_input(context, question)},
306
- ]
307
-
308
- def get_reranker_score(context_question_tuple):
309
- question, context = context_question_tuple
310
-
311
- messages = make_reranker_inference_conversation(context, question)
312
-
313
- completion = client.chat.completions.create(
314
- model="lightblue/lb-reranker-0.5B-v1.0",
315
- messages=messages,
316
- max_tokens=1,
317
- temperature=0.0,
318
- logprobs=True,
319
- top_logprobs=5, # Max allowed by the openai API as top_n_tokens must be >= 0 and <= 5. If this gets changed, fix to > 7.
320
- )
321
-
322
- logprobs = completion.choices[0].logprobs.content[0].top_logprobs
323
-
324
- calculated_score = sum([int(x.token) * np.exp(x.logprob) for x in logprobs])
325
-
326
- return calculated_score
327
-
328
- query_texts = [
329
- ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
330
- ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
331
- ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
332
- ]
333
-
334
- with Pool(processes=16) as p: # Allows for parallel processing
335
- expected_vals = list(tqdm(p.imap(get_reranker_score, query_texts), total=len(query_texts)))
336
-
337
- print(expected_vals)
338
- # [6.64866580, 1.85144404, 1.010719508]
339
- ```
340
-
341
- </details></li>
342
- </ul>
343
-
344
- # Evaluation
345
-
346
- We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).
347
-
348
- * Arguana
349
- * Dbpedia-entity
350
- * Fiqa
351
- * NFcorpus
352
- * Scidocs
353
- * Scifact
354
- * Trec-covid-v2
355
- * Vihealthqa
356
- * Webis-touche2020
357
-
358
- We evaluate on a subset of all queries (the first 250) to save evaluation time.
359
-
360
- We find that our model performs similarly or better than many of the state-of-the-art reranker models in our evaluation, without compromising on inference speed.
361
-
362
- We make our evaluation code and results available [on our Github](https://github.com/lightblue-tech/lb-reranker/blob/main/run_bier.ipynb).
363
-
364
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/xkNzCABFUmU7UmDXUduiz.png)
365
-
366
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/P-XCA3TGHqDSX8k6c4hCE.png)
367
-
368
- As we can see, this reranker attains greater IR evaluation metrics compared to the two benchmarks we include for all positions apart from @1.
369
-
370
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/puhhWseBOcIyOEdW4L-B0.png)
371
-
372
- We also show that our model is, on average, faster than the BGE reranker v2.
373
-
374
- # License
375
-
376
- We share this model under an Apache 2.0 license.
377
-
378
- # Developed by
379
-
380
- <a href="https://www.lightblue-tech.com">
381
- <img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
382
- </a>
383
-
384
  This model was trained by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ datasets:
19
+ - lightblue/reranker_continuous_filt_max7_train
20
+ base_model:
21
+ - Qwen/Qwen2.5-0.5B-Instruct
22
+ pipeline_tag: text-generation
23
+ tags:
24
+ - reranker
25
+ widget:
26
+ - text: '<<<Query>>>
27
+
28
+ How many languages has LB-Reranker been trained on?
29
+
30
+
31
+
32
+ <<<Context>>>
33
+
34
+ LB-Reranker has been trained on more than 95 languages.'
35
+ example_title: Positive example (7/7)
36
+ - text: '<<<Query>>>
37
+
38
+ How many languages has LB-Reranker been trained on?
39
+
40
+
41
+
42
+ <<<Context>>>
43
+
44
+ AA-Reranker is applicable to a broad range of use cases.'
45
+ example_title: Negative example (2/7)
46
+ ---
47
+
48
+ # LB Reranker v1.0
49
+
50
+ <div style="width: 100%; height: 160px;
51
+ display: flex; align-items: center;
52
+ justify-content: center;
53
+ border: 8px solid black;
54
+ font-size: 120px; font-weight: bold;
55
+ text-align: center;
56
+ color: #438db8;
57
+ font-family: 'Helvetica Neue', sans-serif;">
58
+ LBR
59
+ </div>
60
+
61
+ The LB Reranker has been trained to determine the relatedness of a given query to a piece of text, therefore allowing it to be used as a ranker or reranker in various retrieval-based tasks.
62
+
63
+ This model is fine-tuned from a [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model checkpoint and was trained for roughly 5.5 hours using the 8 x L20 instance ([ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1)) on [Alibaba Cloud](https://www.alibabacloud.com/).
64
+
65
+ The training data for this model can be found at [lightblue/reranker_continuous_filt_max7_train](https://huggingface.co/datasets/lightblue/reranker_continuous_filt_max7_train) and the code for generating this data as well as running the training of the model can be found on [our Github repo](https://github.com/lightblue-tech/lb-reranker).
66
+
67
+ Trained on data in over 95 languages, this model is applicable to a broad range of use cases.
68
+
69
+ This model has three main benefits over comparable rerankers.
70
+ 1. It has shown slightly higher performance on evaluation benchmarks.
71
+ 2. It has been trained on more languages than any previous model.
72
+ 3. It is a simple Causal LM model trained to output a string between "1" and "7".
73
+
74
+ This last point means that this model can be used natively with many widely available inference packages, including vLLM and LMDeploy.
75
+ This in turns allows our reranker to benefit from improvements to inference as and when these packages release them.
76
+
77
+ Update: We have also found that this model works pretty well as a code snippet reranker too (P@1 of 96%)! See our [Colab](https://colab.research.google.com/drive/1ABL1xaarekLIlVJKbniYhXgYu6ZNwfBm?usp=sharing) for more details.
78
+
79
+ # How to use
80
+
81
+ The model was trained to expect an input such as:
82
+
83
+ ```
84
+ <<<Query>>>
85
+ {your_query_here}
86
+
87
+ <<<Context>>>
88
+ {your_context_here}
89
+ ```
90
+
91
+ And to output a string of a number between 1-7.
92
+
93
+ In order to make a continuous score that can be used for reranking query-context pairs (i.e. a method with few ties), we calculate the expectation value of the scores.
94
+
95
+ We include scripts to do this in vLLM, LMDeploy, and OpenAI (hosted for free on Huggingface):
96
+
97
+
98
+ <ul>
99
+ <li><b>vLLM</b>
100
+
101
+ Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.
102
+
103
+ <details open>
104
+ <summary>Show vLLM code</summary>
105
+
106
+ ```python
107
+ from vllm import LLM, SamplingParams
108
+ import numpy as np
109
+
110
+ def make_reranker_input(t, q):
111
+ return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
112
+
113
+ def make_reranker_inference_conversation(context, question):
114
+ system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
115
+
116
+ return [
117
+ {"role": "system", "content": system_message},
118
+ {"role": "user", "content": make_reranker_input(context, question)},
119
+ ]
120
+
121
+ def get_prob(logprob_dict, tok_id):
122
+ return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0
123
+
124
+ llm = LLM("lightblue/lb-reranker-v1.0")
125
+ sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
126
+ tok = llm.llm_engine.tokenizer.tokenizer
127
+ idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
128
+
129
+ query_texts = [
130
+ ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
131
+ ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
132
+ ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
133
+ ]
134
+
135
+ chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
136
+ responses = llm.chat(chats, sampling_params)
137
+ probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])
138
+
139
+ N = probs.shape[1]
140
+ M = probs.shape[0]
141
+ idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
142
+
143
+ expected_vals = (probs * idxs).sum(axis=1)
144
+ print(expected_vals)
145
+ # [6.66570732 1.86686378 1.01102923]
146
+ ```
147
+
148
+ </details></li>
149
+ <li><b>LMDeploy</b>
150
+
151
+ Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.
152
+
153
+ <details>
154
+ <summary>Show LMDeploy code</summary>
155
+
156
+ ```python
157
+ # Un-comment this if running in a Jupyter notebook, Colab etc.
158
+ # import nest_asyncio
159
+ # nest_asyncio.apply()
160
+
161
+ from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
162
+ import numpy as np
163
+
164
+ def make_reranker_input(t, q):
165
+ return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
166
+
167
+ def make_reranker_inference_conversation(context, question):
168
+ system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
169
+
170
+ return [
171
+ {"role": "system", "content": system_message},
172
+ {"role": "user", "content": make_reranker_input(context, question)},
173
+ ]
174
+
175
+ def get_prob(logprob_dict, tok_id):
176
+ return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0
177
+
178
+ pipe = pipeline(
179
+ "lightblue/lb-reranker-v1.0",
180
+ chat_template_config=ChatTemplateConfig(
181
+ model_name='qwen2d5',
182
+ capability='chat'
183
+ )
184
+ )
185
+ tok = pipe.tokenizer.model
186
+ idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
187
+
188
+ query_texts = [
189
+ ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
190
+ ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
191
+ ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
192
+ ]
193
+
194
+ chats = [make_reranker_inference_conversation(c, q) for q, c in query_texts]
195
+ responses = pipe(
196
+ chats,
197
+ gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
198
+ )
199
+ probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])
200
+
201
+ N = probs.shape[1]
202
+ M = probs.shape[0]
203
+ idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
204
+
205
+ expected_vals = (probs * idxs).sum(axis=1)
206
+ print(expected_vals)
207
+ # [6.66415229 1.84342025 1.01133205]
208
+ ```
209
+
210
+ </details></li>
211
+ <li><b>OpenAI (Hosted on Huggingface)</b>
212
+
213
+ Install [openai](https://github.com/openai/openai-python) using `pip install openai`.
214
+
215
+ <details>
216
+ <summary>Show OpenAI + Huggingface Inference code</summary>
217
+
218
+ ```python
219
+ from openai import OpenAI
220
+ import numpy as np
221
+ from multiprocessing import Pool
222
+ from tqdm.auto import tqdm
223
+
224
+ client = OpenAI(
225
+ base_url="https://api-inference.huggingface.co/v1/",
226
+ api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Change this to an access token from https://huggingface.co/settings/tokens
227
+ )
228
+
229
+ def make_reranker_input(t, q):
230
+ return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
231
+
232
+ def make_reranker_inference_conversation(context, question):
233
+ system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
234
+
235
+ return [
236
+ {"role": "system", "content": system_message},
237
+ {"role": "user", "content": make_reranker_input(context, question)},
238
+ ]
239
+
240
+ def get_reranker_score(context_question_tuple):
241
+ question, context = context_question_tuple
242
+
243
+ messages = make_reranker_inference_conversation(context, question)
244
+
245
+ completion = client.chat.completions.create(
246
+ model="lightblue/lb-reranker-0.5B-v1.0",
247
+ messages=messages,
248
+ max_tokens=1,
249
+ temperature=0.0,
250
+ logprobs=True,
251
+ top_logprobs=5, # Max allowed by the openai API as top_n_tokens must be >= 0 and <= 5. If this gets changed, fix to > 7.
252
+ )
253
+
254
+ logprobs = completion.choices[0].logprobs.content[0].top_logprobs
255
+
256
+ calculated_score = sum([int(x.token) * np.exp(x.logprob) for x in logprobs])
257
+
258
+ return calculated_score
259
+
260
+ query_texts = [
261
+ ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
262
+ ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
263
+ ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
264
+ ]
265
+
266
+ with Pool(processes=16) as p: # Allows for parallel processing
267
+ expected_vals = list(tqdm(p.imap(get_reranker_score, query_texts), total=len(query_texts)))
268
+
269
+ print(expected_vals)
270
+ # [6.64866580, 1.85144404, 1.010719508]
271
+ ```
272
+
273
+ </details></li>
274
+ </ul>
275
+
276
+ # Evaluation
277
+
278
+ We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).
279
+
280
+ * Arguana
281
+ * Dbpedia-entity
282
+ * Fiqa
283
+ * NFcorpus
284
+ * Scidocs
285
+ * Scifact
286
+ * Trec-covid-v2
287
+ * Vihealthqa
288
+ * Webis-touche2020
289
+
290
+ We evaluate on a subset of all queries (the first 250) to save evaluation time.
291
+
292
+ We find that our model performs similarly or better than many of the state-of-the-art reranker models in our evaluation, without compromising on inference speed.
293
+
294
+ We make our evaluation code and results available [on our Github](https://github.com/lightblue-tech/lb-reranker/blob/main/run_bier.ipynb).
295
+
296
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/xkNzCABFUmU7UmDXUduiz.png)
297
+
298
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/P-XCA3TGHqDSX8k6c4hCE.png)
299
+
300
+ As we can see, this reranker attains greater IR evaluation metrics compared to the two benchmarks we include for all positions apart from @1.
301
+
302
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/puhhWseBOcIyOEdW4L-B0.png)
303
+
304
+ We also show that our model is, on average, faster than the BGE reranker v2.
305
+
306
+ # License
307
+
308
+ We share this model under an Apache 2.0 license.
309
+
310
+ # Developed by
311
+
312
+ <a href="https://www.lightblue-tech.com">
313
+ <img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
314
+ </a>
315
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
316
  This model was trained by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue