greatakela commited on
Commit
88b237c
·
verified ·
1 Parent(s): 594d973

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,429 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:4893
8
+ - loss:TripletLoss
9
+ base_model: microsoft/deberta-base
10
+ widget:
11
+ - source_sentence: Perfect working condition. Then what you say leads obviously to
12
+ one alternative. The source of radiation is not from our universe. Nor in our
13
+ universe, Captain. It came from outside. Outside? Yes, that would explain a lot.
14
+ Another universe, perhaps in another dimension, occupying the same space at the
15
+ same time. The possible existence of a parallel universe has been scientifically
16
+ conceded, Captain.[SEP]All right. What would happen if another universe, say a
17
+ minus universe, came into contact with a positive universe such as ours?
18
+ sentences:
19
+ - ' How''s your leg? You seem to be favoring your left side.'
20
+ - Unquestionably a warp. A distortion of physical laws on an immense scale.
21
+ - Queen to queen's level three.
22
+ - source_sentence: The transporter refuses to function, even at maximum power. But
23
+ all the circuits test out. It appears to be the same energy block that's jamming
24
+ our communications. I cannot pinpoint a source. Captain, there's something over
25
+ there in the trees. Metal alloy like the planetary shell. It might tell us something.
26
+ There's an inscription, several languages.[SEP]The Keeper's dead.
27
+ sentences:
28
+ - ' How much heat are you taking from the parents?'
29
+ - This vault was constructed about a half a million years ago. About the same time
30
+ the planet surface was destroyed, if our sensor readings are accurate.
31
+ - An astute medical observation, Doctor, if we can believe this information. Tricorder
32
+ readings indicate there is a body interred here.
33
+ - source_sentence: Welcome home, Jim. I had a whole universe to myself after the Defiant
34
+ was thrown out. There was absolutely no one else in it. I must say I prefer a
35
+ crowded universe much better. How did you two get along without me? Oh, we managed.
36
+ Mister Spock gave the orders, and I found the answers. Good. No problems between
37
+ you? None worth reporting, Captain.[SEP]Try me.
38
+ sentences:
39
+ - Only such minor disturbances as are inevitable when humans are involved.
40
+ - ' Harder than the right?'
41
+ - Good. Report to Sickbay, Mister Sulu.
42
+ - source_sentence: Too bad, Captain. Maybe I can't go home, but neither can you. You're
43
+ as much a prisoner in time as I am. Recommendation for his disposition, dear?
44
+ Maintenance note. My recording computer has a serious malfunction. Recommend it
45
+ either be corrected or scrapped. Compute. Computed. Bridge to Captain Kirk.[SEP]Kirk
46
+ here.
47
+ sentences:
48
+ - Have some new information regarding Captain Christopher. Important I see you both
49
+ immediately.
50
+ - Several times, Captain. I do not wish to surrender hope, but the facts remain
51
+ unchangeable.
52
+ - ' [almost imitating an orgasm] Ohhh, yes! Get a head CT, draw a blood culture,
53
+ run a chem panel and get a complete blood count.'
54
+ - source_sentence: That's paradise? We have no need or want, Captain. It's a true
55
+ Eden, Jim. There's belonging and love. No wants. No needs. We weren't meant for
56
+ that. None of us. Man stagnates if he has no ambition, no desire to be more than
57
+ he is. We have what we need.[SEP]Except a challenge.
58
+ sentences:
59
+ - Sir?
60
+ - ' Happy Valentine''s Day.'
61
+ - You don't understand, Jim, but you'll come around sooner or later. Join us. Please.
62
+ pipeline_tag: sentence-similarity
63
+ library_name: sentence-transformers
64
+ metrics:
65
+ - cosine_accuracy
66
+ model-index:
67
+ - name: SentenceTransformer based on microsoft/deberta-base
68
+ results:
69
+ - task:
70
+ type: triplet
71
+ name: Triplet
72
+ dataset:
73
+ name: evaluator enc
74
+ type: evaluator_enc
75
+ metrics:
76
+ - type: cosine_accuracy
77
+ value: 0.9991825222969055
78
+ name: Cosine Accuracy
79
+ - task:
80
+ type: triplet
81
+ name: Triplet
82
+ dataset:
83
+ name: evaluator val
84
+ type: evaluator_val
85
+ metrics:
86
+ - type: cosine_accuracy
87
+ value: 0.9814814925193787
88
+ name: Cosine Accuracy
89
+ ---
90
+
91
+ # SentenceTransformer based on microsoft/deberta-base
92
+
93
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-base](https://huggingface.co/microsoft/deberta-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
94
+
95
+ ## Model Details
96
+
97
+ ### Model Description
98
+ - **Model Type:** Sentence Transformer
99
+ - **Base model:** [microsoft/deberta-base](https://huggingface.co/microsoft/deberta-base) <!-- at revision 0d1b43ccf21b5acd9f4e5f7b077fa698f05cf195 -->
100
+ - **Maximum Sequence Length:** 128 tokens
101
+ - **Output Dimensionality:** 768 dimensions
102
+ - **Similarity Function:** Cosine Similarity
103
+ <!-- - **Training Dataset:** Unknown -->
104
+ <!-- - **Language:** Unknown -->
105
+ <!-- - **License:** Unknown -->
106
+
107
+ ### Model Sources
108
+
109
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
110
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
111
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
112
+
113
+ ### Full Model Architecture
114
+
115
+ ```
116
+ SentenceTransformer(
117
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DebertaModel
118
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
119
+ )
120
+ ```
121
+
122
+ ## Usage
123
+
124
+ ### Direct Usage (Sentence Transformers)
125
+
126
+ First install the Sentence Transformers library:
127
+
128
+ ```bash
129
+ pip install -U sentence-transformers
130
+ ```
131
+
132
+ Then you can load this model and run inference.
133
+ ```python
134
+ from sentence_transformers import SentenceTransformer
135
+
136
+ # Download from the 🤗 Hub
137
+ model = SentenceTransformer("greatakela/gnlp_hw1_encoder_1")
138
+ # Run inference
139
+ sentences = [
140
+ "That's paradise? We have no need or want, Captain. It's a true Eden, Jim. There's belonging and love. No wants. No needs. We weren't meant for that. None of us. Man stagnates if he has no ambition, no desire to be more than he is. We have what we need.[SEP]Except a challenge.",
141
+ "You don't understand, Jim, but you'll come around sooner or later. Join us. Please.",
142
+ " Happy Valentine's Day.",
143
+ ]
144
+ embeddings = model.encode(sentences)
145
+ print(embeddings.shape)
146
+ # [3, 768]
147
+
148
+ # Get the similarity scores for the embeddings
149
+ similarities = model.similarity(embeddings, embeddings)
150
+ print(similarities.shape)
151
+ # [3, 3]
152
+ ```
153
+
154
+ <!--
155
+ ### Direct Usage (Transformers)
156
+
157
+ <details><summary>Click to see the direct usage in Transformers</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Downstream Usage (Sentence Transformers)
164
+
165
+ You can finetune this model on your own dataset.
166
+
167
+ <details><summary>Click to expand</summary>
168
+
169
+ </details>
170
+ -->
171
+
172
+ <!--
173
+ ### Out-of-Scope Use
174
+
175
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
+ -->
177
+
178
+ ## Evaluation
179
+
180
+ ### Metrics
181
+
182
+ #### Triplet
183
+
184
+ * Datasets: `evaluator_enc` and `evaluator_val`
185
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
186
+
187
+ | Metric | evaluator_enc | evaluator_val |
188
+ |:--------------------|:--------------|:--------------|
189
+ | **cosine_accuracy** | **0.9992** | **0.9815** |
190
+
191
+ <!--
192
+ ## Bias, Risks and Limitations
193
+
194
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
195
+ -->
196
+
197
+ <!--
198
+ ### Recommendations
199
+
200
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
201
+ -->
202
+
203
+ ## Training Details
204
+
205
+ ### Training Dataset
206
+
207
+ #### Unnamed Dataset
208
+
209
+ * Size: 4,893 training samples
210
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
211
+ * Approximate statistics based on the first 1000 samples:
212
+ | | sentence_0 | sentence_1 | sentence_2 |
213
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
214
+ | type | string | string | string |
215
+ | details | <ul><li>min: 2 tokens</li><li>mean: 83.32 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 18.63 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 18.98 tokens</li><li>max: 128 tokens</li></ul> |
216
+ * Samples:
217
+ | sentence_0 | sentence_1 | sentence_2 |
218
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
219
+ | <code>Don't know yet. Engineering. No casualties, Captain, but trouble aplenty with the engines. Every dilithium crystal connection's smashed in the warp engine circuitry. We're trying to bypass them now. What about main circuits? Well, you have to see it to believe it, sir. Those big crystals in there have come apart. Each of them unpeeling like the rind of an orange. Analysis, Spock.[SEP]Our only hope now is rewiring impulse. But there are a thousand broken connections.</code> | <code>Captain, this is quite unprecedented. Notice the fracturing is spiro-form, similar to long chain molecules.</code> | <code> No signs of drug use or acetaminophen poisoning in his tox screen. Maybe the water was contaminated.</code> |
220
+ | <code>Behold. That is most significant. An instinct new to the essence of her being is generating. Compassion for another is becoming part of her functioning life system. She is afraid. She's saving herself. She does not yet have the instinct to save her people. We have failed?[SEP]No. No, not yet.</code> | <code>Captain, Dr. McCoy's life is not solely dependent on Gem. The Vians too must be capable of saving his life.</code> | <code> Not right now. She's already on a respirator. The maParkne is breathing for her. I can do whatEver I want to her lungs. If you're playing catch in the living room and you break your mother's vase you might as well keep playing catch. The vase is already broken.</code> |
221
+ | <code>He was aware of what might happen when he went. I should never have let him go. You had no choice, Captain. You could not have stopped him. How can you ignore that? A Vulcan would not cry out so.[SEP]Whether he's a Vulcan or not, he's in agony.</code> | <code>I am not insensitive to it, Captain.</code> | <code> What about something vascular, polyarteritis nodosa.</code> |
222
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
223
+ ```json
224
+ {
225
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
226
+ "triplet_margin": 5
227
+ }
228
+ ```
229
+
230
+ ### Training Hyperparameters
231
+ #### Non-Default Hyperparameters
232
+
233
+ - `eval_strategy`: steps
234
+ - `multi_dataset_batch_sampler`: round_robin
235
+
236
+ #### All Hyperparameters
237
+ <details><summary>Click to expand</summary>
238
+
239
+ - `overwrite_output_dir`: False
240
+ - `do_predict`: False
241
+ - `eval_strategy`: steps
242
+ - `prediction_loss_only`: True
243
+ - `per_device_train_batch_size`: 8
244
+ - `per_device_eval_batch_size`: 8
245
+ - `per_gpu_train_batch_size`: None
246
+ - `per_gpu_eval_batch_size`: None
247
+ - `gradient_accumulation_steps`: 1
248
+ - `eval_accumulation_steps`: None
249
+ - `torch_empty_cache_steps`: None
250
+ - `learning_rate`: 5e-05
251
+ - `weight_decay`: 0.0
252
+ - `adam_beta1`: 0.9
253
+ - `adam_beta2`: 0.999
254
+ - `adam_epsilon`: 1e-08
255
+ - `max_grad_norm`: 1
256
+ - `num_train_epochs`: 3
257
+ - `max_steps`: -1
258
+ - `lr_scheduler_type`: linear
259
+ - `lr_scheduler_kwargs`: {}
260
+ - `warmup_ratio`: 0.0
261
+ - `warmup_steps`: 0
262
+ - `log_level`: passive
263
+ - `log_level_replica`: warning
264
+ - `log_on_each_node`: True
265
+ - `logging_nan_inf_filter`: True
266
+ - `save_safetensors`: True
267
+ - `save_on_each_node`: False
268
+ - `save_only_model`: False
269
+ - `restore_callback_states_from_checkpoint`: False
270
+ - `no_cuda`: False
271
+ - `use_cpu`: False
272
+ - `use_mps_device`: False
273
+ - `seed`: 42
274
+ - `data_seed`: None
275
+ - `jit_mode_eval`: False
276
+ - `use_ipex`: False
277
+ - `bf16`: False
278
+ - `fp16`: False
279
+ - `fp16_opt_level`: O1
280
+ - `half_precision_backend`: auto
281
+ - `bf16_full_eval`: False
282
+ - `fp16_full_eval`: False
283
+ - `tf32`: None
284
+ - `local_rank`: 0
285
+ - `ddp_backend`: None
286
+ - `tpu_num_cores`: None
287
+ - `tpu_metrics_debug`: False
288
+ - `debug`: []
289
+ - `dataloader_drop_last`: False
290
+ - `dataloader_num_workers`: 0
291
+ - `dataloader_prefetch_factor`: None
292
+ - `past_index`: -1
293
+ - `disable_tqdm`: False
294
+ - `remove_unused_columns`: True
295
+ - `label_names`: None
296
+ - `load_best_model_at_end`: False
297
+ - `ignore_data_skip`: False
298
+ - `fsdp`: []
299
+ - `fsdp_min_num_params`: 0
300
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
301
+ - `fsdp_transformer_layer_cls_to_wrap`: None
302
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
303
+ - `deepspeed`: None
304
+ - `label_smoothing_factor`: 0.0
305
+ - `optim`: adamw_torch
306
+ - `optim_args`: None
307
+ - `adafactor`: False
308
+ - `group_by_length`: False
309
+ - `length_column_name`: length
310
+ - `ddp_find_unused_parameters`: None
311
+ - `ddp_bucket_cap_mb`: None
312
+ - `ddp_broadcast_buffers`: False
313
+ - `dataloader_pin_memory`: True
314
+ - `dataloader_persistent_workers`: False
315
+ - `skip_memory_metrics`: True
316
+ - `use_legacy_prediction_loop`: False
317
+ - `push_to_hub`: False
318
+ - `resume_from_checkpoint`: None
319
+ - `hub_model_id`: None
320
+ - `hub_strategy`: every_save
321
+ - `hub_private_repo`: None
322
+ - `hub_always_push`: False
323
+ - `gradient_checkpointing`: False
324
+ - `gradient_checkpointing_kwargs`: None
325
+ - `include_inputs_for_metrics`: False
326
+ - `include_for_metrics`: []
327
+ - `eval_do_concat_batches`: True
328
+ - `fp16_backend`: auto
329
+ - `push_to_hub_model_id`: None
330
+ - `push_to_hub_organization`: None
331
+ - `mp_parameters`:
332
+ - `auto_find_batch_size`: False
333
+ - `full_determinism`: False
334
+ - `torchdynamo`: None
335
+ - `ray_scope`: last
336
+ - `ddp_timeout`: 1800
337
+ - `torch_compile`: False
338
+ - `torch_compile_backend`: None
339
+ - `torch_compile_mode`: None
340
+ - `dispatch_batches`: None
341
+ - `split_batches`: None
342
+ - `include_tokens_per_second`: False
343
+ - `include_num_input_tokens_seen`: False
344
+ - `neftune_noise_alpha`: None
345
+ - `optim_target_modules`: None
346
+ - `batch_eval_metrics`: False
347
+ - `eval_on_start`: False
348
+ - `use_liger_kernel`: False
349
+ - `eval_use_gather_object`: False
350
+ - `average_tokens_across_devices`: False
351
+ - `prompts`: None
352
+ - `batch_sampler`: batch_sampler
353
+ - `multi_dataset_batch_sampler`: round_robin
354
+
355
+ </details>
356
+
357
+ ### Training Logs
358
+ | Epoch | Step | Training Loss | evaluator_enc_cosine_accuracy | evaluator_val_cosine_accuracy |
359
+ |:------:|:----:|:-------------:|:-----------------------------:|:-----------------------------:|
360
+ | -1 | -1 | - | 0.6203 | - |
361
+ | 0.4902 | 300 | - | 0.9789 | - |
362
+ | 0.8170 | 500 | 0.8516 | - | - |
363
+ | 0.9804 | 600 | - | 0.9931 | - |
364
+ | 1.0 | 612 | - | 0.9937 | - |
365
+ | 1.4706 | 900 | - | 0.9955 | - |
366
+ | 1.6340 | 1000 | 0.1586 | - | - |
367
+ | 1.9608 | 1200 | - | 0.9982 | - |
368
+ | 2.0 | 1224 | - | 0.9992 | - |
369
+ | 2.4510 | 1500 | 0.0644 | 0.9992 | - |
370
+ | 2.9412 | 1800 | - | 0.9992 | - |
371
+ | 3.0 | 1836 | - | 0.9992 | - |
372
+ | -1 | -1 | - | - | 0.9815 |
373
+
374
+
375
+ ### Framework Versions
376
+ - Python: 3.11.11
377
+ - Sentence Transformers: 3.4.1
378
+ - Transformers: 4.48.3
379
+ - PyTorch: 2.5.1+cu124
380
+ - Accelerate: 1.3.0
381
+ - Datasets: 3.3.2
382
+ - Tokenizers: 0.21.0
383
+
384
+ ## Citation
385
+
386
+ ### BibTeX
387
+
388
+ #### Sentence Transformers
389
+ ```bibtex
390
+ @inproceedings{reimers-2019-sentence-bert,
391
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
392
+ author = "Reimers, Nils and Gurevych, Iryna",
393
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
394
+ month = "11",
395
+ year = "2019",
396
+ publisher = "Association for Computational Linguistics",
397
+ url = "https://arxiv.org/abs/1908.10084",
398
+ }
399
+ ```
400
+
401
+ #### TripletLoss
402
+ ```bibtex
403
+ @misc{hermans2017defense,
404
+ title={In Defense of the Triplet Loss for Person Re-Identification},
405
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
406
+ year={2017},
407
+ eprint={1703.07737},
408
+ archivePrefix={arXiv},
409
+ primaryClass={cs.CV}
410
+ }
411
+ ```
412
+
413
+ <!--
414
+ ## Glossary
415
+
416
+ *Clearly define terms in order to be accessible across audiences.*
417
+ -->
418
+
419
+ <!--
420
+ ## Model Card Authors
421
+
422
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
423
+ -->
424
+
425
+ <!--
426
+ ## Model Card Contact
427
+
428
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
429
+ -->
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-base",
3
+ "architectures": [
4
+ "DebertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "legacy": true,
14
+ "max_position_embeddings": 512,
15
+ "max_relative_positions": -1,
16
+ "model_type": "deberta",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "c2p",
25
+ "p2c"
26
+ ],
27
+ "position_biased_input": false,
28
+ "relative_attention": true,
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.48.3",
31
+ "type_vocab_size": 0,
32
+ "vocab_size": 50265
33
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8183b3fd54c073e9a380efb364c4500297686d41cb95e8c2146e9caabd2f3385
3
+ size 554429144
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": true,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "[PAD]",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "[CLS]",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "[SEP]",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "[UNK]",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "50264": {
38
+ "content": "[MASK]",
39
+ "lstrip": true,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ }
45
+ },
46
+ "bos_token": "[CLS]",
47
+ "clean_up_tokenization_spaces": false,
48
+ "cls_token": "[CLS]",
49
+ "do_lower_case": false,
50
+ "eos_token": "[SEP]",
51
+ "errors": "replace",
52
+ "extra_special_tokens": {},
53
+ "mask_token": "[MASK]",
54
+ "model_max_length": 128,
55
+ "pad_token": "[PAD]",
56
+ "sep_token": "[SEP]",
57
+ "tokenizer_class": "DebertaTokenizer",
58
+ "unk_token": "[UNK]",
59
+ "vocab_type": "gpt2"
60
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff