leoozy commited on
Commit
696f6dc
·
1 Parent(s): 17731a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -31
README.md CHANGED
@@ -55,7 +55,7 @@ Users (both direct and downstream) should be made aware of the risks, biases and
55
  ## Training Data
56
 
57
  The model craters note in the [Github Repository](https://github.com/SJTU-LIT/SynCSE/blob/main/README.md)
58
- > We train
59
 
60
 
61
  ## Training Procedure
@@ -75,15 +75,6 @@ More information needed
75
  # Evaluation
76
 
77
 
78
- ## Testing Data, Factors & Metrics
79
-
80
- ### Testing Data
81
-
82
- The model craters note in the [associated paper](https://arxiv.org/pdf/2104.08821.pdf)
83
- > Our evaluation code for sentence embeddings is based on a modified version of [SentEval](https://github.com/facebookresearch/SentEval). It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks.
84
-
85
- For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. See [associated paper](https://arxiv.org/pdf/2104.08821.pdf) (Appendix B) for evaluation details.
86
-
87
 
88
 
89
  ### Factors
@@ -108,16 +99,6 @@ More information needed
108
 
109
 
110
 
111
- # Environmental Impact
112
-
113
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
114
-
115
- - **Hardware Type:** Nvidia 3090 GPUs with CUDA 11
116
- - **Hours used:** More information needed
117
- - **Cloud Provider:** More information needed
118
- - **Compute Region:** More information needed
119
- - **Carbon Emitted:** More information needed
120
-
121
  # Technical Specifications [optional]
122
 
123
  ## Model Architecture and Objective
@@ -143,11 +124,11 @@ More information needed.
143
  **BibTeX:**
144
 
145
  ```bibtex
146
- @inproceedings{gao2021simcse,
147
- title={{SimCSE}: Simple Contrastive Learning of Sentence Embeddings},
148
- author={Gao, Tianyu and Yao, Xingcheng and Chen, Danqi},
149
- booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
150
- year={2021}
151
  }
152
  ```
153
 
@@ -159,13 +140,11 @@ More information needed
159
  More information needed
160
 
161
 
162
- # Model Card Authors [optional]
163
-
164
- Princeton NLP group in collaboration with Ezi Ozoani and the Hugging Face team.
165
 
166
  # Model Card Contact
167
 
168
- If you have any questions related to the code or the paper, feel free to email Tianyu (`tianyug@cs.princeton.edu`) and Xingcheng (`yxc18@mails.tsinghua.edu.cn`). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
169
 
170
 
171
 
@@ -179,9 +158,9 @@ Use the code below to get started with the model.
179
  ```python
180
  from transformers import AutoTokenizer, AutoModel
181
 
182
- tokenizer = AutoTokenizer.from_pretrained("princeton-nlp/sup-simcse-bert-large-uncased")
183
 
184
- model = AutoModel.from_pretrained("princeton-nlp/sup-simcse-bert-large-uncased")
185
 
186
  ```
187
  </details>
 
55
  ## Training Data
56
 
57
  The model craters note in the [Github Repository](https://github.com/SJTU-LIT/SynCSE/blob/main/README.md)
58
+ > We use 26.2k generated synthetic train SynCSE-partial-RoBERTa-base.
59
 
60
 
61
  ## Training Procedure
 
75
  # Evaluation
76
 
77
 
 
 
 
 
 
 
 
 
 
78
 
79
 
80
  ### Factors
 
99
 
100
 
101
 
 
 
 
 
 
 
 
 
 
 
102
  # Technical Specifications [optional]
103
 
104
  ## Model Architecture and Objective
 
124
  **BibTeX:**
125
 
126
  ```bibtex
127
+ @article{zhang2023contrastive,
128
+ title={Contrastive Learning of Sentence Embeddings from Scratch},
129
+ author={Zhang, Junlei and Lan, Zhenzhong and He, Junxian},
130
+ journal={arXiv preprint arXiv:2305.15077},
131
+ year={2023}
132
  }
133
  ```
134
 
 
140
  More information needed
141
 
142
 
143
+
 
 
144
 
145
  # Model Card Contact
146
 
147
+ If you have any questions related to the code or the paper, feel free to email Junlei (`zhangjunlei@westlake.edu.cn`). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
148
 
149
 
150
 
 
158
  ```python
159
  from transformers import AutoTokenizer, AutoModel
160
 
161
+ tokenizer = AutoTokenizer.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")
162
 
163
+ model = AutoModel.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")
164
 
165
  ```
166
  </details>