matthieumeeus97 commited on
Commit
d49a2c6
·
verified ·
1 Parent(s): 4e2f65a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -25
README.md CHANGED
@@ -12,20 +12,34 @@ license: llama2
12
  <em>A Llama-2/3-based family of Dutch language models</em>
13
  </div>
14
 
15
- ## Model Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
18
 
19
  We provide 6 variants (of which 3 base and 3 instruction-tuned models):
20
- - **ChocoLlama-2-7B-base**: A language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB (XXX tokens) using LoRa.
21
- - **ChocoLlama-2-7B-instruct**: An instruction-tuned version of ChocoLlama-2-7B-base, fine-tuned on a collection of Dutch translations of instruction-tuning datasets, using SFT followed by DPO.
22
- - **ChocoLlama-2-7B-tokentrans-base**: A language-adapted version of Meta's Llama-2-7b, using a Dutch RoBERTa-based tokenizer. The token embeddings of this model were reinitialized using the token translation algorithm proposed by [Remy et al.](https://arxiv.org/pdf/2310.03477). The model was subsequently fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
23
- - **ChocoLlama-2-7B-tokentrans-instruct**: An instruction-tuned version of ChocoLlama-2-7B-tokentrans-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
24
- - **Llama-3-ChocoLlama-8B-base**: A language-adapted version of Meta's Llama-8-8B, fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
25
- - **Llama-3-ChocoLlama-instruct**: An instruction-tuned version of Llama-3-ChocoLlama-8B-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
26
-
27
 
28
- As far as we are aware, Llama-3-ChocoLlama-8B-instruct sets a new state-of-the-art for Dutch open models in its weight class.
29
 
30
  ### Model Description
31
 
@@ -50,11 +64,8 @@ Since this is a base model, we do not recommend using it for your use-cases dire
50
 
51
  ### Downstream Use
52
 
53
- Since this model is a base model, it can easily be adapted to specific use-cases that required Dutch language understanding and generation. We expect this model to be particularly useful for use-cases in the domains which were explicitly covered in our dataset, e.g. the analysis and/or generation of:
54
- - Dutch job descriptions
55
- - Dutch corporate filings
56
- - Dutch legislation
57
-
58
 
59
  ### Out-of-Scope Use
60
 
@@ -70,17 +81,6 @@ However we did not explicitly conduct any additional filtering of this dataset w
70
 
71
  We recommend fine-tuning this model to your curated data to maximally avoid undesirable outputs.
72
 
73
- ## How to Get Started with the Model
74
-
75
- Use the code below to get started with the model.
76
-
77
- ```
78
- from transformers import AutoModelForCausalLM, AutoTokenizer
79
-
80
- tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
81
- model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
82
- ```
83
-
84
  ## Training Details
85
 
86
  ### Training Data
 
12
  <em>A Llama-2/3-based family of Dutch language models</em>
13
  </div>
14
 
15
+ ## How to Get Started with the Model
16
+
17
+ We here present **ChocoLlama-2-7B-base**, a language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB using LoRa.
18
+ Note that this is a base model, not optimized for conversational behavior.
19
+ If this is desired for your use-case, we recommend finetuning this model on your own Dutch data or using the instruction-finetuned version of this model, [ChocoLlama-2-7B-instruct](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct).
20
+
21
+ Use the code below to get started with the model.
22
+
23
+ ```python
24
+ from transformers import AutoModelForCausalLM, AutoTokenizer
25
+
26
+ tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
27
+ model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
28
+ ```
29
+
30
+ ## ChocoLlama: Model Details
31
 
32
  ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
33
 
34
  We provide 6 variants (of which 3 base and 3 instruction-tuned models):
35
+ - **ChocoLlama-2-7B-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-base)): A language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB using LoRa.
36
+ - **ChocoLlama-2-7B-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct)): An instruction-tuned version of ChocoLlama-2-7B-base, fine-tuned on a collection of Dutch translations of instruction-tuning datasets, using SFT followed by DPO.
37
+ - **ChocoLlama-2-7B-tokentrans-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-base)): A language-adapted version of Meta's Llama-2-7b, using a Dutch RoBERTa-based tokenizer. The token embeddings of this model were reinitialized using the token translation algorithm proposed by [Remy et al.](https://arxiv.org/pdf/2310.03477). The model was subsequently fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
38
+ - **ChocoLlama-2-7B-tokentrans-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct)): An instruction-tuned version of ChocoLlama-2-7B-tokentrans-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
39
+ - **Llama-3-ChocoLlama-8B-base** ([link](https://huggingface.co/ChocoLlama/Llama-3-ChocoLlama-8B-base)): A language-adapted version of Meta's Llama-8-8B, fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
40
+ - **Llama-3-ChocoLlama-instruct** ([link](https://huggingface.co/ChocoLlama/Llama-3-ChocoLlama-8B-instruct)): An instruction-tuned version of Llama-3-ChocoLlama-8B-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
 
41
 
42
+ For benchmark results for all models, including compared to their base models and other Dutch LLMs, we refer to our paper [here](some_url).
43
 
44
  ### Model Description
45
 
 
64
 
65
  ### Downstream Use
66
 
67
+ Since this model is a base model, it can easily be adapted to specific use-cases that required Dutch language understanding and generation.
68
+ We expect this model to be particularly useful for use-cases in the domains which were explicitly covered in our dataset, e.g. the analysis and/or generation of Dutch job descriptions, corporate filings and legislation.
 
 
 
69
 
70
  ### Out-of-Scope Use
71
 
 
81
 
82
  We recommend fine-tuning this model to your curated data to maximally avoid undesirable outputs.
83
 
 
 
 
 
 
 
 
 
 
 
 
84
  ## Training Details
85
 
86
  ### Training Data