Update README.md
Browse files
README.md
CHANGED
@@ -12,20 +12,34 @@ license: llama2
|
|
12 |
<em>A Llama-2/3-based family of Dutch language models</em>
|
13 |
</div>
|
14 |
|
15 |
-
## Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
|
18 |
|
19 |
We provide 6 variants (of which 3 base and 3 instruction-tuned models):
|
20 |
-
- **ChocoLlama-2-7B-base
|
21 |
-
- **ChocoLlama-2-7B-instruct
|
22 |
-
- **ChocoLlama-2-7B-tokentrans-base
|
23 |
-
- **ChocoLlama-2-7B-tokentrans-instruct
|
24 |
-
- **Llama-3-ChocoLlama-8B-base
|
25 |
-
- **Llama-3-ChocoLlama-instruct
|
26 |
-
|
27 |
|
28 |
-
|
29 |
|
30 |
### Model Description
|
31 |
|
@@ -50,11 +64,8 @@ Since this is a base model, we do not recommend using it for your use-cases dire
|
|
50 |
|
51 |
### Downstream Use
|
52 |
|
53 |
-
Since this model is a base model, it can easily be adapted to specific use-cases that required Dutch language understanding and generation.
|
54 |
-
- Dutch job descriptions
|
55 |
-
- Dutch corporate filings
|
56 |
-
- Dutch legislation
|
57 |
-
|
58 |
|
59 |
### Out-of-Scope Use
|
60 |
|
@@ -70,17 +81,6 @@ However we did not explicitly conduct any additional filtering of this dataset w
|
|
70 |
|
71 |
We recommend fine-tuning this model to your curated data to maximally avoid undesirable outputs.
|
72 |
|
73 |
-
## How to Get Started with the Model
|
74 |
-
|
75 |
-
Use the code below to get started with the model.
|
76 |
-
|
77 |
-
```
|
78 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
79 |
-
|
80 |
-
tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
|
81 |
-
model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
|
82 |
-
```
|
83 |
-
|
84 |
## Training Details
|
85 |
|
86 |
### Training Data
|
|
|
12 |
<em>A Llama-2/3-based family of Dutch language models</em>
|
13 |
</div>
|
14 |
|
15 |
+
## How to Get Started with the Model
|
16 |
+
|
17 |
+
We here present **ChocoLlama-2-7B-base**, a language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB using LoRa.
|
18 |
+
Note that this is a base model, not optimized for conversational behavior.
|
19 |
+
If this is desired for your use-case, we recommend finetuning this model on your own Dutch data or using the instruction-finetuned version of this model, [ChocoLlama-2-7B-instruct](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct).
|
20 |
+
|
21 |
+
Use the code below to get started with the model.
|
22 |
+
|
23 |
+
```python
|
24 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
25 |
+
|
26 |
+
tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
|
27 |
+
model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
|
28 |
+
```
|
29 |
+
|
30 |
+
## ChocoLlama: Model Details
|
31 |
|
32 |
ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
|
33 |
|
34 |
We provide 6 variants (of which 3 base and 3 instruction-tuned models):
|
35 |
+
- **ChocoLlama-2-7B-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-base)): A language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB using LoRa.
|
36 |
+
- **ChocoLlama-2-7B-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct)): An instruction-tuned version of ChocoLlama-2-7B-base, fine-tuned on a collection of Dutch translations of instruction-tuning datasets, using SFT followed by DPO.
|
37 |
+
- **ChocoLlama-2-7B-tokentrans-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-base)): A language-adapted version of Meta's Llama-2-7b, using a Dutch RoBERTa-based tokenizer. The token embeddings of this model were reinitialized using the token translation algorithm proposed by [Remy et al.](https://arxiv.org/pdf/2310.03477). The model was subsequently fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
|
38 |
+
- **ChocoLlama-2-7B-tokentrans-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct)): An instruction-tuned version of ChocoLlama-2-7B-tokentrans-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
|
39 |
+
- **Llama-3-ChocoLlama-8B-base** ([link](https://huggingface.co/ChocoLlama/Llama-3-ChocoLlama-8B-base)): A language-adapted version of Meta's Llama-8-8B, fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
|
40 |
+
- **Llama-3-ChocoLlama-instruct** ([link](https://huggingface.co/ChocoLlama/Llama-3-ChocoLlama-8B-instruct)): An instruction-tuned version of Llama-3-ChocoLlama-8B-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
|
|
|
41 |
|
42 |
+
For benchmark results for all models, including compared to their base models and other Dutch LLMs, we refer to our paper [here](some_url).
|
43 |
|
44 |
### Model Description
|
45 |
|
|
|
64 |
|
65 |
### Downstream Use
|
66 |
|
67 |
+
Since this model is a base model, it can easily be adapted to specific use-cases that required Dutch language understanding and generation.
|
68 |
+
We expect this model to be particularly useful for use-cases in the domains which were explicitly covered in our dataset, e.g. the analysis and/or generation of Dutch job descriptions, corporate filings and legislation.
|
|
|
|
|
|
|
69 |
|
70 |
### Out-of-Scope Use
|
71 |
|
|
|
81 |
|
82 |
We recommend fine-tuning this model to your curated data to maximally avoid undesirable outputs.
|
83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
## Training Details
|
85 |
|
86 |
### Training Data
|