ChocoLlama
/

ChocoLlama-2-7B-base

@@ -12,20 +12,34 @@ license: llama2
 <em>A Llama-2/3-based family of Dutch language models</em>
 </div>
-## Model Details
 ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
 We provide 6 variants (of which 3 base and 3 instruction-tuned models):
-- **ChocoLlama-2-7B-base**: A language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB (XXX tokens) using LoRa.
-- **ChocoLlama-2-7B-instruct**: An instruction-tuned version of ChocoLlama-2-7B-base, fine-tuned on a collection of Dutch translations of instruction-tuning datasets, using SFT followed by DPO.
-- **ChocoLlama-2-7B-tokentrans-base**: A language-adapted version of Meta's Llama-2-7b, using a Dutch RoBERTa-based tokenizer. The token embeddings of this model were reinitialized using the token translation algorithm proposed by [Remy et al.](https://arxiv.org/pdf/2310.03477). The model was subsequently fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
-- **ChocoLlama-2-7B-tokentrans-instruct**: An instruction-tuned version of ChocoLlama-2-7B-tokentrans-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
-- **Llama-3-ChocoLlama-8B-base**: A language-adapted version of Meta's Llama-8-8B, fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
-- **Llama-3-ChocoLlama-instruct**: An instruction-tuned version of Llama-3-ChocoLlama-8B-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
-As far as we are aware, Llama-3-ChocoLlama-8B-instruct sets a new state-of-the-art for Dutch open models in its weight class.
 ### Model Description
@@ -50,11 +64,8 @@ Since this is a base model, we do not recommend using it for your use-cases dire
 ### Downstream Use
-Since this model is a base model, it can easily be adapted to specific use-cases that required Dutch language understanding and generation. We expect this model to be particularly useful for use-cases in the domains which were explicitly covered in our dataset, e.g. the analysis and/or generation of:
-- Dutch job descriptions
-- Dutch corporate filings
-- Dutch legislation
 ### Out-of-Scope Use
@@ -70,17 +81,6 @@ However we did not explicitly conduct any additional filtering of this dataset w
 We recommend fine-tuning this model to your curated data to maximally avoid undesirable outputs.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-```
-from transformers import AutoModelForCausalLM, AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
-model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
-```
 ## Training Details
 ### Training Data

 <em>A Llama-2/3-based family of Dutch language models</em>
 </div>
+## How to Get Started with the Model
+We here present **ChocoLlama-2-7B-base**, a language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB using LoRa.
+Note that this is a base model, not optimized for conversational behavior.
+If this is desired for your use-case, we recommend finetuning this model on your own Dutch data or using the instruction-finetuned version of this model, [ChocoLlama-2-7B-instruct](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct).
+Use the code below to get started with the model.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
+model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-base')
+```
+## ChocoLlama: Model Details
 ChocoLlama is a family of open LLM's specifically adapted to Dutch, contributing to the state-of-the-art of Dutch open LLM's in their weight class.
 We provide 6 variants (of which 3 base and 3 instruction-tuned models):
+- **ChocoLlama-2-7B-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-base)): A language-adapted version of Meta's Llama-2-7b, fine-tuned on a Dutch dataset of 104GB using LoRa.
+- **ChocoLlama-2-7B-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-instruct)): An instruction-tuned version of ChocoLlama-2-7B-base, fine-tuned on a collection of Dutch translations of instruction-tuning datasets, using SFT followed by DPO.
+- **ChocoLlama-2-7B-tokentrans-base** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-base)): A language-adapted version of Meta's Llama-2-7b, using a Dutch RoBERTa-based tokenizer. The token embeddings of this model were reinitialized using the token translation algorithm proposed by [Remy et al.](https://arxiv.org/pdf/2310.03477). The model was subsequently fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
+- **ChocoLlama-2-7B-tokentrans-instruct** ([link](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-tokentrans-instruct)): An instruction-tuned version of ChocoLlama-2-7B-tokentrans-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
+- **Llama-3-ChocoLlama-8B-base** ([link](https://huggingface.co/ChocoLlama/Llama-3-ChocoLlama-8B-base)): A language-adapted version of Meta's Llama-8-8B, fine-tuned on the same Dutch dataset as ChocoLlama-2-7B-base, again using LoRa.
+- **Llama-3-ChocoLlama-instruct** ([link](https://huggingface.co/ChocoLlama/Llama-3-ChocoLlama-8B-instruct)): An instruction-tuned version of Llama-3-ChocoLlama-8B-base, fine-tuned on the same dataset as ChocoLlama-2-7B-instruct, again using SFT followed by DPO.
+For benchmark results for all models, including compared to their base models and other Dutch LLMs, we refer to our paper [here](some_url).
 ### Model Description
 ### Downstream Use
+Since this model is a base model, it can easily be adapted to specific use-cases that required Dutch language understanding and generation.
+We expect this model to be particularly useful for use-cases in the domains which were explicitly covered in our dataset, e.g. the analysis and/or generation of Dutch job descriptions, corporate filings and legislation.
 ### Out-of-Scope Use
 We recommend fine-tuning this model to your curated data to maximally avoid undesirable outputs.
 ## Training Details
 ### Training Data