Base model
hi, nice work.
I am wondering which model is used for the fine-tuning, like llama-7B? BTW, are you planning to open-source the fine-tuning code?
Thanks for pointing it out
@IICurious
, added the base model: ModernBert which (really rocks for English). The reason there is llama is because it has been fine-tuned through synthetic data provided by Llama:
https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy
And their policy states that models fine-tuned need to have this in there (disclaimer: this is not an legal or financial statement).
Hope this clarifies. I can share with you the repo for fine-tuning sure, check it out on our github account and feel free to leave a star if it's helpful:
https://github.com/AI4Privacy in the notebooks repo
Also to make the environment more sustainable, please free to join our discord: https://discord.gg/FmzWshaaQT
or if you have entreprise use-cases that we can assist with:
https://forms.gle/oDDYqQkyoTB93otHA / partnerships@ai4privacy.com