Dataset sample

#3
by raminh921 - opened

Hi,
Thanks for well documentation
Could you please share us a little sample of dataset to be aware of dataset format and how to prepare such dataset for other language.
Best

Network for Advancing Modern ArabicNLP & AI org

Hi @raminh921 ,

We will be releasing the dataset soon, Which language are you looking to fine-tune ?

Omartificial-Intelligence-Space changed discussion status to closed

Hi @raminh921 ,

We will be releasing the dataset soon, Which language are you looking to fine-tune ?

"This would be great. I want to try Persian handwriting.
It is interesting that NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct can detect Persian printed documents with about 70% accuracy.
Do you think there are more considerations for working with handwriting images, such as page size, the number of samples, and variety of samples? Or is it better to have different text samples written by a small group of people or to crowdsource one text sample to a large number of people?"

Network for Advancing Modern ArabicNLP & AI org

I have finetuned a persian model a while back and didn't get a chance to evaluate it.

https://huggingface.co/oddadmix/Khanandeh-0.1-Persian-OCR-2B-Instruct

If you want to check it. Meanwhile I will look for the dataset for that persian model and try to upload it and share it.

Network for Advancing Modern ArabicNLP & AI org

@raminh921 , Please share your thoughts on the persian model. I would love for someone who speaks the language to give their feedback.

@raminh921 , Please share your thoughts on the persian model. I would love for someone who speaks the language to give their feedback.

Please refer to the bellow discussion:
https://huggingface.co/oddadmix/Khanandeh-0.1-Persian-OCR-2B-Instruct/discussions/3

Sign up or log in to comment