Dataset sample
Hi,
Thanks for well documentation
Could you please share us a little sample of dataset to be aware of dataset format and how to prepare such dataset for other language.
Best
Hi @raminh921 ,
We will be releasing the dataset soon, Which language are you looking to fine-tune ?
Hi @raminh921 ,
We will be releasing the dataset soon, Which language are you looking to fine-tune ?
"This would be great. I want to try Persian handwriting.
It is interesting that NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct can detect Persian printed documents with about 70% accuracy.
Do you think there are more considerations for working with handwriting images, such as page size, the number of samples, and variety of samples? Or is it better to have different text samples written by a small group of people or to crowdsource one text sample to a large number of people?"
I have finetuned a persian model a while back and didn't get a chance to evaluate it.
https://huggingface.co/oddadmix/Khanandeh-0.1-Persian-OCR-2B-Instruct
If you want to check it. Meanwhile I will look for the dataset for that persian model and try to upload it and share it.
@raminh921 , Please share your thoughts on the persian model. I would love for someone who speaks the language to give their feedback.
@raminh921 , Please share your thoughts on the persian model. I would love for someone who speaks the language to give their feedback.
Please refer to the bellow discussion:
https://huggingface.co/oddadmix/Khanandeh-0.1-Persian-OCR-2B-Instruct/discussions/3