Update README.md

cfed05e verified 4 months ago

4.07 kB

	---
	library_name: transformers
	language: fa
	tags:
	- persian
	- text-generation
	- qlora
	- 4-bit-quantization
	license: apache-2.0
	datasets:
	- mshojaei77/Persian_sft
	metrics:
	- bleu
	base_model:
	- google/gemma-3-4b-it
	---
	# Gemma 3-4B Persian (v0)


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6556b1bb85d43542fa1a8f91/YY9cRRv1u_kBORuiKlr98.png)
	`mshojaei77/gemma-3-4b-persian-v0` is a Persian-specialized model built on the Gemma 3 architecture. It leverages QLoRA for 4-bit quantization to reduce computational overhead while generating and understanding Persian text. In addition to text generation, the model also retains image input capabilities inherited from its base model.

	## Usage

	This model is compatible with both the Hugging Face Transformers library and Ollama.

	### Running with Ollama

	```bash
	ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0
	```

	### Running with Hugging Face Transformers

	1. Install Dependencies:

	```bash
	pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 accelerate
	```

	2. Load Model and Tokenizer:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "mshojaei77/gemma-3-4b-persian-v0"

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto", # Use "cuda" for GPU usage if available
	torch_dtype=torch.bfloat16, # Alternatively, use torch.float16
	)
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	messages = [
	{
	"role": "user",
	"content": "توماس جفرسون کیست؟"
	}
	]
	inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True, tokenize=True, return_tensors="pt"
	).to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=200)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Data and Fine-Tuning

	### Training Dataset

	This model was fine-tuned using the [mshojaei77/Persian_sft](https://huggingface.co/datasets/mshojaei77/Persian_sft) dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions. The dataset features:

	### Fine-Tuning

	- Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
	- Hardware: one T4 GPU
	- Software: Utilizes Hugging Face Transformers, with supporting libraries like `peft` for QLoRA and `bitsandbytes` for quantization
	- Trade-offs: Reduced memory footprint at the expense of some precision compared to full-precision models

	## Evaluation

	[SOON]

	## Usage Considerations and Limitations

	### Intended Use Cases

	- Question Answering: Responding accurately to Persian language queries
	- Instruction Following: Interpreting and executing text-based instructions in Persian
	- Text Generation: Producing fluent, context-aware Persian content
	- Conversational AI: Integrating into chatbots and virtual assistants
	- Image Processing: Retaining image input capabilities from the base model

	### Limitations

	- Quantization Impact: 4-bit quantization may reduce output precision and result in occasional incoherent responses.
	- Evaluation Scope: Absence of comprehensive evaluation metrics specific to this variant.
	- Bias: The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset.
	- Hallucination: As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information.
	- Safety: The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts.

	## Maintenance and Future Work

	This model is under active maintenance. Future updates may include:

	- Additional evaluation metrics and benchmarks
	- Enhanced safety tuning and bias mitigation strategies
	- Expanded documentation and usage examples
	- Incorporation of community feedback for iterative improvements

	For any queries, contributions, or issues, please contact me.