mshojaei77's picture
Update README.md
cfed05e verified
---
library_name: transformers
language: fa
tags:
- persian
- text-generation
- qlora
- 4-bit-quantization
license: apache-2.0
datasets:
- mshojaei77/Persian_sft
metrics:
- bleu
base_model:
- google/gemma-3-4b-it
---
# Gemma 3-4B Persian (v0)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6556b1bb85d43542fa1a8f91/YY9cRRv1u_kBORuiKlr98.png)
`mshojaei77/gemma-3-4b-persian-v0` is a Persian-specialized model built on the Gemma 3 architecture. It leverages QLoRA for 4-bit quantization to reduce computational overhead while generating and understanding Persian text. In addition to text generation, the model also retains image input capabilities inherited from its base model.
## Usage
This model is compatible with both the Hugging Face Transformers library and Ollama.
### Running with Ollama
```bash
ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0
```
### Running with Hugging Face Transformers
1. **Install Dependencies:**
```bash
pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 accelerate
```
2. **Load Model and Tokenizer:**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "mshojaei77/gemma-3-4b-persian-v0"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto", # Use "cuda" for GPU usage if available
torch_dtype=torch.bfloat16, # Alternatively, use torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{
"role": "user",
"content": "توماس جفرسون کیست؟"
}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Data and Fine-Tuning
### Training Dataset
This model was fine-tuned using the [mshojaei77/Persian_sft](https://huggingface.co/datasets/mshojaei77/Persian_sft) dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions. The dataset features:
### Fine-Tuning
- **Method:** Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
- **Hardware:** one T4 GPU
- **Software:** Utilizes Hugging Face Transformers, with supporting libraries like `peft` for QLoRA and `bitsandbytes` for quantization
- **Trade-offs:** Reduced memory footprint at the expense of some precision compared to full-precision models
## Evaluation
[SOON]
## Usage Considerations and Limitations
### Intended Use Cases
- **Question Answering:** Responding accurately to Persian language queries
- **Instruction Following:** Interpreting and executing text-based instructions in Persian
- **Text Generation:** Producing fluent, context-aware Persian content
- **Conversational AI:** Integrating into chatbots and virtual assistants
- **Image Processing:** Retaining image input capabilities from the base model
### Limitations
- **Quantization Impact:** 4-bit quantization may reduce output precision and result in occasional incoherent responses.
- **Evaluation Scope:** Absence of comprehensive evaluation metrics specific to this variant.
- **Bias:** The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset.
- **Hallucination:** As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information.
- **Safety:** The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts.
## Maintenance and Future Work
This model is under active maintenance. Future updates may include:
- Additional evaluation metrics and benchmarks
- Enhanced safety tuning and bias mitigation strategies
- Expanded documentation and usage examples
- Incorporation of community feedback for iterative improvements
For any queries, contributions, or issues, please contact me.