Qwen3-0.6B Full-Precision + W8A8 Quantized MCQA Model
Repository: Kikinoking/MNLP_M2_quantized_model
This is a fine-tuned Qwen-3-0.6B causal-LM, trained on a concatenation of multiple MCQA datasets and then quantized to 8-bit weights and activations using the compressed-tensors format. It is designed for multiple-choice QA tasks, evaluated with the LightEval EPFL MNLP suite.
Model Details
- Base architecture: Qwen-3 (0.6B parameters)
- Pretrained checkpoint:
Qwen/Qwen3-0.6B-Base
- Fine-tuning data sources:
- ScienceQA
- QASC
- OpenBookQA
- MathQA
- CommonsenseQA
- MCQA prompts generated via ChatGPT (labeled
M1_chatgpt
)
- Dataset split: 95% train / 5% validation
- Tokenization:
AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")
- Left padding, EOS token as pad_token
- Sequence length capped at 2048 tokens
Quantization
- Method:
compressed-tensors
/naive-quantized
- Precision: 8-bit weights + 8-bit activations
- Layers kept in FP32: Language modeling head
- Checkpoint: Compatible with CPU and GPU inference
Evaluation
Tested using LightEval EPFL MNLP on the MCQA task:
lighteval accelerate --eval-mode lighteval --save-details --override-batch-size 8 --custom-tasks community_tasks/mnlp_mcqa_evals.py --output-dir out/lighteval_quant model_configs/quantized_model.yaml "community|mnlp_mcqa_evals|0|0"
Results:
Accuracy: 0.30 ± 0.15
Normalized Accuracy: 0.30 ± 0.15
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(
"Kikinoking/MNLP_M2_quantized_model", trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
"Kikinoking/MNLP_M2_quantized_model",
trust_remote_code=True,
device_map="auto",
)
License
Being a 0.6B-parameter model, it may struggle with very long or ambiguous queries.
Quantization can introduce a slight drop in accuracy (~5–10%).
License: CC BY-NC 4.0 (inherits from the base Qwen-3 license)
- Downloads last month
- 92
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support