TaiPhone: A Phone-Scale LLM Rooted in Taiwanese Knowledge

TaiPhone is a low-cost, lightweight language model built for Traditional Chinese, with a strong focus on Taiwanese language, culture, and context. Trained on just 0.7 billion carefully curated tokens and enhanced with chat vector techniques, TaiPhone delivers superior performance compared to similarly sized open-source LLaMA-tuned 1B or 3B-scale LLMs. TaiPhone shows that with the right data, effective and culturally-aware models can be built at a fraction of the cost.

Model Information

Base model: https://huggingface.co/meta-llama/Llama-3.2-3B
Context length: 16k
Training detail:
- Numbers of tokens: 0.7B tokens
- Continual pretraining(CP) epochs: 2
- Fine-tuning(FT) epochs: 3
- CP learning rate: 5e-5 with cosine scheduler.
- FT learning rate: 1e-5 with cosine scheduler.

Benchmark

Evalaution code can be found here: https://github.com/aqweteddy/TaiphoneEval

MCQ Evaluation

The model is prompted to answer each multiple-choice question in free-form, without being constrained to a specific format.
A lightweight LLM (e.g., GPT-4.1-nano) is then used to extract the model’s final selected option from its response.
Accuracy is calculated by comparing the extracted answers against the correct choices.

Score Board

1B Scale

Model	TW-MCQ	MMLU-Redux
LLaMA3.2-1B-Instruct	0.305	0.403
LLaMA3.2-1B-it-chinese-kyara	0.360	0.405
LLaMA3.2-TaiPhone-1B-Instruct-v0.1 (Ours)	0.375	0.421

3B Scale

Model	TW-MCQ	MMLU-Redux
LLaMA3.2-3B-Instruct	0.442	0.569
LLaMA3.2-3B-it-chinese-kyara	0.462	0.405
Llama-3.2-3B-F1-Instruct	0.458	0.548
LLaMA3.2-TaiPhone-3B-Instruct-v0.1 (Ours)	0.502	0.578

TW-MCQ: aqweteddy/Taiwan-Curlture-MCQ
MMLU-Redux: https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux https://huggingface.co/datasets/aqweteddy/MMLU-Redux-MCQ

MT-Bench-Zhtw

LLM as a Judge

Dataset source
Evaluation focused on multiple aspects of conversational performance.
While TaiPhone outperforms Llama-3.2-3B-Instruct, it lags behind other open-source LLMs in certain areas—likely due to our current focus on knowledge enhancement. We aim to improve its extraction and roleplay capabilities in the next release.

Score Board

3B Scale

Model	writing	roleplay	reasoning	math	coding	extraction	stem	humanities
Llama-3.2-3B-Instruct	4.2	3.9	4.1	4.3	4.9	3.8	4.0	4.3
Llama-3.2-3B-F1-Instruct	5.5	6.9	4.2	3.9	3.8	4.7	5.2	7.6
Llama-3.2-Kyara-3B-it	5.7	7.2	4.8	6.3	5.2	5.3	5.9	7.5
Llama-3.2-TaiPhone-3B-Instruct-v0.1 (Ours)	5.5	5.8	4.9	5.0	5.0	3.8	4.5	7.3

aqweteddy
/

Llama3.2-TaiPhone-3B-Instruct-v0.1

TaiPhone: A Phone-Scale LLM Rooted in Taiwanese Knowledge

Model Information

Benchmark

MCQ Evaluation

Score Board

MT-Bench-Zhtw

LLM as a Judge

Score Board

Model tree for aqweteddy/Llama3.2-TaiPhone-3B-Instruct-v0.1

Collection including aqweteddy/Llama3.2-TaiPhone-3B-Instruct-v0.1

TaiPhone LLM