We propose GrammarCoder, a grammar-based model built on a decoder-only architecture, which excels in auto-regressive tasks like code generation, completion, and translation. To enhance its ability to code generation, we apply continued pre-training and instruction tuning on existing code model weights (i.e., DeepSeek-Coder-1.3B-Base, Qwen2.5-1.5B-Base, and Qwen2.5-7B-Base), expanding its knowledge base.

Results

Compared with the model with the same experiment setting, Grammar-coder gained a better preformance on the dataset. The following table presents the code generation accuracy compared with the baseline.

Model	HumanEval	HumanEval+	MBPP	MBPP+
Base Models
DeepSeek-Coder-1.3B-Base	34.8	28.7	56.7	47.9
Qwen2.5-1.5B-Base	37.2	32.9	60.2	49.6
Qwen2.5-7B-Base	57.9	50.6	74.9	62.9
Normal Token-Based CPT
DeepSeek-Coder-1.3B-Base (CPT)	43.9	39.6	61.4	51.3
Qwen2.5-1.5B-Base (CPT)	50.6	42.7	60.3	51.1
Qwen2.5-7B-Base (CPT)	68.9	65.2	81.5	69.8
Grammar-Based CPT
GrammarCoder-1.3B-Base	63.4	57.3	68.3	56.9
GrammarCoder-1.5B-Base	63.4	59.1	64.8	55.3
GrammarCoder-7B-Base	76.8	71.3	85.2	71.7

The model has been open-sourced, and the model and the corresponding tokenizer are stored in HuggingFace-GrammarCoder.

Requirements

tree_sitter: 0.23.2
tree_sitter_python: 0.23.5

Evaluation

Please refer to Github (https://github.com/LIANGQINGYUAN/GrammarCoder) for details on the evaluation.

@article{liang2025grammar,
  title={Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?},
  author={Liang, Qingyuan and Zhang, Zhao and Sun, Zeyu and Lin, Zheng and Luo, Qi and Xiao, Yueyi and Chen, Yizhou and Zhang, Yuqun and Zhang, Haotian and Zhang, Lu and Bin, Chen and Yingfei Xiong},
  journal={arXiv preprint arXiv:2503.05507},
  year={2025}
}

qyliang
/

GrammarCoder-1.5B-Base

Results

Requirements

Evaluation

Collection including qyliang/GrammarCoder-1.5B-Base

GrammarCoder