We propose GrammarCoder, a grammar-based model built on a decoder-only architecture, which excels in auto-regressive tasks like code generation, completion, and translation. To enhance its ability to code generation, we apply continued pre-training and instruction tuning on existing code model weights (i.e., DeepSeek-Coder-1.3B-Base, Qwen2.5-1.5B-Base, and Qwen2.5-7B-Base), expanding its knowledge base.
Results
Compared with the model with the same experiment setting, Grammar-coder gained a better preformance on the dataset. The following table presents the code generation accuracy compared with the baseline.
Model | HumanEval | HumanEval+ | MBPP | MBPP+ |
---|---|---|---|---|
Base Models | ||||
DeepSeek-Coder-1.3B-Base | 34.8 | 28.7 | 56.7 | 47.9 |
Qwen2.5-1.5B-Base | 37.2 | 32.9 | 60.2 | 49.6 |
Qwen2.5-7B-Base | 57.9 | 50.6 | 74.9 | 62.9 |
Normal Token-Based CPT | ||||
DeepSeek-Coder-1.3B-Base (CPT) | 43.9 | 39.6 | 61.4 | 51.3 |
Qwen2.5-1.5B-Base (CPT) | 50.6 | 42.7 | 60.3 | 51.1 |
Qwen2.5-7B-Base (CPT) | 68.9 | 65.2 | 81.5 | 69.8 |
Grammar-Based CPT | ||||
GrammarCoder-1.3B-Base | 63.4 | 57.3 | 68.3 | 56.9 |
GrammarCoder-1.5B-Base | 63.4 | 59.1 | 64.8 | 55.3 |
GrammarCoder-7B-Base | 76.8 | 71.3 | 85.2 | 71.7 |
The model has been open-sourced, and the model and the corresponding tokenizer are stored in HuggingFace-GrammarCoder.
Requirements
- tree_sitter: 0.23.2
- tree_sitter_python: 0.23.5
Evaluation
Please refer to Github (https://github.com/LIANGQINGYUAN/GrammarCoder) for details on the evaluation.
@article{liang2025grammar,
title={Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?},
author={Liang, Qingyuan and Zhang, Zhao and Sun, Zeyu and Lin, Zheng and Luo, Qi and Xiao, Yueyi and Chen, Yizhou and Zhang, Yuqun and Zhang, Haotian and Zhang, Lu and Bin, Chen and Yingfei Xiong},
journal={arXiv preprint arXiv:2503.05507},
year={2025}
}
- Downloads last month
- 2