We propose GrammarCoder, a grammar-based model built on a decoder-only architecture, which excels in auto-regressive tasks like code generation, completion, and translation. To enhance its ability to code generation, we apply continued pre-training and instruction tuning on existing code model weights (i.e., DeepSeek-Coder-1.3B-Base, Qwen2.5-1.5B-Base, and Qwen2.5-7B-Base), expanding its knowledge base.

Results

Compared with the model with the same experiment setting, Grammar-coder gained a better preformance on the dataset. The following table presents the code generation accuracy compared with the baseline.

Model HumanEval HumanEval+ MBPP MBPP+
Base Models
DeepSeek-Coder-1.3B-Base 34.8 28.7 56.7 47.9
Qwen2.5-1.5B-Base 37.2 32.9 60.2 49.6
Qwen2.5-7B-Base 57.9 50.6 74.9 62.9
Normal Token-Based CPT
DeepSeek-Coder-1.3B-Base (CPT) 43.9 39.6 61.4 51.3
Qwen2.5-1.5B-Base (CPT) 50.6 42.7 60.3 51.1
Qwen2.5-7B-Base (CPT) 68.9 65.2 81.5 69.8
Grammar-Based CPT
GrammarCoder-1.3B-Base 63.4 57.3 68.3 56.9
GrammarCoder-1.5B-Base 63.4 59.1 64.8 55.3
GrammarCoder-7B-Base 76.8 71.3 85.2 71.7

The model has been open-sourced, and the model and the corresponding tokenizer are stored in HuggingFace-GrammarCoder.

Requirements

  • tree_sitter: 0.23.2
  • tree_sitter_python: 0.23.5

Evaluation

Please refer to Github (https://github.com/LIANGQINGYUAN/GrammarCoder) for details on the evaluation.

@article{liang2025grammar,
  title={Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?},
  author={Liang, Qingyuan and Zhang, Zhao and Sun, Zeyu and Lin, Zheng and Luo, Qi and Xiao, Yueyi and Chen, Yizhou and Zhang, Yuqun and Zhang, Haotian and Zhang, Lu and Bin, Chen and Yingfei Xiong},
  journal={arXiv preprint arXiv:2503.05507},
  year={2025}
}
Downloads last month
4
Safetensors
Model size
1.55B params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including qyliang/GrammarCoder-1.5B-Base