qyliang commited on
Commit
bdb9b50
·
verified ·
1 Parent(s): 879ddea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -2,6 +2,37 @@
2
  license: apache-2.0
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  Please refer to Github (https://github.com/LIANGQINGYUAN/GrammarCoder) for details on the evaluation.
6
 
7
  ```
 
2
  license: apache-2.0
3
  ---
4
 
5
+
6
+
7
+ We propose GrammarCoder, a grammar-based model built on a decoder-only architecture, which excels in auto-regressive tasks like code generation, completion, and translation. To enhance its ability to code generation, we apply continued pre-training and instruction tuning on existing code model weights (i.e., DeepSeek-Coder-1.3B-Base, Qwen2.5-1.5B-Base, and Qwen2.5-7B-Base), expanding its knowledge base.
8
+
9
+
10
+ # Results
11
+ Compared with the model with the same experiment setting, Grammar-coder gained a better preformance on the dataset. The following table presents the code generation accuracy compared with the baseline.
12
+
13
+ | **Model** | **HumanEval** | **HumanEval+** | **MBPP** | **MBPP+** |
14
+ |------------------------------------|--------------|---------------|---------|---------|
15
+ | **Base Models** | | | | |
16
+ | DeepSeek-Coder-1.3B-Base | 34.8 | 28.7 | 56.7 | 47.9 |
17
+ | Qwen2.5-1.5B-Base | 37.2 | 32.9 | 60.2 | 49.6 |
18
+ | **Normal Token-Based CPT** | | | | |
19
+ | DeepSeek-Coder-1.3B-Base (CPT) | 43.9 | 39.6 | 61.4 | 51.3 |
20
+ | Qwen2.5-1.5B-Base (CPT) | 50.6 | 42.7 | 60.3 | 51.1 |
21
+ | Qwen2.5-7B-Base (CPT) | 68.9 | 65.2 | 81.5 | 69.8 |
22
+ | **Grammar-Based CPT** | | | | |
23
+ | **GrammarCoder-1.3B-Base** | 63.4 | 57.3 | 68.3 | 56.9 |
24
+ | **GrammarCoder-1.5B-Base** | 63.4 | 59.1 | 64.8 | 55.3 |
25
+ | **GrammarCoder-7B-Base** | **76.8** | **71.3** | **85.2** | **71.7** |
26
+
27
+
28
+ The model has been open-sourced, and the model and the corresponding tokenizer are stored in HuggingFace-[GrammarCoder](https://huggingface.co/collections/qyliang/grammarcoder-683fe8778270d31b08fe54a4).
29
+
30
+ # Requirements
31
+ - tree_sitter: 0.23.2
32
+ - tree_sitter_python: 0.23.5
33
+
34
+
35
+ # Evaluation
36
  Please refer to Github (https://github.com/LIANGQINGYUAN/GrammarCoder) for details on the evaluation.
37
 
38
  ```