Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,37 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
Please refer to Github (https://github.com/LIANGQINGYUAN/GrammarCoder) for details on the evaluation.
|
6 |
|
7 |
```
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
|
6 |
+
|
7 |
+
We propose GrammarCoder, a grammar-based model built on a decoder-only architecture, which excels in auto-regressive tasks like code generation, completion, and translation. To enhance its ability to code generation, we apply continued pre-training and instruction tuning on existing code model weights (i.e., DeepSeek-Coder-1.3B-Base, Qwen2.5-1.5B-Base, and Qwen2.5-7B-Base), expanding its knowledge base.
|
8 |
+
|
9 |
+
|
10 |
+
# Results
|
11 |
+
Compared with the model with the same experiment setting, Grammar-coder gained a better preformance on the dataset. The following table presents the code generation accuracy compared with the baseline.
|
12 |
+
|
13 |
+
| **Model** | **HumanEval** | **HumanEval+** | **MBPP** | **MBPP+** |
|
14 |
+
|------------------------------------|--------------|---------------|---------|---------|
|
15 |
+
| **Base Models** | | | | |
|
16 |
+
| DeepSeek-Coder-1.3B-Base | 34.8 | 28.7 | 56.7 | 47.9 |
|
17 |
+
| Qwen2.5-1.5B-Base | 37.2 | 32.9 | 60.2 | 49.6 |
|
18 |
+
| **Normal Token-Based CPT** | | | | |
|
19 |
+
| DeepSeek-Coder-1.3B-Base (CPT) | 43.9 | 39.6 | 61.4 | 51.3 |
|
20 |
+
| Qwen2.5-1.5B-Base (CPT) | 50.6 | 42.7 | 60.3 | 51.1 |
|
21 |
+
| Qwen2.5-7B-Base (CPT) | 68.9 | 65.2 | 81.5 | 69.8 |
|
22 |
+
| **Grammar-Based CPT** | | | | |
|
23 |
+
| **GrammarCoder-1.3B-Base** | 63.4 | 57.3 | 68.3 | 56.9 |
|
24 |
+
| **GrammarCoder-1.5B-Base** | 63.4 | 59.1 | 64.8 | 55.3 |
|
25 |
+
| **GrammarCoder-7B-Base** | **76.8** | **71.3** | **85.2** | **71.7** |
|
26 |
+
|
27 |
+
|
28 |
+
The model has been open-sourced, and the model and the corresponding tokenizer are stored in HuggingFace-[GrammarCoder](https://huggingface.co/collections/qyliang/grammarcoder-683fe8778270d31b08fe54a4).
|
29 |
+
|
30 |
+
# Requirements
|
31 |
+
- tree_sitter: 0.23.2
|
32 |
+
- tree_sitter_python: 0.23.5
|
33 |
+
|
34 |
+
|
35 |
+
# Evaluation
|
36 |
Please refer to Github (https://github.com/LIANGQINGYUAN/GrammarCoder) for details on the evaluation.
|
37 |
|
38 |
```
|