redmoe-ai-v1 commited on
Commit
21959ae
·
1 Parent(s): c75a7d9

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/performance.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,129 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ license_link: https://huggingface.co/rednote-hilab/dots.llm1.inst/blob/main/LICENSE
4
+ pipeline_tag: text-generation
5
+ base_model: rednote-hilab/dots.llm1.base
6
+ tags:
7
+ - chat
8
+ library_name: transformers
9
+ ---
10
+
11
+ # dots1
12
+
13
+
14
+ ## 1. Introduction
15
+
16
+
17
+ `dots.llm1` is a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs.
18
+ Leveraging our meticulously crafted and efficient data processing pipeline, `dots.llm1` achieves performance comparable to Qwen2.5-72B when trained on 11.2T high-quality tokens without synthetic data. To foster further research, we open-source intermediate training checkpoints at every one trillion tokens, providing valuable insights into the learning dynamics of large language models.
19
+
20
+
21
+ <p align="center">
22
+ <img width="90%" src="./figures/performance.png">
23
+ </p>
24
+
25
+
26
+ ## 2. Model Summary
27
+
28
+ **This repo contains the base and instruction-tuned `dots.llm1` model**. which has the following features:
29
+
30
+ - Type: A 14B/142B MoE model trained on 11.2T tokens.
31
+ - Training Stage: Pretraining & Post-training
32
+ - Architecture: Multi-head Attention with QK-Norm in Attention Layer, fine-grained MoE utilizing top-6 out of 128 routed experts, plus 2 shared experts.
33
+ - Number of Layers: 62
34
+ - Number of Attention Heads: 32
35
+ - Context Length: 32,768 tokens
36
+ - License: MIT
37
+
38
+ For more details, please refer to our [report](dots1_tech_report.pdf).
39
+
40
+ ## 3. Example Usage
41
+
42
+ ### Model Downloads
43
+
44
+ <div align="center">
45
+
46
+ | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
47
+ | :------------: | :------------: | :------------: | :------------: | :------------: |
48
+ | dots.llm1.base | 142B | 14B | 32K | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.base) |
49
+ | dots.llm1.inst | 142B | 14B | 32K | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.inst) |
50
+
51
+ </div>
52
+
53
+ ### Inference with huggingface
54
+
55
+ #### Text Completion
56
+
57
+ ```python
58
+ import torch
59
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
60
+
61
+ model_name = "rednote-hilab/dots.llm1.base"
62
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
63
+
64
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
65
+ model.generation_config = GenerationConfig.from_pretrained(model_name)
66
+
67
+ text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
68
+ inputs = tokenizer(text, return_tensors="pt")
69
+ outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
70
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
71
+ print(result)
72
+ ```
73
+
74
+ #### Chat Completion
75
+
76
+ ```python
77
+ import torch
78
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
79
+
80
+ model_name = "/cpfs/user/taishan/model/rgtjf/dots.llm1.inst"
81
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
82
+
83
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
84
+ model.generation_config = GenerationConfig.from_pretrained(model_name)
85
+
86
+ messages = [
87
+ {"role": "user", "content": "Write a piece of quicksort code in C++"}
88
+ ]
89
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
90
+ outputs = model.generate(input_tensor.to(model.device), max_new_tokens=200)
91
+
92
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
93
+ print(result)
94
+ ```
95
+
96
+
97
+ ### Inference with sglang
98
+ [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service. `sglang>=***` is required. It is as easy as
99
+
100
+ ```shell
101
+ python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000
102
+ ```
103
+ An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
104
+
105
+ ### Inference with vllm
106
+ [vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs.
107
+ `vllm>=***` is recommended.
108
+
109
+ ```shell
110
+ vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8
111
+ ```
112
+ An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
113
+
114
+ ## 4. Evaluation Results
115
+
116
+ Detailed evaluation results are reported in this [📑 report](dots1_tech_report.pdf).
117
+
118
+ ## Citation
119
+
120
+ If you find `dots.llm1` is useful or want to use in your projects, please kindly cite our paper:
121
+
122
+ ```
123
+ @article{dots1,
124
+ title={dots.llm1 Technical Report},
125
+ author={rednote-hilab},
126
+ journal={arXiv preprint arXiv:TBD},
127
+ year={2025}
128
+ }
129
+ ```
config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Dotsl1ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": null,
8
+ "eos_token_id": 151643,
9
+ "first_k_dense_replace": 1,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 10944,
14
+ "max_position_embeddings": 32768,
15
+ "model_type": "dots1",
16
+ "moe_intermediate_size": 1408,
17
+ "moe_layer_freq": 1,
18
+ "n_routed_experts": 128,
19
+ "n_shared_experts": 2,
20
+ "norm_topk_prob": true,
21
+ "num_attention_heads": 32,
22
+ "num_experts_per_tok": 6,
23
+ "num_hidden_layers": 62,
24
+ "num_key_value_heads": 32,
25
+ "pretraining_tp": 1,
26
+ "rms_norm_eps": 1e-05,
27
+ "rope_scaling": null,
28
+ "rope_theta": 10000000,
29
+ "routed_scaling_factor": 2.5,
30
+ "sliding_window": null,
31
+ "scoring_func": "noaux_tc",
32
+ "tie_word_embeddings": false,
33
+ "torch_dtype": "bfloat16",
34
+ "transformers_version": "4.46.3",
35
+ "use_cache": true,
36
+ "use_sliding_window": false,
37
+ "vocab_size": 152064
38
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": null,
4
+ "eos_token_id": 151643,
5
+ "transformers_version": "4.46.3"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "151643": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "151644": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "151645": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "151646": {
29
+ "content": "<|userprompt|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "151647": {
37
+ "content": "<|endofuserprompt|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "151648": {
45
+ "content": "<|response|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "151649": {
53
+ "content": "<|endofresponse|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "151650": {
61
+ "content": "<|system|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "151651": {
69
+ "content": "<|endofsystem|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "151652": {
77
+ "content": "<|observation|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "151653": {
85
+ "content": "<|endofobservation|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "151654": {
93
+ "content": "<|execution|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "151655": {
101
+ "content": "<|endofexecution|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "151656": {
109
+ "content": "<|reject-unknown|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "151657": {
117
+ "content": "<|sec-cot|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "151658": {
125
+ "content": "<|sec-end-cot|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ }
132
+ },
133
+ "additional_special_tokens": ["<|im_start|>", "<|im_end|>", "<|userprompt|>", "<|endofuserprompt|>", "<|response|>", "<|endofresponse|>", "<|system|>", "<|endofsystem|>", "<|observation|>", "<|endofobservation|>", "<|execution|>", "<|endofexecution|>", "<|reject-unknown|>", "<|sec-cot|>", "<|sec-end-cot|>"],
134
+ "bos_token": null,
135
+ "chat_template": "{% if messages[0]['role'] == 'system' %}<|system|>{{ messages[0]['content'] }}<|endofsystem|>{% set start_idx = 1 %}{% else %}<|system|><|endofsystem|>{% set start_idx = 0 %}{% endif %}{% for idx in range(start_idx, messages|length) %}{% if messages[idx]['role'] == 'user' %}<|userprompt|>{{ messages[idx]['content'] }}<|endofuserprompt|>{% elif messages[idx]['role'] == 'assistant' %}<|response|>{{ messages[idx]['content'] }}<|endofresponse|>{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] == 'user' %}<|response|>{% endif %}",
136
+ "clean_up_tokenization_spaces": false,
137
+ "eos_token": "<|endoftext|>",
138
+ "errors": "replace",
139
+ "model_max_length": 32768,
140
+ "pad_token": "<|endoftext|>",
141
+ "split_special_tokens": false,
142
+ "tokenizer_class": "Qwen2Tokenizer",
143
+ "unk_token": null
144
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff