Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

.gitattributes +1 -0
README.md +129 -3
config.json +38 -0
generation_config.json +6 -0
merges.txt +0 -0
model.safetensors.index.json +0 -0
tokenizer.json +0 -0
tokenizer_config.json +144 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+figures/performance.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,129 @@
----
-license: mit
----

+---
+license: mit
+license_link: https://huggingface.co/rednote-hilab/dots.llm1.inst/blob/main/LICENSE
+pipeline_tag: text-generation
+base_model: rednote-hilab/dots.llm1.base
+tags:
+- chat
+library_name: transformers
+---
+# dots1
+## 1. Introduction
+`dots.llm1` is a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs.
+Leveraging our meticulously crafted and efficient data processing pipeline, `dots.llm1` achieves performance comparable to Qwen2.5-72B when trained on 11.2T high-quality tokens without synthetic data. To foster further research, we open-source intermediate training checkpoints at every one trillion tokens, providing valuable insights into the learning dynamics of large language models.
+<p align="center">
+  <img width="90%" src="./figures/performance.png">
+</p>
+## 2. Model Summary
+**This repo contains the base and instruction-tuned `dots.llm1` model**. which has the following features:
+- Type: A 14B/142B MoE model trained on 11.2T tokens.
+- Training Stage: Pretraining & Post-training
+- Architecture: Multi-head Attention with QK-Norm in Attention Layer, fine-grained MoE utilizing top-6 out of 128 routed experts, plus 2 shared experts.
+- Number of Layers: 62
+- Number of Attention Heads: 32
+- Context Length: 32,768 tokens
+- License: MIT
+For more details, please refer to our [report](dots1_tech_report.pdf).
+## 3. Example Usage
+### Model Downloads
+<div align="center">
+| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
+| :------------: | :------------: | :------------: | :------------: | :------------: |
+| dots.llm1.base | 142B | 14B | 32K   | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.base)   |
+| dots.llm1.inst  | 142B | 14B |  32K   | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.inst)   |
+</div>
+### Inference with huggingface
+#### Text Completion
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+model_name = "rednote-hilab/dots.llm1.base"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
+model.generation_config = GenerationConfig.from_pretrained(model_name)
+text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
+result = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(result)
+```
+#### Chat Completion
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+model_name = "/cpfs/user/taishan/model/rgtjf/dots.llm1.inst"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
+model.generation_config = GenerationConfig.from_pretrained(model_name)
+messages = [
+    {"role": "user", "content": "Write a piece of quicksort code in C++"}
+]
+input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
+outputs = model.generate(input_tensor.to(model.device), max_new_tokens=200)
+result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
+print(result)
+```
+### Inference with sglang
+[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service. `sglang>=***` is required. It is as easy as
+```shell
+python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000
+```
+An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
+### Inference with vllm
+[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs.
+`vllm>=***` is recommended.
+```shell
+vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8
+```
+An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
+## 4. Evaluation Results
+Detailed evaluation results are reported in this [📑 report](dots1_tech_report.pdf).
+## Citation
+If you find `dots.llm1` is useful or want to use in your projects, please kindly cite our paper:
+```
+@article{dots1,
+      title={dots.llm1 Technical Report},
+      author={rednote-hilab},
+      journal={arXiv preprint arXiv:TBD},
+      year={2025}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "architectures": [
+    "Dotsl1ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": null,
+  "eos_token_id": 151643,
+  "first_k_dense_replace": 1,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 10944,
+  "max_position_embeddings": 32768,
+  "model_type": "dots1",
+  "moe_intermediate_size": 1408,
+  "moe_layer_freq": 1,
+  "n_routed_experts": 128,
+  "n_shared_experts": 2,
+  "norm_topk_prob": true,
+  "num_attention_heads": 32,
+  "num_experts_per_tok": 6,
+  "num_hidden_layers": 62,
+  "num_key_value_heads": 32,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000000,
+  "routed_scaling_factor": 2.5,
+  "sliding_window": null,
+  "scoring_func": "noaux_tc",
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.46.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": null,
+  "eos_token_id": 151643,
+  "transformers_version": "4.46.3"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,144 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|userprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|endofuserprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|response|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|endofresponse|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|system|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|endofsystem|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|observation|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|endofobservation|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|execution|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|endofexecution|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|reject-unknown|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<|sec-cot|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151658": {
+      "content": "<|sec-end-cot|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": ["<|im_start|>", "<|im_end|>", "<|userprompt|>", "<|endofuserprompt|>", "<|response|>", "<|endofresponse|>", "<|system|>", "<|endofsystem|>", "<|observation|>", "<|endofobservation|>", "<|execution|>", "<|endofexecution|>", "<|reject-unknown|>", "<|sec-cot|>", "<|sec-end-cot|>"],
+  "bos_token": null,
+  "chat_template": "{% if messages[0]['role'] == 'system' %}<|system|>{{ messages[0]['content'] }}<|endofsystem|>{% set start_idx = 1 %}{% else %}<|system|><|endofsystem|>{% set start_idx = 0 %}{% endif %}{% for idx in range(start_idx, messages|length) %}{% if messages[idx]['role'] == 'user' %}<|userprompt|>{{ messages[idx]['content'] }}<|endofuserprompt|>{% elif messages[idx]['role'] == 'assistant' %}<|response|>{{ messages[idx]['content'] }}<|endofresponse|>{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] == 'user' %}<|response|>{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff