Feng Luo
commited on
Commit
·
e73d95b
1
Parent(s):
2464065
update example usage
Browse files- README.md +74 -1
- examples/__pycache__/prefixLLM.cpython-310.pyc +0 -0
- examples/__pycache__/template.cpython-310.pyc +0 -0
- examples/inference.py +20 -0
- examples/prefixLLM.py +150 -0
- examples/template.py +26 -0
README.md
CHANGED
@@ -1,3 +1,76 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
# AutoL2S-7B
|
6 |
+
|
7 |
+
This is the official model repository for **AutoL2S-7B**, a model fine-tuned for efficient reasoning based on [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/tree/main).
|
8 |
+
|
9 |
+
## 💡 Overview
|
10 |
+
|
11 |
+
AutoL2S enables automatically switching between short and long reasoning paths based on input complexity. This repository contains:
|
12 |
+
|
13 |
+
- Model weights
|
14 |
+
- Configuration files
|
15 |
+
- necessary scripts in the `examples/` directory
|
16 |
+
|
17 |
+
---
|
18 |
+
|
19 |
+
## 🧩 Dependencies
|
20 |
+
|
21 |
+
We recommend using the model with [vLLM](https://github.com/vllm-project/vllm).
|
22 |
+
The code has been tested with:
|
23 |
+
|
24 |
+
```
|
25 |
+
vLLM == 0.6.2
|
26 |
+
```
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
## 🚀 How to Use
|
31 |
+
|
32 |
+
Run the inference example:
|
33 |
+
|
34 |
+
```bash
|
35 |
+
cd examples
|
36 |
+
python run_inference.py
|
37 |
+
```
|
38 |
+
|
39 |
+
Alternatively, **please download examples/prefixLLM.py and examples/template.py from this repository and put them in your working dir**.
|
40 |
+
|
41 |
+
```python
|
42 |
+
from vllm import SamplingParams
|
43 |
+
from prefixLLM import PrefixLLM
|
44 |
+
from template import SYSTEM_PROMPT, SHORT_TRIGGER
|
45 |
+
|
46 |
+
llm = PrefixLLM(model="amandaa/AutoL2S-7b")
|
47 |
+
max_tokens, temp = 32768, 0.7
|
48 |
+
sampling_params_route = SamplingParams(max_tokens=max_tokens, temperature=temp, stop=["<specialLong>"], include_stop_str_in_output=True)
|
49 |
+
sampling_params_force_think = SamplingParams(max_tokens=max_tokens, temperature=temp)
|
50 |
+
|
51 |
+
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"
|
52 |
+
messages = [
|
53 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
54 |
+
{"role": "user", "content": question}
|
55 |
+
]
|
56 |
+
responses = llm.route_chat(messages=messages, sampling_params_route=sampling_params_route, sampling_params_force_think=sampling_params_force_think, use_tqdm=True, trigger_word=SHORT_TRIGGER)
|
57 |
+
|
58 |
+
print(SHORT_TRIGGER + responses[0].outputs[0].text)
|
59 |
+
```
|
60 |
+
|
61 |
+
---
|
62 |
+
|
63 |
+
|
64 |
+
## 🔍 Citation
|
65 |
+
|
66 |
+
If you use this model in your work, please consider citing:
|
67 |
+
|
68 |
+
```bibtex
|
69 |
+
@misc{autol2s2025,
|
70 |
+
title = {AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models},
|
71 |
+
author = {Luo, Feng* and Chuang, Yu-Neng* and Wang, Guanchu* and Le, Duy and Zhong, Shaochen and Liu, Hongyi and Yuan, Jiayi and Sui, Yang and Braverman, Vladimir and Chaudhary, Vipin and Hu, Xia},
|
72 |
+
journal={arXiv preprint},
|
73 |
+
year={2025}
|
74 |
+
}
|
75 |
+
```
|
76 |
+
|
examples/__pycache__/prefixLLM.cpython-310.pyc
ADDED
Binary file (3.98 kB). View file
|
|
examples/__pycache__/template.cpython-310.pyc
ADDED
Binary file (2.01 kB). View file
|
|
examples/inference.py
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from vllm import SamplingParams
|
2 |
+
|
3 |
+
from prefixLLM import PrefixLLM
|
4 |
+
from template import SYSTEM_PROMPT, SHORT_TRIGGER
|
5 |
+
|
6 |
+
|
7 |
+
if __name__ == "__main__":
|
8 |
+
llm = PrefixLLM(model="amandaa/AutoL2S-7b")
|
9 |
+
max_tokens, temp = 32768, 0.7
|
10 |
+
sampling_params_route = SamplingParams(max_tokens=max_tokens, temperature=temp, stop=["<specialLong>"], include_stop_str_in_output=True)
|
11 |
+
sampling_params_force_think = SamplingParams(max_tokens=max_tokens, temperature=temp)
|
12 |
+
|
13 |
+
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"
|
14 |
+
messages = [
|
15 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
16 |
+
{"role": "user", "content": question}
|
17 |
+
]
|
18 |
+
responses = llm.route_chat(messages=messages, sampling_params_route=sampling_params_route, sampling_params_force_think=sampling_params_force_think, use_tqdm=True, trigger_word=SHORT_TRIGGER)
|
19 |
+
|
20 |
+
print(SHORT_TRIGGER + responses[0].outputs[0].text)
|
examples/prefixLLM.py
ADDED
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import re
|
2 |
+
from typing import Dict, List, Optional, Sequence, Union
|
3 |
+
|
4 |
+
from vllm import LLM, SamplingParams
|
5 |
+
from vllm.entrypoints.chat_utils import (
|
6 |
+
ChatCompletionMessageParam,
|
7 |
+
apply_hf_chat_template,
|
8 |
+
apply_mistral_chat_template,
|
9 |
+
parse_chat_messages,
|
10 |
+
)
|
11 |
+
from vllm.inputs import PromptInputs, TextPrompt
|
12 |
+
from vllm.lora.request import LoRARequest
|
13 |
+
from vllm.outputs import RequestOutput
|
14 |
+
from vllm.transformers_utils.tokenizer import MistralTokenizer
|
15 |
+
from vllm.utils import is_list_of
|
16 |
+
|
17 |
+
|
18 |
+
_TAIL_WS_RE = re.compile(r"(?:\r?\n|\s)+$")
|
19 |
+
|
20 |
+
def needs_newline(text: str) -> bool:
|
21 |
+
"""Return True when *text* does NOT already end with whitespace/newline."""
|
22 |
+
return _TAIL_WS_RE.search(text[-8:]) is None # inspect last few chars
|
23 |
+
|
24 |
+
|
25 |
+
def add_prefix(prompt: str, prefix: str, eos_token: str) -> str:
|
26 |
+
"""Insert *prefix* before the first generated token.
|
27 |
+
|
28 |
+
Keeps EOS token at the very end if the template already appended it.
|
29 |
+
"""
|
30 |
+
if prompt.endswith(eos_token):
|
31 |
+
return prompt[:-len(eos_token)] + prefix + eos_token
|
32 |
+
return prompt + prefix
|
33 |
+
|
34 |
+
|
35 |
+
class PrefixLLM(LLM):
|
36 |
+
"""vLLM LLM subclass that conditionally prepends *trigger_word*."""
|
37 |
+
|
38 |
+
def route_chat(
|
39 |
+
self,
|
40 |
+
messages: Union[
|
41 |
+
List[ChatCompletionMessageParam],
|
42 |
+
List[List[ChatCompletionMessageParam]],
|
43 |
+
],
|
44 |
+
sampling_params_route: Optional[Union[SamplingParams,
|
45 |
+
List[SamplingParams]]] = None,
|
46 |
+
sampling_params_force_think: Optional[Union[SamplingParams,
|
47 |
+
List[SamplingParams]]] = None,
|
48 |
+
use_tqdm: bool = True,
|
49 |
+
lora_request: Optional[LoRARequest] = None,
|
50 |
+
chat_template: Optional[str] = None,
|
51 |
+
add_generation_prompt: bool = True,
|
52 |
+
tools: Optional[List[Dict[str, any]]] = None,
|
53 |
+
*,
|
54 |
+
trigger_word: Optional[str] = None,
|
55 |
+
) -> List[RequestOutput]:
|
56 |
+
"""Drop-in replacement for `LLM.chat` with one extra keyword:
|
57 |
+
|
58 |
+
Parameters
|
59 |
+
----------
|
60 |
+
trigger_word : str | None, default None
|
61 |
+
The prefix to inject. If ``None`` → no prefix injection.
|
62 |
+
"""
|
63 |
+
|
64 |
+
tokenizer = self.get_tokenizer()
|
65 |
+
model_config = self.llm_engine.get_model_config()
|
66 |
+
eos_token = tokenizer.eos_token
|
67 |
+
|
68 |
+
orig_prompts: List[Union[TokensPrompt, TextPrompt]] = []
|
69 |
+
pref_prompts: List[Union[TokensPrompt, TextPrompt]] = []
|
70 |
+
mm_payloads: List[Optional[Dict[str, Any]]] = []
|
71 |
+
|
72 |
+
list_of_messages: List[List[ChatCompletionMessageParam]]
|
73 |
+
|
74 |
+
# Handle multi and single conversations
|
75 |
+
if is_list_of(messages, list):
|
76 |
+
# messages is List[List[...]]
|
77 |
+
list_of_messages = messages
|
78 |
+
else:
|
79 |
+
# messages is List[...]
|
80 |
+
list_of_messages = [messages]
|
81 |
+
|
82 |
+
prompts: List[Union[TokensPrompt, TextPrompt]] = []
|
83 |
+
|
84 |
+
for msgs in list_of_messages:
|
85 |
+
# ---- render chat template exactly once ----
|
86 |
+
if isinstance(tokenizer, MistralTokenizer):
|
87 |
+
prompt_data: Union[str, List[int]] = apply_mistral_chat_template(
|
88 |
+
tokenizer,
|
89 |
+
messages=msgs,
|
90 |
+
chat_template=chat_template,
|
91 |
+
add_generation_prompt=add_generation_prompt,
|
92 |
+
tools=tools,
|
93 |
+
)
|
94 |
+
mm_data = None # mistral util returns already embedded image tokens
|
95 |
+
else:
|
96 |
+
conversation, mm_data = parse_chat_messages(msgs, model_config, tokenizer)
|
97 |
+
prompt_data = apply_hf_chat_template(
|
98 |
+
tokenizer,
|
99 |
+
conversation=conversation,
|
100 |
+
chat_template=chat_template,
|
101 |
+
add_generation_prompt=add_generation_prompt,
|
102 |
+
tools=tools,
|
103 |
+
)
|
104 |
+
|
105 |
+
if is_list_of(prompt_data, int):
|
106 |
+
raise NotImplementedError
|
107 |
+
else:
|
108 |
+
orig_prompt = TextPrompt(prompt=prompt_data)
|
109 |
+
|
110 |
+
if trigger_word is None:
|
111 |
+
raise ValueError("trigger_word must be provided when using force_think logic")
|
112 |
+
|
113 |
+
need_nl = needs_newline(prompt_data)
|
114 |
+
prefix = trigger_word + ("\n" if need_nl else "")
|
115 |
+
pref_txt = add_prefix(prompt_data, prefix, eos_token)
|
116 |
+
pref_prompt = TextPrompt(prompt=pref_txt)
|
117 |
+
|
118 |
+
if mm_data is not None:
|
119 |
+
orig_prompt["multi_modal_data"] = mm_data
|
120 |
+
pref_prompt["multi_modal_data"] = copy.deepcopy(mm_data)
|
121 |
+
|
122 |
+
orig_prompts.append(orig_prompt)
|
123 |
+
pref_prompts.append(pref_prompt)
|
124 |
+
|
125 |
+
results = self.generate(
|
126 |
+
orig_prompts,
|
127 |
+
sampling_params=sampling_params_route,
|
128 |
+
use_tqdm=use_tqdm,
|
129 |
+
lora_request=lora_request,
|
130 |
+
)
|
131 |
+
|
132 |
+
need_force = [i for i, out in enumerate(results) if "<specialLong>" in out.outputs[0].text[:100]]
|
133 |
+
|
134 |
+
|
135 |
+
if len(need_force) == 0:
|
136 |
+
return results # early exit, nothing to redo
|
137 |
+
|
138 |
+
prompts_force = [pref_prompts[i] for i in need_force]
|
139 |
+
|
140 |
+
results_force = self.generate(
|
141 |
+
prompts_force,
|
142 |
+
sampling_params=sampling_params_force_think,
|
143 |
+
use_tqdm=use_tqdm,
|
144 |
+
lora_request=lora_request,
|
145 |
+
)
|
146 |
+
|
147 |
+
for idx, new_out in zip(need_force, results_force):
|
148 |
+
results[idx] = new_out
|
149 |
+
|
150 |
+
return results
|
examples/template.py
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
SYSTEM_PROMPT = "Your role as an assistant is to solve problems using one of two ways. \
|
2 |
+
First, think step-by-step and give the answer. The solution should be formatted as follows: \
|
3 |
+
<|begin_of_solution|> \
|
4 |
+
{precise and concise solution} \
|
5 |
+
<|end_of_solution|> \
|
6 |
+
Second, your output thoroughly exploring questions through a systematic long \
|
7 |
+
thinking process before providing the final precise and accurate solutions. This requires \
|
8 |
+
engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, \
|
9 |
+
backtracing, and iteration to develop well-considered thinking process. \
|
10 |
+
Please structure your response into two main sections: Thought and Solution. \
|
11 |
+
In the Thought section, detail your reasoning process using the specified format: \
|
12 |
+
<|begin_of_thought|> {thought with steps separated with '\n\n'} \
|
13 |
+
<|end_of_thought|> \
|
14 |
+
Each step should include detailed considerations such as analisying questions, summarizing \
|
15 |
+
relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining \
|
16 |
+
any errors, and revisiting previous steps. \
|
17 |
+
In the Solution section, based on various attempts, explorations, and reflections from the Thought \
|
18 |
+
section, systematically present the final solution that you deem correct. The solution should \
|
19 |
+
remain a logical, accurate, concise expression style and detail necessary step needed to reach the \
|
20 |
+
conclusion, formatted as follows: \
|
21 |
+
<|begin_of_solution|> \
|
22 |
+
{final formatted, precise, and clear solution} \
|
23 |
+
<|end_of_solution|> \
|
24 |
+
Now, try to solve the following question through the above guidelines:"
|
25 |
+
|
26 |
+
SHORT_TRIGGER = "<|begin_of_solution|>\n\nThis is a trigger to ensure the model’s upcoming output <short>."
|