Feng Luo commited on
Commit
e73d95b
·
1 Parent(s): 2464065

update example usage

Browse files
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ ---
4
+
5
+ # AutoL2S-7B
6
+
7
+ This is the official model repository for **AutoL2S-7B**, a model fine-tuned for efficient reasoning based on [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/tree/main).
8
+
9
+ ## 💡 Overview
10
+
11
+ AutoL2S enables automatically switching between short and long reasoning paths based on input complexity. This repository contains:
12
+
13
+ - Model weights
14
+ - Configuration files
15
+ - necessary scripts in the `examples/` directory
16
+
17
+ ---
18
+
19
+ ## 🧩 Dependencies
20
+
21
+ We recommend using the model with [vLLM](https://github.com/vllm-project/vllm).
22
+ The code has been tested with:
23
+
24
+ ```
25
+ vLLM == 0.6.2
26
+ ```
27
+
28
+ ---
29
+
30
+ ## 🚀 How to Use
31
+
32
+ Run the inference example:
33
+
34
+ ```bash
35
+ cd examples
36
+ python run_inference.py
37
+ ```
38
+
39
+ Alternatively, **please download examples/prefixLLM.py and examples/template.py from this repository and put them in your working dir**.
40
+
41
+ ```python
42
+ from vllm import SamplingParams
43
+ from prefixLLM import PrefixLLM
44
+ from template import SYSTEM_PROMPT, SHORT_TRIGGER
45
+
46
+ llm = PrefixLLM(model="amandaa/AutoL2S-7b")
47
+ max_tokens, temp = 32768, 0.7
48
+ sampling_params_route = SamplingParams(max_tokens=max_tokens, temperature=temp, stop=["<specialLong>"], include_stop_str_in_output=True)
49
+ sampling_params_force_think = SamplingParams(max_tokens=max_tokens, temperature=temp)
50
+
51
+ question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"
52
+ messages = [
53
+ {"role": "system", "content": SYSTEM_PROMPT},
54
+ {"role": "user", "content": question}
55
+ ]
56
+ responses = llm.route_chat(messages=messages, sampling_params_route=sampling_params_route, sampling_params_force_think=sampling_params_force_think, use_tqdm=True, trigger_word=SHORT_TRIGGER)
57
+
58
+ print(SHORT_TRIGGER + responses[0].outputs[0].text)
59
+ ```
60
+
61
+ ---
62
+
63
+
64
+ ## 🔍 Citation
65
+
66
+ If you use this model in your work, please consider citing:
67
+
68
+ ```bibtex
69
+ @misc{autol2s2025,
70
+ title = {AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models},
71
+ author = {Luo, Feng* and Chuang, Yu-Neng* and Wang, Guanchu* and Le, Duy and Zhong, Shaochen and Liu, Hongyi and Yuan, Jiayi and Sui, Yang and Braverman, Vladimir and Chaudhary, Vipin and Hu, Xia},
72
+ journal={arXiv preprint},
73
+ year={2025}
74
+ }
75
+ ```
76
+
examples/__pycache__/prefixLLM.cpython-310.pyc ADDED
Binary file (3.98 kB). View file
 
examples/__pycache__/template.cpython-310.pyc ADDED
Binary file (2.01 kB). View file
 
examples/inference.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from vllm import SamplingParams
2
+
3
+ from prefixLLM import PrefixLLM
4
+ from template import SYSTEM_PROMPT, SHORT_TRIGGER
5
+
6
+
7
+ if __name__ == "__main__":
8
+ llm = PrefixLLM(model="amandaa/AutoL2S-7b")
9
+ max_tokens, temp = 32768, 0.7
10
+ sampling_params_route = SamplingParams(max_tokens=max_tokens, temperature=temp, stop=["<specialLong>"], include_stop_str_in_output=True)
11
+ sampling_params_force_think = SamplingParams(max_tokens=max_tokens, temperature=temp)
12
+
13
+ question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"
14
+ messages = [
15
+ {"role": "system", "content": SYSTEM_PROMPT},
16
+ {"role": "user", "content": question}
17
+ ]
18
+ responses = llm.route_chat(messages=messages, sampling_params_route=sampling_params_route, sampling_params_force_think=sampling_params_force_think, use_tqdm=True, trigger_word=SHORT_TRIGGER)
19
+
20
+ print(SHORT_TRIGGER + responses[0].outputs[0].text)
examples/prefixLLM.py ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ from typing import Dict, List, Optional, Sequence, Union
3
+
4
+ from vllm import LLM, SamplingParams
5
+ from vllm.entrypoints.chat_utils import (
6
+ ChatCompletionMessageParam,
7
+ apply_hf_chat_template,
8
+ apply_mistral_chat_template,
9
+ parse_chat_messages,
10
+ )
11
+ from vllm.inputs import PromptInputs, TextPrompt
12
+ from vllm.lora.request import LoRARequest
13
+ from vllm.outputs import RequestOutput
14
+ from vllm.transformers_utils.tokenizer import MistralTokenizer
15
+ from vllm.utils import is_list_of
16
+
17
+
18
+ _TAIL_WS_RE = re.compile(r"(?:\r?\n|\s)+$")
19
+
20
+ def needs_newline(text: str) -> bool:
21
+ """Return True when *text* does NOT already end with whitespace/newline."""
22
+ return _TAIL_WS_RE.search(text[-8:]) is None # inspect last few chars
23
+
24
+
25
+ def add_prefix(prompt: str, prefix: str, eos_token: str) -> str:
26
+ """Insert *prefix* before the first generated token.
27
+
28
+ Keeps EOS token at the very end if the template already appended it.
29
+ """
30
+ if prompt.endswith(eos_token):
31
+ return prompt[:-len(eos_token)] + prefix + eos_token
32
+ return prompt + prefix
33
+
34
+
35
+ class PrefixLLM(LLM):
36
+ """vLLM LLM subclass that conditionally prepends *trigger_word*."""
37
+
38
+ def route_chat(
39
+ self,
40
+ messages: Union[
41
+ List[ChatCompletionMessageParam],
42
+ List[List[ChatCompletionMessageParam]],
43
+ ],
44
+ sampling_params_route: Optional[Union[SamplingParams,
45
+ List[SamplingParams]]] = None,
46
+ sampling_params_force_think: Optional[Union[SamplingParams,
47
+ List[SamplingParams]]] = None,
48
+ use_tqdm: bool = True,
49
+ lora_request: Optional[LoRARequest] = None,
50
+ chat_template: Optional[str] = None,
51
+ add_generation_prompt: bool = True,
52
+ tools: Optional[List[Dict[str, any]]] = None,
53
+ *,
54
+ trigger_word: Optional[str] = None,
55
+ ) -> List[RequestOutput]:
56
+ """Drop-in replacement for `LLM.chat` with one extra keyword:
57
+
58
+ Parameters
59
+ ----------
60
+ trigger_word : str | None, default None
61
+ The prefix to inject. If ``None`` → no prefix injection.
62
+ """
63
+
64
+ tokenizer = self.get_tokenizer()
65
+ model_config = self.llm_engine.get_model_config()
66
+ eos_token = tokenizer.eos_token
67
+
68
+ orig_prompts: List[Union[TokensPrompt, TextPrompt]] = []
69
+ pref_prompts: List[Union[TokensPrompt, TextPrompt]] = []
70
+ mm_payloads: List[Optional[Dict[str, Any]]] = []
71
+
72
+ list_of_messages: List[List[ChatCompletionMessageParam]]
73
+
74
+ # Handle multi and single conversations
75
+ if is_list_of(messages, list):
76
+ # messages is List[List[...]]
77
+ list_of_messages = messages
78
+ else:
79
+ # messages is List[...]
80
+ list_of_messages = [messages]
81
+
82
+ prompts: List[Union[TokensPrompt, TextPrompt]] = []
83
+
84
+ for msgs in list_of_messages:
85
+ # ---- render chat template exactly once ----
86
+ if isinstance(tokenizer, MistralTokenizer):
87
+ prompt_data: Union[str, List[int]] = apply_mistral_chat_template(
88
+ tokenizer,
89
+ messages=msgs,
90
+ chat_template=chat_template,
91
+ add_generation_prompt=add_generation_prompt,
92
+ tools=tools,
93
+ )
94
+ mm_data = None # mistral util returns already embedded image tokens
95
+ else:
96
+ conversation, mm_data = parse_chat_messages(msgs, model_config, tokenizer)
97
+ prompt_data = apply_hf_chat_template(
98
+ tokenizer,
99
+ conversation=conversation,
100
+ chat_template=chat_template,
101
+ add_generation_prompt=add_generation_prompt,
102
+ tools=tools,
103
+ )
104
+
105
+ if is_list_of(prompt_data, int):
106
+ raise NotImplementedError
107
+ else:
108
+ orig_prompt = TextPrompt(prompt=prompt_data)
109
+
110
+ if trigger_word is None:
111
+ raise ValueError("trigger_word must be provided when using force_think logic")
112
+
113
+ need_nl = needs_newline(prompt_data)
114
+ prefix = trigger_word + ("\n" if need_nl else "")
115
+ pref_txt = add_prefix(prompt_data, prefix, eos_token)
116
+ pref_prompt = TextPrompt(prompt=pref_txt)
117
+
118
+ if mm_data is not None:
119
+ orig_prompt["multi_modal_data"] = mm_data
120
+ pref_prompt["multi_modal_data"] = copy.deepcopy(mm_data)
121
+
122
+ orig_prompts.append(orig_prompt)
123
+ pref_prompts.append(pref_prompt)
124
+
125
+ results = self.generate(
126
+ orig_prompts,
127
+ sampling_params=sampling_params_route,
128
+ use_tqdm=use_tqdm,
129
+ lora_request=lora_request,
130
+ )
131
+
132
+ need_force = [i for i, out in enumerate(results) if "<specialLong>" in out.outputs[0].text[:100]]
133
+
134
+
135
+ if len(need_force) == 0:
136
+ return results # early exit, nothing to redo
137
+
138
+ prompts_force = [pref_prompts[i] for i in need_force]
139
+
140
+ results_force = self.generate(
141
+ prompts_force,
142
+ sampling_params=sampling_params_force_think,
143
+ use_tqdm=use_tqdm,
144
+ lora_request=lora_request,
145
+ )
146
+
147
+ for idx, new_out in zip(need_force, results_force):
148
+ results[idx] = new_out
149
+
150
+ return results
examples/template.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_PROMPT = "Your role as an assistant is to solve problems using one of two ways. \
2
+ First, think step-by-step and give the answer. The solution should be formatted as follows: \
3
+ <|begin_of_solution|> \
4
+ {precise and concise solution} \
5
+ <|end_of_solution|> \
6
+ Second, your output thoroughly exploring questions through a systematic long \
7
+ thinking process before providing the final precise and accurate solutions. This requires \
8
+ engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, \
9
+ backtracing, and iteration to develop well-considered thinking process. \
10
+ Please structure your response into two main sections: Thought and Solution. \
11
+ In the Thought section, detail your reasoning process using the specified format: \
12
+ <|begin_of_thought|> {thought with steps separated with '\n\n'} \
13
+ <|end_of_thought|> \
14
+ Each step should include detailed considerations such as analisying questions, summarizing \
15
+ relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining \
16
+ any errors, and revisiting previous steps. \
17
+ In the Solution section, based on various attempts, explorations, and reflections from the Thought \
18
+ section, systematically present the final solution that you deem correct. The solution should \
19
+ remain a logical, accurate, concise expression style and detail necessary step needed to reach the \
20
+ conclusion, formatted as follows: \
21
+ <|begin_of_solution|> \
22
+ {final formatted, precise, and clear solution} \
23
+ <|end_of_solution|> \
24
+ Now, try to solve the following question through the above guidelines:"
25
+
26
+ SHORT_TRIGGER = "<|begin_of_solution|>\n\nThis is a trigger to ensure the model’s upcoming output <short>."