Rabbit
Collection
6 items
•
Updated
•
1
Qwen3-4B 기반의 LLM 모델로 한국어 및 영어 데이터셋을 사용하여 파인튜닝한 한국어 모델입니다.
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.8400 | ± | 0.0101 |
strict-match | 5 | exact_match | ↑ | 0.8378 | ± | 0.0102 | ||
hrm8k | N/A | |||||||
- hrm8k_gsm8k | 1 | none | 0 | exact_match | ↑ | 0.8196 | ± | 0.0106 |
- hrm8k_ksm | 1 | none | 0 | exact_match | ↑ | 0.0511 | ± | 0.0058 |
- hrm8k_math | 1 | none | 0 | exact_match | ↑ | 0.5539 | ± | 0.0093 |
- hrm8k_mmmlu | 1 | none | 0 | exact_match | ↑ | 0.5362 | ± | 0.0230 |
- hrm8k_omni_math | 1 | none | 0 | exact_match | ↑ | 0.1812 | ± | 0.0088 |
ifeval | 4 | none | 0 | inst_level_loose_acc | ↑ | 0.8753 | ± | N/A |
none | 0 | inst_level_strict_acc | ↑ | 0.8609 | ± | N/A | ||
none | 0 | prompt_level_loose_acc | ↑ | 0.8244 | ± | 0.0164 | ||
none | 0 | prompt_level_strict_acc | ↑ | 0.8078 | ± | 0.0170 |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
haerae | 1 | none | 0 | acc | ↑ | 0.6654 | ± | 0.0140 |
none | 0 | acc_norm | ↑ | 0.6654 | ± | 0.0140 | ||
kobest | 1 | none | 0 | acc | ↑ | 0.7768 | ± | 0.0057 |
none | 0 | acc_norm | ↑ | 0.5880 | ± | 0.0220 | ||
none | 0 | f1 | ↑ | 0.7764 | ± | N/A |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
kmmlu_direct | 2 | none | 0 | exact_match | ↑ | 0.5212 | ± | 0.0026 |
- kmmlu_direct_applied_science | 2 | none | 0 | exact_match | ↑ | 0.4997 | ± | 0.0046 |
- kmmlu_direct_humss | 2 | none | 0 | exact_match | ↑ | 0.5365 | ± | 0.0068 |
- kmmlu_direct_other | 2 | none | 0 | exact_match | ↑ | 0.5130 | ± | 0.0053 |
- kmmlu_direct_stem | 2 | none | 0 | exact_match | ↑ | 0.5455 | ± | 0.0048 |
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "CarrotAI/Rabbit3-Ko-4B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
For deployment, you can use sglang>=0.4.6.post1 or vllm>=0.8.5 or to create an OpenAI-compatible API endpoint: