README.md · ValiantLabs/Qwen3-8B-Esper3 at main

metadata

language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - esper
  - esper-3
  - valiant
  - valiant-labs
  - qwen
  - qwen-3
  - qwen-3-8b
  - 8b
  - reasoning
  - code
  - code-instruct
  - python
  - javascript
  - dev-ops
  - jenkins
  - terraform
  - scripting
  - powershell
  - azure
  - aws
  - gcp
  - cloud
  - problem-solving
  - architect
  - engineer
  - developer
  - creative
  - analytical
  - expert
  - rationality
  - conversational
  - chat
  - instruct
base_model: Qwen/Qwen3-8B
datasets:
  - sequelbox/Titanium2.1-DeepSeek-R1
  - sequelbox/Tachibana2-DeepSeek-R1
  - sequelbox/Raiden-DeepSeek-R1
license: apache-2.0

Support our open-source dataset and model releases!

Esper 3: Qwen3-4B, Qwen3-8B, Qwen3-14B

Esper 3 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3.

Finetuned on our DevOps and architecture reasoning and code reasoning data generated with Deepseek R1!
Improved general and creative reasoning to supplement problem-solving and general chat performance.
Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Prompting Guide

Esper 3 uses the Qwen 3 prompt format.

Esper 3 is a reasoning finetune; we recommend enable_thinking=True for all chats.

Example inference script to get started:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ValiantLabs/Qwen3-8B-Esper3"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Write a Terraform configuration that uses the `aws_ami` data source to find the latest Amazon Linux 2 AMI. Then, provision an EC2 instance using this dynamically determined AMI ID."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Esper 3 is created by Valiant Labs.

Check out our HuggingFace page to see all of our models!

We care about open source. For everyone to use.