TD-Llama-OP

TL;DR

TD (ToolDial)-Llama-OP (OverallPerformance) is the same model used in ToolDial paper Overall Performance Task. We encourage you to use this model to reproduce the results. Please refer the Experiments of our github page to see how our evaluation has proceed.

[Model Summary]

Trained with Q-lora quantization, and LoRA Adapters are merged to original weights.
Trained for 1 epoch with Adam-8bit optimizer with learning rate 0.00001 and beta 0.9 to 0.995

[How to load the model]

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

device = "cuda:0"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

## 1. Load the base model (we use llama3-8b-inst) with the given quantization config.
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    quantization_config=quant_config,
    device_map={"": device},
)
tokenizer = AutoTokenizer.from_pretrained("HOLILAB/td-llama-op")
tokenizer.pad_token_id = tokenizer.eos_token_id

## 2. Load the lora adapter with PeftModel
model = PeftModel.from_pretrained(base_model, "HOLILAB/td-llama-op")