---
base_model: mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
library_name: peft
tags:
- generated_from_trainer
model-index:
- name: Meta-Llama-3.1-8B-Instruct-abliterated-Sabresooth
  results: []
license: llama3.1
datasets:
- Sabresooth/Sabresooth_Train
---

[Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) finetuned using the [ICONN-1-BasicChat-Data-SuperLite](https://huggingface.co/datasets/Sabresooth/Sabresooth_Train) dataset as requested by [@Enderchef](https://huggingface.co/Enderchef) under https://huggingface.co/mradermacher/model_requests/discussions/920

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)

axolotl version: `0.9.0`
```yaml
base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: Sabresooth/Sabresooth_Train
    chat_template: llama3
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

adapter: lora
lora_model_dir:

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00004

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
special_tokens:
  pad_token: <|end_of_text|>
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 8.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 3.4056        | 0.0336 | 1    | 4.5655          |
| 3.9338        | 0.2689 | 8    | 4.2118          |
| 1.4716        | 0.5378 | 16   | 2.0672          |
| 0.4684        | 0.8067 | 24   | 1.0214          |
| 0.0732        | 1.0672 | 32   | 0.4799          |
| 0.081         | 1.3361 | 40   | 0.0248          |
| 0.0064        | 1.6050 | 48   | 0.0024          |
| 0.0013        | 1.8739 | 56   | 0.0014          |
| 0.0004        | 2.1345 | 64   | 0.0003          |
| 0.0003        | 2.4034 | 72   | 0.0003          |
| 0.0002        | 2.6723 | 80   | 0.0005          |
| 0.0001        | 2.9412 | 88   | 0.0001          |
| 0.0001        | 3.2017 | 96   | 0.0001          |
| 0.0001        | 3.4706 | 104  | 0.0001          |
| 0.0002        | 3.7395 | 112  | 0.0001          |
| 0.0001        | 4.0    | 120  | 0.0001          |
| 0.0001        | 4.2689 | 128  | 0.0001          |
| 0.0001        | 4.5378 | 136  | 0.0001          |
| 0.0001        | 4.8067 | 144  | 0.0001          |
| 0.0001        | 5.0672 | 152  | 0.0001          |
| 0.0001        | 5.3361 | 160  | 0.0001          |
| 0.0001        | 5.6050 | 168  | 0.0001          |
| 0.0001        | 5.8739 | 176  | 0.0001          |
| 0.0001        | 6.1345 | 184  | 0.0001          |
| 0.0001        | 6.4034 | 192  | 0.0001          |
| 0.0           | 6.6723 | 200  | 0.0001          |
| 0.0           | 6.9412 | 208  | 0.0001          |
| 0.0001        | 7.2017 | 216  | 0.0001          |
| 0.0001        | 7.4706 | 224  | 0.0001          |
| 0.0001        | 7.7395 | 232  | 0.0001          |


### Framework versions

- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.0+cu128
- Datasets 3.5.0
- Tokenizers 0.21.1