image/png

Table of Contents

  1. TL;DR
  2. Model Details
  3. Training Details
  4. Usage
  5. Evaluation
  6. Citation

TL;DR

Model Details

Model Description

  • Developed by: https://www.tii.ae
  • Model type: Causal decoder-only / Base version
  • Architecture: Pure-transformer - 1.58bit version
  • Language(s) (NLP): English
  • License: Falcon-LLM License

Training details

For more details about the training protocol of this model, please refer to the Falcon-E technical blogpost.

Usage

Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. There are multiple ways to interact with the model depending on your target usage. For each of the Falcon-E series model, you have three variants: the BitNet model, the prequantized checkpoint for fine-tuning and the bfloat16 version of the BitNet model.

Inference

πŸ€— transformers

In case you want to perform inference on the BitNet checkpoint run:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/Falcon-E-3B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
).to("cuda")

# Perform text generation

If you want to rather use the classic bfloat16 version, you can run:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/Falcon-E-3B-Instruct"
revision = "bfloat16"

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.bfloat16,
  revision=revision,
).to("cuda")

# Perform text generation

BitNet

git clone https://github.com/microsoft/BitNet && cd BitNet
pip install -r requirements.txt
python setup_env.py --hf-repo tiiuae/Falcon-E-3B-Instruct -q i2_s
python run_inference.py -m models/Falcon-E-3B-Instruct/ggml-model-i2_s.gguf -p "You are a helpful assistant" -cnv

Fine-tuning

For fine-tuning the model, you should load the prequantized revision of the model and use the onebitllms Python package:

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer
+ from onebitllms import replace_linear_with_bitnet_linear, quantize_to_1bit

model_id = "tiiuae/Falcon-E-3B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, revision="prequantized")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
+    revision="prequantized"
)
+ model = replace_linear_with_bitnet_linear(model)

trainer = SFTTrainer(
    model,
    ...
)

trainer.train()

+ quantize_to_1bit(output_directory)

Evaluation

We report in the following table our internal pipeline benchmarks:

Note evaluation results are normalized score from former Hugging Face leaderboard v2 tasks

For 1B scale models and below
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Qwen-2.5-0.5B 0.5B 1GB 16.27 3.93 0.0 2.08 6.95 10.06 6.55
SmolLM2-360M 0.36B 720MB 21.15 1.21 0.0 7.73 5.54 1.88 6.25
Qwen-2.5-1.5B 1.5B 3.1GB 26.74 9.14 16.66 5.27 20.61 4.7 13.85
Llama-3.2-1B 1.24B 2.47GB 14.78 1.21 4.37 2.56 2.26 0 4.2
SmolLM2-1.7B 1.7B 3.4GB 24.4 2.64 9.3 4.6 12.64 3.91 9.58
Falcon-3-1B-Base 1.5B 3GB 24.28 3.32 11.34 9.71 6.76 3.91 9.89
Hymba-1.5B-Base 1.5B 3GB 22.95 1.36 7.69 5.18 10.25 0.78 8.04
Falcon-E-1B-Base 1.8B 635MB 32.9 10.97 2.8 3.65 12.28 17.82 13.40
For 3B scale models
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Falcon-3-3B-Base 3B 6.46GB 15.74 11.78 21.58 6.27 18.09 6.26 15.74
Qwen2.5-3B 3B 6.17GB 26.9 14.8 24.3 11.76 24.48 6.38 18.1
Falcon-E-3B-Base 3B 955MB 36.67 13.45 8.67 4.14 19.83 27.16 18.32

Below are the results for instruction fine-tuned models:

For 1B scale models and below
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Qwen-2.5-0.5B-Instruct 500M 1GB 30.71 0 8.43 0.94 7.75 0 6.59
SmolLM2-360M-Instruct 360M 720MB 38.42 1.51 4.17 2.77 1.3 0.67 8.14
Qwen-2.5-1.5B-Instruct 1.5B 3.1GB 44.76 22.05 19.81 3.19 19.99 0.78 18.43
SmolLM2-1.7B 1.7B 3.4GB 53.68 5.82 10.92 4.1 11.71 0 15.02
Falcon-3-1B-Instruct 1.5B 3GB 55.57 6.34 12.96 10.56 9.32 2.24 16.16
Hymba-1.5B-Instruct 1.5B 3GB 60.09 2.72 4.59 1.05 11.56 5.515 14.19
Falcon-E-1B-Instruct 1.8B 635MB 54.35 9.12 16.5 2.51 19.42 9.64 18.59
For 3B scale models
Model Nb Params Mem Footprint IFEVAL Math-Hard GPQA MuSR BBH MMLU-Pro Avg.
Falcon-3-3B-Instruct 3B 6.46GB 69.77 25 26.29 11.13 22.28 5.15 26.6
Qwen2.5-3B-Instruct 3B 6.17GB 64.75 36.78 25.8 7.57 25.05 3.02 27.16
Falcon-E-3B-Instruct 3B 955MB 60.97 15.3 23.59 2.12 26.45 7.45 22.64666667

Useful links

Citation

If the Falcon-E family of models were helpful to your work, feel free to give us a cite.

@misc{tiionebitllms,
    title = {Falcon-E, a series of powerful, universal and fine-tunable 1.58bit language models.},
    author = {Falcon-LLM Team},
    month = {April},
    url = {https://falcon-lm.github.io/blog/falcon-edge},
    year = {2025}
}
Downloads last month
1,306
Safetensors
Model size
864M params
Tensor type
BF16
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using tiiuae/Falcon-E-3B-Instruct 2

Collection including tiiuae/Falcon-E-3B-Instruct