Text Generation
Transformers
Safetensors
falcon_h1
falcon-h1

Hugging Face implementation is very slow during prefill

#2
by Jellyfish042 - opened

Why is that?

and the results look very strange

Technology Innovation Institute org

Hi @Jellyfish042
Thanks for the issue ! I think you might be running the non-Mamba kernel path, we need to upstream a fix on HF transformers @DhiyaEddine
Meanwhile can you try to run the model on GPU and make sure to install mamba-ssm and causal-conv1d ? pip install mamba-ssm causal-conv1d

Technology Innovation Institute org

Already on it !

Sign up or log in to comment