Hugging Face implementation is very slow during prefill
#2
by
Jellyfish042
- opened
Why is that?
and the results look very strange
Hi
@Jellyfish042
Thanks for the issue ! I think you might be running the non-Mamba kernel path, we need to upstream a fix on HF transformers
@DhiyaEddine
Meanwhile can you try to run the model on GPU and make sure to install mamba-ssm
and causal-conv1d
? pip install mamba-ssm causal-conv1d
Already on it !