Hugging Face implementation is very slow during prefill

by Jellyfish042 - opened May 22

Discussion

Jellyfish042

May 22

Why is that?

Jellyfish042

May 22

and the results look very strange

ybelkada

Technology Innovation Institute org May 22

Hi @Jellyfish042
Thanks for the issue ! I think you might be running the non-Mamba kernel path, we need to upstream a fix on HF transformers @DhiyaEddine
Meanwhile can you try to run the model on GPU and make sure to install mamba-ssm and causal-conv1d ? pip install mamba-ssm causal-conv1d

DhiyaEddine

Technology Innovation Institute org May 22

Already on it !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment