llama_cpp_python: gguf_init_from_file_impl: failed to read tensor info

#14
by miscw - opened
.../Python/ai_test $ python main.py                                       gguf_init_from_file_impl: tensor 'blk.0.ffn_down.weight' of type 36 (TYPE_IQ4_NL_4_4 REMOVED, use IQ4_NL with runtime repacking) has 6912 elements per row, not a multiple of block size (0)                                  gguf_init_from_file_impl: failed to read tensor info                      llama_model_load: error loading model: llama_model_loader: failed to load model from /data/data/com.termux/files/home/.cache/huggingface/hub/models--microsoft--bitnet-b1.58-2B-4T-gguf/snapshots/0f9a32c738e25e05b399303a54e59c9826a35b36/./ggml-model-i2_s.gguf                                                                                                                 llama_model_load_from_file_impl: failed to load model                     Traceback (most recent call last):                                          File "/storage/emulated/0/Python/ai_test/main.py", line 4, in <module>      llm = Llama.from_pretrained(                                                    ^^^^^^^^^^^^^^^^^^^^^^                                            File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/llama_cpp/llama.py", line 2357, in from_pretrained                                 return cls(                                                                      ^^^^                                                             File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/llama_cpp/llama.py", line 372, in __init__                                         internals.LlamaModel(                                                   File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/llama_cpp/_internals.py", line 56, in __init__                                     raise ValueError(f"Failed to load model from file: {path_model}")     ValueError: Failed to load model from file: /data/data/com.termux/files/home/.cache/huggingface/hub/models--microsoft--bitnet-b1.58-2B-4T-gguf/snapshots/0f9a32c738e25e05b399303a54e59c9826a35b36/./ggml-model-i2_s.gguf      .../Python/ai_test $

Python snippet:

from llama_cpp import Llama

# Load the model from Hugging Face repo (auto-downloads .gguf)
llm = Llama.from_pretrained(
    repo_id="microsoft/bitnet-b1.58-2B-4T-gguf",
    filename="ggml-model-i2_s.gguf",
    n_ctx=512,
    verbose=False,
)

# ANSI styles
GREEN = "\033[92m"
MAGENTA = "\033[95m"
RESET = "\033[0m"

# Chat loop
print(f"{MAGENTA}BitNet 2B ChatBot Ready. Type 'exit' to quit.{RESET}")
while True:
    try:
        user_input = input(f"{GREEN}You > {RESET}").strip()
        if user_input.lower() in ["exit", "quit"]: break

        output = llm(f"User: {user_input}\nAssistant:", max_tokens=200)
        reply = output["choices"][0]["text"].strip()
        print(f"{MAGENTA}Bot > {reply}{RESET}\n")

    except KeyboardInterrupt:
        break

Sign up or log in to comment