llama_cpp_python: gguf_init_from_file_impl: failed to read tensor info
#14
by
miscw
- opened
.../Python/ai_test $ python main.py gguf_init_from_file_impl: tensor 'blk.0.ffn_down.weight' of type 36 (TYPE_IQ4_NL_4_4 REMOVED, use IQ4_NL with runtime repacking) has 6912 elements per row, not a multiple of block size (0) gguf_init_from_file_impl: failed to read tensor info llama_model_load: error loading model: llama_model_loader: failed to load model from /data/data/com.termux/files/home/.cache/huggingface/hub/models--microsoft--bitnet-b1.58-2B-4T-gguf/snapshots/0f9a32c738e25e05b399303a54e59c9826a35b36/./ggml-model-i2_s.gguf llama_model_load_from_file_impl: failed to load model Traceback (most recent call last): File "/storage/emulated/0/Python/ai_test/main.py", line 4, in <module> llm = Llama.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^ File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/llama_cpp/llama.py", line 2357, in from_pretrained return cls( ^^^^ File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/llama_cpp/llama.py", line 372, in __init__ internals.LlamaModel( File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/llama_cpp/_internals.py", line 56, in __init__ raise ValueError(f"Failed to load model from file: {path_model}") ValueError: Failed to load model from file: /data/data/com.termux/files/home/.cache/huggingface/hub/models--microsoft--bitnet-b1.58-2B-4T-gguf/snapshots/0f9a32c738e25e05b399303a54e59c9826a35b36/./ggml-model-i2_s.gguf .../Python/ai_test $
Python snippet:
from llama_cpp import Llama
# Load the model from Hugging Face repo (auto-downloads .gguf)
llm = Llama.from_pretrained(
repo_id="microsoft/bitnet-b1.58-2B-4T-gguf",
filename="ggml-model-i2_s.gguf",
n_ctx=512,
verbose=False,
)
# ANSI styles
GREEN = "\033[92m"
MAGENTA = "\033[95m"
RESET = "\033[0m"
# Chat loop
print(f"{MAGENTA}BitNet 2B ChatBot Ready. Type 'exit' to quit.{RESET}")
while True:
try:
user_input = input(f"{GREEN}You > {RESET}").strip()
if user_input.lower() in ["exit", "quit"]: break
output = llm(f"User: {user_input}\nAssistant:", max_tokens=200)
reply = output["choices"][0]["text"].strip()
print(f"{MAGENTA}Bot > {reply}{RESET}\n")
except KeyboardInterrupt:
break