```bash
git clone -b llama4 https://github.com/kIshizaki-sci/AutoAWQ.git
pip install -U transformers
pip install -e ./AutoAWQ
```

In [1]:
from awq import AutoAWQForCausalLM
import torch
import transformers
from transformers import AutoProcessor

In [2]:
print('torch version : ', torch.__version__)
print('transformers version : ', transformers.__version__)

torch version :  2.4.1+cu124
transformers version :  4.51.3


In [3]:
quant_path = 'kishizaki-sci/Llama-4-Scout-17B-16E-Instruct-AWQ'

In [4]:
%%time
model = AutoAWQForCausalLM.from_quantized(quant_path, torch_dtype=torch.float16, use_cache=True, device_map='auto')
processor = AutoProcessor.from_pretrained(quant_path)



Fetching 25 files:   0%|          | 0/25 [00:00<?, ?it/s]

Replacing MoE Block...:   0%|          | 0/48 [00:00<?, ?it/s]

Replacing layers...: 100%|██████████| 48/48 [00:08<00:00,  5.66it/s]


CPU times: user 1h 38min 56s, sys: 24min 23s, total: 2h 3min 19s
Wall time: 22min 8s


In [5]:
!nvidia-smi

Sat May  3 00:13:08 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15              Driver Version: 570.86.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA H100 NVL                On  |   00000000:C7:00.0 Off |                    0 |
| N/A   36C    P0             92W /  310W |   60518MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [6]:
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What does means the torsion free in the general relativit?"},
        ]
    },
]

In [7]:
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.model.device)

In [8]:
%%time
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
)

CPU times: user 1min 43s, sys: 867 ms, total: 1min 44s
Wall time: 1min 44s


In [9]:
response = processor.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])[0]
print(response)

A question from the realm of differential geometry and general relativity!

In general relativity, "torsion-free" refers to a property of a connection on a manifold, specifically in the context of Riemannian geometry.

**Torsion** is a measure of how much a connection "twists" or "turns" a vector as it is parallel-transported around a closed loop. In other words, it's a measure of how much the connection deviates from being "flat" or "Euclidean".

A **torsion-free connection**, also known as a **symmetric connection**, is a connection that has zero torsion. This means that when you parallel-transport a vector around a closed loop, it returns to its original orientation, without any twisting or turning.

In mathematical terms, a torsion-free connection satisfies the following condition:

$$\Gamma^i_{jk} = \Gamma^i_{kj}$$

where $\Gamma^i_{jk}$ are the Christoffel symbols of the second kind, which define the connection.

In general relativity, the Levi-Civita connection is a fundamental 