Edit Models filters

Inference Providers

Nebius AI Studio

HF Inference API

Misc

compressed-tensors

Inference Endpoints

text-generation-inference

8-bit precision

Mixture of Experts

Carbon Emissions

text-embeddings-inference

Misc with no match

4-bit precision

Models

1,652

Full-text search

Active filters: compressed-tensors

nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test

Text Generation • Updated Oct 9, 2024 • 4.66k

nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test-bos

Text Generation • Updated Oct 9, 2024 • 15

nm-testing/TinyLlama-1.1B-compressed-tensors-kv-cache-scheme

Text Generation • Updated Oct 9, 2024 • 46.3k

nm-testing/Meta-Llama-3-8B-Instruct-W4A16-compressed-tensors-test

Text Generation • Updated Oct 9, 2024 • 17

RedHatAI/Phi-3-mini-128k-instruct-quantized.w8a16

Text Generation • Updated Oct 9, 2024 • 21

RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a16

Text Generation • Updated Oct 9, 2024 • 20 • 2

nm-testing/Qwen2-0.5B-Instruct

Text Generation • Updated Oct 9, 2024 • 24

RedHatAI/Llama-2-7b-chat-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 1.02k • 1

RedHatAI/Meta-Llama-3-8B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 4.87k • 2

RedHatAI/Phi-3-mini-128k-instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 94

RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 18 • 2

RedHatAI/Qwen2-1.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 939

nm-testing/Qwen2-1.5B-Instruct-W8A16-Channelwise

Text Generation • Updated Oct 9, 2024 • 16

RedHatAI/Phi-3-mini-128k-instruct-quantized.w4a16

Text Generation • Updated Oct 9, 2024 • 18 • 1

RedHatAI/Qwen2-0.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 708

RedHatAI/Phi-3-medium-128k-instruct-quantized.w4a16

Text Generation • Updated Oct 9, 2024 • 5.08k • 3

RedHatAI/Qwen2-7B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 74

nm-testing/DeepSeek-Coder-V2-Lite-Instruct-FP8

Text Generation • Updated Feb 13 • 2.5k

RedHatAI/Meta-Llama-3-70B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 12

RedHatAI/Qwen2-72B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 284 • 1

nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V

Text Generation • Updated Oct 9, 2024 • 23

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-FP8-Channelwise-compressed-tensors

Text Generation • Updated Oct 9, 2024 • 749

nm-testing/Meta-Llama-3-8B-Instruct-Non-Uniform-compressed-tensors

Text Generation • Updated Oct 9, 2024 • 13

nm-testing/nonuniform

Text Generation • Updated Oct 9, 2024 • 14

nm-testing/Meta-Llama-3-8B-Instruct-nonuniform-test

Text Generation • Updated Oct 9, 2024 • 1.6k

RedHatAI/Mistral-7B-Instruct-v0.3-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 36 • 2

nm-testing/Meta-Llama-3-405B-Instruct-Up-Merge-fp8

Text Generation • Updated Oct 9, 2024 • 14 • 4

nm-testing/Meta-llama3-8b-Instruct-SmoothQuant-Fp8

Text Generation • Updated Oct 9, 2024 • 20

nm-testing/Meta-llama3-8b-Instruct-quant-FP8

Text Generation • Updated Oct 9, 2024 • 17

nm-testing/Llama-2-70b-chat-hf-W8A8-Dynamic-Per-Token

Text Generation • Updated Oct 9, 2024 • 11