Models from inference-optimization org containing quantization bit info
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.2M • • 5.68k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 28.5k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.5k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 29