Guff Q - Search

About 6,140,000 results

Open links in new tab

Any time

huggingface.co
https://huggingface.co › docs › hub › gguf
GGUF - Hugging Face
Hugging Face Hub supports all file formats, but has built-in features for GGUF format, a binary format that is optimized for quick loading and saving of models, making it highly efficient for …
reddit.com
https://www.reddit.com › LocalLLaMA › comments › for_those_who...
For those who don't know what different model formats (GGUF ... - Reddit
GGUF can be executed solely on a CPU or partially/fully offloaded to a GPU. By utilizing K quants, the GGUF can range from 2 bits to 8 bits. Previously, GPTQ served as a GPU-only …
huggingface.co
https://huggingface.co › bartowski
bartowski/DeepSeek-R1-GGUF - Hugging Face
Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon. Good quality, default size for most use cases, recommended. Slightly lower quality with more …
github.com
https://github.com › ggml-org › llama.cpp › discussions
Quantizing LLM to GGML or GUFF Format: A Comprehensive Guide #4068 - GitHub
Nov 13, 2023 · Yes, I would like to know what main techniques are used for quantization in GGML or GUFF format. For example, GPTQ quantizes value by calibration with datasets to minimize …
civitai.com
https://civitai.com › articles
Flux GGUF - Civitai
Mar 1, 2025 · Ideal for research or applications where output quality is paramount. Offers near-original quality with some performance gains. Good for applications requiring high accuracy …
maartengrootendorst.com
https://newsletter.maartengrootendorst.com › which-quantization...
Which Quantization Method is Right for You? (GPTQ vs. GGUF vs.
Nov 13, 2023 · GPTQ is a p ost-t raining q uantization (PTQ) method for 4-bit quantization that focuses primarily on GPU inference and performance. The idea behind the method is that it …
reddit.com
https://www.reddit.com › LocalLLaMA › comments › overview_of...
Overview of GGUF quantization methods : r/LocalLLaMA - Reddit
Mar 9, 2024 · In case anyone finds it helpful, here is what I found and how I understand the current state. TL;DR: K-quants are not obsolete: depending on your HW, they may run faster …
csdn.net
https://blog.csdn.net › znsoft › article › details
GGML 或GGUF的14种不同量化模式说明 - CSDN博客
Feb 15, 2025 · 它们遵循特定的命名约定：“q”+ 用于存储权重的位数（精度）+ 特定变体。以下是所有可能的量化方法及其相应用例的列表，基于 TheBloke 制作的模型卡中的描述，针 …
huggingface.co
https://huggingface.co › unsloth
unsloth/DeepSeek-V3-GGUF - Hugging Face
See our collection for versions of Deepseek V3 including bf16 and original formats. Q2_K_XS should run ok in ~40GB of CPU / GPU VRAM with automatic llama.cpp offloading. Do not …
reddit.com
https://www.reddit.com › LocalLLaMA › comments › the...
The difference between quantization methods for the same bits
Jul 25, 2023 · Using GGML quantized models, let's say we are going to talk about 4bit. I see a lot of versions suffixed with either 0, 1, k_s or k_m. I understand that the difference is in the way …
Pagination
- 1
- 2
- 3
- 4
- Next

GGUF - Hugging Face

For those who don't know what different model formats (GGUF ... - Reddit

bartowski/DeepSeek-R1-GGUF - Hugging Face

Quantizing LLM to GGML or GUFF Format: A Comprehensive Guide #4068 - GitHub

Flux GGUF - Civitai

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs.

Overview of GGUF quantization methods : r/LocalLLaMA - Reddit

GGML 或GGUF的14种不同量化模式说明 - CSDN博客

unsloth/DeepSeek-V3-GGUF - Hugging Face

The difference between quantization methods for the same bits