-
Notifications
You must be signed in to change notification settings - Fork 12.5k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
44 Releases published by 1 person
-
b5949
published
Jul 21, 2025 -
b5950
published
Jul 21, 2025 -
b5952
published
Jul 21, 2025 -
b5953
published
Jul 21, 2025 -
b5954
published
Jul 21, 2025 -
b5956
published
Jul 22, 2025 -
b5957
published
Jul 22, 2025 -
b5958
published
Jul 22, 2025 -
b5959
published
Jul 22, 2025 -
b5960
published
Jul 22, 2025 -
b5961
published
Jul 22, 2025 -
b5962
published
Jul 22, 2025 -
b5963
published
Jul 22, 2025 -
b5965
published
Jul 23, 2025 -
b5966
published
Jul 23, 2025 -
b5967
published
Jul 23, 2025 -
b5968
published
Jul 23, 2025 -
b5970
published
Jul 23, 2025 -
b5972
published
Jul 23, 2025 -
b5973
published
Jul 23, 2025 -
b5975
published
Jul 24, 2025 -
b5976
published
Jul 24, 2025 -
b5978
published
Jul 24, 2025 -
b5979
published
Jul 24, 2025 -
b5980
published
Jul 24, 2025 -
b5981
published
Jul 24, 2025 -
b5984
published
Jul 24, 2025 -
b5985
published
Jul 24, 2025 -
b5986
published
Jul 25, 2025 -
b5987
published
Jul 25, 2025 -
b5988
published
Jul 25, 2025 -
b5989
published
Jul 25, 2025 -
b5990
published
Jul 25, 2025 -
b5992
published
Jul 25, 2025 -
b5993
published
Jul 25, 2025 -
b5994
published
Jul 25, 2025 -
b5995
published
Jul 26, 2025 -
b5996
published
Jul 26, 2025 -
b5997
published
Jul 26, 2025 -
b5998
published
Jul 27, 2025 -
b5999
published
Jul 27, 2025 -
b6000
published
Jul 27, 2025 -
b6001
published
Jul 27, 2025 -
b6002
published
Jul 27, 2025
58 Pull requests merged by 39 people
-
ops : update Metal
#14912 merged
Jul 28, 2025 -
sync : ggml
#14911 merged
Jul 28, 2025 -
quantize: update README.md
#14905 merged
Jul 27, 2025 -
Vulkan: add ops docs
#14900 merged
Jul 27, 2025 -
SYCL: add ops doc
#14901 merged
Jul 27, 2025 -
llama : clarify comment about pp and tg graphs [no ci]
#14895 merged
Jul 27, 2025 -
vulkan : add fp16 support for the conv_2d kernel
#14872 merged
Jul 27, 2025 -
vulkan: skip empty set_rows to avoid invalid API usage
#14860 merged
Jul 27, 2025 -
make rope_yarn_log_mul optional for deepseek2
#14896 merged
Jul 27, 2025 -
Fix kq_scale for the attention layers of PLaMo2
#14892 merged
Jul 27, 2025 -
Docs: add instructions in ops.md + simplify backend csv
#14889 merged
Jul 27, 2025 -
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3
#14624 merged
Jul 26, 2025 -
CANN: Implement GLU ops
#14884 merged
Jul 26, 2025 -
musa: fix build warnings (unused variable)
#14869 merged
Jul 26, 2025 -
ggml-cpu: disable GGML_NNPA by default due to instability
#14880 merged
Jul 25, 2025 -
metal: SSM_SCAN performance
#14743 merged
Jul 25, 2025 -
opencl: add fused
rms_norm_mul
#14841 merged
Jul 25, 2025 -
docs: update HOWTO‑add‑model.md for ModelBase and new model classes
#14874 merged
Jul 25, 2025 -
Code health: Remove invalid
portPos
specifiers from graph dumping to dot files#14838 merged
Jul 25, 2025 -
context : restore preemptive sched reset when LLAMA_SET_ROWS=0
#14870 merged
Jul 25, 2025 -
mtmd : Fix 32-bit narrowing issue in export-lora and mtmd clip
#14503 merged
Jul 25, 2025 -
GGML: Check for null buffers in get/set/copy tensor RPC endpoints
#14868 merged
Jul 25, 2025 -
sched : fix multiple evaluations of the same graph with pipeline parallelism
#14855 merged
Jul 25, 2025 -
musa: upgrade musa sdk to rc4.2.0
#14498 merged
Jul 24, 2025 -
sync : ggml
#14858 merged
Jul 24, 2025 -
context : perform output reorder lazily upon access after sync
#14853 merged
Jul 24, 2025 -
chat : fix kimi-k2 chat template
#14852 merged
Jul 24, 2025 -
sycl: unified semantics of block offset calculation
#14814 merged
Jul 24, 2025 -
fix: restore MiniCPM inference after Granite Four changes
#14850 merged
Jul 24, 2025 -
docs: add libcurl-dev install hint for Linux distros
#14801 merged
Jul 24, 2025 -
metal : fix fusion across different encoders
#14849 merged
Jul 24, 2025 -
sycl: fix undefined variable in work group size check
#14843 merged
Jul 24, 2025 -
convert: text-only support for GLM-4.1V-9B-Thinking
#14823 merged
Jul 23, 2025 -
CUDA: fix overflow in FA, tune performance
#14840 merged
Jul 23, 2025 -
CUDA: fix compilation with GGML_CUDA_F16
#14837 merged
Jul 23, 2025 -
ci : correct label refactor->refactoring
#14832 merged
Jul 23, 2025 -
tests : add non-cont K,V FA tests
#14756 merged
Jul 23, 2025 -
CUDA: fix quantized KV cache + multiple sequences
#14822 merged
Jul 23, 2025 -
bug fix: handle saving/loading null layers in recurrent memory
#14675 merged
Jul 23, 2025 -
ggml: fix loongarch quantize_row_q8_1 error
#14827 merged
Jul 23, 2025 -
[CANN] weight format to nz for Ascend310P3
#14407 merged
Jul 23, 2025 -
CUDA: add fused rms norm
#14800 merged
Jul 23, 2025 -
model card yaml tab->2xspace
#14819 merged
Jul 22, 2025 -
vulkan: fix rms_norm_mul to handle broadcasting dim0
#14817 merged
Jul 22, 2025 -
llama : add model type detection for rwkv7 7B&14B
#14816 merged
Jul 22, 2025 -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 merged
Jul 22, 2025 -
Mtmd: add a way to select device for vision encoder
#14236 merged
Jul 22, 2025 -
cuda : implement bf16 cpy ops and enable bf16 cont
#14763 merged
Jul 22, 2025 -
opencl: remove unreachable
return
#14806 merged
Jul 22, 2025 -
server : allow setting
--reverse-prompt
arg#14799 merged
Jul 22, 2025 -
cuda: remove linking to cublasLt
#14790 merged
Jul 21, 2025 -
opencl : fix im2col when KW!=KH
#14803 merged
Jul 21, 2025 -
OpenCL: add conv2d kernel
#14403 merged
Jul 21, 2025 -
sycl: Fix im2col
#14797 merged
Jul 21, 2025 -
kleidiai: add support for get_rows
#14676 merged
Jul 21, 2025 -
docs : fix backends table in README.md
#14796 merged
Jul 21, 2025 -
vulkan/cuda: Fix im2col when KW!=KH
#14789 merged
Jul 21, 2025 -
llama : fix
--reverse-prompt
crashing issue#14794 merged
Jul 21, 2025
25 Pull requests opened by 20 people
-
opencl: tiled mul_mat with local memory for f16 and f32
#14809 opened
Jul 22, 2025 -
convert : handle pre-quantized models
#14810 opened
Jul 22, 2025 -
feat(batched): Add functionality to upload benchmark test results
#14811 opened
Jul 22, 2025 -
sycl: refactor quantization to q8_1
#14815 opened
Jul 22, 2025 -
graph : reduce splits for recurrent and hybrid models
#14825 opened
Jul 23, 2025 -
test-backend-ops: enables perf/eval testing of composite ops
#14833 opened
Jul 23, 2025 -
SvelteKit-based WebUI
#14839 opened
Jul 23, 2025 -
imatrix : use GGUF by default
#14842 opened
Jul 24, 2025 -
mtmd : add support for Voxtral
#14862 opened
Jul 24, 2025 -
Adding chat template support for Granite model
#14864 opened
Jul 24, 2025 -
Extend test case filtering
#14865 opened
Jul 24, 2025 -
Support intern-s1
#14875 opened
Jul 25, 2025 -
model: add hunyuan dense
#14878 opened
Jul 25, 2025 -
GGML: Fix leak of backend buffer memory address in RPC
#14882 opened
Jul 26, 2025 -
SYCL: Add set_rows support for quantized types
#14883 opened
Jul 26, 2025 -
imatrix: calculate activation-based statistics for new format (GGUF) imatrices
#14891 opened
Jul 26, 2025 -
ggml-cpu : deduplicate scalar implementations
#14897 opened
Jul 27, 2025 -
Add support for SmallThinker model series
#14898 opened
Jul 27, 2025 -
Vulkan: Fix minor debug mode issues
#14899 opened
Jul 27, 2025 -
Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants
#14903 opened
Jul 27, 2025 -
ggml : repack block_iq4_nlx8 (AVX)
#14904 opened
Jul 27, 2025 -
cuda : add softcap fusion
#14907 opened
Jul 27, 2025 -
opencl: fixed a typo
#14908 opened
Jul 27, 2025 -
opencl: add ops docs
#14910 opened
Jul 28, 2025 -
ops : update BLAS
#14914 opened
Jul 28, 2025
41 Issues closed by 17 people
-
Feature Request: Support EXAONE 4.0
#14474 closed
Jul 28, 2025 -
Misc. bug: [SYCL] llama-cli built by Visual Studio 2022 is not working
#14086 closed
Jul 28, 2025 -
Research: mmap eviction
#14154 closed
Jul 28, 2025 -
prismatic-vlms to gguf?
#14159 closed
Jul 28, 2025 -
Eval bug: MultiGPU x MultiModels = 100% GPU
#14890 closed
Jul 27, 2025 -
Misc. bug: using --jinja flag with server and Qwen3 models removes thinking block, still works on llama-cli
#14894 closed
Jul 27, 2025 -
ggml_vulkan: RADV crash on ggml_set_rows due to zero size buffer
#14845 closed
Jul 27, 2025 -
Eval bug: gemma-3n-E4B-it-Q8_0.gguf is speaking nonsense
#14885 closed
Jul 27, 2025 -
Misc. bug: llama-server webui with --jinja flag does not show thinking when using reasoning models
#14007 closed
Jul 27, 2025 -
Vulkan Runner Frequent Crashing under workload
#14105 closed
Jul 27, 2025 -
Misc. bug: --cache-reuse no longer seems to be caching prompt prefixes
#14113 closed
Jul 27, 2025 -
Misc. bug: "llama_context_params::swa_full = true" causes very large RAM/VRAM usage
#14123 closed
Jul 27, 2025 -
Misc. bug: llama-server drops multi-part content for final assistant message
#14137 closed
Jul 27, 2025 -
Metrics should not include : in Prometheus metric names
#14150 closed
Jul 27, 2025 -
Feature Request: (webui) read data from /props endpoint and use it on the webui
#11717 closed
Jul 26, 2025 -
Support Hybrid Models
#12331 closed
Jul 26, 2025 -
Eval bug: s390x GGML_NNPA=ON Generates Gibberish Tokens at Different Thread Counts
#14877 closed
Jul 25, 2025 -
Eval bug: Generation speed loss after b5920
#14876 closed
Jul 25, 2025 -
Performance regression with multiple GPUs in commit 01612b7
#14863 closed
Jul 25, 2025 -
Eval bug: LLAMA_SET_ROWS=1 gibberish output with Dual GPU offload
#14795 closed
Jul 25, 2025 -
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU
#13978 closed
Jul 25, 2025 -
Eval bug: MiniCPM4 0.5B run failed
#14094 closed
Jul 25, 2025 -
Eval bug: Gemma3 decode and update_slots fail with parallel slots
#14097 closed
Jul 25, 2025 -
Eval bug: gemma3 generates infinite "and" output after commit bf9087f
#14835 closed
Jul 24, 2025 -
Misc. bug: llama-server --batch-size always set to 64
#14046 closed
Jul 24, 2025 -
Misc. bug: Server tests /health race conditions
#14092 closed
Jul 24, 2025 -
Compile bug: convert.cu
#14834 closed
Jul 23, 2025 -
Misc. bug: CUDA docker image - libcurl: file too short
#14813 closed
Jul 23, 2025 -
Misc. bug: Failed to run `llama-server` when trying to recurrence the issue #14812
#14829 closed
Jul 23, 2025 -
[How to serve lookahead decoding Qwen 3]
#14057 closed
Jul 23, 2025 -
Eval bug: Model produces gibberish or repeated output when using `-sm row` on CUDA
#14075 closed
Jul 23, 2025 -
Quantize bug: Ernie4.5 MoE 300B low-bit quantization crashes
#14788 closed
Jul 22, 2025 -
Feature Request: Built-in Token Probability Output for Inference API
#14611 closed
Jul 22, 2025 -
Feature Request: Direct FP8 conversion from convert_hf_to_gguf.py
#14762 closed
Jul 22, 2025 -
Eval bug: RWKV inference with llama-parallel gets wrong output with lmhead offloaded to GPU
#14211 closed
Jul 22, 2025 -
Feature Request: Support Kimi K2
#14642 closed
Jul 22, 2025 -
Misc. bug: test-backend-ops: IM2COL test sometimes fail with when KW!=KH
#14777 closed
Jul 21, 2025
17 Issues opened by 17 people
-
Feature Request: Implement missing ops from backends
#14909 opened
Jul 28, 2025 -
Eval bug: Repeated sequences with gemma3 and image recognition
#14888 opened
Jul 26, 2025 -
Eval bug: No kernel named rms_norm_f32_sycl
#14887 opened
Jul 26, 2025 -
Speed regression with -fa and -ctk
#14881 opened
Jul 25, 2025 -
Eval bug: cline plugin for VS Code does not work with any GGUF
#14866 opened
Jul 25, 2025 -
Misc. bug: slow model loading to GPU when size > 64GB (Vulkan)
#14854 opened
Jul 24, 2025 -
Eval bug: Embedding output differs significantly between b4712 and b4713
#14848 opened
Jul 24, 2025 -
Misc. bug: Regression in unified KV cache appears after `llama.cpp` release b5912 in b5913
#14847 opened
Jul 24, 2025 -
Please implement phi-3-M3-coder
#14846 opened
Jul 24, 2025 -
Eval bug: failed to allocate compute pp buffers
#14836 opened
Jul 23, 2025 -
Misc. bug: llama-server issue on Windows when compiling from source code
#14826 opened
Jul 23, 2025 -
Eval bug: unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF:Q2_K_XL using HIP backend (AMD MI300X) outputs `GGGGG`
#14824 opened
Jul 23, 2025 -
Misc. bug: llama-server embedding endpoint returns vectors with just null values after a while
#14812 opened
Jul 22, 2025 -
Misc. bug: Server cpp no image_data being used
#14807 opened
Jul 22, 2025 -
ggml_vulkan: device Vulkan0 does not support 16-bit storage.
#14805 opened
Jul 21, 2025 -
Misc. bug: llama-quant "found point XXX not on grid: XXXX"
#14798 opened
Jul 21, 2025
64 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add LLaDA 8b Diffusion model
#14771 commented on
Jul 28, 2025 • 7 new comments -
Fix MinicpmV model converter and clip to avoid using hardcode.
#14750 commented on
Jul 28, 2025 • 4 new comments -
examples : predicted output for text generation
#14739 commented on
Jul 24, 2025 • 3 new comments -
feat: Add optional prompt processing progress streaming
#14731 commented on
Jul 28, 2025 • 3 new comments -
Improve Mistral models integration with llama.cpp
#14737 commented on
Jul 25, 2025 • 1 new comment -
Feature request: Graphical GGUF viewer
#6715 commented on
Jul 28, 2025 • 0 new comments -
Compile bug: gcc-12: error: unrecognized command-line option ‘-compress-mode=size’
#14260 commented on
Jul 28, 2025 • 0 new comments -
Eval bug: Unexpected empty grammar stack after accepting piece: <unused32>
#14413 commented on
Jul 28, 2025 • 0 new comments -
bug: GGML_ASSERT(backend_embd != nullptr) failed error at llama.cpp:14775
#14418 commented on
Jul 28, 2025 • 0 new comments -
Some models like gemma-3n crashes - rocBLAS error: Cannot read /opt/rocm-6.4.1/lib/llvm/bin/../../../lib/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1036
#14421 commented on
Jul 28, 2025 • 0 new comments -
Misc. bug:
#14422 commented on
Jul 28, 2025 • 0 new comments -
Compile bug: Looking for C++ include rocwmma/rocwmma.hpp - not found
#14538 commented on
Jul 27, 2025 • 0 new comments -
Compile bug: loop not unrolled ROCm warnings
#14776 commented on
Jul 27, 2025 • 0 new comments -
Eval bug: Gemma 3n on Vulkan on Ryzen APUs produces garbled output
#14525 commented on
Jul 27, 2025 • 0 new comments -
Feature Request: per-chat prompt caching
#14470 commented on
Jul 27, 2025 • 0 new comments -
mtmd: Any plan for mtmd to support video input and audio output?
#14295 commented on
Jul 27, 2025 • 0 new comments -
Eval bug: Program crashes during long input inference when batch size is set to 16384
#14325 commented on
Jul 27, 2025 • 0 new comments -
Eval bug: Regression: Tool calls still returned in content field as JSON string instead of tool_calls array
#14697 commented on
Jul 26, 2025 • 0 new comments -
Error while converting peft finetuned merged model to gguf
#12494 commented on
Jul 26, 2025 • 0 new comments -
Introduce Graph Profiler
#9659 commented on
Jul 25, 2025 • 0 new comments -
Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.
#11867 commented on
Jul 25, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jul 22, 2025 • 0 new comments -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
Jul 21, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jul 25, 2025 • 0 new comments -
CUDA: update build CTK version to 12.8
#13360 commented on
Jul 27, 2025 • 0 new comments -
model : jina-embeddings-v3 support
#13693 commented on
Jul 21, 2025 • 0 new comments -
finetune.cpp command-line arg
#13873 commented on
Jul 23, 2025 • 0 new comments -
convert: add eagle2 draft arch
#13908 commented on
Jul 28, 2025 • 0 new comments -
llama : support qwen3 rerank and embeddings
#14029 commented on
Jul 23, 2025 • 0 new comments -
compare-commits.sh: support both llama-bench and test-backend-ops
#14392 commented on
Jul 24, 2025 • 0 new comments -
mtmd : Support jinja in libmtmd (Only for QwenVL and Qwen Omni)
#14730 commented on
Jul 22, 2025 • 0 new comments -
docs : mention apt installation method
#14766 commented on
Jul 21, 2025 • 0 new comments -
Compile bug: Built target undefined reference std::filesystem
#14536 commented on
Jul 21, 2025 • 0 new comments -
Eval bug: When I store the model on my hard drive, llama.cpp attempts to load it and then says it's warming it up with a blank run after which it crashes the terminal session.
#14297 commented on
Jul 22, 2025 • 0 new comments -
Misc. bug: Gemma3 multimodal (or all VL models?): </think> tag in the image or PDF text breaks prompt processing (or token generation?)
#14143 commented on
Jul 22, 2025 • 0 new comments -
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on
Jul 22, 2025 • 0 new comments -
Feature Request: Add --upload to llama-bench
#14791 commented on
Jul 22, 2025 • 0 new comments -
Eval bug: Nondeterministic output with ROCm backend despite zero temperature
#14727 commented on
Jul 22, 2025 • 0 new comments -
Eval bug: CUDA error: operation not supported
#14692 commented on
Jul 22, 2025 • 0 new comments -
[BUG] DeepSeek V3 weight_scale_inv tensor mapping not supported in converter
#14781 commented on
Jul 22, 2025 • 0 new comments -
Eval bug: KV cache stopped working in b5554 version
#14071 commented on
Jul 23, 2025 • 0 new comments -
server : add "token healing" support
#5765 commented on
Jul 23, 2025 • 0 new comments -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 commented on
Jul 23, 2025 • 0 new comments -
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 commented on
Jul 23, 2025 • 0 new comments -
deprecate llama_batch_get_one and llama_get_logits
#4491 commented on
Jul 23, 2025 • 0 new comments -
Eval bug: ROCml -> ggml_cuda_compute_forward: MUL_MAT failed when running unsloth/Kimi K2
#14787 commented on
Jul 23, 2025 • 0 new comments -
Feature Request: add tool calling for deepseek-r1-0528
#14557 commented on
Jul 23, 2025 • 0 new comments -
Feature Request: Suggest to provide armv7l version to run on Raspberry Pi devices.
#14348 commented on
Jul 24, 2025 • 0 new comments -
Misc. bug: LLAMA-SERVER is 40% slower than LLAMA-CLI when using identical parameters including -ot option for tensor offloading
#14201 commented on
Jul 24, 2025 • 0 new comments -
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 commented on
Jul 24, 2025 • 0 new comments -
Regarding the build for 8060S (gfx1151):
#14734 commented on
Jul 24, 2025 • 0 new comments -
Misc. bug: Failure to allocate buffer with ROCm 6.4
#14178 commented on
Jul 25, 2025 • 0 new comments -
Misc. bug: RUNPATH properties are not properly set
#13740 commented on
Jul 25, 2025 • 0 new comments -
Eval bug: terminate called after throwing an instance of 'std::runtime_error' what(): Unexpected empty grammar stack after accepting piece: [control_36]
#13690 commented on
Jul 25, 2025 • 0 new comments -
Misc. bug: DeepSeek-R1 0528 671b:Q4_K_XL think tags do not close sometimes
#14679 commented on
Jul 25, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
Jul 25, 2025 • 0 new comments -
Feature Request: Add TPU/Hardware Accelerator Support (e.g., Google Coral, Hailo) to llama.cpp
#11603 commented on
Jul 25, 2025 • 0 new comments -
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on
Jul 25, 2025 • 0 new comments -
Eval bug: Unable to run with Qwen3 model on rocm with gfx1100, but works on cpu
#14696 commented on
Jul 25, 2025 • 0 new comments -
Feature Request: allow running llama with an idle (lowest) priority as well
#14382 commented on
Jul 26, 2025 • 0 new comments -
Compile error for ggml_gemv_q4_K_8x8_q8_K on Intel x86_64 MacOS (AVX2)
#14372 commented on
Jul 26, 2025 • 0 new comments -
Misc. bug: Inconsistent Gemma3 implementation in rope factor
#14367 commented on
Jul 26, 2025 • 0 new comments -
Feature Request: Add support for moonshotai/Kimi-VL-A3B-Instruct
#14318 commented on
Jul 26, 2025 • 0 new comments -
Eval bug: [CANN]AutoDL Ascend 910B instance running DeepSeek-r1 32B_Q8 error
#14291 commented on
Jul 26, 2025 • 0 new comments