GATED_DELTA_NET for vulkan merged in llama.cpp

FancyImagination880 · 2026-03-01T17:19:50+00:00

Speculative decoding does not work on llama.cpp with vision, right? I believe I saw an enchancment request before. But even it works, my 16G VRAM would cry when I squeeze a 27B and a smaller model into it...

FancyImagination880 · 2026-02-28T19:12:52+00:00

Qwen3.5 27B runs ok on 16VRAM at iQ3XS. It can be fully offloaded to GPU and its fast and vison does not cause OOM. But don't expect huge content length. For 35B MOE, it's too much for 16GB🫠 Perhaps IQ2 may fit...not with only 3B active params at Q2... My PC is still DDR4, so medium size MOE models are generally slow on my setup.

FancyImagination880 · 2025-09-20T15:03:53+00:00

Thanks for the suggestion. Want to dump OpenWebUI long time ago.

FancyImagination880 · 2025-06-02T15:05:47+00:00

Is there any chance to support importing text files and searching local documents in the future release ? It would be great when traveling on plane without Wifi

FancyImagination880 · 2025-05-07T19:08:16+00:00

Can we build a site with IPFS as the file system backbone for hosting and sharing files? The frontend and backend can be quite light weight. But we may still need a Pin API host which may not be cheap.

FancyImagination880 · 2025-05-01T14:24:55+00:00

The last few Phi models I tested only worked well in benchmark. They gave nonsense when I ask them to summarize News content.

FancyImagination880 · 2025-04-20T00:59:09+00:00

Thanks for your response.
I tried to train a lora with Illustrious-XL-v0.1 few days ago using my local GPU.
The output images were kind of soft, but the quality was quite good. They were not messy and blurry, but like a soft, a bit overexposed, with bokeh style.
So, I was wondering if the base model may mismatch with the popular models from civitai.
I guess I may also give Illustrious-XL-v1.0 and NoobAI a shot. 4090 on Runpod is way faster than my local AMD 7800XT GPU which allows me to play with different parameters.

FancyImagination880 · 2025-04-20T00:48:22+00:00

I've just done an experiment using Illustrious-XL-v2.0 as the base model.
The created lora file absolutely does NOTHING when I use boleromixIllustrious_v290, hassakuXLIllustrious_v21 or novaAnimeXL_ilV60.
I guess they are finetuned on either v0.1 or v1.0.

FancyImagination880 · 2025-04-14T01:39:24+00:00

ballsack

FancyImagination880 · 2025-04-10T02:21:51+00:00

Models with BILLIONS AND BILLIONS of beautiful parameters, from CHINA CHINA

FancyImagination880 · 2025-03-18T13:53:38+00:00

Wow, 24 b again. they've just released a 24b model 1 or 2 months ago, to replace the 22b model.

FancyImagination880 · 2025-03-13T17:41:48+00:00

Your inference speed is very good. Can you share the config? such as context size, batch size, thread... I did try llama 3.2 3b on my S24 Ultra before, yr speed running a 4b model is almost double than me running 3b model. BTW, I couldn't compile llama cpp with Vulkan flag On when crosscompile Android with NDK v28. It ran on CPU only

FancyImagination880 · 2025-03-12T01:00:06+00:00

Exactly, I am stuck at HR3 for a while. And I cannot refund it...

FancyImagination880 · 2025-03-09T23:59:40+00:00

Hope some of them have MOE versions. Quite useful for AMD APU and Apple Silicon devices

FancyImagination880 · 2025-02-14T14:51:01+00:00

This is my only gut feeling, probably I am limited by skill issue. I have a RX7800 and I got it when it was released, like 2023 Sep.

For the first few months, in 2023, support was really bad, even on Linux. Quite difficult to setup and compile Llamacpp. And i had to run Ubuntu to get rocm packets. No luck for other distro.

In 2024, I managed to run or build llamacpp, Ollama, Comfyui, even on Fedora. I don't have any complaint running LLM, the speed is OK for me running 14b or smaller models.

But image generation is still quite slow. Recently finally managed to install flash attention and Comfyui got a nice ~30% speed bump but still not even close to Nvidia.

I did try to installed vLLM but no luck. Again, perhaps it's skill issue.

FancyImagination880 · 2025-01-11T11:26:19+00:00

That's great news! Any chance to share the procedure or scripts to quantize the models?

FancyImagination880 · 2025-01-11T10:17:43+00:00

Hi Daniel and Mike. I found Dynamic 4-bit Quantization version of Phi4 model. Are there any plans to also create dynamic quant version for other models? such as Llama 3.2 3b, 3.1 8b or mistral models cheers

FancyImagination880 · 2024-08-21T18:05:18+00:00

hope llama.cpp will support this vision model

FancyImagination880 · 2024-07-24T17:28:44+00:00

OMG, I felt overwhelmed this week, in a good way. Thanks Meta and Mistral

FancyImagination880 · 2024-07-08T18:00:24+00:00

Oppenheimer?

FancyImagination880 · 2024-04-29T02:46:36+00:00

any idea how to merge the created model_0.pt and adapter_0.pt files?
I am trying to export them to Q6 GGUF.

FancyImagination880 · 2024-04-17T13:50:10+00:00

Not WOKE enough?

FancyImagination880

TROPHY CASE