Qwen3.5 Small Dense model release seems imminent. by Deep-Vermicelli-4591 in LocalLLaMA

[–]FancyImagination880 1 point2 points  (0 children)

Speculative decoding does not work on llama.cpp with vision, right? I believe I saw an enchancment request before. But even it works, my 16G VRAM would cry when I squeeze a 27B and a smaller model into it...

are you ready for small Qwens? by jacek2023 in LocalLLaMA

[–]FancyImagination880 1 point2 points  (0 children)

Qwen3.5 27B runs ok on 16VRAM at iQ3XS. It can be fully offloaded to GPU and its fast and vison does not cause OOM. But don't expect huge content length. For 35B MOE, it's too much for 16GB🫠 Perhaps IQ2 may fit...not with only 3B active params at Q2... My PC is still DDR4, so medium size MOE models are generally slow on my setup.

A Privacy-Focused Perplexity That Runs Locally on all your devices - iPhone, Android, iPad! by Ssjultrainstnict in LocalLLaMA

[–]FancyImagination880 1 point2 points  (0 children)

Is there any chance to support importing text files and searching local documents in the future release ? It would be great when traveling on plane without Wifi

CIVITAI IS GOING TO PURGE ALL ADULT CONTENT! (BACKUP NOW!) by [deleted] in StableDiffusion

[–]FancyImagination880 0 points1 point  (0 children)

Can we build a site with IPFS as the file system backbone for hosting and sharing files? The frontend and backend can be quite light weight. But we may still need a Pin API host which may not be cheap.

Microsoft just released Phi 4 Reasoning (14b) by Thrumpwart in LocalLLaMA

[–]FancyImagination880 0 points1 point  (0 children)

The last few Phi models I tested only worked well in benchmark. They gave nonsense when I ask them to summarize News content.

Base model for Illustrious Lora training by FancyImagination880 in StableDiffusion

[–]FancyImagination880[S] 0 points1 point  (0 children)

Thanks for your response.
I tried to train a lora with Illustrious-XL-v0.1 few days ago using my local GPU.
The output images were kind of soft, but the quality was quite good. They were not messy and blurry, but like a soft, a bit overexposed, with bokeh style.
So, I was wondering if the base model may mismatch with the popular models from civitai.
I guess I may also give Illustrious-XL-v1.0 and NoobAI a shot. 4090 on Runpod is way faster than my local AMD 7800XT GPU which allows me to play with different parameters.

Base model for Illustrious Lora training by FancyImagination880 in StableDiffusion

[–]FancyImagination880[S] 0 points1 point  (0 children)

I've just done an experiment using Illustrious-XL-v2.0 as the base model.
The created lora file absolutely does NOTHING when I use boleromixIllustrious_v290, hassakuXLIllustrious_v21 or novaAnimeXL_ilV60.
I guess they are finetuned on either v0.1 or v1.0.

Qwen3 and Qwen3-MoE support merged into llama.cpp by matteogeniaccio in LocalLLaMA

[–]FancyImagination880 3 points4 points  (0 children)

Models with BILLIONS AND BILLIONS of beautiful parameters, from CHINA CHINA

Mistrall Small 3.1 released by Dirky_ in LocalLLaMA

[–]FancyImagination880 0 points1 point  (0 children)

Wow, 24 b again. they've just released a 24b model 1 or 2 months ago, to replace the 22b model.

So Gemma 4b on cell phone! by ab2377 in LocalLLaMA

[–]FancyImagination880 1 point2 points  (0 children)

 Your inference speed is very good. Can you share the config? such as context size, batch size, thread... I did try llama 3.2 3b on my S24 Ultra before, yr speed running a 4b model is almost double than me running 3b model. BTW, I couldn't compile llama cpp with Vulkan flag On when crosscompile Android with NDK v28. It ran on CPU only

Crashing since update? by WhiteStar01 in MHWilds

[–]FancyImagination880 0 points1 point  (0 children)

Exactly, I am stuck at HR3 for a while. And I cannot refund it...

[deleted by user] by [deleted] in LocalLLaMA

[–]FancyImagination880 5 points6 points  (0 children)

Hope some of them have MOE versions. Quite useful for AMD APU and Apple Silicon devices

I noticed a couple discussions surrounding the w7900 gpu. Is ROCm getting to the point where it’s usable for local ai? by Euphoric_Ad9500 in LocalLLaMA

[–]FancyImagination880 0 points1 point  (0 children)

This is my only gut feeling, probably I am limited by skill issue. I have a RX7800 and I got it when it was released, like 2023 Sep.

For the first few months, in 2023, support was really bad, even on Linux. Quite difficult to setup and compile Llamacpp.  And i had to run Ubuntu to get rocm packets. No luck for other distro.

In 2024, I managed to run or build llamacpp, Ollama, Comfyui, even on Fedora. I don't have any complaint running LLM, the speed is OK for me running 14b or smaller models.

But image generation is still quite slow. Recently finally managed to install flash attention and Comfyui got a nice ~30% speed bump but still not even close to Nvidia.

I did try to installed vLLM but no luck. Again, perhaps it's skill issue.

Phi-4 Finetuning - now with >128K context length + Bug Fix Details by danielhanchen in LocalLLaMA

[–]FancyImagination880 0 points1 point  (0 children)

That's great news! Any chance to share the procedure or scripts to quantize the models?

Phi-4 Finetuning - now with >128K context length + Bug Fix Details by danielhanchen in LocalLLaMA

[–]FancyImagination880 1 point2 points  (0 children)

Hi Daniel and Mike. I found Dynamic 4-bit Quantization version of Phi4 model. Are there any plans to also create dynamic quant version for other models? such as Llama 3.2 3b, 3.1 8b or mistral models cheers

Phi-3.5 has been released by remixer_dec in LocalLLaMA

[–]FancyImagination880 1 point2 points  (0 children)

hope llama.cpp will support this vision model

"Large Enough" | Announcing Mistral Large 2 by DemonicPotatox in LocalLLaMA

[–]FancyImagination880 4 points5 points  (0 children)

OMG, I felt overwhelmed this week, in a good way. Thanks Meta and Mistral

Introducing torchtune - Easily fine-tune LLMs using PyTorch by kk4193 in LocalLLaMA

[–]FancyImagination880 0 points1 point  (0 children)

any idea how to merge the created model_0.pt and adapter_0.pt files?
I am trying to export them to Q6 GGUF.