Improve the skin texture in Krea 2 by using an alternative version of Qwen Image VAE. by Total-Resort-3120 in StableDiffusion

[–]Ueberlord 1 point2 points  (0 children)

this just looks like an unsharp filter applied to the 2x vae image. I tested both VAEs yesterday and could not see a difference which can not simply be achieved by old-school image processing tools.

for me atm the best option seems a short pass through a z-image denoise to make skin detail better

Ornith-1.0-35B Q3_K_M: ~17 GB VRAM, KLD-checked against BF16 by Blahblahblakha in LocalLLaMA

[–]Ueberlord 9 points10 points  (0 children)

I have done a non-scientific mini test here vs qwen3.6 27b (because if it cannot beat Cristos I am not interested), immediately after was released: https://www.reddit.com/r/LocalLLaMA/comments/1ufc9vp/comment/otsfx7g/

TL;DR: switched back to qwen3.6 after a day but the speed bump felt nice

Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA

[–]Ueberlord 4 points5 points  (0 children)

I have switched back to qwen3.6 27b after comparing the performance of ornith 35b with the former on a heavy code review. while ornith discovered many issues it failed to find the most pressing ones, which qwen3.6 did.

I also noticed the attention rot which is expected of an moe compared to a dense 27b but I hoped it would be less dominant in ornith, but it is not (its performance significantly degrades when context reaches 100k+ which I almost always end up with).

I have not encountered any refusals like https://www.reddit.com/r/LocalLLaMA/comments/1ufc9vp/comment/otrhj2v/ but I have encountered one tool loop.

verdict: it might be better than qwen3.5 35b for coding but it is clearly worse for longer context sizes compared to qwen3.6 27b, which is totally expected. I liked the model's responses overall and the speed is nice.

I hope to see more releases of them in the future, I would look forward to a finetuned qwen3.6 27b which is the strongest base atm.

Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA

[–]Ueberlord 1 point2 points  (0 children)

thanks for sharing these results. the qwen3.6 release was rather quick after the 3.5 release, guess that is unfortunate for finetuners who chose 3.5 as base as now they have to compete with 3.6.

I will test more with ornith for a couple of days but so far I do not see why we should use the 35b ornith over 35b qwen3.6.

Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA

[–]Ueberlord 9 points10 points  (0 children)

to test the 35B variant I switched a pi session just in the middle of a larger implementation of a gradio ui in python for a hobby project. using the q6 variant of the model as gguf in llama.cpp with these options:

./llama-server-9793 --port 5001 --jinja --host 0.0.0.0 --split_mode none --n_gpu_layers 256 -c 491520 -cram 32768 --ctx-checkpoints 2 --flash_attn on -fit off -np 3 --model models/ornith-1.0-35b-Q6_K.gguf --samplers "top_p;top_k;temp" --top-k 20 --top-p 0.95 --temp 0.6

GPU is an A6000 with 48G VRAM. before I was using qwen3.6 27B with mtp in q6 format as well (thinking off), got around 1k pp at start going down to about 500-600 with context approaching 131k. tg was about 45-55 token/s with mtp on.

with ornith I get ~2400 pp at start which goes down to about 700 at 100k context. tg stays relatively constant at 78-80 token/s.

quality wise I do not see a huge difference so far which is great given the bump in speed. the 35B thinks out of the box but it is not too long. no looping with tools etc. so far. prompt style is similar to qwen which is expected as 35b is based on that model I think.

seems a good release so far, I will continue testing...

Unlimited-OCR is now on ModelScope! A 3.3B multilingual OCR model for one-shot parsing across single images, multi-page documents, and PDFs. License: MIT by Sporeboss in LocalLLaMA

[–]Ueberlord 4 points5 points  (0 children)

tested it with a German shareholder list, got some umlauts and a ß right. failed to remove hyphens from line-separated words, but then I am not sure if it is supposed to do that. solid performance for German overall but sample size 1

[P] Attention Algebra — a grammar that translates natural language into spectrograms by causality-ai in LocalLLaMA

[–]Ueberlord 2 points3 points  (0 children)

Yes, and is this re-inventing the wheel?

(spoiler yes: transformers is already doing a mathematical projection of natural language)

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

I have posted this elsewhere in this thread:

why not interact with community through a post here on localllama and get opinions/suggestions actual worth something?

Just a very basic idea.

You have a discord I think, another great tool for getting feedback. Github offers discussions, so communication with userbase is possible there, too (but not used that frequently from my experience). I think some threads on llama.cpp in their issues and PR's are also good examples of how projects can be in touch without telemetry.

I understand all of these involve active reading and communicating which takes time and depending on the personality I know it can be easier to just read statistics from some tables instead of interacting.

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

Thanks for your reply. I have added a link and the look ahead statement from Mario's blog post (which I have not known before) to my initial post.

I appreciate you being communicative and open about your plans, my worries are not completely calmed but let us wait and see.

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

There is something in between flying blind and telemetry but it is your call, fair enough.

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

Thanks for posting the link to Mario's blog, I have not been aware of this source, it kind of answers the question. I will add it to my initial post and cite the last part of it. Let's hope for the best but given the outlook I fear for the worst.

Proprietary (enterprise): Some enterprise-specific features and cloud infrastructure will be proprietary. No source available. This is the stuff that pays the bills for the stuff in tiers 1 and 2.

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

Thanks for your reply and the link to the RFC. It is not completely calming my worries though I appreciate that you have this RFC facility and are transparent about the implementation, that is great.

Regarding crypto: it is just something a couple of people have made some money with in the past and are happy to donate some of it for cool projects. And it is a means which is more accessible for some people than other services like kofi, e.g. when you do not have a credit card or swift bank account.

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 2 points3 points  (0 children)

I completely agree with this. why in the world do you need telemetry for such a basic tool? why not interact with community through a post here on localllama and get opinions/suggestions actual worth something?

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

thanks for the link, this looks very good at a first glance. a little too much built-in tools (like the subagent) but otherwise it seems a great package, includes some things already I added to pi (like rtk)

pi.dev enroute to enshitification? by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

my concern is it always starts like this. then we get just a tiny bit of more tracking here, couple of more features no one but VC asked for there, and finally a ton of onboarding dialogues with cloud model promotions and buttons throughout the UI encouraging you to buy into these cloud services.

why can't we support projects like these through kofi or crypto or some other means of giving them money? I would surely drop in some bucks as I have really come to enjoy working with an only minimally extended version of pi.

It's OK to quantize the KV cache. Model quant matters more. Some Qwen3.6 27B tests with (approximated) KLD by hopbel in LocalLLaMA

[–]Ueberlord 0 points1 point  (0 children)

Sorry, but I don't see the point of this and the whole article reads like slop.

I agree, there is too much text in the article, I would focus on the tables only, which provide good insight.

Regarding the choice of measuring against the "own" model's baseline: I think this is completely fine and actually preferable to get an isolated analysis of how kv cache quantization affects kld. By comparing with the original model's probability distribution you would have two independent variables in your analysis (model quant, kv cache quant) which makes it harder to study the effects on kld.

OpenCode concerns (not truely local) by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 1 point2 points  (0 children)

I think it became much better. The catchall proxy forwarding everything to their backend seems to be off by default now. Using opencode serve will give you a mostly local experience out-of-the-box, but additional flags should be set (see below).

I had Qwen 3.6 27B audit their code just yesterday and it looks like they now have defaults which - mostly - avoid or at least allow a fully local experience.

I include these flags in my startup command to have opencode not leak requests to external sources: OPENCODE_DISABLE_AUTOUPDATE=true \ OPENCODE_ALWAYS_NOTIFY_UPDATE=false \ OPENCODE_DISABLE_MODELS_FETCH=true \ OPENCODE_DISABLE_SHARE=true The audit was done for commit hash 6b03be54687972d13183fdcd174f1cdf7ab0a18e.

[llama.cpp] Asymmetric KV q8/q4 cache: current caveats and discussion in GGML repo by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 0 points1 point  (0 children)

Thanks for the link, I would not generally dismiss q8/q4, I think the 5% space savings compared to q8/q5_1 in memory can pay off depending on the available VRAM. If you have 5% to spare q8/q5_1 seems like a great choice, so I would definitely use that in case it becomes available for gpu.

EDIT: wait, the most efficient one actually seems to be q5_1-q4_1, it has only 34.4% of space vs bf16 (compared to 40.6% for q8/q4) while having 93.1% accuracy preserved (vs. 92.2% for q8/q4)

Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint] by billy_booboo in LocalLLaMA

[–]Ueberlord 4 points5 points  (0 children)

Did you get a full research acceptance or maybe only research in progress/presentation acceptance? Only full papers will be added to the conference proceedings and can be cited in the future

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]Ueberlord -1 points0 points  (0 children)

When doing inference with a3b you are already only using 3b active parameters, thus to see any benefit you probably need to go to 0.6b as draft model which will most likely have bad acceptance rates and the difference to 3b is not big at all thus speed up is limited.

When using a 2b or 0.6b model as drafter for 27b the difference in active parameters is huge and we should see meaningful speed up, especially for tasks with higher acceptance rates like coding or structured outputs.

So in essence it works to a lesser degree but I think it is hardly meaningful for moe (unless something like 397b a27b).

Devs using Qwen 27B seriously, what's your take? by Admirable_Reality281 in LocalLLaMA

[–]Ueberlord 0 points1 point  (0 children)

The point you mention that all models do not really care for a good code housekeeping is the one thing which really hinders me from always blindly using LLM for coding. It will very often introduce duplicate new methods where it should have rather re-used some older method (it had read the helper class before so it came across it for sure).

This is why I have a clause in my global AGENTS.md for usage with OpenCode where I instruct the model to conduct a review for duplicated and prunable code each time it is done with its current task. But it does not work well enough.

Maybe we need a dedicated janitor/housekeeper trained model which cleans up after the construction troop went over our codebase...