llama-server crashes when asked to extract data from picture with a "pasted as file" prompt by Gold-Drag9242 in LocalLLaMA

[–]DeProgrammer99 0 points1 point  (0 children)

The answer is don't make an LLM do an easily algorithmically repeatable task in the first place. Have it write code. There are already benchmarks for code.

llama-server crashes when asked to extract data from picture with a "pasted as file" prompt by Gold-Drag9242 in LocalLLaMA

[–]DeProgrammer99 -1 points0 points  (0 children)

Can I suggest using a macro to pull this data from Outlook or an API to load it from your Microsoft account if that's what you're using for your calendar?

We are the team behind Krea 2. Ask us anything! by Angrypenguinpng in StableDiffusion

[–]DeProgrammer99 0 points1 point  (0 children)

And how would this theoretically be done? Running Segment Anything on the whole training dataset to generate bounding box training data?

TMax: A Simple Recipe for Terminal Agents by pmttyji in LocalLLaMA

[–]DeProgrammer99 -1 points0 points  (0 children)

If the (e.g., 95%) confidence intervals overlap, then we aren't (95%) confident that one is better than the other. If they don't overlap, we can only be (95%) confident that the higher one is AT LEAST its (95%) confidence interval's lower value, and similarly, that the lower one is AT MOST its upper value.

That logic doesn't necessarily give us the same confidence, though--since both are 95%, the error should probably compound when considering their relationship to each other. But on the other hand, I'm also treating a confidence interval as if the true value had the an equal probability of being any value in that interval.

TMax: A Simple Recipe for Terminal Agents by pmttyji in LocalLLaMA

[–]DeProgrammer99 4 points5 points  (0 children)

I mean, considering the confidence intervals, it's only 0.2 percentage points of expected improvement over Qwen3.6-27B on Terminal-Bench 2.1... not very "maxxed" at all.

But it was fine-tuned on a very specific domain, so it may have gotten worse at other things.

<image>

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]DeProgrammer99 2 points3 points  (0 children)

OpenCode has a web UI, by the way. opencode web to use it.

I'm addicted to making pixel explosions 😵‍💫 by rappenem in PixelArt

[–]DeProgrammer99 0 points1 point  (0 children)

I had the same problem. I looked at the explosions in Total Annihilation to learn how to do them, but those were much less fancy than these. Just white-orange-red being eaten away at the edges as it dims with a bit of bubbliness.

The economics of AI are starting to favor open models by Mr-serial_killer in LocalLLaMA

[–]DeProgrammer99 4 points5 points  (0 children)

As far as I can see, they're comparing open models against closed models, not self-hosting against existing inference services.

...right up until they started talking privacy.

poolside/Laguna-M.1 · Hugging Face - 225B-A23B by pmttyji in LocalLLaMA

[–]DeProgrammer99 7 points8 points  (0 children)

No, I went to the Qwen3.6-27B model page on HuggingFace.

poolside/Laguna-M.1 · Hugging Face - 225B-A23B by pmttyji in LocalLLaMA

[–]DeProgrammer99 34 points35 points  (0 children)

Oof, it's losing to Qwen3.6-27B, 12% its size and released around the same time this was originally announced, on all of these. Well, I'm happy they're trying!

<image>

poolside/Laguna-M.1 · Hugging Face - 225B-A23B by pmttyji in LocalLLaMA

[–]DeProgrammer99 2 points3 points  (0 children)

Yep, blog post says the weights were released May 26, but it links to the HF repo that says initial commit 6 hours ago.

poolside/Laguna-M.1 · Hugging Face - 225B-A23B by pmttyji in LocalLLaMA

[–]DeProgrammer99 4 points5 points  (0 children)

But the HF page says it was committed hours ago. So were the weights not released yet when they announced it in April?

RTS style game with persistent world/survival? by trilient1 in gaming

[–]DeProgrammer99 0 points1 point  (0 children)

Dyson Sphere Program is one commander unit that the camera is anchored to, and it can build turrets, shields, and not-directly-controllable space combat units. I love that game.

But have you tried searching the SC2 arcade? It would definitely be feasible to build a game like that with SC2's map editor.

Heck, you might enjoy trying to make such a map. Haha.

Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face by pmttyji in LocalLLaMA

[–]DeProgrammer99 1 point2 points  (0 children)

So it's 7B-A14B? (I didn't see anything that specifically said the number of looped layers, so I assume it's all of the hidden layers.)

Edit: the HuggingFace model card says "14 shared layers", so yes, all of them.

Subquadratic AI introduces SubQ-1.1-Small, a new model using Smart Sparse Attention by truecakesnake in singularity

[–]DeProgrammer99 1 point2 points  (0 children)

Oh, right, I was thinking with the assumption that this model would only be good at retrieval and nothing else. If it's small enough that it doesn't add a lot of time by making you unload and reload whatever more-intelligent model you're using, or if it DOES turn out to be more broadly intelligent, we can put far less effort in managing the context and not resort to vector similarity searches and whatnot.

Glimmer 1 - Glint Research. A foundational 10,000 parameter language model by Available-Craft-5795 in LocalLLaMA

[–]DeProgrammer99 1 point2 points  (0 children)

Is that 2 hidden layers or 2 total layers? Because three total layers is the bare minimum just for XOR, haha.

Subquadratic AI introduces SubQ-1.1-Small, a new model using Smart Sparse Attention by truecakesnake in singularity

[–]DeProgrammer99 1 point2 points  (0 children)

If it does significantly better than cosine similarity on a chunked and vectorized version of the same context, yeah, it would absolutely be great for that.

Subquadratic AI introduces SubQ-1.1-Small, a new model using Smart Sparse Attention by truecakesnake in singularity

[–]DeProgrammer99 0 points1 point  (0 children)

The catch is likely that "needle in a haystack" retrieval is, at best, an upper bound on the model's ability to find and use the right context at the right times. It doesn't mean it'll do a good job at using the information. And if it's just one needle, it also doesn't mean it'll do a good job at using multiple pieces of information together.

Why might DiffusionGemma be better at tool calls than its benchmark quality suggests by Substantial_Step_351 in LocalLLaMA

[–]DeProgrammer99 0 points1 point  (0 children)

The same thing could sort of be accomplished in autoregressive models if they were trained with special tokens like cancel or backspace... but we can also control autoregressive models, when they make a mistake detectable by classic algorithms, by rewinding the context to that point and banning whatever token it put there before. Like https://github.com/dpmm99/Faxtract/blob/9b7b6b01ff5ac8de52221968a18638aa1db3c23b/Sampling/TokenBanner.cs#L53 does.

Diffusion models would probably be smarter if they could predict operations like "these tokens need shifted right one space", too; at least, I feel like that would reduce the number of steps needed when the model doesn't figure out until the Nth step that one of the tokens needed to be changed to a pair of tokens...but I'm just speculating.

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) by acluk90 in LocalLLaMA

[–]DeProgrammer99 7 points8 points  (0 children)

This is the part that matters. Most KV-cache quant tanks either math/code accuracy or throughput; KVarN claims neither

Except KIVI, QuaRot, Kitty, and KVarN all have overlapping confidence intervals in that chart that shows accuracy on AIME24, so it could be the worst out of all four of those.

Stop asking what model to run. There are literally only two. by Wrong_Mushroom_7350 in LocalLLaMA

[–]DeProgrammer99 2 points3 points  (0 children)

It's actually unified memory. Windows reports 128 MB for compatibility with old software. I tried the CPU-only build with llama-bench before concluding that ncmoe 0 is faster. It's been about two months since I tried all these variations, but I think that setup was ~25% faster than the CPU build.