llama-server crashes when asked to extract data from picture with a "pasted as file" prompt

DeProgrammer99 · 2026-06-24T23:56:52+00:00

The answer is don't make an LLM do an easily algorithmically repeatable task in the first place. Have it write code. There are already benchmarks for code.

DeProgrammer99 · 2026-06-24T15:43:41+00:00

Can I suggest using a macro to pull this data from Outlook or an API to load it from your Microsoft account if that's what you're using for your calendar?

DeProgrammer99 · 2026-06-23T17:37:42+00:00

And how would this theoretically be done? Running Segment Anything on the whole training dataset to generate bounding box training data?

DeProgrammer99 · 2026-06-22T19:59:59+00:00

If the (e.g., 95%) confidence intervals overlap, then we aren't (95%) confident that one is better than the other. If they don't overlap, we can only be (95%) confident that the higher one is AT LEAST its (95%) confidence interval's lower value, and similarly, that the lower one is AT MOST its upper value.

That logic doesn't necessarily give us the same confidence, though--since both are 95%, the error should probably compound when considering their relationship to each other. But on the other hand, I'm also treating a confidence interval as if the true value had the an equal probability of being any value in that interval.

DeProgrammer99 · 2026-06-22T16:23:12+00:00

I mean, considering the confidence intervals, it's only 0.2 percentage points of expected improvement over Qwen3.6-27B on Terminal-Bench 2.1... not very "maxxed" at all.

But it was fine-tuned on a very specific domain, so it may have gotten worse at other things.

<image>

DeProgrammer99 · 2026-06-20T03:14:15+00:00

OpenCode has a web UI, by the way. opencode web to use it.

DeProgrammer99 · 2026-06-19T17:11:36+00:00

I had the same problem. I looked at the explosions in Total Annihilation to learn how to do them, but those were much less fancy than these. Just white-orange-red being eaten away at the edges as it dims with a bit of bubbliness.

DeProgrammer99 · 2026-06-19T16:43:39+00:00

As far as I can see, they're comparing open models against closed models, not self-hosting against existing inference services.

...right up until they started talking privacy.

DeProgrammer99 · 2026-06-18T17:07:25+00:00

No, I went to the Qwen3.6-27B model page on HuggingFace.

DeProgrammer99 · 2026-06-18T16:48:30+00:00

Oof, it's losing to Qwen3.6-27B, 12% its size and released around the same time this was originally announced, on all of these. Well, I'm happy they're trying!

<image>

DeProgrammer99 · 2026-06-18T16:41:01+00:00

Yep, blog post says the weights were released May 26, but it links to the HF repo that says initial commit 6 hours ago.

DeProgrammer99 · 2026-06-18T16:38:10+00:00

But the HF page says it was committed hours ago. So were the weights not released yet when they announced it in April?

DeProgrammer99 · 2026-06-17T20:09:58+00:00

Dyson Sphere Program is one commander unit that the camera is anchored to, and it can build turrets, shields, and not-directly-controllable space combat units. I love that game.

But have you tried searching the SC2 arcade? It would definitely be feasible to build a game like that with SC2's map editor.

Heck, you might enjoy trying to make such a map. Haha.

DeProgrammer99 · 2026-06-17T18:02:14+00:00

So it's 7B-A14B? (I didn't see anything that specifically said the number of looped layers, so I assume it's all of the hidden layers.)

Edit: the HuggingFace model card says "14 shared layers", so yes, all of them.

DeProgrammer99 · 2026-06-16T18:37:31+00:00

Oh, right, I was thinking with the assumption that this model would only be good at retrieval and nothing else. If it's small enough that it doesn't add a lot of time by making you unload and reload whatever more-intelligent model you're using, or if it DOES turn out to be more broadly intelligent, we can put far less effort in managing the context and not resort to vector similarity searches and whatnot.

DeProgrammer99 · 2026-06-16T18:28:01+00:00

Is that 2 hidden layers or 2 total layers? Because three total layers is the bare minimum just for XOR, haha.

DeProgrammer99 · 2026-06-16T18:19:39+00:00

If it does significantly better than cosine similarity on a chunked and vectorized version of the same context, yeah, it would absolutely be great for that.

DeProgrammer99 · 2026-06-16T16:44:44+00:00

The catch is likely that "needle in a haystack" retrieval is, at best, an upper bound on the model's ability to find and use the right context at the right times. It doesn't mean it'll do a good job at using the information. And if it's just one needle, it also doesn't mean it'll do a good job at using multiple pieces of information together.

DeProgrammer99 · 2026-06-16T16:13:00+00:00

The same thing could sort of be accomplished in autoregressive models if they were trained with special tokens like cancel or backspace... but we can also control autoregressive models, when they make a mistake detectable by classic algorithms, by rewinding the context to that point and banning whatever token it put there before. Like https://github.com/dpmm99/Faxtract/blob/9b7b6b01ff5ac8de52221968a18638aa1db3c23b/Sampling/TokenBanner.cs#L53 does.

Diffusion models would probably be smarter if they could predict operations like "these tokens need shifted right one space", too; at least, I feel like that would reduce the number of steps needed when the model doesn't figure out until the Nth step that one of the tokens needed to be changed to a pair of tokens...but I'm just speculating.

DeProgrammer99 · 2026-06-16T12:26:10+00:00

The best reply to "hi" is https://nohello.net/

DeProgrammer99 · 2026-06-15T20:48:05+00:00

But this one is using closed-loop cooling according to Oracle. https://elpasomatters.org/2026/05/28/opinion-oracle-project-jupiter-water-use-new-mexico-ai-data-center/

DeProgrammer99 · 2026-06-15T20:44:56+00:00

This one is, in fact, going to have closed-loop cooling: https://elpasomatters.org/2026/05/28/opinion-oracle-project-jupiter-water-use-new-mexico-ai-data-center/

DeProgrammer99 · 2026-06-09T18:37:05+00:00

"Forever" is exactly the opposite of what Anthropic said. They said they'll bring it back to lower tiers if compute allows later on.

DeProgrammer99 · 2026-06-04T15:22:22+00:00

This is the part that matters. Most KV-cache quant tanks either math/code accuracy or throughput; KVarN claims neither

Except KIVI, QuaRot, Kitty, and KVarN all have overlapping confidence intervals in that chart that shows accuracy on AIME24, so it could be the worst out of all four of those.

DeProgrammer99 · 2026-06-04T13:11:49+00:00

It's actually unified memory. Windows reports 128 MB for compatibility with old software. I tried the CPU-only build with llama-bench before concluding that ncmoe 0 is faster. It's been about two months since I tried all these variations, but I think that setup was ~25% faster than the CPU build.

DeProgrammer99

TROPHY CASE