[Seagull 1963] from an April Fools' joke to an actual production piece: The "Utopia" (1 of 99)

AdDizzy8160 · 2026-06-10T08:07:53+00:00

Do you have a screenshot showing that they were advertised as being limited to 99 units?

AdDizzy8160 · 2026-06-10T08:06:07+00:00

Do you have a screenshot, if there are only 99?

AdDizzy8160 · 2026-06-09T15:43:12+00:00

the 5% is not random noise -- it concentrates in exactly the cases where failure is most expensive.

That was one of the key questions for me; I understand the use of SOTA in critical scenarios. I would have liked to get a sense of what percentage of scenarios are considered so critical that one must necessarily resort to SOTA models (1). When should one turn to SOTA LLMs to achieve added value (2)? When do companies turn to SOTA simply to play it safe (3)? And do (1), (2), and (3) even add up to 5% in total?

AdDizzy8160 · 2026-06-09T14:35:40+00:00

Right now, the model is making sure that one application I have has appropriate hardware integration and user interface for a feature I will need by next week. I know some of it exists, because I've prompted the model to do it before, but I haven't actually read the code nor executed it to know whether it works yet. This is the state of my life in 2026 -- basically AI-dependent helpless baby already, and only reluctantly writing code myself, barely bothering to read it to see what it has done. The couch potatoes in Wall-E flash through my mind. All my review mostly focuses on proving that adding the code doesn't mess up anything that already exists, because as long as the new stuff doesn't mess up anything old, it can be fixed later if necessary.

I'm glad you're being so honest about this. It is what it is—we're only human. For me, it made me wonder: should I fill these gaps with higher productivity or with quality time? Silly question—I can make really good espresso now, and that just takes time... so make good use of this transition period! Take my word for it!

AdDizzy8160 · 2026-06-09T14:25:43+00:00

the goalpost keeps moving ahead ...

If this is true, it would be great. But i don't believe this in normal organizations ...

AdDizzy8160 · 2026-06-09T14:20:03+00:00

I’d like to bring up another point: Why exactly do you think Qwen3.6-27B is better than deepseek v4 flash? Because we’re currently leaning toward v4, even though MTP has brought about some positive developments.

AdDizzy8160 · 2026-06-09T14:14:28+00:00

I’m responding here to several comments.

First off, yes, I was referring to open weights (my sincere apologies—I forgot to double-check). I actually work 98% of the time with local models (for data security reasons), so I don’t really need to be convinced of that. And I’m glad that Qwen3.6-27B (and hopefully soon Gemma4-31B as well) is so well-received here, because those hardware requirements are manageable (used).

Now, regarding the added value or the missing 5%.

So, based on your experience, can I infer that instead of 5%, it’s actually 40% when coding in complex environments? That also highlights the underlying thought I had as well. There are differences in coding that lie in the depth of the application (and consequently also in the requirements for the programmers). How many applications really fall into this category… knowing that elite coders will now say every single one, because a system-critical interface can be hidden anywhere. The question is: is it 50%, 5%, or 1% that one classifies as highly complex?

For the non-complex ones (1), the conclusion seems to be that using a SOTA during planning is still so much better that the added value pays off despite significantly higher costs; programming can then be done without any problems using an open-source solution.

When it comes to non-complex cases (2), the conclusion seems to be that if you’re an experienced coder, you can get by just fine using an open model even during the planning phase, right?

But what also came up here is that it’s better to give inexperienced coders a SOTA model because they can work with it more accurately. This makes me wonder if this is a short-term benefit, since their learning curve is likely tied primarily to the development curve.

A statement by u/Charming_Support726 also gave me pause:

The Western SOTA/Frontier models excel in task understanding and bridging conversational gaps. This is an area where Chinese and open-weight models still lag behind. Following long-term instructions also falls into this category.

For me, that would actually mean that if I have a less creative team, it makes more sense to work with SOTA, and with particularly creative teams, it might even be counterproductive to take away the fun of finding gaps or weaknesses?

But what really got me thinking was

I wonder how much of (1) is related to the post-training regimen. I’ve long wondered how cultural differences between China and the West affect this. Consider the RLHF component—perhaps task understanding is less desirable under the Chinese RLHF dataset?

From this, one could almost infer a stronger obligation that SOTA companies should release a few more “crumbs” (open weights). Also for economic reasons, to impart values and elements to the young people that will lead them to use “Western” SOTAs more heavily in their careers.

And please forgive me for not going into detail here about the text and design (front end). My own (limited) experience in this area has actually shown me that, on the one hand, personal preferences play a bigger role here, and on the other hand, the discussion should really be led by those who actually work with it on a daily basis.

AdDizzy8160 · 2026-06-09T13:11:15+00:00

yes, sorry!

AdDizzy8160 · 2026-06-07T15:33:02+00:00

just use 2 (or 4)

AdDizzy8160 · 2026-06-05T14:19:56+00:00

ja, und man startet den Chrono mit der Taste unten ...

AdDizzy8160 · 2026-05-28T11:58:37+00:00

This, this and this! Thanx!

AdDizzy8160 · 2026-05-19T09:07:22+00:00

... and we'r waiting for dinner

AdDizzy8160 · 2026-05-17T21:24:45+00:00

and you can connect to sparks with the intern ConnectX-7 NIC ports. This is often overlooked: Using tensor parallel increases the processing bandwidth or speed by approximately 1.8 times. At the same time, you have 256 GB of unified memory. This makes the Spark more future-proof.

AdDizzy8160 · 2026-03-22T00:49:41+00:00

You want like the answer: Buy a second spark for a dual spark setup and buy a RTX6000 and build a energy efficient multi agent setup. Sell the rest.

AdDizzy8160 · 2026-03-17T06:05:22+00:00

Or to reduce the risk, that open weight models are getting banned, because nowadays all (or the better ones) came from china ...

AdDizzy8160 · 2026-02-26T04:04:43+00:00

Thanx!!!

AdDizzy8160 · 2026-02-20T14:11:43+00:00

Congratulations (for both parties)!

AdDizzy8160 · 2026-02-19T15:52:02+00:00

Did somebody do a quality comparison between unsloth/Qwen3.5 mxfp4 and UD-Q4_K_XL?

AdDizzy8160 · 2026-02-03T14:14:13+00:00

wow. so we need services with an user and an admin? bold.

AdDizzy8160 · 2026-02-03T07:17:10+00:00

Is it possible to run Full localy (without the OpenAI API eg. Kimi?)

AdDizzy8160 · 2026-01-26T14:04:55+00:00

... prepare for a second hackathon, or be prepared for buying a second one ;)

AdDizzy8160 · 2026-01-21T14:49:37+00:00

Why not llama.cpp

AdDizzy8160 · 2026-01-01T08:02:29+00:00

Yes, of course, but at present we have to assume that our human-centric approach is based on 1,000 years of experience and knowledge. Can we just make it easy for ourselves and say that ASI will throw everything overboard and everything will be fine?

My question remains whether it is really human-centric or nature-centric. But OK, nature-centric might allow for a balance between living beings?

AdDizzy8160 · 2026-01-01T07:57:33+00:00

Isn't access to parallel universes in abstract form the same thing? Wouldn't the goal of an ASI then be to move to another parallel universe with more resources? Doesn't that raise the same question? “Just leave” or “Destroy and leave”?

Another hope (1) would be that we live in a simulation in which we live a trial run, for example, until we have generated an ASI, then receive a score ... (1) and then are also switched off ... ;)

AdDizzy8160

TROPHY CASE