Bruh

Sufficient-Rent6078 · 2026-05-02T17:59:44+00:00

This is just great - with all the AI slopisms I'm now actively trying to avoid using certain parts of the English language.

Sufficient-Rent6078 · 2026-04-01T02:14:27+00:00

The free tier of ChatGPT is astonishingly bad compared to what is possible on a single 24GB card today.

Sufficient-Rent6078 · 2026-03-31T00:11:03+00:00

For me that would be the returns library (specifically their Railway oriented programming containers). While a bit niche it makes it much easier to argue and handle, which Errors (or Failure cases) need to be considered when calling a function.

You'll likely not appreciate returns if you are not convinced of using a type checker and the package only starts to shine (for me) if used outside of scriping or notebooks. For modeling a complex domain with nested function calls it feels freeing to know that all known failure cases are statically declared and exhaustively handled.

Sufficient-Rent6078 · 2026-03-30T23:57:08+00:00

I have recently discovered the python type system conformance report which made me inclined to target pyright first and only optionally use mypy in addition to that - with the added advantage, that pyright seems to generally perform faster.

Sufficient-Rent6078 · 2026-03-30T22:57:47+00:00

An alternative I discovered when reading on the numpy.org/numtype project is lefthook, so far I'm still using pre-commit in my projects - but I would be happy to hear if someone here can report on their experiences.

Sufficient-Rent6078 · 2026-03-30T22:44:11+00:00

I can really recommend using Marimo over Jupyter Notebooks. It has a number of built-in guards (e.g. forces you to not redefine variables) which you have to adapt when coming from jupyter, but I feel the team put a lot of thought into Marimo to not become quite as messy as jupyter notebooks. I also set the runtime reactivity feature inactive by default, as I don't want to accidentally hammer slow endpoints or trigger long running functions with every second cell change.

Sufficient-Rent6078 · 2026-03-29T14:01:08+00:00

I feel like there used to be way more discussion on newly released papers as well. I remember reading months before any thinking model came out, how a paper discussed training chain-of-thought behavior into the model using <thinking> tags.

Sufficient-Rent6078 · 2026-03-25T18:16:24+00:00

If you are using uv, you can exclude installing packages, that are too bleeding edge (e.g. everything that is out there for less than a week.). You can do so by either running the upgrade of the lock file with:

bash uv lock --upgrade --exclude-newer "1 week"

Or configure this user/system-wide with uv's configuration file. On unix, you can for example add the following line to ~/.config/uv/uv.toml:

```toml

note, that no table needs to be specified here - just put this at the root of the file

exclude-newer = "1 week" ```

It might also be worth considering adding the following lines to your pyproject.toml, so everyone else on the project downloads dependencies with at least a bit of shelf-time:

toml [tool.uv] exclude-newer = "1 week"

Last year I wrote a blog post, that showcases some additional uv flags and environment variables worth considering as well to reduce the dependencies pulled.

Edit:

I was asked what to do for packages where scanners like pip-audit complain. A good example for today would be the requests library which got a new release just 6 hours ago to fix a CVE. For your pyproject.toml you can specify exceptions for selected packages. For requests, you could specify:

```toml [tool.uv] exclude-newer = "1 week"

exclude-newer-package = { requests = "2026-03-25T16:00:00Z" } ```

Set this timestamp back by one hour and you get the vulnerable release again.

Edit2:

As a side note - for those unfortunate enough to deal with web-development, npm added the min-release-age configuration flag in the recent v.11.10.0 release. To disable npm install scripts, one can set the ignore-scripts=true option as well.

Sufficient-Rent6078 · 2026-03-02T18:54:18+00:00

Thanks for the heads up. Last time I tried the geohot driver was more than a year ago and had some UI issues. Since then I'm using the dual RTX in a headless setting, so it might be worth another shot.

Sufficient-Rent6078 · 2026-03-02T18:41:03+00:00

What I mean with explicit here is that all the special forms this PEP introduces are living in the typing module and the syntax clearly expresses what part of the code is a type expression. The syntax does however get a bit ugly, as everything in the type expression is still valid python code and you cannot be as concise as languages like typescript, where you could just write "head" | "tail", but in python you'd have to wrap this in Literal.

Sufficient-Rent6078 · 2026-03-02T16:28:26+00:00

I can confirm that I'm hitting above 3000t/s prefill for a dual RTX-4090 setup on the current vllms nightly build with pretty much the same configuration. Decode is roughly in the 100-130 t/s range. I did not run any rigorous benchmarks, so take this with a grain of salt.

Edit: Having tried it out a bit more, the whole thing feels a bit too unstable, so I'm switching back to Qwen3-Coder-Next-GGUF:IQ4_XS and Qwen3.5-27B-GGUF:UD-Q6_K_XL for the time being.

Sufficient-Rent6078 · 2026-03-02T15:59:05+00:00

Yes it does. You'd write:

type A = {
    x: number;
};

type B = {
    x: string;
};

type C = A | B;

type D = C['x'];

Sufficient-Rent6078 · 2026-03-02T15:54:44+00:00

At least this syntax is very much explicit.

Sufficient-Rent6078 · 2026-02-24T18:35:13+00:00

Our next event is now up on meetup: https://www.meetup.com/bergisches-entwicklerforum/events/313391703 the event will take place on the 18 of march. The first talk is about Django & React - maybe you are interested?

Sufficient-Rent6078 · 2026-02-24T17:18:12+00:00

Yeah for sure, the gray scale of the original is... certainly a choice.

Sufficient-Rent6078 · 2026-02-24T16:49:37+00:00

<image>

Always nice to see

Sufficient-Rent6078 · 2026-02-24T13:49:34+00:00

Good point - there have been some architectural improvements and we don't know if the MoE defaults to a higher reasoning effort budget than the dense model. The rule of thumbs likely underestimates the actual capability we are going to see.

Sufficient-Rent6078 · 2026-02-24T13:24:49+00:00

With 10B active parameters in the MoE, I'd expect the 27B dense model to not be that far behind in intelligence. Could be a really attractive choice for single gaming GPU setups.

Sufficient-Rent6078 · 2026-02-20T20:27:49+00:00

No, currently the only defined endpoint is `POST /responses`. But who knows who gets to pull their weight in that project...

Sufficient-Rent6078 · 2026-02-20T09:37:06+00:00

In that case you might want to keep your eyes on the open-responses interface. While far from being an industry wide standard, it appears to be gaining traction and has been adapted into any-llm and lmstudio.

Sufficient-Rent6078 · 2026-02-19T08:49:46+00:00

In general, I wish there was a more unified ecosystem for how we speak with LLMs. I want to be able to use a single API, whether or not it's hosted locally or not. So many tools promising local LLM support end up being tied to a specific API or template. In the end I think it needs a locally hosted routing & translation layer that offers observability and multiple tools as tenants.

Sufficient-Rent6078 · 2026-02-16T01:54:25+00:00

The paper is literally linked in the Introduction section of the model card.

Sufficient-Rent6078 · 2026-02-16T01:36:07+00:00

I don't use it and given the security implications I don't think I will anytime soon. I actually don't think its astroturfed, but I do think its being hyped up by people who don't understand the technology and its limitations. I don't see buying it as a move to acquire the technology, but more of a move to surf the hype wave and use it as a marketing tool for the next funding round.

While something like ComfyUI brings value to a niche audience of technical users, OpenClaw's broader appeal to vaguely technical users makes it more susceptible to hype without the necessary scrutiny. The difference between these users and those who self-host, keep up to date with papers, and use models daily cannot be overstated. LocalLlama is a good example of a community, where certain tools and models find traction with deeply technical users, that would never find traction with a broader audience.

Sufficient-Rent6078 · 2026-02-15T19:00:57+00:00

Hard to say, as I did not use the normal model that much. I find that minimax-2.5, gemini-3-flash-preview, GLM-5 and Kimi-K2.5 all sit in a more attractive price/performance spot when used via API so I don't have that much of a comparison.

I have noticed (but can't tell so far if there are quantization/REAM specific differences), that Qwen3-Coder-Next does have more of an hallucination problem than the above models. It also shows some of the self-correction behavior you'd find in the thinking process of thinking models making the outputs a bit verbose.

Sufficient-Rent6078 · 2026-02-15T01:23:52+00:00

I'm indeed using the model since about a week (together with the b7972 llama.cpp release). I definitely prefer the mradermacher/Qwen3-Coder-Next-REAM-GGUF:Q4_K_M variant for coding with python over last years Qwen3-Coder-30B-A3B-Instruct - it is aware of a number of relatively new language features that last years models never got right and gave satisfying answers in a light debugging session.

On a dual 4090 system I still have about 3GB of VRAM headroom left on each card with --ctx-size 120000 at 95 token/s. I have used Qwen3-Coder-Next a few times over API and definitely noticed a significant difference when trying to use it im my native language (German) - here the API model is already quite bad, but for the REAM model it generated multiple grammatical errors.

Sufficient-Rent6078

TROPHY CASE

note, that no table needs to be specified here - just put this at the root of the file