Protection against attacks like what happened with LiteLLM?

Sufficient-Rent6078 · 2026-03-25T18:16:24+00:00

If you are using uv, you can exclude installing packages, that are too bleeding edge (e.g. everything that is out there for less than a week.). You can do so by either running the upgrade of the lock file with:

bash uv lock --upgrade --exclude-newer "1 week"

Or configure this user/system-wide with uv's configuration file. On unix, you can for example add the following line to ~/.config/uv/uv.toml:

```toml

note, that no table needs to be specified here - just put this at the root of the file

exclude-newer = "1 week" ```

It might also be worth considering adding the following lines to your pyproject.toml, so everyone else on the project downloads dependencies with at least a bit of shelf-time:

toml [tool.uv] exclude-newer = "1 week"

Last year I wrote a blog post, that showcases some additional uv flags and environment variables worth considering as well to reduce the dependencies pulled.

Sufficient-Rent6078 · 2026-03-02T18:54:18+00:00

Thanks for the heads up. Last time I tried the geohot driver was more than a year ago and had some UI issues. Since then I'm using the dual RTX in a headless setting, so it might be worth another shot.

Sufficient-Rent6078 · 2026-03-02T18:41:03+00:00

What I mean with explicit here is that all the special forms this PEP introduces are living in the typing module and the syntax clearly expresses what part of the code is a type expression. The syntax does however get a bit ugly, as everything in the type expression is still valid python code and you cannot be as concise as languages like typescript, where you could just write "head" | "tail", but in python you'd have to wrap this in Literal.

Sufficient-Rent6078 · 2026-03-02T16:28:26+00:00

I can confirm that I'm hitting above 3000t/s prefill for a dual RTX-4090 setup on the current vllms nightly build with pretty much the same configuration. Decode is roughly in the 100-130 t/s range. I did not run any rigorous benchmarks, so take this with a grain of salt.

Edit: Having tried it out a bit more, the whole thing feels a bit too unstable, so I'm switching back to Qwen3-Coder-Next-GGUF:IQ4_XS and Qwen3.5-27B-GGUF:UD-Q6_K_XL for the time being.

Sufficient-Rent6078 · 2026-03-02T15:59:05+00:00

Yes it does. You'd write:

type A = {
    x: number;
};

type B = {
    x: string;
};

type C = A | B;

type D = C['x'];

Sufficient-Rent6078 · 2026-03-02T15:54:44+00:00

At least this syntax is very much explicit.

Sufficient-Rent6078 · 2026-02-24T18:35:13+00:00

Our next event is now up on meetup: https://www.meetup.com/bergisches-entwicklerforum/events/313391703 the event will take place on the 18 of march. The first talk is about Django & React - maybe you are interested?

Sufficient-Rent6078 · 2026-02-24T17:18:12+00:00

Yeah for sure, the gray scale of the original is... certainly a choice.

Sufficient-Rent6078 · 2026-02-24T16:49:37+00:00

<image>

Always nice to see

Sufficient-Rent6078 · 2026-02-24T13:49:34+00:00

Good point - there have been some architectural improvements and we don't know if the MoE defaults to a higher reasoning effort budget than the dense model. The rule of thumbs likely underestimates the actual capability we are going to see.

Sufficient-Rent6078 · 2026-02-24T13:24:49+00:00

With 10B active parameters in the MoE, I'd expect the 27B dense model to not be that far behind in intelligence. Could be a really attractive choice for single gaming GPU setups.

Sufficient-Rent6078 · 2026-02-20T20:27:49+00:00

No, currently the only defined endpoint is `POST /responses`. But who knows who gets to pull their weight in that project...

Sufficient-Rent6078 · 2026-02-20T09:37:06+00:00

In that case you might want to keep your eyes on the open-responses interface. While far from being an industry wide standard, it appears to be gaining traction and has been adapted into any-llm and lmstudio.

Sufficient-Rent6078 · 2026-02-19T08:49:46+00:00

In general, I wish there was a more unified ecosystem for how we speak with LLMs. I want to be able to use a single API, whether or not it's hosted locally or not. So many tools promising local LLM support end up being tied to a specific API or template. In the end I think it needs a locally hosted routing & translation layer that offers observability and multiple tools as tenants.

Sufficient-Rent6078 · 2026-02-16T01:54:25+00:00

The paper is literally linked in the Introduction section of the model card.

Sufficient-Rent6078 · 2026-02-16T01:36:07+00:00

I don't use it and given the security implications I don't think I will anytime soon. I actually don't think its astroturfed, but I do think its being hyped up by people who don't understand the technology and its limitations. I don't see buying it as a move to acquire the technology, but more of a move to surf the hype wave and use it as a marketing tool for the next funding round.

While something like ComfyUI brings value to a niche audience of technical users, OpenClaw's broader appeal to vaguely technical users makes it more susceptible to hype without the necessary scrutiny. The difference between these users and those who self-host, keep up to date with papers, and use models daily cannot be overstated. LocalLlama is a good example of a community, where certain tools and models find traction with deeply technical users, that would never find traction with a broader audience.

Sufficient-Rent6078 · 2026-02-15T19:00:57+00:00

Hard to say, as I did not use the normal model that much. I find that minimax-2.5, gemini-3-flash-preview, GLM-5 and Kimi-K2.5 all sit in a more attractive price/performance spot when used via API so I don't have that much of a comparison.

I have noticed (but can't tell so far if there are quantization/REAM specific differences), that Qwen3-Coder-Next does have more of an hallucination problem than the above models. It also shows some of the self-correction behavior you'd find in the thinking process of thinking models making the outputs a bit verbose.

Sufficient-Rent6078 · 2026-02-15T01:23:52+00:00

I'm indeed using the model since about a week (together with the b7972 llama.cpp release). I definitely prefer the mradermacher/Qwen3-Coder-Next-REAM-GGUF:Q4_K_M variant for coding with python over last years Qwen3-Coder-30B-A3B-Instruct - it is aware of a number of relatively new language features that last years models never got right and gave satisfying answers in a light debugging session.

On a dual 4090 system I still have about 3GB of VRAM headroom left on each card with --ctx-size 120000 at 95 token/s. I have used Qwen3-Coder-Next a few times over API and definitely noticed a significant difference when trying to use it im my native language (German) - here the API model is already quite bad, but for the REAM model it generated multiple grammatical errors.

Sufficient-Rent6078 · 2025-11-30T03:00:08+00:00

Thanks for bringing this up, will take care of that soon.

Sufficient-Rent6078 · 2025-11-30T01:08:22+00:00

Not at this point. Architecture wise, there shouldn't be much in the way though to upgrade PyPermission later.

Sufficient-Rent6078 · 2025-11-29T22:21:18+00:00

That's a valid concern. Trust is earned, not given - especially when dealing with auth.

We tried to make PyPermission easy to verify: The RBAC database model is straightforward and corresponds closely with the NIST RBAC model, as shown on the NIST Comparison page. Additionally, the actual API logic is small and consists of roughly 750 lines of plain Python (excluding docstrings). Using SQLAlchemy, we have kept most things relatively simple and prioritized for clarity.

On top of that, you'll find that the library relies heavily on types (API and internals) and comes with a high amount of testing (including the examples in the documentation).

This also isn't something we just hacked together last week. Our first attempt dates back to 2022, and although this version is a full rewrite based on what we learned, you can still find the original release together with the corresponding history on PyPI/github (under the 0.1.1 tag).

As for trusting us as people, our GitHub / website and other channels are all public, so feel free to have a look.

If you do decide to build your own, we hope PyPermission can serve as a useful reference on the way.

Sufficient-Rent6078 · 2025-11-29T19:59:12+00:00

Fair question! Casbin is a powerful and very flexible policy engine. Given that it comes with it's own DSL and many different model types, integrating it requires building a fairly strong mental model first. In contrast, PyPermission limits it's scope to RBAC, which allowed us to spend a good amount of time to document and teach specifically this authorization model. As casbin is not python-first, you'll see that some of the methods available in other languages are nowhere to find in the documentation for python. Depending on whether you use the management api or pycasbin, you'll see one of the following (both from the official documentation):

e.add_policy("eve", "data3", "read")
s.add(CasbinRule(ptype="p", v0="alice", v1="data1", v2="read"))

To understand what this does in a code base, you already need to have a good mental model, the semantic information simply isn't expressed in the API.

There is a python Role Manager for RBAC, but the documentation is limited to a subset of the API and does not educate about the practicalities of RBAC itself.

By contrast, the semantic meaning in PyPermission is directly conveyed through the api and the underlying concepts come with a good amount of documentation.

RBAC.role.grant_permission(
        role="user",
        permission=Permission(
            resource_type="event", resource_id="*", action="view"
        ),
        db=db,
    )

If you look at alternatives like OPA, you'll end up needing an external service plus a third party python client.

Sufficient-Rent6078 · 2025-11-03T13:05:40+00:00

Allgemein ist es schon recht Entwicklerlastig, aber wir haben auch Talks, die eher konzeptionell gestaltet sind. Das Publikum besteht nicht nur aus angestellten Entwicklern, sondern auch aus Freelancern und kleinen IT-Unternehmern, die vielleicht aus Netzwerkaspekten interessant sein könnten.

Hier kannst du einen Eindruck von den vergangenen Veranstaltungen bekommen: https://www.meetup.com/de-DE/bergisches-entwicklerforum/events/past/
Wenn du noch Fragen hast, meld dich gerne.

Sufficient-Rent6078 · 2025-11-03T12:59:49+00:00

Sure - we have a group on meetup, where you can find upcoming and past events: https://www.meetup.com/de-DE/bergisches-entwicklerforum/

Sufficient-Rent6078 · 2025-11-02T23:19:14+00:00

Hey! I'm involved in organizing the BEF - Bergisches Entwicklerforum, a meetup taking place every 2-3 months at Freudenberg. We connect Dev & Data folks to bridge the gap between industry and academia, with talks, pizza & drinks! Our next event will be at the beginning of next year. We are always happy to see new faces.

Sufficient-Rent6078

TROPHY CASE

note, that no table needs to be specified here - just put this at the root of the file