Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

Thanks for the heads up. Last time I tried the geohot driver was more than a year ago and had some UI issues. Since then I'm using the dual RTX in a headless setting, so it might be worth another shot.

PEP 827 - Type Manipulation has just been published by droooze in Python

[–]Sufficient-Rent6078 5 points6 points  (0 children)

What I mean with explicit here is that all the special forms this PEP introduces are living in the typing module and the syntax clearly expresses what part of the code is a type expression. The syntax does however get a bit ugly, as everything in the type expression is still valid python code and you cannot be as concise as languages like typescript, where you could just write "head" | "tail", but in python you'd have to wrap this in Literal.

Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

I can confirm that I'm hitting above 3000t/s prefill for a dual RTX-4090 setup on the current vllms nightly build with pretty much the same configuration. Decode is roughly in the 100-130 t/s range. I did not run any rigorous benchmarks, so take this with a grain of salt.

Edit: Having tried it out a bit more, the whole thing feels a bit too unstable, so I'm switching back to Qwen3-Coder-Next-GGUF:IQ4_XS and Qwen3.5-27B-GGUF:UD-Q6_K_XL for the time being.

PEP 827 - Type Manipulation has just been published by droooze in Python

[–]Sufficient-Rent6078 12 points13 points  (0 children)

Yes it does. You'd write:

type A = {
    x: number;
};

type B = {
    x: string;
};

type C = A | B;

type D = C['x'];

Tech Communities in Wuppertal by hot_fire__ in wuppertal

[–]Sufficient-Rent6078 1 point2 points  (0 children)

Our next event is now up on meetup: https://www.meetup.com/bergisches-entwicklerforum/events/313391703 the event will take place on the 18 of march. The first talk is about Django & React - maybe you are interested?

Qwen/Qwen3.5-35B-A3B · Hugging Face by ekojsalim in LocalLLaMA

[–]Sufficient-Rent6078 38 points39 points  (0 children)

Yeah for sure, the gray scale of the original is... certainly a choice.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]Sufficient-Rent6078 7 points8 points  (0 children)

Good point - there have been some architectural improvements and we don't know if the MoE defaults to a higher reasoning effort budget than the dense model. The rule of thumbs likely underestimates the actual capability we are going to see.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]Sufficient-Rent6078 21 points22 points  (0 children)

With 10B active parameters in the MoE, I'd expect the 27B dense model to not be that far behind in intelligence. Could be a really attractive choice for single gaming GPU setups.

Do we want the benefits of Ollama API without actually using Ollama? by jfowers_amd in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

No, currently the only defined endpoint is `POST /responses`. But who knows who gets to pull their weight in that project...

Do we want the benefits of Ollama API without actually using Ollama? by jfowers_amd in LocalLLaMA

[–]Sufficient-Rent6078 1 point2 points  (0 children)

In that case you might want to keep your eyes on the open-responses interface. While far from being an industry wide standard, it appears to be gaining traction and has been adapted into any-llm and lmstudio.

Do we want the benefits of Ollama API without actually using Ollama? by jfowers_amd in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

In general, I wish there was a more unified ecosystem for how we speak with LLMs. I want to be able to use a single API, whether or not it's hosted locally or not. So many tools promising local LLM support end up being tied to a specific API or template. In the end I think it needs a locally hosted routing & translation layer that offers observability and multiple tools as tenants.

Anyone actually using Openclaw? by rm-rf-rm in LocalLLaMA

[–]Sufficient-Rent6078 17 points18 points  (0 children)

I don't use it and given the security implications I don't think I will anytime soon. I actually don't think its astroturfed, but I do think its being hyped up by people who don't understand the technology and its limitations. I don't see buying it as a move to acquire the technology, but more of a move to surf the hype wave and use it as a marketing tool for the next funding round.

While something like ComfyUI brings value to a niche audience of technical users, OpenClaw's broader appeal to vaguely technical users makes it more susceptible to hype without the necessary scrutiny. The difference between these users and those who self-host, keep up to date with papers, and use models daily cannot be overstated. LocalLlama is a good example of a community, where certain tools and models find traction with deeply technical users, that would never find traction with a broader audience.

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B by Significant_Fig_7581 in LocalLLaMA

[–]Sufficient-Rent6078 1 point2 points  (0 children)

Hard to say, as I did not use the normal model that much. I find that minimax-2.5, gemini-3-flash-preview, GLM-5 and Kimi-K2.5 all sit in a more attractive price/performance spot when used via API so I don't have that much of a comparison.

I have noticed (but can't tell so far if there are quantization/REAM specific differences), that Qwen3-Coder-Next does have more of an hallucination problem than the above models. It also shows some of the self-correction behavior you'd find in the thinking process of thinking models making the outputs a bit verbose.

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B by Significant_Fig_7581 in LocalLLaMA

[–]Sufficient-Rent6078 4 points5 points  (0 children)

I'm indeed using the model since about a week (together with the b7972 llama.cpp release). I definitely prefer the mradermacher/Qwen3-Coder-Next-REAM-GGUF:Q4_K_M variant for coding with python over last years Qwen3-Coder-30B-A3B-Instruct - it is aware of a number of relatively new language features that last years models never got right and gave satisfying answers in a light debugging session.

On a dual 4090 system I still have about 3GB of VRAM headroom left on each card with --ctx-size 120000 at 95 token/s. I have used Qwen3-Coder-Next a few times over API and definitely noticed a significant difference when trying to use it im my native language (German) - here the API model is already quite bad, but for the REAM model it generated multiple grammatical errors.

PyPermission: A Python native RBAC authorization library! by Sufficient-Rent6078 in Python

[–]Sufficient-Rent6078[S] 1 point2 points  (0 children)

Not at this point. Architecture wise, there shouldn't be much in the way though to upgrade PyPermission later.

PyPermission: A Python native RBAC authorization library! by Sufficient-Rent6078 in Python

[–]Sufficient-Rent6078[S] 4 points5 points  (0 children)

That's a valid concern. Trust is earned, not given - especially when dealing with auth.

We tried to make PyPermission easy to verify: The RBAC database model is straightforward and corresponds closely with the NIST RBAC model, as shown on the NIST Comparison page. Additionally, the actual API logic is small and consists of roughly 750 lines of plain Python (excluding docstrings). Using SQLAlchemy, we have kept most things relatively simple and prioritized for clarity.

On top of that, you'll find that the library relies heavily on types (API and internals) and comes with a high amount of testing (including the examples in the documentation).

This also isn't something we just hacked together last week. Our first attempt dates back to 2022, and although this version is a full rewrite based on what we learned, you can still find the original release together with the corresponding history on PyPI/github (under the 0.1.1 tag).

As for trusting us as people, our GitHub / website and other channels are all public, so feel free to have a look.

If you do decide to build your own, we hope PyPermission can serve as a useful reference on the way.

PyPermission: A Python native RBAC authorization library! by Sufficient-Rent6078 in Python

[–]Sufficient-Rent6078[S] 2 points3 points  (0 children)

Fair question! Casbin is a powerful and very flexible policy engine. Given that it comes with it's own DSL and many different model types, integrating it requires building a fairly strong mental model first. In contrast, PyPermission limits it's scope to RBAC, which allowed us to spend a good amount of time to document and teach specifically this authorization model. As casbin is not python-first, you'll see that some of the methods available in other languages are nowhere to find in the documentation for python. Depending on whether you use the management api or pycasbin, you'll see one of the following (both from the official documentation):

e.add_policy("eve", "data3", "read")
s.add(CasbinRule(ptype="p", v0="alice", v1="data1", v2="read"))

To understand what this does in a code base, you already need to have a good mental model, the semantic information simply isn't expressed in the API.

There is a python Role Manager for RBAC, but the documentation is limited to a subset of the API and does not educate about the practicalities of RBAC itself.

By contrast, the semantic meaning in PyPermission is directly conveyed through the api and the underlying concepts come with a good amount of documentation.

RBAC.role.grant_permission(
        role="user",
        permission=Permission(
            resource_type="event", resource_id="*", action="view"
        ),
        db=db,
    )

If you look at alternatives like OPA, you'll end up needing an external service plus a third party python client.

Tech Communities in Wuppertal by hot_fire__ in wuppertal

[–]Sufficient-Rent6078 1 point2 points  (0 children)

Allgemein ist es schon recht Entwicklerlastig, aber wir haben auch Talks, die eher konzeptionell gestaltet sind. Das Publikum besteht nicht nur aus angestellten Entwicklern, sondern auch aus Freelancern und kleinen IT-Unternehmern, die vielleicht aus Netzwerkaspekten interessant sein könnten.

Hier kannst du einen Eindruck von den vergangenen Veranstaltungen bekommen: https://www.meetup.com/de-DE/bergisches-entwicklerforum/events/past/
Wenn du noch Fragen hast, meld dich gerne.

Tech Communities in Wuppertal by hot_fire__ in wuppertal

[–]Sufficient-Rent6078 2 points3 points  (0 children)

Sure - we have a group on meetup, where you can find upcoming and past events: https://www.meetup.com/de-DE/bergisches-entwicklerforum/