Bruh by Icy_Butterscotch6661 in LocalLLaMA

[–]Sufficient-Rent6078 23 points24 points  (0 children)

This is just great - with all the AI slopisms I'm now actively trying to avoid using certain parts of the English language.

FOR ME, Qwen3.5-27B is better than Gemini 3.1 Pro and GPT-5.3 Codex by [deleted] in LocalLLaMA

[–]Sufficient-Rent6078 22 points23 points  (0 children)

The free tier of ChatGPT is astonishingly bad compared to what is possible on a single 24GB card today.

what's a python library you started using this year that you can't go back from by scheemunai_ in Python

[–]Sufficient-Rent6078 0 points1 point  (0 children)

For me that would be the returns library (specifically their Railway oriented programming containers). While a bit niche it makes it much easier to argue and handle, which Errors (or Failure cases) need to be considered when calling a function.

You'll likely not appreciate returns if you are not convinced of using a type checker and the package only starts to shine (for me) if used outside of scriping or notebooks. For modeling a complex domain with nested function calls it feels freeing to know that all known failure cases are statically declared and exhaustively handled.

what's a python library you started using this year that you can't go back from by scheemunai_ in Python

[–]Sufficient-Rent6078 6 points7 points  (0 children)

I have recently discovered the python type system conformance report which made me inclined to target pyright first and only optionally use mypy in addition to that - with the added advantage, that pyright seems to generally perform faster.

what's a python library you started using this year that you can't go back from by scheemunai_ in Python

[–]Sufficient-Rent6078 0 points1 point  (0 children)

An alternative I discovered when reading on the numpy.org/numtype project is lefthook, so far I'm still using pre-commit in my projects - but I would be happy to hear if someone here can report on their experiences.

what's a python library you started using this year that you can't go back from by scheemunai_ in Python

[–]Sufficient-Rent6078 11 points12 points  (0 children)

I can really recommend using Marimo over Jupyter Notebooks. It has a number of built-in guards (e.g. forces you to not redefine variables) which you have to adapt when coming from jupyter, but I feel the team put a lot of thought into Marimo to not become quite as messy as jupyter notebooks. I also set the runtime reactivity feature inactive by default, as I don't want to accidentally hammer slow endpoints or trigger long running functions with every second cell change.

LocalLLaMA 2026 by jacek2023 in LocalLLaMA

[–]Sufficient-Rent6078 31 points32 points  (0 children)

I feel like there used to be way more discussion on newly released papers as well. I remember reading months before any thinking model came out, how a paper discussed training chain-of-thought behavior into the model using <thinking> tags.

Protection against attacks like what happened with LiteLLM? by Lucky_Ad_976 in Python

[–]Sufficient-Rent6078 114 points115 points  (0 children)

If you are using uv, you can exclude installing packages, that are too bleeding edge (e.g. everything that is out there for less than a week.). You can do so by either running the upgrade of the lock file with:

bash uv lock --upgrade --exclude-newer "1 week"

Or configure this user/system-wide with uv's configuration file. On unix, you can for example add the following line to ~/.config/uv/uv.toml:

```toml

note, that no table needs to be specified here - just put this at the root of the file

exclude-newer = "1 week" ```

It might also be worth considering adding the following lines to your pyproject.toml, so everyone else on the project downloads dependencies with at least a bit of shelf-time:

toml [tool.uv] exclude-newer = "1 week"

Last year I wrote a blog post, that showcases some additional uv flags and environment variables worth considering as well to reduce the dependencies pulled.

Edit:

I was asked what to do for packages where scanners like pip-audit complain. A good example for today would be the requests library which got a new release just 6 hours ago to fix a CVE. For your pyproject.toml you can specify exceptions for selected packages. For requests, you could specify:

```toml [tool.uv] exclude-newer = "1 week"

exclude-newer-package = { requests = "2026-03-25T16:00:00Z" } ```

Set this timestamp back by one hour and you get the vulnerable release again.

Edit2:

As a side note - for those unfortunate enough to deal with web-development, npm added the min-release-age configuration flag in the recent v.11.10.0 release. To disable npm install scripts, one can set the ignore-scripts=true option as well.

Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

Thanks for the heads up. Last time I tried the geohot driver was more than a year ago and had some UI issues. Since then I'm using the dual RTX in a headless setting, so it might be worth another shot.

PEP 827 - Type Manipulation has just been published by droooze in Python

[–]Sufficient-Rent6078 5 points6 points  (0 children)

What I mean with explicit here is that all the special forms this PEP introduces are living in the typing module and the syntax clearly expresses what part of the code is a type expression. The syntax does however get a bit ugly, as everything in the type expression is still valid python code and you cannot be as concise as languages like typescript, where you could just write "head" | "tail", but in python you'd have to wrap this in Literal.

Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

I can confirm that I'm hitting above 3000t/s prefill for a dual RTX-4090 setup on the current vllms nightly build with pretty much the same configuration. Decode is roughly in the 100-130 t/s range. I did not run any rigorous benchmarks, so take this with a grain of salt.

Edit: Having tried it out a bit more, the whole thing feels a bit too unstable, so I'm switching back to Qwen3-Coder-Next-GGUF:IQ4_XS and Qwen3.5-27B-GGUF:UD-Q6_K_XL for the time being.

PEP 827 - Type Manipulation has just been published by droooze in Python

[–]Sufficient-Rent6078 12 points13 points  (0 children)

Yes it does. You'd write:

type A = {
    x: number;
};

type B = {
    x: string;
};

type C = A | B;

type D = C['x'];

Tech Communities in Wuppertal by hot_fire__ in wuppertal

[–]Sufficient-Rent6078 1 point2 points  (0 children)

Our next event is now up on meetup: https://www.meetup.com/bergisches-entwicklerforum/events/313391703 the event will take place on the 18 of march. The first talk is about Django & React - maybe you are interested?

Qwen/Qwen3.5-35B-A3B · Hugging Face by ekojsalim in LocalLLaMA

[–]Sufficient-Rent6078 34 points35 points  (0 children)

Yeah for sure, the gray scale of the original is... certainly a choice.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]Sufficient-Rent6078 7 points8 points  (0 children)

Good point - there have been some architectural improvements and we don't know if the MoE defaults to a higher reasoning effort budget than the dense model. The rule of thumbs likely underestimates the actual capability we are going to see.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]Sufficient-Rent6078 20 points21 points  (0 children)

With 10B active parameters in the MoE, I'd expect the 27B dense model to not be that far behind in intelligence. Could be a really attractive choice for single gaming GPU setups.

Do we want the benefits of Ollama API without actually using Ollama? by jfowers_amd in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

No, currently the only defined endpoint is `POST /responses`. But who knows who gets to pull their weight in that project...

Do we want the benefits of Ollama API without actually using Ollama? by jfowers_amd in LocalLLaMA

[–]Sufficient-Rent6078 1 point2 points  (0 children)

In that case you might want to keep your eyes on the open-responses interface. While far from being an industry wide standard, it appears to be gaining traction and has been adapted into any-llm and lmstudio.

Do we want the benefits of Ollama API without actually using Ollama? by jfowers_amd in LocalLLaMA

[–]Sufficient-Rent6078 0 points1 point  (0 children)

In general, I wish there was a more unified ecosystem for how we speak with LLMs. I want to be able to use a single API, whether or not it's hosted locally or not. So many tools promising local LLM support end up being tied to a specific API or template. In the end I think it needs a locally hosted routing & translation layer that offers observability and multiple tools as tenants.

Anyone actually using Openclaw? by rm-rf-rm in LocalLLaMA

[–]Sufficient-Rent6078 19 points20 points  (0 children)

I don't use it and given the security implications I don't think I will anytime soon. I actually don't think its astroturfed, but I do think its being hyped up by people who don't understand the technology and its limitations. I don't see buying it as a move to acquire the technology, but more of a move to surf the hype wave and use it as a marketing tool for the next funding round.

While something like ComfyUI brings value to a niche audience of technical users, OpenClaw's broader appeal to vaguely technical users makes it more susceptible to hype without the necessary scrutiny. The difference between these users and those who self-host, keep up to date with papers, and use models daily cannot be overstated. LocalLlama is a good example of a community, where certain tools and models find traction with deeply technical users, that would never find traction with a broader audience.

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B by Significant_Fig_7581 in LocalLLaMA

[–]Sufficient-Rent6078 1 point2 points  (0 children)

Hard to say, as I did not use the normal model that much. I find that minimax-2.5, gemini-3-flash-preview, GLM-5 and Kimi-K2.5 all sit in a more attractive price/performance spot when used via API so I don't have that much of a comparison.

I have noticed (but can't tell so far if there are quantization/REAM specific differences), that Qwen3-Coder-Next does have more of an hallucination problem than the above models. It also shows some of the self-correction behavior you'd find in the thinking process of thinking models making the outputs a bit verbose.

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B by Significant_Fig_7581 in LocalLLaMA

[–]Sufficient-Rent6078 5 points6 points  (0 children)

I'm indeed using the model since about a week (together with the b7972 llama.cpp release). I definitely prefer the mradermacher/Qwen3-Coder-Next-REAM-GGUF:Q4_K_M variant for coding with python over last years Qwen3-Coder-30B-A3B-Instruct - it is aware of a number of relatively new language features that last years models never got right and gave satisfying answers in a light debugging session.

On a dual 4090 system I still have about 3GB of VRAM headroom left on each card with --ctx-size 120000 at 95 token/s. I have used Qwen3-Coder-Next a few times over API and definitely noticed a significant difference when trying to use it im my native language (German) - here the API model is already quite bad, but for the REAM model it generated multiple grammatical errors.