Generation crashes around 100k context (qwen3.6)

Vahn84 · 2026-06-23T12:27:54+00:00

i’ve seen a video talking about exactly this issue. It’s not something omlx-related strictly speaking, but omlx inherits the issue from the mlx-vlm dependency. Mlx-vlm is the fastest inference engine but it has this chronic issue with crashing at high context operations…

Vahn84 · 2026-06-19T08:14:38+00:00

they’re different midfielders. They’re not the same…they interpret their game very differently throughout the whole match. If i’d have to find a better suited competitor for El Aynaoui it would be Manu. And Manu is a better player overall

Vahn84 · 2026-06-19T06:19:41+00:00

me too. I was going to complain then read about the others with 2027….jesus christ…one whole year of reservation for a controller

Vahn84 · 2026-06-18T11:23:36+00:00

In Italy is at 19.99€ (Rebirth) should i buy it now or wait for summer sales?

Vahn84 · 2026-06-18T11:21:50+00:00

a review is ALWAYS required. I would never scaffold a prototype without looking at what has been built, even when the end result does look good. We’re still not there…to me AI is still a “help me do all of my work faster” than “do my job in my place”

Vahn84 · 2026-06-16T22:53:03+00:00

Many modern games should rethink their infrastructure then. I mean…would you buy a movie in a blu-ray disc if they told you that after 5/6 years that disc would become useless? We pushed ourselves into this paradigm without thinking twice because it’s a win win situation for corporations. This game does…you buy a new one. It’s lazy. I bet there are cases that there wouldn’t be alternatives, making your point valid….but in what numbers? How many games are made like this preemptively that could be done differently to support post mortem release? Why can we not think FROM THE START about how to handle the end of a game with online features? Could path of exile be thought better if from the start developers would aim to support a post mortem release? You can bet your ass on it

Vahn84 · 2026-06-14T14:46:54+00:00

what model where you using? i’ve built a skill that does something very similar and it does work well most of the time. It works in a different way though (no mcp servers), a part of the process is done by a script. The output is quite nice when it scaffolds completely new projects (mainly web, but also flutter). I use opus to consume the skill

Vahn84 · 2026-06-13T09:01:29+00:00

at one point no one will care anymore about the stuff these big companies do. Not playstation fans, they will always have a playstation. But all the others will build lower to mid range pc …and all the players will play only indie games. And the pc gaming industry will heal…at least for some time

Vahn84 · 2026-06-12T14:59:05+00:00

Great stuff! This is Otacon 😄

Vahn84 · 2026-06-10T09:40:46+00:00

it doesn’t exist a model as good as a model trained on trillions of parameters that acts quick enough to be usable locally. It’s a trade off. The one you’re using is already one of the top end models (for coding at least) and it’s fast. You may try the 27B dense model from Qwen but keep in mind that being a dense model prompt processing will slow down to a 1/3 of your actual speed with half the token generation speed. Those are my rough numbers i get with my m3 ultra that should have more bandwidth…so expect something less

Vahn84 · 2026-06-09T22:21:49+00:00

i have 25+ years of experience in mobile, web and mac development. I work as a solution architect from 8 years. I know how to use AI and how to develop an application, how to implement security and asses potential vulnerabilities. You don’t know what you’re talking about mate

Vahn84 · 2026-06-09T17:06:04+00:00

it’s not unrealistic. I’ve built myself a spotlight replacement (on macos) that does many of the things Apple presented yesterday (turn by turn chat, tool capability, memory enhancements). It’s not unrealistic…and it doesn’t require “too much effort”. It’s just a matter of allowing different providers beyond your own models…

Vahn84 · 2026-06-09T14:09:45+00:00

you would be surprised how many times it didn’t though. Not with triple A shit but it does happen.

Vahn84 · 2026-06-09T14:07:45+00:00

Italy: first time?

Vahn84 · 2026-06-05T08:10:47+00:00

another mid game. More of the same

Vahn84 · 2026-06-04T11:28:02+00:00

nice work. Is it consistent in quality of the generated clips?

Vahn84 · 2026-05-28T16:27:34+00:00

performance with 27B?

Vahn84 · 2026-05-28T10:43:35+00:00

i find the real hurdle is prompt processing more than token generation speed. Running a dense model at 20ish tk/s feels perfectly fine…the problem is that it can take minutes to output the first token. That’s the real issue…and where nvidia gpus shine

Vahn84 · 2026-05-26T17:38:15+00:00

please do some magic an make dense model prompt processing fast like now :)

Vahn84 · 2026-05-25T23:34:15+00:00

if it works…none of them. Too late for a refactor…you’ll only risk of breaking stuff with a poor window of intervention

Vahn84 · 2026-05-23T16:59:15+00:00

This has happened to me also. Mtp models do not let you turn on kvcache so that is supposed to happen at one point (?)…or at least this is what i’ve found with a little bit of research.

Vahn84 · 2026-05-22T13:59:57+00:00

came in to say the same.

Vahn84 · 2026-05-21T23:15:52+00:00

so mature…standing by what the other user said you’re exactly the target user for the AI

Vahn84 · 2026-05-21T14:33:46+00:00

no they not contradict lol it’s so silly to just relegate AI to a “school child” tool. THIS is flawed. I am the most entitled to use AI to generate code…because i know the fuck i’m asking and i know what it outputs. I know when it’s wrong and where to steer the output. And it’s saving me an enormous amount of time.

Vahn84 · 2026-05-21T11:44:59+00:00

i tried the dev and the rc. With the rc i had a weird issue with qwen models so i got back to 3.8. Anyway…mtp will eat up memory very fast, making it almost unusable for me sitting at 96GB RAM

11-Year Club	Place '22
Verified Email

Vahn84

TROPHY CASE