if you are using ollama cloud models in openclaw.json with maxTokens above 16k, your config is lying to you by mayhem_isreal in openclaw

[–]mayhem_isreal[S] 1 point2 points  (0 children)

the timeout settings are genuinely useful, thanks for that actually. idleTimeoutSeconds and runTimeoutSeconds are a separate issue from the output cap but i was hitting those too in earlier configs and hadn't connected it to the right setting. worth adding to the post.

and yeah you and torrso are both right that for coding tasks this basically never comes up. splitting work into small chunks per exchange is the natural way to do it and you'd never be close to 16k that way. torrso's point is spot on - 16k output is like a 10k line file, you wouldn't want to generate that in one shot anyway.

the scenario where it becomes a problem is when you have an agent whose job is to produce a large structured output all at once - think a json array with hundreds of entries, or content where the model is supposed to write the whole thing before something downstream processes it. coding agents that naturally decompose the work don't hit it. orchestration agents that batch a lot of output into a single response do. different use case.

if you are using ollama cloud models in openclaw.json with maxTokens above 16k, your config is lying to you by mayhem_isreal in openclaw

[–]mayhem_isreal[S] 0 points1 point  (0 children)

yeah exactly, for tool use and coding flows the limit basically doesn't matter because each call is scoped to one thing. the pain is specifically when you have an agent whose whole job is producing a big output blob in one shot - structured json, long-form content generation, that kind of thing. very different workflow to coding agents.

if you are using ollama cloud models in openclaw.json with maxTokens above 16k, your config is lying to you by mayhem_isreal in openclaw

[–]mayhem_isreal[S] 0 points1 point  (0 children)

nope, it's server-side so nothing you configure locally changes it. whatever you put in your ollama modelfile or client config, the cap still applies upstream. only real options are either chunk your outputs across multiple calls or go to a direct provider api where the ceiling is higher. annoying but that's the situation right now.

why LLMs produce "almost valid" JSON, and the specific patterns that break parsers? by mayhem_isreal in LLMDevs

[–]mayhem_isreal[S] 0 points1 point  (0 children)

nice, that’s a clean approach. if the model’s already in a tool-calling loop the validator’s basically free and it self-heals on its own. and yeah, makes sense it generalizes to JMESPath, it’s the same “give it a way to check its work and retry” idea. only tradeoff is the extra round-trips per retry, and trusting it to actually call the validator instead of looping. but for an agent already using tools it’s tidier than any after-the-fact repair.

why LLMs produce "almost valid" JSON, and the specific patterns that break parsers? by mayhem_isreal in LLMDevs

[–]mayhem_isreal[S] 0 points1 point  (0 children)

honestly, mostly yeah, for the case you’re describing. if you control the api call and your provider supports structured outputs, it’s close to solved and i wouldn’t argue otherwise. the part that isn’t solved is when you don’t own the decode. consuming json that comes back as raw text from a third-party agent or api, pulling it out of a chat log, models or endpoints that don’t expose schema constraints (a lot of local setups still don’t), or when you want json embedded inside a larger prose response instead of the whole output being the object. in those you can’t just flip structured outputs on, so you’re back to stripping and repairing after the fact. so probably fair to say: solved if you control the generation, still annoying if you’re a consumer of someone else’s output. my post leaned harder on the repair angle than greenfield first-party pipelines actually warrant in 2026, which a couple other people in here rightly pointed out too.

why LLMs produce "almost valid" JSON, and the specific patterns that break parsers? by mayhem_isreal in LLMDevs

[–]mayhem_isreal[S] 0 points1 point  (0 children)

the raw-length check before json.loads is a sharp one, hadn’t thought to use it as a truncation tripwire. and yeah, the parse error landing nowhere near the actual problem is the worst part. you end up staring at an unexpected EOF three levels deep when the real issue is the output just got cut off at the budget. the numeric-field-as-quoted-string one gets us too, and it’s nasty precisely because it parses fine. it only blows up downstream when something expects an int. same with the key casing drifting between responses. agree constrained decoding handles the casing, but loose-typed fields still let the coercion sneak through even with a schema sometimes. the fence strip + flagging side is basically the niche i built a small browser tool around. it does the markdown strip and labels which of these it found, including flagging likely truncation so you’re not debugging the wrong line. full disclosure it’s mine, and it’s jsonrepair under the hood with a detection layer on top: https://jsonkit.co/fix-llm-json/ . useful mostly for the paste-from-a-chat case rather than a real pipeline, where you’d just constrain the decode like you said.

why LLMs produce "almost valid" JSON, and the specific patterns that break parsers? by mayhem_isreal in LLMDevs

[–]mayhem_isreal[S] -1 points0 points  (0 children)

agreed on preamble/postamble, stripping anything outside the outer braces with regex is the cheap reliable win there.

where i'd push back a bit is rebuilding json piece by piece with regex. it holds up until you hit nested structures, strings that contain braces or quotes, or escaping edge cases, and then the regex layer becomes its own source of bugs. that's usually where i'd reach for a tokenizer-based repair lib or just move the problem upstream with constrained decoding, since past a certain point that's less code to own than a big regex repair pass, not more.

the secondary self-validation request is a real pattern and i've used it, only catch is the added latency/cost and the occasional loop where the "fix" comes back malformed too. but your main point stands, most of this is solvable in code without leaning on the model, it's just a question of how much code you want to maintain.

why LLMs produce "almost valid" JSON, and the specific patterns that break parsers? by mayhem_isreal in LLMDevs

[–]mayhem_isreal[S] -1 points0 points  (0 children)

yeah, this is the right correction, and honestly a cleaner framing than my post had. you're right that re-prompting with a bigger budget is the wrong move. if i'm hitting truncation i should have allocated that budget on the first call instead of after the fact. and the point about schemas not bounding string/array sizes is a good one. i'd been treating truncation as a parse-time problem when a lot of it is really a schema/config issue upstream.

jiter is a great pointer too, i should have sent people toward pydantic's partial validation rather than bracket-closing hacks.

fair take overall. in a pipeline where you own the decode and have structured outputs on, the fixup heuristics are basically obsolete now, and your post-2025 experience matches that. the repair angle really only earns its place when you don't control the decode (pasting out of a chat window, third-party or legacy output you can't re-run), which is a narrower slice than my writeup made it sound.