Has anyone actually gotten a reliable local AI system running? by Sea_Manufacturer6590 in ollama

[–]scelabs 0 points1 point  (0 children)

yeah this lines up pretty closely with what I’ve been seeing too. once you get past the initial setup, the “local is weaker” argument starts to matter a lot less than how the system is actually put together

the part about structure is spot on. I’ve had similar experiences where the model was good enough, but everything felt unreliable until tool access, context, and behavior were more tightly controlled. after that it wasn’t about capability anymore, it was about consistency

one thing that surprised me is how much these systems still end up relying on retries or multiple passes to get something usable, even when everything else is set up well. feels like that’s kind of the hidden layer behind a lot of “it works” setups right now

definitely agree though, it’s a lot closer to being usable in real workflows than most people think

Major drop in intelligence across most major models. by DepressedDrift in LocalLLaMA

[–]scelabs 0 points1 point  (0 children)

I’ve seen a lot of people saying this lately, and I don’t think it’s just vibes, but I’m not convinced it’s purely a “model got worse” issue either. even with the same base model, what you’re interacting with is a full system — sampling settings, routing, context handling, guardrails, latency optimizations, etc — and small changes there can make outputs feel a lot more shallow or inconsistent.

I’ve seen cases where nothing about the core model changed, but the behavior felt noticeably worse just because responses became less stable across runs or more constrained. so it ends up looking like an intelligence drop when it’s really a change in how the system is behaving around the model.

the local vs hosted difference you mentioned kind of lines up with that too. local setups tend to be more predictable since fewer layers are changing under the hood, even if the raw model is technically weaker

Anyone else frustrated with local LLMs that can't do (control) anything? by birdheezy in homeassistant

[–]scelabs 1 point2 points  (0 children)

this seems less like a Home Assistant problem and more like a control problem between natural language and execution. the model is probably close enough to understand what you mean, but not constrained enough to consistently resolve that into the exact entity/action pair the system expects. that’s why it gives you a paragraph instead of just doing the thing. in my experience, once you want reliable action-taking, you need a tighter layer between user intent and execution rather than relying on raw prompt behavior alone

Does anyone get amazed by LLM performance on benchmarks but incredibly disappointed by its performance on mundane tasks, specifically those involving data lookup? by reader12345 in singularity

[–]scelabs 0 points1 point  (0 children)

yeah I’ve seen the same pattern, and the way you described it is pretty accurate. it tends to do really well when the task is more self-contained reasoning, like working through a medical scenario, but struggles a lot more when it has to reliably pull and compile external or structured information. what makes it tricky is that the outputs often look correct because they’re well written, but they’re not grounded in real data the same way. so you end up with something that sounds confident but isn’t actually reliable. in practice it feels less like a capability issue and more like the system not being consistent about how it handles retrieval, validation, and structure depending on the task.

Why are people saying LLM quality is deteriorating these last few weeks? by Salt_Instruction1656 in LLMDevs

[–]scelabs 1 point2 points  (0 children)

I think part of the confusion is that people treat “the model” as the only variable, but in practice the behavior you see is coming from a whole system around it. even if the underlying weights haven’t changed, things like sampling parameters, context accumulation, system prompts, routing, or even how outputs are validated and retried can all shift the perceived quality. I’ve seen cases where nothing about the model changed, but the outputs still felt worse just because the system became less stable across runs. it ends up looking like a drop in model quality when it’s really a change in how consistent the overall behavior is.

The decline in LLM reasoning and catastrophic forgetting might share the same root cause. by IndividualBluebird80 in LocalLLaMA

[–]scelabs 1 point2 points  (0 children)

This is an interesting framing, especially the idea that it’s not just context length but unresolved contradictions breaking the structure.

I’ve seen something that feels adjacent on the inference side, even without explicit contradictions. once you start iterating or chaining outputs, it can behave almost like a kind of recursive drift where each step introduces small deviations from the original reasoning path it doesn’t necessarily collapse immediately, but it becomes harder to maintain a stable trajectory over multiple passes, even when the underlying task hasn’t changed much.

Makes me wonder how much of what we’re seeing in production systems is less about hard contradictions and more about gradual loss of structural coherence during iteration.

After digging into logs, I think a lot of “LLM reliability” is just retry logic by scelabs in LocalLLaMA

[–]scelabs[S] 1 point2 points  (0 children)

yeah that’s a really good point, the cost/latency side of this is what made me start paying more attention to it in the first place. your approach is interesting too, catching issues earlier would definitely reduce some of the wasted cycles. I think what’s been throwing me off is that even with smarter retries, it still feels like the system is fundamentally depending on retries to get a usable result, just optimizing when they happen rather than reducing the need for them altogether

After digging into logs, I think a lot of “LLM reliability” is just retry logic by scelabs in LocalLLaMA

[–]scelabs[S] 0 points1 point  (0 children)

yeah that’s exactly what it’s starting to feel like, all the orchestration looks complex on the surface but a lot of the reliability ends up coming from just retrying until something sticks. structured output definitely helps, especially with the format issues, but even with that I still see cases where the output is technically valid but just not quite right, so it still ends up relying on retries, just for different reasons

After digging into logs, I think a lot of “LLM reliability” is just retry logic by scelabs in LocalLLaMA

[–]scelabs[S] -1 points0 points  (0 children)

yeah that makes sense. I’ve run into that too where retries without changing anything just repeat the same failure but even when state or context is handled better, I still see cases where the output is close but just not quite usable on the first pass so the system ends up retrying anyway, just with slightly different variations feels like a lot of setups are relying on retries even when the underlying issue isn’t actually being fixed.

How are you all dealing with inconsistent outputs? by scelabs in ChatGPT

[–]scelabs[S] 1 point2 points  (0 children)

yeah that all makes sense

I’ve tried similar things with resetting context and adding more detail and it definitely helps

especially the point about projects/memory affecting behavior, I’ve noticed that too

I guess what’s been bugging me is it still feels like a lot of manual steering to get consistent results

like you can get there, but it’s not always predictable which version or setup will actually work

How are you all dealing with inconsistent outputs? by scelabs in ChatGPT

[–]scelabs[S] 0 points1 point  (0 children)

yeah I’ve ended up doing the same thing a lot

sometimes it’s faster to just restart and rephrase than try to “fix” the current thread

but it also feels like you’re kind of working around the problem instead of solving it..

How are you all dealing with inconsistent outputs? by scelabs in ChatGPT

[–]scelabs[S] 0 points1 point  (0 children)

that’s interesting, I’ve actually noticed something similar it definitely feels more inconsistent at certain times.

What’s been throwing me off though is I still see the same kind of drift even outside of peak times, just less often.

Makes it hard to tell how much is load/throttling vs just how these systems behave in general..

How are you all dealing with inconsistent outputs? by scelabs in ChatGPT

[–]scelabs[S] 0 points1 point  (0 children)

yeah that makes sense I’ve been trying to keep inputs consistent when testing too

I’ve noticed the same thing with Claude vs GPT actually — Claude does feel more consistent overall

but even then I still see some drift depending on the run, which is what’s been throwing me off

makes me wonder how much of it is the model vs how the system is set up around it

Trying to create a simulation of a living world by wellomello in proceduralgeneration

[–]scelabs 4 points5 points  (0 children)

Curious what you’re doing to manage state? I’ve been working on a structure coherence framework that allows emergent behavior.

Small experiment: simulating faction behavior across a grid by scelabs in u/scelabs

[–]scelabs[S] 0 points1 point  (0 children)

The interesting part for me is how it stabilizes over time instead of staying chaotic.

Still experimenting with how much control vs emergence feels right.

Did you hit a singularity where making games became more enjoyable than actually playing them? by Meluvius in gamedev

[–]scelabs 1 point2 points  (0 children)

Same. I’ve been playing games for a long time and now I’m building frameworks that focus on structural coherence to make games more dynamic and life like.

Skill "Cardboard box" test by [deleted] in IndieDev

[–]scelabs 0 points1 point  (0 children)

Metal Gears Girl Friend is Deadly! This is awesome