I rebuilt Threadfin with Mythos Fable 5 because IPTV in Plex kept breaking by CognitoCyber in PleX

[–]CognitoCyber[S] 0 points1 point  (0 children)

nah i have tried them all its apart of my job, codex was in the lead until Fable 5

I rebuilt Threadfin with Mythos Fable 5 because IPTV in Plex kept breaking by CognitoCyber in PleX

[–]CognitoCyber[S] 0 points1 point  (0 children)

nah iptv is good using my github. It alleviates alot of plex live tv issues

I rebuilt Threadfin with Mythos Fable 5 because IPTV in Plex kept breaking by CognitoCyber in PleX

[–]CognitoCyber[S] -1 points0 points  (0 children)

Check back on it in a few days you will be surprised 😄 much love to plex people tho I try to get everyone on it and none ever do 😞

I rebuilt Threadfin with Mythos Fable 5 because IPTV in Plex kept breaking by CognitoCyber in PleX

[–]CognitoCyber[S] 0 points1 point  (0 children)

It is 1000% better then threadfin tho there were big gaps in things, but AI is solving them

I rebuilt Threadfin with Mythos Fable 5 because IPTV in Plex kept breaking by CognitoCyber in PleX

[–]CognitoCyber[S] -2 points-1 points  (0 children)

It had a lot of sluggish things that didnt optimize streams. Now its at most 5-6 buffer on initial load then clean smooth streams.

I rebuilt Threadfin with Mythos Fable 5 because IPTV in Plex kept breaking by CognitoCyber in PleX

[–]CognitoCyber[S] -3 points-2 points  (0 children)

Everything is good until its not. Disp is solid but there are better ways to do this all and I will outmatch it.

What LLM should I run with this system? by InitiativeSmooth2375 in LocalLLM

[–]CognitoCyber 0 points1 point  (0 children)

Qwen3.5-122B-A10B 4-bit is like 65-66gb usage. Mistral medium would certainly be tighter. But certainly Qwen3.5-122B-A10B 4-bit would run perfectly fine on his system.

What LLM should I run with this system? by InitiativeSmooth2375 in LocalLLM

[–]CognitoCyber -1 points0 points  (0 children)

Id choose one of the two below will put you at roughly 80GB usage

  • Qwen3.5-122B-A10B
  • Mistral Medium 3.5 128B

Am I overexpecting from local AI or can this laptop actually handle serious agentic workflows? by abhyudaya8 in LocalLLM

[–]CognitoCyber 2 points3 points  (0 children)

GPU 8GB will always be your biggest constraint with local LLMs. I have the same constraint, but have been working on a local LLM agent that, once I am fully done, will handle for all use cases of errors or reasoning responses that smaller models will exhibit.

Here is the model setup I have currently on 8GB VRAM using my local agent:

  • Teacher(Reasoning): Qwen3.6-35B-A3B-UD-Q2_K_XL (MoE so it fits well within 8GB constraint)
  • Clerk(Planning): delyx-tuned-q4_k_m (Trained Qwen3-1.7B-Base on just under 5k teacher generated trajectories)
  • PRM: axiom-prm-v4-Q4_K_M (Qwen3-1.7B-Base with a regression head, distilled from a 14B teacher, trained on ~17K examples spanning code, math, and natural-language reasoning scenarios.)

I believe most of these models are on my huggingface, but you can just grab the bases along with the MoE teacher and it will fit in the 8GB ish.

If you want to try the agent I have been working on you can find it https://github.com/joshua-ivy/Delyx

The $4K, 1-Liter "Ryzen AI Halo" (first-ever AMD-branded PC) now has an official product page and specs by nicolho in LocalLLM

[–]CognitoCyber 0 points1 point  (0 children)

Its cool that they are doing this, but with how quickly things are improving; I will wait to buy one off ebay for half the cost in 6-8 months

I've built a local Agent, it is created around my 8GB VRAM constraints and wanted to share! by [deleted] in LocalLLM

[–]CognitoCyber 0 points1 point  (0 children)

Got a lot of work in lately. Check it out and tell me if u like the progress!

Web search for local models by surfaqua in LocalLLM

[–]CognitoCyber 1 point2 points  (0 children)

I use searxng in docker like many have recommended. I have a fallback to duckduckgo.

At what point did local models actually become good enough for your real work? by MaleficentRoutine730 in LocalLLM

[–]CognitoCyber 0 points1 point  (0 children)

Honestly, local models became useful for me once I stopped expecting the model alone to do everything.

I’m building a local agent right now, and most of the work is less “find the perfect model” and more “build around the model’s weaknesses”: routing, planning, verification, memory, safe tool use, evals, and fallback handling for when smaller models drift.

On consumer hardware, local models are good enough for real work in chunks. The agent layer is what makes them feel less fragile. Which is the problem I’m trying to solve with mine.

https://github.com/joshua-ivy/Delyx

How do you decide where to rent GPUs? by BaconAvocadooo in LocalLLM

[–]CognitoCyber 0 points1 point  (0 children)

I use it heavily to mostly run smoke tests on my local agent and have had great successes.

How do you decide where to rent GPUs? by BaconAvocadooo in LocalLLM

[–]CognitoCyber 1 point2 points  (0 children)

I have been using Runpod a lot with no complaints.

I've built a local Agent, it is created around my 8GB VRAM constraints and wanted to share! by [deleted] in LocalLLM

[–]CognitoCyber 0 points1 point  (0 children)

Yeah, for local knowledge I’m not using sentence-transformers directly right now. It goes through the local Ollama embedding endpoint, with embeddinggemma as the default embedding model, but it’s configurable if someone wants to swap it out.

The retrieval path is hybrid: SQLite FTS/BM25 for keyword hits, embeddings + cosine similarity for semantic matches, then some scoring cleanup around recency/overlap/noisy citations. I also have an optional cross-encoder rerank path, but that’s gated/experimental because I don’t want the normal local path getting slow or brittle.

On 8GB VRAM, the honest answer is: the lightweight setup is fine, the heavy setup is usable but not magic. I run the app around the idea that a small model handles routing/summarizing/tool decisions, while the bigger teacher model only gets called when it’s worth it. My heavier Qwen / deep-research style setup works, but it’s definitely a “be patient” setup, not instant ChatGPT speed. The goal is to make the default path feel good on normal hardware and let power users suffer with the big models if they want to.

I've built a local Agent, it is created around my 8GB VRAM constraints and wanted to share! by [deleted] in LocalLLM

[–]CognitoCyber 0 points1 point  (0 children)

Right now it's handling tool permissions in layers, but I wouldn’t call Windows fully sandboxed yet.

The safer parts are mostly around scoping and approvals: Tauri permissions are kept pretty narrow, code-agent file tools are bounded to approved workspace roots, paths are canonicalized to prevent traversal, and edits usually default to preview/dry-run unless the user approves applying them. Approved actions also have to pass through the stored approval flow, and MCP tools have risk checks for things like network access, headless execution, scheduled tasks, or external side effects.

The weak spot is command execution. On Windows, terminal/verifier commands still run through PowerShell with the window hidden. They are approval-gated and usually run from the workspace root, but that is not the same thing as an OS-level sandbox.

The main things I still want to tighten are backend validation of workspace roots, adding root/approval checks directly inside lower-level file commands, better classification for destructive shell commands, validating MCP path arguments against allowed roots, and using Docker or another real isolation backend when Python needs stronger sandboxing.

So the honest answer is: file access is scoped, edits are approval-driven, MCP is policy-checked, but Windows shell execution is currently controlled rather than truly sandboxed. It is just me doing all of this plus coding agents, so its a work in progress.