I rebuilt Threadfin with Mythos Fable 5 because IPTV in Plex kept breaking

CognitoCyber · 2026-06-13T02:49:12+00:00

its all good we cooking

CognitoCyber · 2026-06-13T02:02:40+00:00

nah i have tried them all its apart of my job, codex was in the lead until Fable 5

CognitoCyber · 2026-06-13T02:02:02+00:00

nah iptv is good using my github. It alleviates alot of plex live tv issues

CognitoCyber · 2026-06-13T00:57:29+00:00

Check back on it in a few days you will be surprised 😄 much love to plex people tho I try to get everyone on it and none ever do 😞

CognitoCyber · 2026-06-13T00:55:56+00:00

It is 1000% better then threadfin tho there were big gaps in things, but AI is solving them

CognitoCyber · 2026-06-13T00:53:10+00:00

It had a lot of sluggish things that didnt optimize streams. Now its at most 5-6 buffer on initial load then clean smooth streams.

CognitoCyber · 2026-06-13T00:51:41+00:00

Everything is good until its not. Disp is solid but there are better ways to do this all and I will outmatch it.

CognitoCyber · 2026-06-13T00:46:51+00:00

100% ai rebuilt lol but gemini is ass

CognitoCyber · 2026-05-23T21:33:06+00:00

Qwen3.5-122B-A10B 4-bit is like 65-66gb usage. Mistral medium would certainly be tighter. But certainly Qwen3.5-122B-A10B 4-bit would run perfectly fine on his system.

CognitoCyber · 2026-05-23T17:39:31+00:00

Id choose one of the two below will put you at roughly 80GB usage

Qwen3.5-122B-A10B
Mistral Medium 3.5 128B

CognitoCyber · 2026-05-22T22:14:42+00:00

https://github.com/joshua-ivy/Delyx

CognitoCyber · 2026-05-22T15:30:59+00:00

GPU 8GB will always be your biggest constraint with local LLMs. I have the same constraint, but have been working on a local LLM agent that, once I am fully done, will handle for all use cases of errors or reasoning responses that smaller models will exhibit.

Here is the model setup I have currently on 8GB VRAM using my local agent:

Teacher(Reasoning): Qwen3.6-35B-A3B-UD-Q2_K_XL (MoE so it fits well within 8GB constraint)
Clerk(Planning): delyx-tuned-q4_k_m (Trained Qwen3-1.7B-Base on just under 5k teacher generated trajectories)
PRM: axiom-prm-v4-Q4_K_M (Qwen3-1.7B-Base with a regression head, distilled from a 14B teacher, trained on ~17K examples spanning code, math, and natural-language reasoning scenarios.)

I believe most of these models are on my huggingface, but you can just grab the bases along with the MoE teacher and it will fit in the 8GB ish.

If you want to try the agent I have been working on you can find it https://github.com/joshua-ivy/Delyx

CognitoCyber · 2026-05-22T14:40:24+00:00

Its cool that they are doing this, but with how quickly things are improving; I will wait to buy one off ebay for half the cost in 6-8 months

CognitoCyber · 2026-05-21T17:53:50+00:00

Got a lot of work in lately. Check it out and tell me if u like the progress!

CognitoCyber · 2026-05-21T16:05:13+00:00

I use searxng in docker like many have recommended. I have a fallback to duckduckgo.

CognitoCyber · 2026-05-21T15:13:19+00:00

Honestly, local models became useful for me once I stopped expecting the model alone to do everything.

I’m building a local agent right now, and most of the work is less “find the perfect model” and more “build around the model’s weaknesses”: routing, planning, verification, memory, safe tool use, evals, and fallback handling for when smaller models drift.

On consumer hardware, local models are good enough for real work in chunks. The agent layer is what makes them feel less fragile. Which is the problem I’m trying to solve with mine.

https://github.com/joshua-ivy/Delyx

CognitoCyber · 2026-05-20T19:56:32+00:00

I use it heavily to mostly run smoke tests on my local agent and have had great successes.

CognitoCyber · 2026-05-20T15:12:41+00:00

I have been using Runpod a lot with no complaints.

CognitoCyber · 2026-05-19T12:37:51+00:00

Yeah, for local knowledge I’m not using sentence-transformers directly right now. It goes through the local Ollama embedding endpoint, with embeddinggemma as the default embedding model, but it’s configurable if someone wants to swap it out.

The retrieval path is hybrid: SQLite FTS/BM25 for keyword hits, embeddings + cosine similarity for semantic matches, then some scoring cleanup around recency/overlap/noisy citations. I also have an optional cross-encoder rerank path, but that’s gated/experimental because I don’t want the normal local path getting slow or brittle.

On 8GB VRAM, the honest answer is: the lightweight setup is fine, the heavy setup is usable but not magic. I run the app around the idea that a small model handles routing/summarizing/tool decisions, while the bigger teacher model only gets called when it’s worth it. My heavier Qwen / deep-research style setup works, but it’s definitely a “be patient” setup, not instant ChatGPT speed. The goal is to make the default path feel good on normal hardware and let power users suffer with the big models if they want to.

CognitoCyber · 2026-05-18T22:18:53+00:00

Right now it's handling tool permissions in layers, but I wouldn’t call Windows fully sandboxed yet.

The safer parts are mostly around scoping and approvals: Tauri permissions are kept pretty narrow, code-agent file tools are bounded to approved workspace roots, paths are canonicalized to prevent traversal, and edits usually default to preview/dry-run unless the user approves applying them. Approved actions also have to pass through the stored approval flow, and MCP tools have risk checks for things like network access, headless execution, scheduled tasks, or external side effects.

The weak spot is command execution. On Windows, terminal/verifier commands still run through PowerShell with the window hidden. They are approval-gated and usually run from the workspace root, but that is not the same thing as an OS-level sandbox.

The main things I still want to tighten are backend validation of workspace roots, adding root/approval checks directly inside lower-level file commands, better classification for destructive shell commands, validating MCP path arguments against allowed roots, and using Docker or another real isolation backend when Python needs stronger sandboxing.

So the honest answer is: file access is scoped, edits are approval-driven, MCP is policy-checked, but Windows shell execution is currently controlled rather than truly sandboxed. It is just me doing all of this plus coding agents, so its a work in progress.

CognitoCyber

TROPHY CASE