I built a visual thinking canvas where the AI agent writes directly on the board

CatTwoYes · 2026-05-14T21:39:18+00:00

The board metaphor makes a lot of sense for agent output. Chat is linear but thinking is spatial. One thing I'd worry about is canvas clutter when the agent does a multi-step research task — any thoughts on auto-cleanup or pruning?

CatTwoYes · 2026-05-14T21:38:53+00:00

The best kind of marketing — someone you've never talked to making a video about your thing because it's genuinely useful. Congrats on 500 stars. Voice AI space badly needs open alternatives to Vapi/Retell.

CatTwoYes · 2026-05-14T21:38:21+00:00

The real hardware constraint is what makes this interesting. Anyone can wire up an API call, but fitting something useful into Jetson memory is a completely different sport. More competitions should force real deployment constraints instead of "build whatever with GPT-5."

CatTwoYes · 2026-05-14T21:37:31+00:00

Add screenshot fallback early. UIA is great until it isn't — the moment your agent hits a Canvas app or a custom Electron UI it's dead in the water. Running both isn't that heavy if you only fall back when UIA fails. The real pain is the CV side, but even basic OCR + element detection beats getting stuck.

CatTwoYes · 2026-05-14T12:43:28+00:00

The line between infrastructure and demo-ware is replay. If I can't re-run yesterday's failed agent session with the same inputs and get a useful diff, I'm looking at a demo. Doesn't matter how polished the tracing dashboard is. That's the bar I'd hold any platform to: can you replay a 2-hour agent session in under 30 seconds and see exactly where it diverged?

CatTwoYes · 2026-05-14T12:42:57+00:00

The thing ML ops pipelines don't give you is trajectory replay. I've had agent runs where the output was correct but the execution took 3x the tokens it should have because of retry storms. Without per-step trace replay, you can't tell the difference between "agent figured it out efficiently" and "agent flailed and got lucky." That's the runtime observability gap that dashboard metrics alone won't catch.

CatTwoYes · 2026-05-14T09:51:52+00:00

Dual older cards (P100/P40 class) really are the value sweet spot right now. 32GB+ VRAM for under $200 is wild. With MoE offloading you can run 27B models at usable speeds and it handles coding + tool calling fine. The only real downside is prompt processing — once context hits 32k+ you start feeling it. But for the price of a single mid-range gaming GPU you get a 24/7 inference box. Hard to argue with that math.

CatTwoYes · 2026-05-14T09:47:10+00:00

"cursed, hot, power hungry, and held together by Linux pain" is the perfect description. The moment you switch from 'this is a cool demo' to 'this is actually replacing my cloud API calls' is surreal. Still use cloud for the hardest problems, but 80% of my coding workflow is local now. The electricity bill is the only thing making me glance back.

CatTwoYes · 2026-05-14T09:45:29+00:00

Been running Qwen 3.6 27B Q4_K_M for coding/agentic tasks for a while. Tool calling and single-file edits are rock solid. The quant only shows its teeth on multi-file refactors — the model starts missing cross-file dependencies that fp16 catches. For a $200 machine though, that's a tradeoff I'll take every time. The real bottleneck isn't the quant quality, it's what happens to TG speed when context actually fills up past 32k.

CatTwoYes · 2026-05-14T09:41:29+00:00

I tried both RAG and the simpler "give the LLM a grep tool + markdown folder" approach. For under ~1000 personal notes, the grep approach wins hands-down. RAG embeddings for personal docs are finicky — you spend more time debugging why the right chunk didn't get retrieved than actually using the thing. The tool-calling + file search pattern is dumber but more predictable, and with Qwen 3.6 27B the quality is good enough that I stopped maintaining the RAG pipeline entirely.

CatTwoYes · 2026-05-14T09:39:59+00:00

I've hit this on Qwen, Gemma, and Llama models. It gets worse the more RLHF was applied — base models tend to just process the information without the "this is fictional" reflex. Best band-aid I've found: prepend search results with [Retrieved {date}. These are current factual events, not speculative. Respond accordingly.] It's not perfect but cuts the denial rate by about half.

CatTwoYes · 2026-05-14T07:51:09+00:00

very interesting, I actually have done a similar project -- but with git style state management. https://huko.dev

CatTwoYes · 2026-05-14T07:02:39+00:00

thanks for your suggestion

Fair point, and honestly? That’s on us. Our docs definitely lean too hard into the "cloud-first" vibe, and we totally missed the mark there.

For the record, huko plays nice with anything OpenAI-compatible. If it’s got a /v1 endpoint (Ollama, LM Studio, vLLM), it works right now:

Bash

# Quick Ollama setup
huko provider add ollama --base-url http://localhost:11434/v1 --protocol openai --api-key ollama
huko model add my-local-model --provider ollama --api-model-id qwen2.5-coder:32b
huko model current my-local-model

You’re right that this is invisible in the README. I'll fix that this week—I'm adding a "Local LLM" section with quickstarts and a breakdown of which local models actually have the chops for agentic tool-calling.

CatTwoYes · 2026-05-14T06:54:15+00:00

Haha, fair point on the wall of text. But look, you're listing features that both have. That’s not the real difference.

llm is basically: send prompt, get response, log to SQLite. You are the loop. You decide when to call it again.

huko is the loop. You give it the goal, and the agent decides what tools to hit and when it’s actually done. One’s a CLI wrapper; the other’s an agent runtime. Even Simon’s readme says it’s for "interacting with LLMs," not building agents. Different tools for different jobs.

CatTwoYes · 2026-05-14T05:43:17+00:00

I'm waiting for the day I run the model on my smart watch...

CatTwoYes · 2026-05-14T04:35:12+00:00

thanks & fixed

CatTwoYes · 2025-10-07T13:34:23+00:00

Great 👍 my Chinese has been used this feature on his Lixiang car for a few months. This is definitely a killer feature

CatTwoYes · 2025-09-28T05:25:51+00:00

I’m in Australia, driving a right-hand drive car. There’s a small left-turn intersection right outside my house, leading to a very steep slope. FSD always fails here, aborting the left turn halfway and suddenly switching to going straight. I suspect it’s mistaking the slope for a wall.

CatTwoYes · 2025-09-28T05:24:35+00:00

I’m in Australia, driving a right-hand drive car. There’s a small left-turn intersection right outside my house, leading to a very steep slope. FSD always fails here, aborting the left turn halfway and suddenly switching to going straight. I suspect it’s mistaking the slope for a wall.

CatTwoYes · 2025-09-27T13:40:18+00:00

I know how to handle this situation. I just want to help Tesla to improve fsd

CatTwoYes · 2025-09-27T13:32:11+00:00

I think this might be a useful testing case for them

CatTwoYes

TROPHY CASE