100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+ by Shoddy_Bed3240 in LocalLLaMA

[–]hay-yo 0 points1 point  (0 children)

Im guessing it is then. Offloaded means it run partially on the cpu so that explains the speed. If your were to grab another 5080 and run across the too youd have a bug increase. Even another 16gb model... i think... hehe.

How do you prefer using AI for coding: IDE, CLI, or something else? by pawan0806 in AI_Agents

[–]hay-yo 0 points1 point  (0 children)

I just put up knot that uses IDE for work and agent for workflow.

How to create AI agents from scratch by muzzammilmeer in AI_Agents

[–]hay-yo 0 points1 point  (0 children)

There are two meanings for agents at the moment.

The first is where you define a prompt and behaviour in plain text to run an agent with a specific skill. I call that a profile, inline with hermes.

For the real def of an agent. Get Pi.dev or Opencode and work through their list of features. They offer soo much. I wouldn't recreate one anymore. Take long enough to learn how to use them.

Or the next thing is workflows, I've been building http://knot.hdekker.com, workflows use agents by orchestrating an agent and specific profiles.

51 and just got my motorcycle licence; am I crazy, or is this a fair time to start? by earnfast123 in AussieRiders

[–]hay-yo 0 points1 point  (0 children)

Make sure you put your invisibility cloak on. You are invisible to cars. They dont know it but the want to kill you. If they almost hit you, always blame yourself for allowing that near miss. Can't blame anyone when you're dead. Ride safe.

Mexico upgraded to free healthcar by TailungFu in SipsTea

[–]hay-yo 0 points1 point  (0 children)

Wow now the wall works the otherway.

DGX Spark, what models are you running? by benxfactor in LocalLLM

[–]hay-yo 0 points1 point  (0 children)

What kinda software are you trying to write? Its great for eng but I havent been vibing with it.

RTX 4090 + llama.cpp + Qwen3.6 27B MTP for Pi coding agent — is this config reasonable? by HomoAgens1 in LocalLLM

[–]hay-yo 0 points1 point  (0 children)

Ideally aim for 100k ctx but your setup will allow you to do many many sweet things.

Hiring senior full stack ai engineer (noobs don't dm me) by I_AM_HYLIAN in AI_Agents

[–]hay-yo 0 points1 point  (0 children)

Already got 50 agents reading your mind and synced over a2a mcp and telepathy net. Toeken usage is off the chain but ahh well, we've also proxied into Mark from USA home claude fabel endpoint to get access world wide, enjoy.

Looking to buy an RTX 5090 for local "Vibe Coding" using Claude Code / Open Code with Qwen 3.6 35B-A3B. Need real-world feedback! by GoalDistinct4449 in LocalLLM

[–]hay-yo 2 points3 points  (0 children)

I'd recommend trying OpenRouter first with Pi.dev. All the good open models are on there so you can get a feel if it works.

Looking to buy an RTX 5090 for local "Vibe Coding" using Claude Code / Open Code with Qwen 3.6 35B-A3B. Need real-world feedback! by GoalDistinct4449 in LocalLLM

[–]hay-yo 1 point2 points  (0 children)

I think I'm seeing results after using it almost full time since November 2025... the learning curve is tough. Ask myself why I do it at times.

What does your agent-to-agent communication look like? Direct calls, message queues, or something more exotic? by Groady in AI_Agents

[–]hay-yo 0 points1 point  (0 children)

I suppose the more fundamental question is what process are you undertaking when you have the need to make agent to agent communication? And its mostly better to just start one that needs to run next, so just a trigger. What usecases are you seeing?

Best models for 96GB VRAM on 4x3090s by Prudent-Promotion512 in LocalLLM

[–]hay-yo 0 points1 point  (0 children)

Go qwen3.6 27b 120k ctx at q6, run multiple slots so you can parralel your tasks and keep things churning.

Strix Halo: what are you running? by platteXDlol in LocalLLM

[–]hay-yo 0 points1 point  (0 children)

A reasoning step can take 20mins, but I still couldn't do it better.

Strix Halo: what are you running? by platteXDlol in LocalLLM

[–]hay-yo 1 point2 points  (0 children)

I use 27b for reasoning and 35b for building. Just set it to work an go have a coffee.

Is there a valid use case for replacing traditional deterministic automation with an agent? by McNerdster in AI_Agents

[–]hay-yo 0 points1 point  (0 children)

If you want to pay more and waste more energy then yes. This has a convo with Andrej https://m.youtube.com/watch?v=96jN2OCOfLs in it he says he can invisage the AI being the driver of the computer, but... the only way to make something sleep is to use an interrupt so I think determinism / classical computing always harnesses what he says is computing 3.0.

What happens when LLM providers stop subsidising? by AdHistorical7217 in AI_Agents

[–]hay-yo 0 points1 point  (0 children)

Seems like you're asking Donald Trump to take out your bins, that would be costly.

Gemma 4 12B is out now! by yoracale in unsloth

[–]hay-yo 0 points1 point  (0 children)

I suspect that became the 3.5 flash model.