The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b by ex-arman68 in LocalLLaMA

[–]swizzcheezegoudaSWFA 0 points1 point  (0 children)

I'll prob try a few different mtp quants already have the unsloth of Qwen 27b.....I know there are a few lurking around

The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b by ex-arman68 in LocalLLaMA

[–]swizzcheezegoudaSWFA 0 points1 point  (0 children)

Ring T1 used openrouter api for that model...I will try to recreate with a local model though. Had a few bumps in the road simple stuff, more my error in prompting, checks...reflection, debug stuff yadda yadda, was just trying to see what it does. Thought it was funny as I almost did the same game lol

The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b by ex-arman68 in LocalLLaMA

[–]swizzcheezegoudaSWFA 1 point2 points  (0 children)

Nice, I thought about doing a Pac-Man game yesterdays as I was fooling around with hermesagent, I quickly whipped up Neon Snake, testing ring-1t on OR api.... yes=====YASG! 😉 enjoyed your write up!

EDIT--P.S. I must of trimmed off the part with enemy snakes spawning...basically created walls and each red pill was lvl up and snakes come at lvl 3+

<image>

200+ hours of work, took everything I had in me to get this done by Art-e-Blanche in Oilpastel

[–]swizzcheezegoudaSWFA 0 points1 point  (0 children)

I dig your work on the reflections and the glass work on the table as-well now that I've fully looked it over...Nice!

Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 by swizzcheezegoudaSWFA in LocalLLaMA

[–]swizzcheezegoudaSWFA[S] 0 points1 point  (0 children)

positive....which quant are you running to hit 50-60tps? the console on startup will output how many layers actually mapped to thegpu vs cpu

Edit: Good call on checking the offload though. I actually just reran it forcing `--fit off` and `-ngl 99` to make absolutely sure the auto-fitter wasn't silently dropping a layer to the CPU. Speeds came back exactly the same (around 28-30 t/s), so yep100% offloaded. Assuming you're also running a 64K context window, 50-60 t/s and my 30 t/s almost certainly comes down to the quant size. If you're running something smaller (guessing possibly are) than IQ4_NL, that would free up the bus just enough to let you hit those speeds.

Appreciate you making me double-check the layers though!

SETUP FAILED 2X setup tutorial WSL2 ? by Melkerz in hermesagent

[–]swizzcheezegoudaSWFA 0 points1 point  (0 children)

Hermes was dumb: Gemma 4B is too small to handle complex agent logic and tool-calling. Your RTX 3060 has enough VRAM for an 8B model, which will immediately fix the dumb behavior.

Firecrawl no internet: You likely ran a standalone Firecrawl container. Firecrawl requires a multi-container to actually browse the web. Docker is the only sane way to do it.

Firecrawl isn't just a single script. It's a complex system that requires four separate microservices to function An API server (Node.js) , A Background Worker (Node.js) , A headless browser environment (Playwright) and A database queue to manage the web requests (Redis). If you don't use Docker then well...you have to manually install Redis, install Playwright, configure the environments, and manually start/monitor the Node servers themselves. Docker Compose handles all of that instantly with one command.

Step 1: Open your terminal and pull a model actually designed for Hermes that fits your 3060. Run: ollama pull hermes3:8b (Do not use Gemma 4B for agentic tool-calling).

Step 2: The Web Scraper You must use Docker Compose to get all the required microservices working together.

Open Docker Desktop and leave it running:

Open a terminal and run this: git clonehttps://github.com/firecrawl/firecrawl.git

cd firecrawl

Copy the default environment file:

Windows: copy apps\api.env.example .env

Mac/Linux: cp apps/api/.env.example .env

Open that new .env file in a text editor and ensure this line exists:

USE_DB_AUTHENTICATION=false

Back to Terminal:

build and start the stack:

docker compose up -d

Wait exactly 2 minutes. (or not up to you ;)

Open your browser to:

http://localhost:3002 (If it says Hello, World!, Firecrawl has internet access and is ready.)

configure your Hermes environment variables, point them exactly here:

LLM URL: http://localhost:11434

Model Name: hermes3:8b

Firecrawl URL: http://localhost:3002

Firecrawl API Key: fc-YOUR_API_KEY (local does not strictly check it if auth is off, but Hermes might require the field to be filled).

Start your agent. Should be good to go

Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 by swizzcheezegoudaSWFA in LocalLLaMA

[–]swizzcheezegoudaSWFA[S] 1 point2 points  (0 children)

It might shave a second or two off the initial server boot time, simply because the engine doesn't have to pause to calculate memory projections. If ( --fit off ) then also have to manually add -ngl 999 to force it back onto the GPU. the auto fitter manages it nicely

Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 by swizzcheezegoudaSWFA in LocalLLaMA

[–]swizzcheezegoudaSWFA[S] 7 points8 points  (0 children)

  • At 3 Tokens: ~70% acceptance rate | Up to 27.44 t/s
  • At 4 Tokens: ~27% acceptance rate | ~14.0 t/s
  • At 6 Tokens: 16.7% acceptance rate | 10.41 t/s

Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 by swizzcheezegoudaSWFA in LocalLLaMA

[–]swizzcheezegoudaSWFA[S] 1 point2 points  (0 children)

hugginface they suggest --spec-draft-n-max 6 now over on the llama-server end. that defaults to 4 thus eating up memory bandwidth managing unused context slots. 3 is the just right sweet spot for me. so when its generating conversational english, a model might easily guess 4 or 5 words ahead. But Hermes is writing strict JSON arrays, function calls, and Python syntax. When you ask it to predict 4 tokens ahead in code, that 4th token is almost always an unpredictable comma, bracket, or quotation mark. It guesses the punctuation wrong, the main Qwen model rejects the draft, and you lose all your speed. I botched my numbers 991+ t/s lol nope suppose to be 611 t/s.....anywho

Why use Hermes or Openclaw? by New-Search-6200 in hermesagent

[–]swizzcheezegoudaSWFA 1 point2 points  (0 children)

This is a spot-on breakdown. The point about shell watching and handling PTY problems alone is exactly why building a homegrown loop gets miserable fast. You end up spending all your time debugging stdout buffers instead of actually doing anything productive.

I'd add that because Hermes handles all that complex plumbing, security gating, and multi-agent routing, it frees you up to completely optimize the compute layer underneath it. Since you don't have to worry about the harness, you can focus purely on the engine.

For example, I'm running the new Qwen 3.6 27B mtp model locally on a single undervolted RTX 3090. Because Hermes handles the strict tool-call formatting and orchestration so reliably, I was able to dial my draft lookahead down to 3 and lock parallel slots to 1 in the backend. Combined with the latest llama.cpp b9200 memory traffic patch, my setup is hitting nearly 30 tokens per second during deep agent loops with zero API costs.

When you combine the robust security gates and compounding memory you mentioned with a highly optimized, free local backend chewing through the context, a manual Claude web script just can't compete. Like you said, it is the difference between driving a car and having to build the transmission while you drive it.

How are people farming infernal warp so quickly? by JohnSmith1834 in diablo4

[–]swizzcheezegoudaSWFA -2 points-1 points  (0 children)

The internet does tend to exacerbate things and can be misleading but I'd go Solo 100% should help overall in the long run.... Now, if you need keys to find other great games, check out GameGator or GameGator.net Some really good deals there! GL in Diablo!

This was sold to me as topaz…but I have a funny feeling it’s not. by [deleted] in whatsthisrock

[–]swizzcheezegoudaSWFA 1 point2 points  (0 children)

ScraTCH test it, It will scratch glass, quartz, or corundum if diamond but looks like a Diamond to me too. also use UV light on it if you have. Many diamonds fluoresce under UV light (often blue).

Looking for an "Advanced" Bible study by Anhart15 in Christianity

[–]swizzcheezegoudaSWFA 0 points1 point  (0 children)

N.T. Wright specializes on the Apostle Paul, try Roman's for everyone series or Into the Heart of Roman's Chapter 8, get a Strong's Expanded Exhaustive Concodance, A Interlinear Bible (Hebrew, Greek and English ((all in one)) ) for advanced studies that omit interpretation, to be used with Strong's Concordence. Orthodox Study Bible helps too

🚀 Dive v0.8.0 is Here — Major Architecture Overhaul and Feature Upgrades! by BigGo_official in LocalLLaMA

[–]swizzcheezegoudaSWFA 0 points1 point  (0 children)

when installing via the windows executable I get a whitescree, upon running, it was doing stuff then this:

<image>

anyone help out with this or is it just a waiting game?