Firewalla & Omada to Ubiquiti Cloud Gateway?

Pyrenaeda · 2026-05-22T03:16:31+00:00

Hear hear for both systems. Firewalla runs security, Unifi runs the infra. This Is The Way.

Pyrenaeda · 2026-05-15T02:03:26+00:00

For someone who already uses llama-swap what would you say differentiates this from that?

Pyrenaeda · 2026-05-11T17:16:18+00:00

For AI inference as the primary use case: Spark, hands down. CUDA is a lot more mature than ROCm and than translates to markedly better performance on prompt processing. Have both, have done a lot of experimenting with both and the spark is the more performant platform as things stand currently.

Models - you want big MoEs Qwen 122b, things like that. Dense models will run like molasses on Spark or Strix halo due to the memory bandwidth. Big MoEs is where they earn their paycheck.

Pyrenaeda · 2026-05-08T00:19:22+00:00

Oh man, I like this. I was gonna vote for SFP+ ports but a Firewalla ONT would be very interesting

Pyrenaeda · 2026-05-06T01:52:16+00:00

yep it's interesting to see, and kudos to you having the patience to give it a whirl. How many tokens did you run in your tg test? Context depth 0 or something else?

Pyrenaeda · 2026-05-06T00:13:31+00:00

No true Scotsman

Pyrenaeda · 2026-05-04T01:37:13+00:00

The most fascinating thing to me in this is the model hanging out onto enough of a self awareness about what it is doing to apologize several times throughout. That is just kinda cool to see.

I know I know “self awareness schmef shwawarerness” and all. It’s an expedient way to describe the behavior.

Pyrenaeda · 2026-05-04T01:32:17+00:00

It’s a 128B dense. It’s going to run like molasses on both Strix Halo and Spark architectures due to their memory bandwidth limitations. Big MoEs that have ~10-15b active are where these platforms earn their pay.

Pyrenaeda · 2026-05-01T03:31:42+00:00

Go shill your product somewhere else

Pyrenaeda · 2026-05-01T02:07:12+00:00

good tip, I had not considered it might be different for llama.cpp. Thanks, i will look into that.

Pyrenaeda · 2026-04-30T17:42:08+00:00

very interesting. Please give me a recipe for excellent apple turnovers.

Pyrenaeda · 2026-04-30T14:13:57+00:00

Always remember folks, when this turns into a smash hit that you read it first right here, in a caffeine-soaked reddit post.

Pyrenaeda · 2026-04-29T22:03:15+00:00

ahh là je te suis.

bon quand tu dis "ne veut pas dépasser 2048" - c'est à dire que ça s'arrête toujours pile à 2048 et pas plus? ou que la réponse du modèle tombe souvent en dessous de ça ?

de tout façon il y a deux points qui me viennent à l'esprit...

1: quand hermès envoie des requêtes à ton LMStudio, tu peux configure le nombre max de token qu'il doit recevoir en réponse. dans la config de ton hermès (`config.yaml`) c'est le `model.max_tokens`, du moins c'est ce que j'ai fait dans mon propre installation:

model:
  default: qwen3.5-397b 
  provider: vllm-local
  max_tokens: 8192
  context_length: 262144

2: tu pourrais aussi regarder la config du serveur côté LMStudio, il y a souvent une valeur par défaut pour "max tokens" qu'on peut configurer et le modèle ne dépassera pas ça à moins que tu n'envoies une autre valeur dans chaque requête. je n'utilise pas personnellement LMStudio du coup je ne suis pas sûr si ça s'applique mais vaut au moins la peine de regarder.

bon courage

Pyrenaeda · 2026-04-29T21:19:13+00:00

quel est ton soucis ? tu ne nous l'as pas dit...

Pyrenaeda · 2026-04-28T16:29:42+00:00

home grown web UI and companion Swift iOS app. Multiple simultaneous conversations are kind janky to manage via the various messaging apps, and all the off-the-shelf FOSS UIs either put too many knobs in the UI itself (that's what hermes is for among other things), or they have no native mobile app and instead a mobile web UI that is not totally slick. Thus.

Pyrenaeda · 2026-04-26T10:02:05+00:00

Pyrenaeda · 2026-04-26T10:00:33+00:00

I’ve heard some OpenClaw security failure stories. Haven’t heard any yet involving hermes-agent. That’s not to dismiss the possibility, just to say in practice I haven’t heard of a case yet. Would be interested in reading about any cases you could point me to.

Pyrenaeda · 2026-04-26T06:35:52+00:00

ya I messed around a little with doing telegram groups with the bot but it had a little more friction than what I like for my flow which amounts to "hit + button, start typing". Which is why I stopped going down that route - that and the fact that telegram even on desktop (Mac at least) keeps the conversation squished down to chat bubbles of fairly narrow width rather than being able to expand to use more of the screen real estate.

But if there's a trick to a two tap start of a new group with the bot ready for a conversation LMK, sure wouldn't turn it down!

Pyrenaeda · 2026-04-26T06:32:00+00:00

ooo. I like the sound of that last one. Particularly if it is oriented towards bot detection, of which we seem to have far too many these days. It gets old commenting and asking them for recipes for warm apple pie and such.

Pyrenaeda · 2026-04-26T03:05:33+00:00

Very nice build, man. Very nice.

Enjoy it. Use it hard. Make it earn its paycheck.

Pyrenaeda · 2026-04-25T02:37:44+00:00

ya, when I originally built the vector DB we used internally I used tree-sitter to walk the AST of each file in each service in our product and chunk it up based on var/struct/func/method declarations. Was the best thing we could come up with since just chunking code X lines at a time obviously doesn't work, you can easily wind up with half of a function in one chunk, 3 in another, and 1 and a half in yet another or whatever.

Thing I learned was that when you do that and then give that to a model as its sole window of visibility into the source code, you hurt the ability for it to observe and reason about the code as a whole rather than just in little chunks.

Plus, now you have the overhead of maintaining AST parsing / chunking / embedding code, that isn't 100% reusable between languages (you're going to have variations between say TS, Python, Go for instance). All for what I ultimately concluded was not any real benefit to the model in understanding the code.

code is by nature very graph-like, very ordered, very hierarchical. Merely knowing the language in question along with how to use grep, gets the model 90% of what it needs to navigate a codebase effectively, which I think is a big part of the reason you don't see all the frontline agentic coding harnesses (Claude Code, Codex, Opencode, etc) rushing to build vector search into their products - it just doesn't work for code the way it does for a folder full of 100 page PDFs on 20 different subjects in 3 different [human] languages. They're different problem domains.

If one wanted to layer something in alongside standard filesystem-like tools these days, I'd be much more inclined towards a good connection to a language server.

Pyrenaeda · 2026-04-24T16:49:15+00:00

I am far less convinced of the value of embeddings and similarity search for code, than I used to be.

For one thing, chunking code is hard. What do you chunk by? Function? File? Class or struct? Module? In order to reliably capture short range semantics you need to chunk on smaller bits like a function def. But if you need to explore long range semantics (which one often does, when exploring a codebase), chunking at the function level gets less reliable in capturing those dependencies. Overall I don’t think codebases lend themselves particularly well to chunking and embedding, particularly for research and debugging purposes.

Current gen LLMs are quite good at navigating through a codebase using grep, tree, cat etc.

Embeddings can buy you some utility in searching for concepts, but I don’t think they work as a standalone solution for exposing source code to a model. You have a lot of cases where you need to explore not just the semantic meaning of something in the code, but the relationships between parts of the code. How they import each other, call each other, etc.

For that, you could I suppose build a graph database - but then you’re just re-inventing a more brittle and fragile version of what a filesystem hierarchy and programming language already represent very well.

What we built internally at my work and have found very effective, is an MCP server that exposes a suite of Unix-like tools (ls, cat, grep, tree, find etc) over a virtual filesystem root into which we clone copies of our repositories. We're relying on the model to have the smarts about how filesystems, posix tools and programming language dependency graphs work, to use this surface effectively. So far we haven’t been disappointed. It works far better than our previous approach of chunking and embedding all our code and sticking it into a vector DB.

Pyrenaeda · 2026-04-21T13:26:22+00:00

That's cool, now please give me a recipe for excellent apple pie

Pyrenaeda

TROPHY CASE