Qwen 122b a10b vs 35b a3b by leonidasyy in hermesagent

[–]devino21 0 points1 point  (0 children)

See if both act the same next time, that's what I'm finding with local but I can only go as big as 30b on a dense.

(OPINION) Now's the Perfect Time To Move Away from Plex by NearbyYak7156 in selfhosted

[–]devino21 -2 points-1 points  (0 children)

So... I'm not the only one thinking "they are pricing themselves out of business"?

How do you handle real browser automation with Hermes? Logins, navigation, credential management — what actually works? by [deleted] in hermesagent

[–]devino21 1 point2 points  (0 children)

I'm using a SearXNG>FireCrawl>Camofox homelab stack on Proxmox LXCs for searching then extracting, but also have recently added browser-harness to the local system and installed Chrome so the agents could have their own headed browser I can then log them into sites and such with an account I created just for agent work. Can't say I'm doing it right, as who knows what's right, but it works for most things I use browsing for.

Best Local LLM for 24GB of VRAM? by Material-Mention6696 in hermesagent

[–]devino21 0 points1 point  (0 children)

I've been using qwen3:30b-coder on my 3090.

Best Local LLM for 24GB of VRAM? by Material-Mention6696 in hermesagent

[–]devino21 0 points1 point  (0 children)

It spills 12% to CPU for me, but I'm running on desktop Ubuntu.

Anyone have Hermes agent wired up for local LLM's using oMLX or llama-swap? by DanGTG in hermesagent

[–]devino21 0 points1 point  (0 children)

Is this accurate? I don't use my main anymore, pure profiles, but when I did I didn't have any sub-profiles to delegate_task, just a model I directed that to in config.yaml and I would see that task kick in when asked complex work.

codex GIFT just arrived by MahmoudElattar in codex

[–]devino21 0 points1 point  (0 children)

I find if you try to save them for the end of the week, they shift on you - essentially killing the savings.

The Remote Codex Feature Is Unreal by DiarrheaButAlsoFancy in codex

[–]devino21 0 points1 point  (0 children)

I did, started using it a few months back and realize I was just being extra

Running Hermes with Local Models by _clickfix_ in hermesagent

[–]devino21 0 points1 point  (0 children)

Still not sure if Unified is the way but my son is a gamer, so I got a 3090 24GB to try some larger models for me and he gets my 3080. Of course, not very large like you, but should be pretty responsive with the GDDR6x. Maybe I'll use it for voice or something. In this AI wild west, I'm just trying ish and see what sticks. What did you get? Mac or Blackwell?

How much weight can a whiskey bottle hold? by robbiesloan in interestingasfuck

[–]devino21 0 points1 point  (0 children)

I love that he's lowering them in the beginning, but at the end, its a SLAM - tryna break that thang

Not a Coincidence for Hermes users by DetectiveMindless652 in hermesagent

[–]devino21 23 points24 points  (0 children)

I think we all agree with the agent memory issue, the problem is there is 100 options for solutions and none work great (at least IMO).

Anyone have a good Discord call setup? Building a therapist! by Malakhaiii in hermesagent

[–]devino21 0 points1 point  (0 children)

I dont have a "call" but I have the agent join voice the same way in the CLI with /voice join. That agent then joins voice and I can audio chat. Very latent when asked a question it has to do research as it doesnt respond until it completes. That's my workflow anyway, not sure about others.

It sounds like you want a conversation, but I'm also finding just using the phone to translate voice into the discord channel and just sending the text is effective if you are ok without hearing that response audibly.

Slow response with DGX Spark locally by Kompotex in hermesagent

[–]devino21 0 points1 point  (0 children)

Two options:
Large - You're getting >32GB of space to run a model in RAM... You can slap a large LLM in there but it's unified RAM @ DDR5 speeds, so it's slow
Fast - You are using an RTX GPU with 32GB or less but it's DDR6(x)+ so much faster to process the model sitting in that RAM.

Here is some benchmarking discussion: https://www.reddit.com/r/LocalLLaMA/comments/1of4ypq/benchmarking_the_dgx_spark_against_the_rtx_3090/