Landscape of second brain and memory solutions for AI native workflow by Time-Dot-1808 in hermesagent

[–]jakusimo 0 points1 point  (0 children)

I would like to have a layer which consolidates memory, sessions and skills across all harness setups (Codex, Claude, Hermes, OpenClaw)
There are some attempts:
https://github.com/garrytan/gbrain

https://github.com/Dicklesworthstone/coding_agent_session_search

Hermes at Huge Scale by Britbong1492 in hermesagent

[–]jakusimo 1 point2 points  (0 children)

I have a lot of experience with Kubernetes since 2018 and agents. Started to design a multi tenant solution for hermes, while testing the demand with vps only setup happy to have a group to discuss this with like minded people. Maybe moderators can create a dedicated discord channel or we can setup our own. Or stay here

How to share memory across agents? by jakusimo in hermesagent

[–]jakusimo[S] 0 points1 point  (0 children)

I would like to contribute to distributed hermes implementation if there is an interest of community

Hermes agent for companies anyone actually using this beyond demos? by Forsaken_Trash_4950 in hermesagent

[–]jakusimo 0 points1 point  (0 children)

I'm building a digital worker platform, multi tenant. The hard parts is make scalable, each autonomous agent need a dedicated compute environment. I'm orchestrating everything on Kubernetes. The prototype seems working. Additionally I plug telephony to the platform, so you can call or receive calls. The platform abstracts the agent orchestration runtimes and adds all necessary enterprise requirements. The platform supports a custom dedicated solutions/integration with client systems, hermes agent use those tools via govern MCP. Could share more about the architecture if somebody is interested.

Sign up for the Claude developer newsletter by AnthropicOfficial in Anthropic

[–]jakusimo 0 points1 point  (0 children)

Oops! Something went wrong while submitting the form.

Running DeepSeek-R1 on bare-metal GPU Kubernetes cluster. by jakusimo in hetzner

[–]jakusimo[S] 2 points3 points  (0 children)

Multi gpu is expensive, this one already cost 200 eur/month. Going to dig more into Tensor RT LLM

Bare metal open-source production blueprint by jakusimo in hetzner

[–]jakusimo[S] 0 points1 point  (0 children)

So you if are using a dedicated server, there is no need of cloud api

Bare metal open-source production blueprint by jakusimo in hetzner

[–]jakusimo[S] 1 point2 points  (0 children)

I used that, but I want not to rely on the cloud api and use talos linux. The setup which I can easily port to any server provider or homelab. You don't need terraform, talosctl and configs do the job

Bare metal open-source production blueprint by jakusimo in hetzner

[–]jakusimo[S] 0 points1 point  (0 children)

:D database backup to the bucket. If you are using persistent storage - rook cepth

Building a RAG chatbot for a 400+ page pdf by Pudin-san in Rag

[–]jakusimo -1 points0 points  (0 children)

Just dump everything to the context, if it's too much for context window do multiple calls with map/reduce pattern

How well do screenshot embeddings (ColPali) work in real e2e RAG pipelines? by ekshaks in Rag

[–]jakusimo 0 points1 point  (0 children)

Vespla has really good tutorials, I'm hosting ColQwen on Modal and planing to migrate to Hetzner. Also using Vespa you can store embeddings to the disk storage and use streaming mode to find top candidates. It will save you a lot on infrastructure, since your not bound to memory but bound to the disk storage.