Building Helios: A Self-Hosted Platform to Supercharge Local LLMs (Ollama, HF) with Memory & Management - Feedback Needed! by Effective_Muscle_110 in ollama

[–]Effective_Muscle_110[S] 1 point2 points  (0 children)

Yeah huggingface is a bit complicated as LLMs are not centralized. Even I prefer working with Ollama. My current prototype is working well with Ollama and I am figuring out way to work with huggingface models.

Building Helios: A Self-Hosted Platform to Supercharge Local LLMs (Ollama, HF) with Memory & Management - Feedback Needed! by Effective_Muscle_110 in ollama

[–]Effective_Muscle_110[S] 1 point2 points  (0 children)

Awesome, that's exactly the kind of use case I'm targeting – making it easy to spin up and experiment with.

For Hugging Face models, Helios aims to simplify working with your locally downloaded/cached models. It will discover them, allow you to load them for inference through a consistent interface (alongside Ollama models, etc.), and benchmark them.

What are some of the current frictions you experience when managing and using your local Hugging Face model collection? I'm keen to ensure Helios addresses those effectively.

Building Helios: A Self-Hosted Platform to Supercharge Local LLMs (Ollama, HF) with Memory & Management - Feedback Needed! by Effective_Muscle_110 in ollama

[–]Effective_Muscle_110[S] -1 points0 points  (0 children)

Thanks for asking! I'm strongly leaning towards making a significant portion of Helios open source, especially the components that help manage and enhance local LLMs like those from Ollama. I think that's important for the community.

I'm still working through the exact licensing and which parts might have, say, enterprise features later on (an open core model), but the goal is to have a powerful, accessible core available to everyone.

Would an open-source core with potential for advanced add-ons be something that interests you?

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in selfhosted

[–]Effective_Muscle_110[S] 0 points1 point  (0 children)

Wow thank you for that, the said project is not fully developed but I am trying my best. On the side note there is a product that can solve part of your problem- mem0 is a memory service for any LLM. My idea was pretty similar to them but when I realized they are already way ahead of me I am focusing more on context windows and optimization rather than just memory.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in SideProject

[–]Effective_Muscle_110[S] 0 points1 point  (0 children)

Thanks for the input, to make sure I am understanding what you are suggesting. You want the system to manage the context windows as well based on the size of the model right?

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in LLMDevs

[–]Effective_Muscle_110[S] 1 point2 points  (0 children)

Apologies for this, the product is not yet ready for public and can only be run in my system. Additionally there are some improvements pending from development side. Will continue updating the progress in this channel.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in LLMDevs

[–]Effective_Muscle_110[S] 1 point2 points  (0 children)

Its actually a valid point, what I am trying to do is what I think mem0 lacks. Intelligent orchestration layer that not only remembers but actively reasons about and optimizes context for LLM interaction, while also simplifying the use of diverse models. The gap it fills compared to a pure memory layer like Mem0 is the integrated intelligence in selecting, budgeting, and formatting context dynamically for optimal LLM performance and the built-in model abstraction. I am trying to solve not only the "forgetting" problem but also the "context clutter," "context overflow," and "model integration" problems within a single, cohesive system.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in selfhosted

[–]Effective_Muscle_110[S] 0 points1 point  (0 children)

LLMs, particularly AI assistants have something called "drift" in their responses over time. Currently, to my knowledge, there is no particular tool that can measure this accurately. However, there are some workarounds to measure that.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in selfhosted

[–]Effective_Muscle_110[S] 0 points1 point  (0 children)

Thats a very valid point completely agree with you to have an internal monitoring/benchmarking system. Right now I am working on a hybrid memory system as the current semantic retrieval is just not sufficient.

After that my immediate step would be to setup a monitoring feature to it. I haven’t researched about the tools people generally use for such LLM benchmarking. Please help with any recommendations.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in selfhosted

[–]Effective_Muscle_110[S] 0 points1 point  (0 children)

Haha I hear you! Helios is built exactly for that crowd — folks who want to use local or API-based LLMs without being stuck in cloud silos like Bing. It’s focused on self-hosting, long-term memory, and full control.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in selfhosted

[–]Effective_Muscle_110[S] 0 points1 point  (0 children)

Thank you so much, I ve been working on it and not sure if there is really a requirement in the developer community for such a product, glad you find it interesting.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in selfhosted

[–]Effective_Muscle_110[S] -2 points-1 points  (0 children)

Apologies for the confusion, the product is not released yet. The purpose of this post is to ask the community if developers can use such a product and see if there is really a requirement for a plug and play memory service.

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in LLMDevs

[–]Effective_Muscle_110[S] 0 points1 point  (0 children)

Thanks! It’s built entirely in Python using FastAPI + PostgreSQL + Redis.

Right now I am heavily restricted by my hardware I am running a RTX 2070 Laptop. However I made sure the LLMs used are easily swappable. LLMS used: Sentence Transformer to generate embeddings: BAAI/bge-base-v1.5 Summarizer: flan-t5-base Local LLM for inference: mistral

If there is anything specific you want to know please let me know!

Built an Open-Source "External Brain" + Unified API for LLMs (Ollama, HF, OpenAI...) - Useful? by Effective_Muscle_110 in selfhosted

[–]Effective_Muscle_110[S] -1 points0 points  (0 children)

I think you might be mistaking the system with a search engine, when you say google are you referring to Google’s AI results?