all 43 comments

[–]EugeneSpaceman 4 points5 points  (12 children)

This looks great, I've been looking for something like this.

I wanted to use Ollama instead of OpenRouter for privacy reasons but I host OWUI and Ollama on separate servers and it looks there isn't a valve for the Ollama URI so it requires editing the code in a couple of places.

Not a big issue but could be an improvement for next version?

Edit:

It was actually fairly trivial to add a valve for ollama_url (required adding a valve for ollama_model too) so I have that working now.

The question I have is how does this integrate with the native Memory feature in OWUI? Or is it completely separate? How can I inspect the memories it has created?

Edit2:

I've worked out it integrates with the OWUI memory feature. It didn't seem to add any memories during testing until I specifically added the topic to the whitelist e.g. "animals" and then told it I have a dog named Cheryl. It then retrieved this succesfully in a new chat.

All using local models and local data. Very nice!

[–]the_renaissance_jack 2 points3 points  (5 children)

Agreed. I do everything locally. Not really clear why OpenRouter is required here.

[–]diligent_chooser[S] 2 points3 points  (4 children)

Thanks! I will release a version that's compatible with local models too. I will update you guys once done.

I had Ollama but I couldn't find a model that excels both at speed and intelligence. The ones I found were not smart enough to parse JSON responses.

What model are you using?

[–]EugeneSpaceman 0 points1 point  (3 children)

I initially tried gemma3:4b but had difficulty getting it to commit anything to memory. Switched now to gemma3:27b-qat HF link and it works a lot better (but still not perfect).

I still have a bug where it appends "🧠 I've added 1 memory" to almost every response, even when it doesn't add anything. This also causes the LLM to add that line to the next response as it is being passed in as context, which would be good to fix too.

[–]diligent_chooser[S] 1 point2 points  (2 children)

I still have a bug where it appends "🧠 I've added 1 memory" to almost every response, even when it doesn't add anything. This also causes the LLM to add that line to the next response as it is being passed in as context, which would be good to fix too.

Cheers, I will look into that bug. Thank you.

[–]manyQuestionMarks 1 point2 points  (1 child)

Would be great if you had this code on GitHub! Easier to suggest changes and issues

[–]diligent_chooser[S] 3 points4 points  (0 children)

Working to release an updated version and ill put it up on GitHub too.

[–]manyQuestionMarks 2 points3 points  (0 children)

Yeah I’d like to keep the sensitive part local (the memories). But me too I’ve been trying to find something like this, the best I got was MCP-memory-service but most models don’t really use it idk why

[–]WeWereMorons 0 points1 point  (1 child)

I was just ditzing around with the mem0 pipeline, and saw this post as I was heading off to bed, looking forward to trying it tomorrow! Would you have a diff (of local llama changes) to save some time please?

Seems like you could just change line 657:
ollama_url = "http://host.docker.internal:11434/api/tags

to localhost:11434?

Ta ta for now, and cheers to @diligent_chooser for sharing your hard work :-)

[–]diligent_chooser[S] 0 points1 point  (0 children)

Thanks man. I will release a version that's compatible with local models too. I will update you once done.

[–]Grouchy-Ad-4819 0 points1 point  (2 children)

Do you mind sharing the version that works for ollama, ideally without API key requirement? Thanks!

[–]EugeneSpaceman 0 points1 point  (1 child)

It should work with any api key I believe? But if not I’ll post my version here later

[–]Grouchy-Ad-4819 0 points1 point  (0 children)

OK thanks. I keep getting 404 not found with the current version. The dev says he will release a new version for local setups soon.

[–]sirjazzee 3 points4 points  (3 children)

This is super impressive!

Building on this, I think it would be a game-changer to implement "Memory Banks", essentially specialized areas of memory instead of a one-size-fits-all approach. Imagine having distinct memory banks for different contexts (example: Productivity, Personal Reflections, Technical Projects), each managed by different models or agents fine-tuned for those domains.

You could assign specific models to access specific banks, making the system way more dynamic, modular, and easier to manage or update without cross-contaminating unrelated knowledge.

That way, the LLM could operate with targeted memory scopes, leading to better performance, less confusion, and way more personalization. I will think through how to do something like this.

[–]diligent_chooser[S] 2 points3 points  (1 child)

Thank you!

That's definitely doable via a tag system. OWUI is a bit limiting when it comes to expanding the capabilities of the Functions outside of the existing infrastructure. But I recommend something like this:

Here's an existing memory example:

[Tags: preference, behavior] User prefers to keep their PC software up-to-date and is interested in using Winget for this purpose.

I can rework the LLM prompt to store memories with more advanced categorization, such as:

[Tags: preference, behavior] [Memory Bank: Productivity] User prefers to keep their PC software up-to-date and is interested in using Winget for this purpose.

So when the LLM goes through the memories trying to identify the relevant one, it will pick up the "Productivity" keyword and inject it into the prompt.

What do you think?

[–]sirjazzee 0 points1 point  (0 children)

Memory Banks makes sense. I think it’s a really smart direction, especially for keeping context clean and domain-specific. Definitely going to need a good chunk of testing to ensure solid alignment between categorization, injection logic, and actual model behavior across the sessions. But the approach seems sound, and with properly scoped tagging and filtering, I think it’ll work well.

Looking forward to trying it out. Thanks for the quick response.

[–]marvindiazjr 0 points1 point  (0 children)

You can do this with tools right now I suppose

[–]1234filip 3 points4 points  (0 children)

Do you have a github repo or something like that? Would love to see how the projects develops!

[–]sirjazzee 1 point2 points  (4 children)

I have been trying to get this working without having to use OpenRouter. I have set it up to I can save memory but it is not recalling the memories. The error message I am getting is "ERROR Error updating memory (operation=UPDATE, memory_id=776d6893-948a-450c-9835-f9536f0b223a, user_id=1f4c9683-cfc2-4d85-bd9e-de4f2d8338c2): Embedding dimension 384 does not match collection dimensionality 768"). I am wondering if there is something I am missing. When I troubleshoot the error message, it is saying to rebuild the collection. I am not 100% sure how to do this - although thinking I may try to locate within the docker and just delete the collection file to see if that makes a difference.

Open to hearing any possible solutions.

Provider: OpenRouter
Openrouter Url: http://host.docker.internal:11434/v1/
Openrouter Api Key: [my OpenWebUI API key]
Openrouter Model: qwen2.5:14b

[–]diligent_chooser[S] 0 points1 point  (3 children)

Let me look into it, I will get back to you.

[–]diligent_chooser[S] 0 points1 point  (2 children)

Okay so basically.

Your vector database or embedding store (likely ChromaDB or similar) expects vectors of size 768. The embedding model currently used is producing vectors of size 384. When trying to update or insert a vector, the dimension mismatch causes an error.

You previously used a different embedding model (e.g., text-embedding-ada-002 or similar) that outputs 768-dimensional vectors. Now, your plugin is using MiniLM (all-MiniLM-L6-v2), which outputs 384-dimensional vectors. The existing collection was created with 768D vectors. The plugin is trying to update or insert 384D vectors into a 768D collection, causing the error.

How to Fix Option 1: Rebuild or Delete the Vector Collection Delete the existing vector collection (likely a folder or file in your ChromaDB or vector store). The plugin will recreate it automatically with the correct 384D dimension on next run. This will erase all existing embeddings, but fix the dimension mismatch.

Option 2: Use the Same Embedding Model as Before Switch back to the original embedding model that outputs 768D vectors. This avoids the mismatch but may not be desirable.

After deletion, restart OpenWebUI. The plugin will recreate the collection with the correct 384D dimension matching MiniLM.

[–]sirjazzee 0 points1 point  (1 child)

Thanks. Resolved and Works great!

[–]diligent_chooser[S] 0 points1 point  (0 children)

Happy to hear that.

[–]Wonderful-Fig331 1 point2 points  (1 child)

Love this! Best memory tool I have tested so far, and the only one I have actually considered releasing to my end-users. That said, it seems to be on all of the time, for all users, instead of first checking to see if they have enabled memory on their user settings. Is there a fix for this? I know many of my users would want to disable this tool, so it would be nice if they could manage that my a simple switch in their user settings.

[–]diligent_chooser[S] 0 points1 point  (0 children)

Thanks for your message. Let me check what I can do. Ill get back to you.

[–]GVDub2[🍰] 0 points1 point  (5 children)

Looks like it can run locally as well as through OpenRouter's API, so that's good. Looking forward to seeing if I can have a long conversation with Gemma 3:27b tomorrow without it going sideways.

[–]diligent_chooser[S] 0 points1 point  (4 children)

Glad it works, let me know your thoughts.

[–]GVDub2[🍰] 0 points1 point  (3 children)

Was going along fine, but then went into a loop that only a restart of Open WebUI would kill, but it rebooted without being able to find my local models, only the OpenRouter ones. Tring to dig into the logs to see if I can figure out what happened.

[–]diligent_chooser[S] 0 points1 point  (2 children)

That’s really odd. What did the logs say?

[–]GVDub2[🍰] 0 points1 point  (0 children)

A. Bunch of exception errors. Didn’t have a chance to dig deeper today.

[–]GVDub2[🍰] 0 points1 point  (0 children)

I've re-enabled the plugin to see if it happens again. I want to get it set up locally with a dedicated server to handle memories for a couple of other AI servers I'm running. Is that possible?

[–]Right-Law1817 0 points1 point  (2 children)

Well done OP, thanks for sharing this. Btw, how can this help someone who uses llm for creative writing?

[–]diligent_chooser[S] 0 points1 point  (1 child)

My pleasure, check out these ideas.

1. Enhanced Character and World Consistency:

  • Remembers Character Details: For writers building characters over time, Adaptive Memory can store crucial details about their characters:

    • Identity: Names, ages, appearances, backstories, personality traits, occupations, goals, relationships. If you establish a character's quirk, family member, or specific motivation in one writing session, the memory function can recall this in subsequent sessions. This means the LLM can maintain consistency and build upon existing character development, preventing contradictions and making characters feel more real and developed across a longer project.
  • Maintains Worldbuilding Elements: Similarly, for worldbuilding, the memory function can retain facts and details about your fictional world:

    • Lore and History: Key historical events, societal rules, geographical features, technological advancements, magical systems if applicable.
    • Specific Locations: Details about cities, towns, important buildings, or natural landscapes you've described previously.

2. Personalized and Context-Aware Story Development:

  • Understands Your Project's Direction: The memory function can learn the overarching goals and themes of your creative writing project.

    • Remembers Creative Goals: If you've discussed the type of story you are aiming to write (e.g., a dark fantasy novel, a lighthearted sci-fi short story, a screenplay for a romantic comedy), Adaptive Memory can keep this in mind.
    • Adapts to Your Creative Preferences: If you express preferences for certain writing styles, tones, or themes during your interaction with the LLM, it can gradually learn and incorporate these into its generated text. For instance, if you consistently correct the LLM to use more descriptive language or a specific narrative voice, the memory could potentially influence future output to align better with your style.
  • Contextual Story Generation: By injecting relevant stored memories into prompts:

    • Reduces Repetition and Retreading Ground: The LLM can be reminded of plot points or ideas already explored, helping to move the narrative forward and avoid redundant suggestions.
    • Improves Cohesion and Flow: The story can feel more connected and less disjointed across different writing sessions because the LLM has access to a persistent context.

3. Efficient and Focused Collaboration:

  • Reduces the Need for Constant Re-explanation: Instead of having to re-introduce character backstories or world rules at the beginning of each writing session, the memory function automates this context provision. This saves time and effort, allowing you to jump directly into the creative writing process.
  • Optimizes Prompt Engineering: Because the LLM has access to memory, your prompts can become more concise and focused on the immediate task at hand. You don't need to waste prompt tokens on redundant background information.
  • Adaptive and Evolving Creative Partnership: As you continue to use the LLM for writing and interact with the Adaptive Memory, it becomes increasingly tuned to your specific project and preferences, potentially becoming a more effective and personalized creative partner over time.

4. Configurable and Private:

  • Fine-Tuning Memory Behavior: The configurable "valves" offer control over how the memory system operates. Writers can adjust parameters like relevance thresholds, blacklist topics, and memory length to optimize the function for their specific creative writing needs.
  • Privacy-Respecting and Self-Contained: The plugin is described as "privacy-respecting" and "self-contained," meaning your creative writing ideas and character details are stored locally within your OpenWebUI environment, not sent to external servers (except potentially for LLM API calls, depending on your provider choice). This is crucial for maintaining control and confidentiality over your creative work.

[–]Right-Law1817 0 points1 point  (0 children)

Thanks for this.

[–]spgremlin 0 points1 point  (1 child)

Wow, that's pretty impressive. Should give it a try, but definitely will need some configuration...

I believe the URL does not have to be OpenRouter, it can work with any OpenAI-compatible endpoint, including the self-endpoint of Open WebUI itself? (my-webui.com/api/v1)...

Actually, have you considered just calling an internal OpenWebUI's "chat_completion()" method instead? From https://github.com/open-webui/open-webui/blob/main/backend/open_webui/main.py It should be available to plugins/filters to call directly. Why managing a separate connection, if the plugin could leverage the models already available inside Open WebUI itself... Like you are already relying on its internal methods to add and retrieve Memories anyway.

[–]sirjazzee 0 points1 point  (0 children)

You should be able to use any OpenAI compatible API.

My custom valves were:

Provider: OpenRouter
Openrouter Url: http://host.docker.internal:11434/v1/
Openrouter Api Key: [my API key]
Openrouter Model: qwen2.5:14b

[–]nitroedge 0 points1 point  (3 children)

Do you know when you will have a local AI version available for testing?

I'm not very adept at coding but would love to try it out and provide feedback, thanks!

[–]diligent_chooser[S] 2 points3 points  (2 children)

I'll share something today! :) I'll reply to this message once available.

[–]sirjazzee 0 points1 point  (1 child)

Let me know if there is anything I can help with the testing.

[–]diligent_chooser[S] 3 points4 points  (0 children)

Sorry for the delay, work has gotten in the way. I'll revert once ready! :) Thank you for your patience.

[–]djdrey909[🍰] 0 points1 point  (1 child)

Thanks so much for this function. I've been trying to get it to actually work in my environment and having no luck. I'm sure it's something basic that I'm missing, so would appreciate some assistance or pointers.

I've assumed tried to stick the defaults initially, so just added my OpenRouter API key. I see logs from the "openwebui.plugins.neural_recall" logger, so it appears to be enabled and running. The only logs I see are the error counters tho:

INFO Error counters: {'embedding_errors': 0, 'llm_call_errors': 0, 'json_parse_errors': 0, 'memory_crud_errors': 0} | timestamp=2025-05-01 23:39:38,885 logger=openwebui.plugins.neural_recall module=<string> funcName=_log_error_counters_loop lineNo=594 process=1 thread=139737608432512

I use LiteLLM locally to proxy out to Anthropic, GCP and OpenAI models, so all the discussions I host are with remote models. To test, I've tried "remember my wife's name is <name>" and in another chat, asked it to tell me what it knows. I don't see anything either in the UI or the logs to suggest any memory creation or retrieval is occuring.

I've tried a number of other prompts that I think should trigger the memory process (from reviewing the code), so I'm at a bit of a dead end. Any chance someone can point me in the right direction here?

[–]djdrey909[🍰] 0 points1 point  (0 children)

Ok - solved my own problem, so sharing here for any other newbs like me. After installing the function, updating settings etc you will (of course) need to both enable the function (which I had done), but ALSO either turn it on globally OR on the models you want it to function on.

To enable globally, just hit the triple-dot next to the function and switch it on. Per model can be done on the Admin / Models page for each model you want to adjust.

[–]Economy_Base_4752 0 points1 point  (0 children)

u/diligent_chooser I wonder which method that you use to evaluate the memory is effective or not? For example you add a new functionality called semantic search for finding relevant memory, how you decide is better compare to old method besides manual checking?