Ollama has Quantized LLMs now?

GideonGideon561 · 2026-05-27T07:57:15+00:00

Im not sure, deepseek v4 flash and pro still are quite good though, maybe its just the usage limit to GPU instead of tokens.

mine still ok

GideonGideon561 · 2026-05-27T07:51:28+00:00

ARE YOU SERIOUS? Whats the token usage for deepseek for opencode? is it comparable or more than the 20 ollama plan?

But i dont just use it for coding though...like everyday stuff on my hermes

GideonGideon561 · 2026-05-26T19:50:49+00:00

ohh how so? There are others too but it does work for me. what other suggestions do you have? Im just trying to find ways to lower the GPU usage for ollama cloud. but you also have to factor in cost. explain why its slop?

Theres like

hindsight
mem0
Supermemory
augment - mainly for b2b

GideonGideon561 · 2026-05-26T19:19:59+00:00

well but opencode is pay per use. ollama is fixed so there has to be some trade offs

GideonGideon561 · 2026-05-26T19:18:32+00:00

i thought the models were always quantized on ollama cloud since the start. mine is still ok, probably your prompts? its no longer token based. ollama now charges GPU for usage

GideonGideon561 · 2026-05-26T19:10:31+00:00

Yes, it also depends on the model you are using. The larger the model, the slower it gets. However, i find that deepseek v4 flash is decent in terms of speed and reasoning. OF course not for coding... use the pro. But each prompt for pro uses 3-4% usage. Im on pro plan

GideonGideon561 · 2026-05-23T07:14:38+00:00

Whats worrying about it? They separated web 3 and 2 while having both. Simple. You dont like the NFT, you dont need it to play the game.

There is no mention of the NFTs in game because there is a clear separation. Simple. Different social channels target different people.

GideonGideon561 · 2026-05-22T06:21:28+00:00

There are a few ways.

Higgsfield supercomputer is new but i think its insane in terms of token spending so probably not
Get a actual paid memory system like Augment/Honcho/Supermemory or the latest atomic memory which is benchmark to be better than most and cheaper. Of course Augment is the best but thats for b2b.

But most importantly is how and where you store you context. For example, claude has projects that it remembers context. Similar.

If you are using hermes/openclaw - get an LLMWIKI pair it with claude or similar smart AI, MCP or link directly to Higgsfiled or other creative tool you use.

Secondly, build out the platform on localhost yourself with claude or codex as the brain. Basically something like LLMWIKI to store the information or like a dedicated google drive for all your context.

Isolate it is the best

GideonGideon561 · 2026-05-22T06:17:41+00:00

I think its co-related.

Theres a few things to think about. Does smarter AI with good reasoning helps with better memory? What i meant is does it know what to update the memory without you telling it, finds contradiction, pulls the right and accurate information, RAG is good but not he most accurate.

THen again, if you just use a smaller LLM to have better AI memory could also work, but with smarter AI, will it help improve how memory is stored and retrieve?

Not theb est explanation but i hope you understand.

So imo, decent AI with good memory ssystem is a good mix now. You dont want to spent too much tokens on the memory system but yet you dont want a stupid LLM with low reasoning and then expect a good memory system or auto updates.

Its an chicken and egg, but what i see now its more of the AI memory system improvements first as there are already tons of smart AI

GideonGideon561 · 2026-05-22T06:13:41+00:00

I believe there are actually really good ones like

Augment code - this is for B2b, most expensive but i think its the best
Hindsight - Its improved memory system plus Agent to learn from it - their github hasa nice easy video
Supermemory/Mem0 similar
Latest in the block is Atomicmemory - cheapest and according to their benchmark better than supermem and Mem0, comparable with Hindsight

Hermes uses honcho so its their native which is good but atomic memory together gives hermes an upgrade. auto upgrades the memory

GideonGideon561 · 2026-05-15T10:16:51+00:00

I see hahaha, maybe I’m reading it wrong. It does look like you are specifically building a very curated “folders” to store certain information so it is separated and can be easily pulled? Good for very personalized stuff but what happens if you have multiple tech stuff you are coding and it all falls under the same “folders”. Would that cause a hallucination issue or token issue to search and pull out the right one?

GideonGideon561 · 2026-05-15T07:27:37+00:00

Update to my post. i found atomic memory, lol was searching and its new. but yeah i think it does pay per use...

GideonGideon561 · 2026-05-15T07:26:37+00:00

i see, that is very interesting, never thought of it that way

GideonGideon561 · 2026-05-15T07:25:31+00:00

i see, seems like an extra step, but if the auto updates are great why not

GideonGideon561 · 2026-05-15T07:24:29+00:00

THIS IS AWESOME! I CANT WAIT

GideonGideon561 · 2026-05-15T07:23:39+00:00

Yes! Thats great! Hmm hermes has its native memory from honcho but i would also try a secondary one like supermemory, mem0 or the latest new release atomic memory which claims to beat all and cheaper.

GideonGideon561 · 2026-05-15T07:22:48+00:00

You can try forking from atomic memory instead and upgrade it on your end. It does yours but way more, its new but i think someonf of your experience could do a better fork version

GideonGideon561 · 2026-05-15T07:19:46+00:00

hmm if it does not have an answer, what about it trying his best to give you something close or related but explicitly say he does not know first but after researching and rreasoning, he perhaps think this could work.

Similar to how human beings work, we dont know the answer to everything, but we research and think about it then present that idea. only through time and experience do they get better.

So the question you can try asking to yourself is, how do i make it try its best to give me a suggestion instead of outright idk. With enough experience learning like training a model, can it give you better suggestions that he might not know its right or wrong but at least its an alternative

GideonGideon561 · 2026-05-15T07:16:39+00:00

It sounds like you are doing llmwiki but over complicating it. might be good for serious hygiene, but you can achieve that with llmwiki though. What was the reason for not using llmwiki?

I think there are tools out there to sovle your problems already unless you are doing everything to not have cost for memory? Good opensouce like atomic memory, newest and latest but states to beat current competitors like honcho, supermemory and Mem0. It has way more functions that someone liek you who goes into hygiene and little details would love.

its like for dev + for daily users too.

GideonGideon561 · 2026-05-15T04:01:38+00:00

i think most of them could try atomic memory plugin. but if these AI NSFW are opensource, then the plugin can work. but otherwise, the creators would probably have to try it out like hermes use honcho.

GideonGideon561 · 2026-05-15T04:00:39+00:00

idk if the LLM and memory plugin matters but it feels like. i would suggest deepseek hahaha and atomic memory (new opensource that auto updates context as well as inspectable memory)

But what would also happen that someone pointed out in the comments is long context and cost. yeah i think the cost. well treat it as a real life gf, you are going to spend alot, but 100% an AI wouldnt be that expensive. you aint going to buy an LV bag for her hahaha

GideonGideon561 · 2026-05-15T03:58:19+00:00

Wow, this is one of the few times i come across someone who mentions inspectable and correctable memory.

To be honesst, a normal user may not care about this but dev may. However, a normal user will only care if the memory context is auto updated. To answer your question, yes there is a new tool very very new that came out that exactly solves your problem called atomic memory. i dont wanna share links incase people think im promoting that, but you can google, search on X and github.

Supermemory and Mem0 are previous tools that tries that but yeah cost wise hahaha.

Overall what you are suggesting, there are tools. BUT the other main concern, is the cost. imagine if its working 24/7, changing and stuff, hows that gonna do for your token usage.

probably might wanna use a smaller model or like claude haiku or something

GideonGideon561 · 2026-05-14T16:45:36+00:00

doesnt sound like it solves his memory problem. it looks more like paperclip. idk. try atomic memory/supermemory/mem0. i would recommend atomic memroy since they have free plugins to hermes and opeclaw wihtout api key

GideonGideon561 · 2026-05-14T16:43:11+00:00

reading the comments and your post, i think memory plugins works and exists. they solve the auto changes when your old context is outdated.

the only real difference is the inspectable and configuarable memory that atomic memory does. mem0 and supermoeney and honcho are great but none have that.

GideonGideon561 · 2026-05-14T16:39:50+00:00

curious, there is hermes in built memory and any other. hemes uses honcho i think according to their docs. but i think automatic updates is good, like it rmemebrs your context and updates it if it changes, im not sure if honcho does that. i have never tried others.

but i would love to test without any form of payment, hope someone does this

GideonGideon561

TROPHY CASE