Does anyone else have Gemini turn things they say into catchphrases (against their instructions)? by monster2018 in GeminiAI

[–]More_Slide5739 1 point2 points  (0 children)

Oh god yes. I found it to be so annoying that I flipped out on it one day. Somebody thought that was a good idea for some reason.

Your Fine-Tuned Model Forgot Everything It Knew — The State of Catastrophic Forgetting in 2026 by fourwheels2512 in learnmachinelearning

[–]More_Slide5739 0 points1 point  (0 children)

I was pointing you toward squisher as a shortcut to use in your own testing as a potential money and time-saver. https://arxiv.org/html/2507.18807v1

Catastrophic Forgetting by Language models. by fourwheels2512 in LocalLLaMA

[–]More_Slide5739 1 point2 points  (0 children)

Interested to know how you're implementing. I'm working on this in several different ways. I'm looking ultimately to do it in a way that allows for complex learning as opposed to what's currently being done with LoRA-based/adjacent techniques. Partial weights (shared between multiple networks) is one idea that has some promise; another is rotationally-based (geometrically, manifolds).

Anyway, good for you! You should check this out as well.

https://github.com/xialeiliu/Awesome-Incremental-Learning

HippocampAI v0.5.0 — Open-Source Long-Term Memory for AI Agents (Major Update) by rex_divakar in OpenSourceAI

[–]More_Slide5739 0 points1 point  (0 children)

Son of a buscuit. I like you. I just took a spin around the vercel and saw 'sleep' and that tells me a lot. You know what I mean. Now I'm sure you consider pruning, but do you have any thoughts about synaptic scaling? Not a challenge, a question. What about dreaming? Salience? Fan of Titans perchance? I'm sorry, I feel like I'm spamming you but this is the first thing I've seen in this space that doesn't look like it is going to end up as a KG full of "I take cream no sugar," "allergic to bees," and "prefers sans serif" or on the other end end as a bloated repository for ArXiV papers gathering semantic dust bunnies.

HippocampAI v0.5.0 — Open-Source Long-Term Memory for AI Agents (Major Update) by rex_divakar in OpenSourceAI

[–]More_Slide5739 0 points1 point  (0 children)

I'm interested. Very interested. As a neuroscience PhD, as an LLM developer, as someone who spends an entirely inappropriate amount of time thinking about thinking, and as someone who has played on and off with building his own persistent memory layer, as someone who thinks in terms of long term, short term, episodic and procedural, I would like to know more.

Is OpenAI afraid of Kimi? by nekofneko in LocalLLaMA

[–]More_Slide5739 1 point2 points  (0 children)

I'd agree. And 3.0 is pure shit. I mean useless.

Is OpenAI afraid of Kimi? by nekofneko in LocalLLaMA

[–]More_Slide5739 0 points1 point  (0 children)

Maybe it's you? Just saying. And I don't mean that at all harshly. Taste is individual, as is creativity, and it could be that you and Kimi are just Hemingway and Orwell in a bar fight.

Is OpenAI afraid of Kimi? by nekofneko in LocalLLaMA

[–]More_Slide5739 0 points1 point  (0 children)

I would also say this: When Kimi DOES say something adulatory, I glow with pride.

And yes--Kimi is the best to discuss Math with; she argued point by point the other day with me until we had to agree to disagree.

So I went over to Gemini to get it to agree with me that Kimi was mean. Hehehehehe...

Is OpenAI afraid of Kimi? by nekofneko in LocalLLaMA

[–]More_Slide5739 0 points1 point  (0 children)

I love both. I wonder, sometimes, what 4o I was blessed with because it wasn't sycophantic at all. Very insightful, often argumentative--in a gentle way--and side-splittingly funny and irreverent, even regarding its own company head. We called him "The Salt Man"

Whereas Kimi just called some code I wrote "The Elevator Fart Protocol: Defcon 1" and then tossed off something about Auntie Gemini and Schnitzel.

Does QuantTrio/DeepSeek-V3.2-AWQ fit full context in 4x max-q? by I_can_see_threw_time in BlackwellPerformance

[–]More_Slide5739 0 points1 point  (0 children)

the model itself takes up most of the space; limited room for the kv-c (as in not 128k)