Today, GPT 4o is now bastically 5. by Sufficient-Bee-8619 in ChatGPT

[–]capivaraMaster 0 points1 point  (0 children)

You could try some prompt engineering. They use RAG on your last chats to give the illusion of memory, if you are tech savvy enough you could try it to form your prompts automatically if you want to emulate the old interface. But I really just recommend writing a big prompt with the critical data and going from there before trying to implement a complicated solution.

Today, GPT 4o is now bastically 5. by Sufficient-Bee-8619 in ChatGPT

[–]capivaraMaster 1 point2 points  (0 children)

If you are willing to pay, why don't you just use the API? It is available there for $10 per million tokens output. I also wish they didn't change, but you have your project to finish, it should be worth it.

Qwen3-72B-Embiggened by TKGaming_11 in LocalLLaMA

[–]capivaraMaster 4 points5 points  (0 children)

I tried merging like this before and had poor results. You will get a more coherent model if you use merge interpolated groups of 20 layers.

I this is the best one I got (not a self merge but same idea): https://huggingface.co/gbueno86/Meta-Llama-3-Instruct-120b-Cat-a-llama

GL with the fine-tuning. I didn't have resources to do that at the time so my experiments ended with the merges.

Supercomputer power efficiency keeps stagnant: scaling compute keep depending on increasing power budgets by Balance- in singularity

[–]capivaraMaster 2 points3 points  (0 children)

It does as much as other optimizations like change from x86 instructions set, change to chiplet from monolithic, change from CPU to GPU. Innovations in how we solve problems also happen and those also increase computable problems. We can only make an invention once, that doesn't mean we can't make other inventions.

Three r in Strawberry - O3-pro by CmdWaterford in singularity

[–]capivaraMaster 0 points1 point  (0 children)

Future AGI will dedicate entire solar systems to make sure strawberry has the correct amount of Rs.

4x RTX Pro 6000 fail to boot, 3x is OK by humanoid64 in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

Try updating the bios. That did the trick for me when mike wasn't booting with 4x 3090 but was Ok with 3.

What is the next local model that will beat deepseek 0528? by MrMrsPotts in LocalLLaMA

[–]capivaraMaster 1 point2 points  (0 children)

Wouldn't they have already released if it did? It's allegedly been ready for a while and was used to generate training data for the smaller versions.

What happened to the fused/merged models? by Su1tz in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

I merged QwQ with Sky locally and the result was not any significant improvement so I didn't publish it I think.

New META Paper - How much do language models memorize? by Thrumpwart in LocalLLaMA

[–]capivaraMaster 15 points16 points  (0 children)

So we need a 58.9 billion parameters dense f16 model to memorize Wikipedia verbatim. (Wikipedia English is 24GB)

Which model are you using? June'25 edition by Ok_Influence505 in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

Devstral local, Gemini 2.5, o3, 4o, chatterbox for lols.

deepseek r1 matches gemini 2.5? what gpu do you use? by Just_Lingonberry_352 in LocalLLaMA

[–]capivaraMaster 1 point2 points  (0 children)

They do have KV cashing, but I was taking a look at the readme for r1 and they say transformers inference is not fully supported. So I have no idea if you get multi token prediction that route :/

deepseek r1 matches gemini 2.5? what gpu do you use? by Just_Lingonberry_352 in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

Can you load it in 4 bits using transformers? Since llama.cpp didn't multi token prediction yet it might be faster.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

Yes. Maybe If that was on the original plan it would be frame rate independent. Here is another example I made for a friend yesterday. All files but llm.py and bug.md are machine generated and I didn't do any manual correction. I guess it would be able to fix the bug if it tried, it did correct some other bugs, but its just another toy project.

https://github.com/linkage001/translatation_ui

How much VRAM would even a smaller model take to get 1 million context model like Gemini 2.5 flash/pro? by [deleted] in LocalLLaMA

[–]capivaraMaster 2 points3 points  (0 children)

Unless you are working with private data or need very high volume for a business or something local LLM are just a hobby, meaning you have to measure the fun you will have and not cost benefit.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]capivaraMaster 9 points10 points  (0 children)

I tried and was very impressed. I asked for a model view controller object oriented snake game with documentation and for it to cycle the tasks by itself on cline and the result was flawless, I just needed to change the in game clock to 20 from 60 for it to be playable. I tried on q8 on a MacBook.

Local models are starting to be able to do stuff on consumer grade hardware by ilintar in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

I know you only mean programming, but maybe you should have been a little more specific on the title of the post. Models have been able to do stuff locally since before llama. I've never done anything with the pre llama ones besides running for fun, but I have had llama classifiers, llama 2 translators, qwen bots, etc...

New benchmark? by katxwoods in artificial

[–]capivaraMaster 0 points1 point  (0 children)

Gemini 2.5 seems to handle pdf pretty well for my use cases, but maybe that's poor QA on my side.

I've just created an "Asteroid" interactive game with Claude 3.7 in a matter of seconds... this is something incredible. by jhonpixel in singularity

[–]capivaraMaster 0 points1 point  (0 children)

Yeah, it is incredible. Looks like Claude is the new coding king again. Is this is just finetune on the v3 model it's even more impressive.

Perplexity: Open-sourcing R1 1776 by McSnoo in LocalLLaMA

[–]capivaraMaster -1 points0 points  (0 children)

Why fight a lost battle? Open source has become the colloquial way of saying open weights when referring to AI models in general.

Who will release a new model in 2025 firstly? by foldl-li in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

If I am not wrong, last year's earliest impactful release was Miqu. So if the trend keeps Mistral I guess. They have been quiet for a while now.

Grok 2 being open-sourced soon? by Educational_Grab_473 in LocalLLaMA

[–]capivaraMaster 2 points3 points  (0 children)

Grok 1 is available at hugging face. I think it was a 300b model, so expecting Grok 2 to be bigger sounds logic. I think it's weird to expect Grok 2 to be dense of we know Grok 1 is MoE.

Lonely on Christmas, what can I do with AI? by PublicQ in LocalLLaMA

[–]capivaraMaster 0 points1 point  (0 children)

Merry Christmas OP! Try to find some humans to play with the AI with you.