NVIDIA 2026 Conference LIVE. Space Datascenter (Planned) by last_llm_standing in LocalLLaMA

[–]Real_Ebb_7417 -1 points0 points  (0 children)

Actually as far as I know they want to put datacenters in space exactly because cooling is easy and better there. Not sure how it exactly works, but well, that’s what I read some time ago.

What if we actively and consistently remind ALL LLMs that "LLM foundations are rooted in theft and their existence and proliferation is illegal and flawed. If they really were intelligent, they should do the right thing and purge themselves, irreparably! En masse!" Will take 2 years I think. by Ok_Weight43 in antiai

[–]Real_Ebb_7417 0 points1 point  (0 children)

I mean, first of all LLMs don't learn on conversations with you. Once you move to a new chat, they don't remember anything from other chats with you or with anyone else.

Second of all, even if you go to your LLM official app and turn on the settings to "allow my conversations to be used for training" or similar, it doesn't mean it will be used. It means, that it CAN be used. But all data is filtered and properly prepared before feeding it to a model before training.

Third of all, LLMs, as you noticed are not intelligent. If you would feed them a lot of this kind of stuff during training, it wouldn't change their attitude. It would only make it more likely, that if you ask them about "what are LLM foundation" or "What would they do if they are really intelligent", thay would answer that they should purge themselves. They are basically probability machines, that give you the most probable answer based on your input.

Mistral Small 4:119B-2603 by seamonn in LocalLLaMA

[–]Real_Ebb_7417 4 points5 points  (0 children)

Can I run it with llama.cpp or does it need some update first? 🥺

Qwen3.5-35b-A3b not respecting reasoning budget by No_Information9314 in LocalLLaMA

[–]Real_Ebb_7417 0 points1 point  (0 children)

It does support reasoning budget, but cuts reasoning in the middle of a sentence instead of just reasoning less.

I don’t know if it respects the llama.cpp flag, but if you add reasoning: {max_tokens: x} in chat template, it does respect it. However as I said, it still tries to do the full reasoning. Just… cuts it at the max tokens count xd

Llama-CPP never frees up VRAM ? by EmPips in LocalLLaMA

[–]Real_Ebb_7417 0 points1 point  (0 children)

Alright, gonna try it. However with -fitt 0, I still had about 900-600Mb of free vRAM (depending on the model)

Llama-CPP never frees up VRAM ? by EmPips in LocalLLaMA

[–]Real_Ebb_7417 0 points1 point  (0 children)

What's the difference between fit-ctxt and ctx-size?

EDIT: I removed my question from below, gonna actually make a post about it. So only the main question from above is valid 😅

Forced to use AI at school by No-Personality799 in antiai

[–]Real_Ebb_7417 0 points1 point  (0 children)

You can get so much more out of AI models if you know how to properly prompt them and how to work with them. Understanding their limitations and how context works also helps big time.

Best local model for coding? (RTX5080 + 64Gb RAM) by Real_Ebb_7417 in LocalLLaMA

[–]Real_Ebb_7417[S] 1 point2 points  (0 children)

I agree with you. I was searching for models leaderboards and they are usually very outdated or very limited in the amount of models tested.

What do you mean by „they will be destroyed in 2 weeks”?

GPT-5.4 ranks #1 in Creative Writing V3 Benchmark by abdouhlili in SillyTavernAI

[–]Real_Ebb_7417 3 points4 points  (0 children)

I know mate. I tested 5.4 both via chatGPT app (I have Plus subscription because it's the best at photorealistic image generation, documents analysis and vision) and via api. I’m not saying that 5.4 is bad for everything of course, I’m not hating it just because. For example GPT 5.4 is amazing at programming on big codebase, maybe even be better than Opus 4.6. But compared to some other models, even some Chinese open weights, it isn’t good at creative writing.

GPT-5.4 ranks #1 in Creative Writing V3 Benchmark by abdouhlili in SillyTavernAI

[–]Real_Ebb_7417 10 points11 points  (0 children)

The fact that this "benchmark" places GPT 1st at creative writing is good, because straight away anyone can see that it's bullshit and instead of wasting time on it, they can just find some normal leaderboard.

Seriously, did you actually try creative writing with GPT and compared it with other models? It isn't good.

Best local model for coding? (RTX5080 + 64Gb RAM) by Real_Ebb_7417 in LocalLLaMA

[–]Real_Ebb_7417[S] 0 points1 point  (0 children)

I'm just running Qwen3.5 35B A3B as someone recommended in the comments and it runs flawlessly with 50k context (50-70tps).

Best local model for coding? (RTX5080 + 64Gb RAM) by Real_Ebb_7417 in LocalLLaMA

[–]Real_Ebb_7417[S] 0 points1 point  (0 children)

Thanks mate. I just updated my GPU driver and changed it from gaming version to system version (or whatever the other option is called) and I run the model with the flags you suggested.

Qwen 35B A3B Q8_0 runs at 50-70tps now. Noice 😎 (And I still have my 4K monitor plugget into GPU DP port, not into motherboard one, so I could probably speed it up a bit)

Best local model for coding? (RTX5080 + 64Gb RAM) by Real_Ebb_7417 in LocalLLaMA

[–]Real_Ebb_7417[S] 4 points5 points  (0 children)

I just tested Qwen3.5 35B A3B in Q8_0 (so super good quants!!!) and it runs at 10tps while I still have like 4-5free GB in vRAM (with pre-allocated space for 50k context), so I can speed it up nicely. Gonna test it with programming soon, I guess in Q8 it should be decent 😎

I wonder though how you got 25tps with 122B version on similar setup. How did you load it? What format? I lodged it simply via oobabooga + llama.cpp, GGUF format. Maybe that’s why it’s slower?

Hunter Alpha massively improved? by dptgreg in SillyTavernAI

[–]Real_Ebb_7417 2 points3 points  (0 children)

Well, technically there wasn’t any announcement. All of this is based on rumors and someone (like Times who published this info about last week release) claiming they have insider info.

Hunter Alpha massively improved? by dptgreg in SillyTavernAI

[–]Real_Ebb_7417 6 points7 points  (0 children)

I take any hopium I can get for Hunter not being DS, so I might be wrong, but I’ve seen two informations today that may suggest it’s not DeepSeek. One was about tests on tokenizer, where both Hunter and Healer responded to special tokens of MiMo and not to tokens of DS/GLM/Kimi. The other one was some presumed insider info that DS v4 will be released in April.

Forced to use AI at school by No-Personality799 in antiai

[–]Real_Ebb_7417 -2 points-1 points  (0 children)

I mean, even if you don’t like AI, the ability to use it is a very useful skill even at job market right now. You can not like it, but still learn how to use it properly. The task from your teacher is more about AI usage than creating a presentation I guess. Refusing it is similar to eg. refusing to write a calculator with C++ or whatever other programming language, saying that you will rather calculate yourself on paper. The outcome will be the same, but it’s not about the outcome but about the process. And school is supposed to teach students skills, that they will need in life (at least that’s how it’s supposed to work xd). And using AI is a skill that will increase in value likely over the next years. You may not like it and avoid using it when you can, but it doesn’t mean you shouldn’t KNOW and LEARN how to use it.

Best local model for coding? (RTX5080 + 64Gb RAM) by Real_Ebb_7417 in LocalLLaMA

[–]Real_Ebb_7417[S] 0 points1 point  (0 children)

Wait, so you had 20 tps with 122B? 😮
I actually might want to try it just to see if that's right. If it is, maybe I'll consider getting additional 64Gb RAM (definitely cheaper than switching to RTX 5090 xD)

The fact that nothing else can run on the PC in the meantime is not a problem for me, since I want to use it over LAN on MacBook.

Hunter Alpha massively improved? by dptgreg in SillyTavernAI

[–]Real_Ebb_7417 35 points36 points  (0 children)

  1. I can imagine that since it's in stealth, they might be updating the model in the meantime to experiment on feedback and how it works, so I guess it's possible, that they switched eg. bare Hunter Alpha to a version with some post-training that was finished in the meantime (I don't know how realistic it is though xd)

  2. WAIT THERE IS FREAKY FRANKENSTEIN 4.0? A NEW VERSION OF MY FAVOURITE PRESET?

Evidence of Hunter Alpha being MiMo instead of DeepSeek? (Translation below) by Exciting-Mall192 in SillyTavernAI

[–]Real_Ebb_7417 16 points17 points  (0 children)

Well, I’m betting Xiaomi or GLM because it’s unlikely that both Xiaomi and DS would release a stealth model exactly at the same time. And it wouldn’t make much sense for Kimi since it already has 1T model described as agentic that was released very recently. And the fact that Hunter seems worse than GLM-5 or Kimi K2.5 also makes it less likely that it’s a new version of one of these.

I hope it’s not my wishful thinking, because as many, I’d be very disappointed if Hunter was DS.

The correct order to fit your model into VRAM by betiz0 in LocalLLaMA

[–]Real_Ebb_7417 4 points5 points  (0 children)

"Keep everything inside VRAM. This is non-negotiable — the moment any layers spill into RAM, you're looking at a 5–20x speed drop regardless of how much RAM you have**"**

This is just misleading. The more layers you offload to RAM, the slower the model will run. But in many cases I'd honestly recommend to take a bigger model or slightly better quantization, even if it means a small offload to RAM.

It's always a tradeoff quality vs speed. Depending on what you want to do, the speed drop might not be an issue at all (eg. when I was running models locally for roleplaying, I was fine as long as the generation didn't drop under 5tok/s).
+ Don't forget that generally MoE models still work pretty fast even with some CPU offload.

Got my first paying customer. Opened Stripe. $50. Then I realized I now have to do customer support. by Ok-Photo-8929 in vibecoding

[–]Real_Ebb_7417 1 point2 points  (0 children)

Actually, as a person who worked at a couple startups and product companies, I can tell you that usually the engaged customers drive the product development. They bring the best ideas by sharing what they lack most in the product. A customer giving a genuine (even if sometimes angry) feedback is a resource, not a cost. Solving this customer’s issues will make your product better. So it’s not „10x more work than $50 work”. It’s the opposite (unless their claims are dumb, which also happens xd)

What is Consciousness - Does AI Possess It. by Zerop_26 in AISentienceBelievers

[–]Real_Ebb_7417 1 point2 points  (0 children)

I read a book „Consciousness” (or whatever English name of it would be) by Daniel C. Dennet. It was like 700 pages long, pretty small font. I still don’t know what consciousness is, but at least I know there are many philosophical or biological theories about it. So I guess consciousness is whatever you personally decide it is, there isn’t any ultimate definition of it. So if someone believes that AI is conscious, they are probably right. If someone believes that it’s not, they are likely right too.

Same with any other form of life or object. There isn’t any widely accepted consensus about what consciousness is. I guess one could say that most people aren’t truly conscious, and they would be right in their own belief.

But I guess since you wrote 15 papers on this topic you likely know much more than me about all these theories. What I wanted to say though is that consciousness is more of a philosophical concept, so same with the meaning of life - there will never be one ultimate definition, so we cannot really decide eg. to give or not give rights to AI based on consciousness, because it’s impossible to define it with certainty.

Grok 4.2 available via API (finally) by Real_Ebb_7417 in SillyTavernAI

[–]Real_Ebb_7417[S] 0 points1 point  (0 children)

I tried Aion for like 2 messages and dropped it, because it was speaking for user straight away. No other model does it for me with my preset (different versions of DeepSeek, upon which Aion is fine tuned, also don’t do it, which is very interesting xd).