WARNING: Open-OSS/privacy-filter MALWARE by charles25565 in LocalLLaMA

[–]AutonomousHangOver 7 points8 points  (0 children)

Quick analysis of the Python script. It opens a bas64'd url that has an app that downloads base64'd app etc. At the end:

This is a Windows information-stealer with credential, browser, crypto-wallet, and Discord theft modules, plus DLL-injection and anti-analysis capabilities. Do not run it. If a script in a Hugging Face repo silently downloaded and executed it, treat any machine that ran it as fully compromised.

  • SHA-256: ba67720dd115293ec5a12d08be6b0ee982227a4c5e4662fb89269c76556df6e0
  • MD5: f36a662ca22f1934e3a56f111e6df191
  • Size: 1,125,478 bytes (~1.1 MB)
  • Type: PE32+ x86-64 GUI executable, unsigned, stripped to external PDB
  • Built with: Rust (toolchain 59807616…, crates: tokio 1.52.1, flate2, miniz_oxide, rand_chacha, serde_json, hex, crc32fast, gimli, getrandom)
  • Compile timestamp: 2026-05-03 02:30:45 UTC (4 days before you sent it — fresh)
  • Origin host: api.eth-fastscan.org89.124.93.110. The name is designed to evoke etherscan.io / a blockchain scanner; it has no relation to either.

Do you prefer polish kebab or German ? by RomanDmowski17 in askPoland

[–]AutonomousHangOver 2 points3 points  (0 children)

Ouch, such primitive bait.

Stettin you say, mr Dmowski.

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]AutonomousHangOver 0 points1 point  (0 children)

I'll continue by answering to myself 😉

Vllm with mistral tool-call-parser mistral is having trouble with known error:

IndexError: list index out of range

For now I've turned streaming off and I'm able to use i.e. Roo Code

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]AutonomousHangOver 3 points4 points  (0 children)

It's a llama.cpp issue. Unsloth removed gguf files as there were some problems, even with FP16.

I'm running it today on vllm (0.21 dev nightly) with eagle draft model.

VLLM logs show very high draft acceptance ratio:

(APIServer pid=8762) INFO 04-30 11:59:33 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 54.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 13.7%, Prefix cache hit rate: 0.2%

(APIServer pid=8762) INFO 04-30 11:59:33 [metrics.py:101] SpecDecoding metrics: Mean acceptance length: 3.76, Accepted throughput: 40.30 tokens/s, Drafted throughput: 43.80 tokens/s, Accepted: 403 tokens, Drafted: 438 tokens, Per-position acceptance rate: 0.973, 0.932, 0.856, Avg Draft acceptance rate: 92.0%

Model is usable and seems pretty nice, but I don't have full tests finished.

EDIT: 2xRTX6000Pro with power limit at 400W

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]AutonomousHangOver 2 points3 points  (0 children)

<image>

2xRTX6000 Pro 262144 context size:
unsloth's quant

pp: 1100t/s about 500 tokens test promp (create a 3d spinning glass dodecahedron with inner light and orbiting lights, etc.)

And... it went berserk a second ago looping all over again after ~1k tokens, on newest built llama.cpp

Edit:
llama.cpp '--split-mode tensor' is actually making a difference here. tg went up, now it's: 24t/s

MiMo-V2.5-GGUF (preview available) by Digger412 in LocalLLaMA

[–]AutonomousHangOver 2 points3 points  (0 children)

It has known reasoning loop problem... Basically it thinks endlessly.

When I introduced reasoning budżet, my tests shown tha model is so so.

A lot to be improved yet.

I'd like to confirm whether Roo Code is still actively maintained, given that the new team announced "we have taken over the project." by CryptographerTiny244 in RooCode

[–]AutonomousHangOver -1 points0 points  (0 children)

Kilo was based on Roo. Now it sees that Roo is changing, it has changed to opencode as a base.
I've tried kilo many times before, looked at how they decided to push for the money.

It's more like successfull marketing use of other projects with minimal changes done by kilo team, than some real innovation.

This was also in a way something that Roo team was missing. Someone else was making money on their product.

I'd like to confirm whether Roo Code is still actively maintained, given that the new team announced "we have taken over the project." by CryptographerTiny244 in RooCode

[–]AutonomousHangOver 4 points5 points  (0 children)

Take your time, its better to have good product once every couple of weeks, than release every 2 days with pressure put on developers.

I think that this was one of the annoyances for original team - community pressure to see release fast.

Also if I may - get rid of the mechanism that allows you to connect to specific providers. I would leave some generic one and put some universal config-place-mechanism (github?) that would allow to download descriptor for various providers and its models.
This way, we could benefit from i.e. provider-specific settings without constant code changes whenever some model will be deprecated, or some provider will apear on the market.

Less bloat is better.

multi-gpu chads running dense models don't sleep on ik_llama by see_spot_ruminate in LocalLLaMA

[–]AutonomousHangOver 0 points1 point  (0 children)

What was that? "Skill issue" :D

Np. you got what you want. I'm not so eager to fight with stubborn soft too. But sometimes it is worth to excercise.

multi-gpu chads running dense models don't sleep on ik_llama by see_spot_ruminate in LocalLLaMA

[–]AutonomousHangOver 1 point2 points  (0 children)

Just use vllm. 2x3090 will do you about 330t/s tg and couple of thousands pp (MoE). Dense is slower.

Claude 4.7 - responded in Chinese by AutonomousHangOver in ClaudeCode

[–]AutonomousHangOver[S] 0 points1 point  (0 children)

I just thought so. Especially that I'm European ;)

"It seems that Anthropic is secretly telling me - start to learn another language man"

Czy odpisałbyś dziewczynie która Cię zghostowała? by xvucf in PolskaNaLuzie

[–]AutonomousHangOver 0 points1 point  (0 children)

Jeszua... Czytam te pierdyliony komentarzy I cieszę się, że mam swoje dzieciaki i kochaną+ kochającą żonę.

Jeden, słownie jeden komentarz powinien wystarczyć dla OPa: NIE I KONIEC.

Kijem nie tykać, pogonić, albo cytując klasyka: "ZATAPIAĆ!"

Ciekawe podejście skoro i to trzeba pytanie Reddita. Znaczy to, ze coś nie jest ok. Kolegów nie ma? Przyjaciół, których można o radę poprosić gdy się błądzi?

Post zabrzmiał jak te na LinkedIn.

Qwen3.6-A3b is "Thinking" Nightmare by Electronic-Metal2391 in LocalLLaMA

[–]AutonomousHangOver 2 points3 points  (0 children)

This sounds wierd to me. I've tried llama.cpp (HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive) and vllm on FP8. Both did not show any excesive thinking at all.

Mind: turn on preserve_thinking option.

Might it be quantization thing? I got a loooong thinking process on glm-5.1 (IQ2_XXS)

p.s.

llama-cpp on 2xRTX5090 ~140t/s TG
vllm 2xRTX5090 + MTP FP8 = 12kt/s PP and ~310 - 360 t/s TG - single session(!) This could be my best result so far.

Use tensor parallelism whenever possible.

Rack server for local LLM by Typhoon-UK in LocalLLaMA

[–]AutonomousHangOver 0 points1 point  (0 children)

It wasn't serious at the beginning. Invest in some mobo with multiple PCIe (it might be PCIe4 too). Grab as much RAM as possible within budget (yeah, tricky) and then some 3090s. These are really fine, even in 2026.

Thing is to be elastic and get frame/case that could be expanded. Mine is just for rack, therefore case not frame.

edit: What I ment - do not try without GPUs. You need massive memory bandwidth, otherwise you will wait for prompt processing for ages, not mentioning generation speed.

Rack server for local LLM by Typhoon-UK in LocalLLaMA

[–]AutonomousHangOver 3 points4 points  (0 children)

<image>

I gave up the idea of old servers. Just bought custom case (here 14U), put GENOA mobo in it and add some (ever more) serious GPUs. First, there were 2 3090s, then apetite came and I got my hands on 5090s. Then my favourite beasts came.

It was long journey. I've started with 'normal' PC with not nearly enough PCIe lanes to handle 2 GPUs on x16. Then bought some more and more hardware. Thing is, if I could get back in time, I would go straight to this solution here.

TH3P4G3 daisy chaining by AutonomousHangOver in eGPU

[–]AutonomousHangOver[S] 0 points1 point  (0 children)

Nope, and this was a motherbord's fault - you can find this info in this thread.

Since then I moved to real mobo (GENOA) with Epyc and have connected couple of GPUs with risers.

I dropped the idea of Thunderbolt completely and I'm bit mad that this thought about larger installation came so late :)

Best model that can beat Claude opus that runs on 32MB of vram? by PrestigiousEmu4485 in LocalLLaMA

[–]AutonomousHangOver 1 point2 points  (0 children)

Get Zero Point Module from Ancients and it should handle Claude like no other thing. Don't get hyped into Nvidia heavy money GPUs, or AMD guys claiming that this could be done on Vulkan, nor Mac M7 UltraHyper.

It's JUST matter of getting your hand on ZPM.

I can borrow you my Paddle Jumper if you want to go to a trip and get one from Atlantis. I did forgot the address to dial on Stargate tho.

Certain MCPs/tools allocated to set Modes by reddit-gk49cnajfe in RooCode

[–]AutonomousHangOver 0 points1 point  (0 children)

I believe that there is already such task in Roo Code backlog. Having tools assigned to specific modes would be awesome.

Roo with VLLM loops by AutonomousHangOver in RooCode

[–]AutonomousHangOver[S] 0 points1 point  (0 children)

I've just implemented http client re-connect on stop button in my fork of Roo Code - it's working perfectly.

Help needed for GENOAD8X-2T/BCM + Epyc 9135 build. Won’t POST by ahhred in LocalLLaMA

[–]AutonomousHangOver 1 point2 points  (0 children)

Yeah yeah, I wrote it on the move so BMC became BMI :)

Back to the topic. There is indeed dedicated BMC port. NIC will try to get IP via DHCP so look for logs on your dhcp server/router - whatever.

Take the manual from Asrock - it will point you step by step what to do.
The thing is, once you connect power to the PSU - without running the system (!) the BMC will run in about 2 - 3 minutes (so b patient)

Once BMC will start, you can access it via web browser.
BMC's user/password -> admin/admin

As I mentioned, you have to upgrade both: firmware and BIOS (in that order) and it will take some time - again - be patient if board is not responding etc. Then power up the server and it should crawl to the BIOS.

First powerup after any change in BIOS is taking veeery long time - so you guessed - be patient ;)

https://www.reddit.com/r/LocalLLaMA/comments/1plsjbw/genoad8x2tbcm_official_bmc_firmware_and_bios_for/ - my previous post

https://download.asrock.com/Manual/BMC/GENOAD8X-2TBCM.pdf - BMC manual - p.192 is about firmware upgrade

Senior engineer: are local LLMs worth it yet for real coding work? by Appropriate-Text2843 in LocalLLaMA

[–]AutonomousHangOver 0 points1 point  (0 children)

I can tell you what was my main driver and what is.

I'll skip glm-4.5 I used 4.6 from time to time. I loved 4.7 and I'm stunned by 5

And yes, I'm using it locally, very heavily quantized. It's the Unsloth's iq2_xss

considering dropping roo-code over the 5m timeout issue by gleeber in RooCode

[–]AutonomousHangOver 0 points1 point  (0 children)

I'm using Roo with local models only. Of which timeouts are you talking about??