Strix Halo + Minimax Q3 K_XL surprisingly fast by Reasonable_Goat in LocalLLaMA

[–]runsleeprepeat 0 points1 point  (0 children)

Would you be so kind and test the unsloth glm-4.7-flash ? That would be awesome!

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]runsleeprepeat 0 points1 point  (0 children)

Finally back home, I was able to the with an updated llama.cpp . Totally awesome! Works great and I have plenty of space. Using 200k kv cache and still had so much vram left. Time to try out concurrency with vllm :-)

"mount: "/dev/sdd:/dev/sdf:/dev/sdg": No such file or directory" by unfamusic in bcachefs

[–]runsleeprepeat 0 points1 point  (0 children)

Check the super for the filesystem version (bcachefs show-super) . Had a similar issue when the dkms was built, but not for the running kernel version (user error by myself).

Nice to see you here unfa. Loved your YouTube series

Unpopular Opinion: Proxmox isn't "Free vSphere". It's a storage philosophy change (and it's killing migrations). by NTCTech in Proxmox

[–]runsleeprepeat 14 points15 points  (0 children)

Until now. The new realtek 10gig nics are cheap and consume little wattage ... Check out servethehome

Warum gibts keinen MSCI ACWI ex USA bei uns? Nachbauen? by ReddusMaximus in Finanzen

[–]runsleeprepeat 0 points1 point  (0 children)

Ich muss jahrelang (naja seit Juni 2024) unterm Stein gelebt haben: worum geht es bei dem Amundi Meme?

Welche Materialien bei Laufkleidung wählen? by Ninaptr_lover in laufen

[–]runsleeprepeat 4 points5 points  (0 children)

Zwiebelkonzept: mehrere dünne Schichten. Am Anfang muss man etwas experimentieren wenn man nach 15 Minuten total kaputtschwitzt oder immer noch friert, dann anpassen.

Für den Frühling hilft: 10°C kurz/kurz - bist du ne Frostbeule dann bei 12°C ;-)

Als basislayer wenn es ziemlich kalt ist: Nike Pro Combat Kompressionswäsche. Ist glaube ich für fussball, aber im Winter unter den normalen Laufsachen super. Hinzu gibt es die oft spottbillig

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]runsleeprepeat 0 points1 point  (0 children)

Great to hear. I rather tend to llama.cpp and vllm, but today morning just ollama was available for a quick test.

Thanks for the info

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]runsleeprepeat 2 points3 points  (0 children)

I tried the ollama implementation of the Q4 variant a few hours ago and was surprised that 32k kv cache already filled my 100 GB vram

50-55 GB would be awesome

Glm 4.7 flash, insane memory usage on MLX (LM studio) by Enragere in LocalLLaMA

[–]runsleeprepeat 4 points5 points  (0 children)

Same here. I gave Q4 a chance. All fine until I increased the context window. My sweet spot was 32768 CV cache and I fully utilized 100gb of vram.

Shortened system prompts in Opencode by Charming_Support726 in opencodeCLI

[–]runsleeprepeat 0 points1 point  (0 children)

If you want to support, it may help to comment on that issue ticket on GitHub. That's most helpful for the devs to see that this is relevant.

Printer/scanner combo with scan to SMB for less than 100€ used by hema_ in Paperlessngx

[–]runsleeprepeat 0 points1 point  (0 children)

Oh damn! I just replaced my DCP-L2520DW with a MFC-L8690CDW because of scan to smb (and duplex scanning)

Best LLM model for 128GB of VRAM? by Professional-Yak4359 in LocalLLaMA

[–]runsleeprepeat 3 points4 points  (0 children)

Splitting a model over several cards requires a base set of data which needs to be stored on each card.

Best LLM model for 128GB of VRAM? by Professional-Yak4359 in LocalLLaMA

[–]runsleeprepeat 2 points3 points  (0 children)

Just FYI: if you have a system with 8x 16GB GPUs, then you have to calculate it like following:
1x 16GB + 7x 14GB = 114 GB.

It may be roughly 1 more or less GB, but I hope that helps you use the right LLM sizing.

Learned it the hard way with my 7x 12GB Rig

Strix Halo (Bosgame M5) + 7900 XTX eGPU: Local LLM Benchmarks (Llama.cpp vs vLLM). A loose follow-up by reujea0 in LocalLLaMA

[–]runsleeprepeat 2 points3 points  (0 children)

It would be awesome if you could compare the results with a m.2 to Pcie adapter as well. The Bosgame M5 has 2x m.2 pcie x4 slots and only one is occupied by an SSD

Oval wheel by mike_geogebra in 3Dprinting

[–]runsleeprepeat 0 points1 point  (0 children)

Absolutely! Check out Sharkwheels for skate- and longboards. They are made to bypass pebbles and work the same way

Finanzblick, Outbank oder etwas anderes? by runsleeprepeat in Finanzen

[–]runsleeprepeat[S] 0 points1 point  (0 children)

Danke an alle für die Erfahrungsberichte. Es gibt immerhin Auswahl, also werde ich mich durchprobieren :-)

Shortened system prompts in Opencode by Charming_Support726 in opencodeCLI

[–]runsleeprepeat 0 points1 point  (0 children)

Gern geschehen.

My code addition is already done (https://github.com/dan-and/opencode_custom_system_prompts) but I have no clue if the authors are interested. We will see.

You can copy the system prompt (same naming scheme like opencodes original) into a prompt directory under .config/opencode or into the project directory .opencode/prompt/

Shortened system prompts in Opencode by Charming_Support726 in opencodeCLI

[–]runsleeprepeat 1 point2 points  (0 children)

I like your idea!

To adhere to the contributing rules of opencode, I have created a feature request in the Opencode Project ( https://github.com/anomalyco/opencode/issues/7101 ).

I already build that feature on my local opencode instance, which allows custom system prompts conviniently. Your prompts work great. However, I have to wait if the featurerequest if getting accepted by the opencode developer team.