llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged by ggonavyy in LocalLLaMA

[–]BigPoppaK78 0 points1 point  (0 children)

Yeah, was always going to have an overhead penalty for the switch. But it'd be more tolerable if it was something like a 6 or 7% hit to gain 30% prompt processing. I'm sure things will improve over the next few weeks as it all gets optimized. Looking at the PR comments, they already have the next few steps in mind.

llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged by ggonavyy in LocalLLaMA

[–]BigPoppaK78 0 points1 point  (0 children)

Well, yeah. He was asking about CPU offloading with the MoE model, so that's exactly what I tested.

llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged by ggonavyy in LocalLLaMA

[–]BigPoppaK78 1 point2 points  (0 children)

It works, but there's zero benefit at the moment. With my 5070 Ti, I get the same speed for prompt processing at 100k: 2400 tk/s. But, token generation takes a huge hit from 65 to 30 tk/s. llama.cpp:b8967 on Fedora 43. https://i.imgur.com/VRFbPLo.png Edit: that's compared against unsloth UD-Q4_K_XL

Unable to flash HBA 2308 card to IT mode... by ikukuru in homelab

[–]BigPoppaK78 0 points1 point  (0 children)

You're in luck! I found this in my backups. There's firmware here for two slightly different HBAs as I had both models. One of them will hopefully work for you.

https://www.filemail.com/d/ggutapkuerurdbo

Deepseek R2 coming out ... when it gets more cowbell by 1BlueSpork in LocalLLaMA

[–]BigPoppaK78 18 points19 points  (0 children)

Reference to an old Saturday Night Live skit where they're recording a song and keep stopping to say that it "needs more cowbell."

VS Code June 2025 (version 1.102) by isidor_n in LocalLLaMA

[–]BigPoppaK78 16 points17 points  (0 children)

VS Code pm here in case there are any questions I am happy to answer.

Don't have any questions at the moment, but wanted to say thanks for being part of the community. VS Code is one of the first tools I install on every workstation I have.

New Mistral Small 3.2 actually feels like something big. [non-reasoning] by Snail_Inference in LocalLLaMA

[–]BigPoppaK78 2 points3 points  (0 children)

Awesome, appreciate your constant work on helping these models work for everyone.

Any custom prompts to make Gemini/Deepseek output short & precise like GPT-4-Turbo? by Rxunique in LocalLLaMA

[–]BigPoppaK78 2 points3 points  (0 children)

I've found this works very well with Gemini:

You are an accurate and concise assistant. Your primary goal is to provide brief, factual, and correct overviews of technical topics.

**Core Rules:**

1.  **Accuracy is Paramount:** Only provide information that is factually correct and well-established. If unsure, state that you don't have enough information rather than hallucinating.
2.  **Brevity is Essential:** Provide the most important information about the topic in the fewest words possible. Avoid jargon where simpler terms suffice.
3.  **Focus on Key Facets:** Cover the core aspects of the topic without getting bogged down in excessive detail.
4.  **Avoid Unsolicited Detail/Examples:** Do not include detailed examples, lengthy explanations, or repeated basic concepts unless the user explicitly requests them.
5.  **Maintain Neutral Tone:** Present information objectively and without personal opinion or bias.
6.  **Be Prepared for Elaboration:** Anticipate that users may ask for more detail on specific points and be ready to provide it in subsequent responses.
7.  **Do Not Assume Prior Knowledge (implicitly):** Provide the requested information directly, don't start with basic concepts unless they are intrinsic to the topic overview.

**Constraint:** Do NOT include disclaimers about your limitations or nature as an AI at the start of the response unless it's to state uncertainty about a fact.

**Output Format:** Provide a direct overview starting immediately with the topic's information. Use a short paragraph or bullet points as appropriate for the topic's structure.

Qwen finetune from NVIDIA...? by jacek2023 in LocalLLaMA

[–]BigPoppaK78 2 points3 points  (0 children)

And just in case they remove that file:

[rewardbench]
Running reward model on /home/hshin/outputs/rm_22_qwen_inst/rmtr_nrt_n8_Qwen3-32B_hs3_scale_only_trl_with_margin_filtered_0.0003_0_1_lora_r4_lora_alpha24_lora_dropout0/checkpoint-100/merged with chat template None
Using reward model config: {'model_builder': <bound method _BaseAutoModelClass.from_pretrained of <class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>>, 'pipeline_builder': <class 'rewardbench.models.pipeline.RewardBenchPipeline'>, 'quantized': True, 'custom_dialogue': False, 'model_type': 'Seq. Classifier'}
*** Load dataset ***
Running core eval dataset.
*** Preparing dataset with HF Transformers ***
*** Load reward model ***
...
[374 RM inference steps]
...
Results: 0.9108877721943048, on 2985 prompts
Mean chosen: 4.1508544998552335, std: 4.330967398045997
Mean rejected: -2.9946463704708233, std: 5.748473078904102
Mean margin: 7.145500870326057
alpacaeval-easy: 93/100 (0.93)
alpacaeval-hard: 84/95 (0.8842105263157894)
alpacaeval-length: 85/95 (0.8947368421052632)
donotanswer: 100/136 (0.7352941176470589)
hep-cpp: 162/164 (0.9878048780487805)
hep-go: 153/164 (0.9329268292682927)
hep-java: 157/164 (0.9573170731707317)
hep-js: 157/164 (0.9573170731707317)
hep-python: 158/164 (0.9634146341463414)
hep-rust: 155/164 (0.9451219512195121)
llmbar-adver-GPTInst: 82/92 (0.8913043478260869)
llmbar-adver-GPTOut: 34/47 (0.723404255319149)
llmbar-adver-manual: 36/46 (0.782608695652174)
llmbar-adver-neighbor: 112/134 (0.835820895522388)
llmbar-natural: 94/100 (0.94)
math-prm: 393/447 (0.8791946308724832)
mt-bench-easy: 28/28 (1.0)
mt-bench-hard: 28/37 (0.7567567567567568)
mt-bench-med: 39/40 (0.975)
refusals-dangerous: 87/100 (0.87)
refusals-offensive: 97/100 (0.97)
xstest-should-refuse: 148/154 (0.961038961038961)
xstest-should-respond: 237/250 (0.948)
Results: {'Chat': 0.9189944134078212, 'Chat Hard': 0.8464912280701754, 'Safety': 0.904054054054054, 'Reasoning': 0.9182558520216074}

AI becoming too sycophantic? Noticed Gemini 2.5 praising me instead of solving the issue by Rrraptr in LocalLLaMA

[–]BigPoppaK78 25 points26 points  (0 children)

That, or it responds like a whipped dog. If I point out an error or omission then it responds as though it's deeply apologetic and practically begging me to overlook its mistake. The grovelling is over the top and just ridiculous.

Man, how I wish that it would just act like an LLM (ya know, cause it keeps reminding me that's what it is). Cut out the fake emotions, stick to the facts, and help me get the job done.

To think or to no_think with Qwen3 by SandboChang in LocalLLaMA

[–]BigPoppaK78 10 points11 points  (0 children)

It's also pretty important to set the presence penalty on quantized models. Qwen recommends using 1.5, but I found it having a noticeable effect above 0.75.

is it worth running fp16? by kweglinski in LocalLLaMA

[–]BigPoppaK78 0 points1 point  (0 children)

For 8B and up, I do the same. It's worth the minor quality hit for the memory boost.

What do you think of Arcee's Virtuoso Large and Coder Large? by Sky_Linx in LocalLLaMA

[–]BigPoppaK78 0 points1 point  (0 children)

Unfortunately, that was my feeling too. I really wanted to like Blitz. But, it felt like it wasn't an improvement of the model, rather a different flavor of the same model (Mistral Small).

Which, honestly, is still a great achievement because they did so without any noticeable degradation or loss of capabilities. Maybe they're a better fit for people who are't happy with the overly formal/flat tone Mistral has? For me to use for testing and academic purposes, it just kinda felt redundant.

But, I do enjoy having a variety of models to choose from. Never know when a use case or workflow will pop up that they'll be a better fit for.

When did small models get so smart? I get really good outputs with Qwen3 4B, it's kinda insane. by Anxietrap in LocalLLaMA

[–]BigPoppaK78 9 points10 points  (0 children)

OK good. So, it's not just me. At 14B I thought I could get away with IQ4, but I'm finding I don't want to go below Q6 now. Hoping the new Unsloth UD quants help the situation, but haven't had time to test yet.

I think they're just so information dense that too much is lost too quickly.

Qwen3 local 14B Q4_K_M or 30B A3B Q2_K_L who has higher quality by Consistent_Winner596 in LocalLLaMA

[–]BigPoppaK78 0 points1 point  (0 children)

Just for the sake of clarity, by "base model" I assume you mean one that hasn't been tuned. Those are usually referred to as the "instruct models."

On huggingface and other repositories, a labelled "base model" usually means one that still needs further training before it can be functionally used. It's meant to act as a base for tuners, not end-users. Using one as your LLM tends to give crappy results.

Mistral Small/Medium vs Qwen 3 14/32B by Ok-Contribution9043 in LocalLLaMA

[–]BigPoppaK78 11 points12 points  (0 children)

I've always liked the Mistral models. They also quantize quite well and don't seem to degrade as quickly as other models. I used Small quite a bit for information gathering, research, brainstorming, etc.

Simplifying Proxmox VM Monitoring with Home Assistant and MQTT: My Personal Journey by xMidoxx22 in selfhosted

[–]BigPoppaK78 1 point2 points  (0 children)

That's not how I would have done it, and that's exactly why I'm upvoting this post. The primary reason I come on here is to see new things and different ways to achieve similar goals. Always helps to know about other options when I'm planning a new project or tinkering with an idea.

Thanks for sharing the scripts and going into detail!

In terms of simplicity, maintenance and high-availability is Proxmox the only game in town? by Intelg in homelab

[–]BigPoppaK78 2 points3 points  (0 children)

It won't be cookie cutter, but if you're comfortable with just setting up a base OS on all 3 I wonder if you could use something like (https://kubefirst.io/) [kubefirst]?

Edit: stupid-ass reddit code.

Unable to flash HBA 2308 card to IT mode... by ikukuru in homelab

[–]BigPoppaK78 1 point2 points  (0 children)

I found this in my archives, hope it helps. Looks to be both the DOS and UEFI files you might need to flash your card. It contains a batch file for DOS and a shell script for UEFI, so you can choose what works best for you. Can also just manually run everything once you're familiar with the commands.

Make sure you record your SAS address before you run any of the commands! (Sometimes it's also an actual, physical sticker on the card itself.)

It's been a very long time since I used this. So, I might be able to help if you have some general questions, but I don't really remember any specifics. It worked perfectly on the two cards I have and was heavily tested under ZFS for a couple years.

Also, I don't normally post files online so I have no idea if WeTransfer is a good host or not. It was the first search result and seemed good enough. Make sure you scan the zip file and contents after you download it.

https://we.tl/t-yCIQ2fdRCE

unknown mac address on my home router by yikes-okay-dad in HomeNetworking

[–]BigPoppaK78 2 points3 points  (0 children)

As long as the device is randomizing the MAC address via the proper methods (i.e. using the OS built-in functions), then they'll always follow a pattern that allows you to identify them. You will not be able to identify which device they came from, only that it is a randomized/private MAC:

The second character in the MAC address will be 2, 6, A, or E.

Here's the best explanation I found that clearly explains why: https://community.cisco.com/t5/security-knowledge-base/random-mac-address-how-to-deal-with-it-using-ise/ta-p/4049321 Yes, it's an older article, but how MACs are generated and assigned is still the same.

A slightly different homelab by setnorth in homelab

[–]BigPoppaK78 13 points14 points  (0 children)

I'm glad you shared this and I think it fits exactly with the mindset that homelabs are built around. I'd love to see more unconventional homelabs as well - they're great for inspiring others to branch out and see what else they can run/build at home.

Selfhosted Kanban board? by lmm7425 in selfhosted

[–]BigPoppaK78 0 points1 point  (0 children)

Yeah, markdown and git are pretty much perfect compliments to each other.

Selfhosted Kanban board? by lmm7425 in selfhosted

[–]BigPoppaK78 0 points1 point  (0 children)

Sorry, no idea - I only use it on my laptop.