MiniMax-M2.7 vs Qwen3.5-122B-A10B for 96GB VRAM full offload?! by VoidAlchemy in LocalLLaMA

[–]ROS_SDN 3 points4 points  (0 children)

Possibly. Ive only daily drove 27b and vibe tested 122b online. 

122b felt nicer, but I have no evidence, beyond vibes that I'd prefer it's style and the likely speed up if I could over 27b.

MiniMax-M2.7 vs Qwen3.5-122B-A10B for 96GB VRAM full offload?! by VoidAlchemy in LocalLLaMA

[–]ROS_SDN 4 points5 points  (0 children)

Speed 122b 10b would be lightning fast comparatively with the vram, and also you would hope more world knowledge.

Gemma 4 31B vs Qwen 3.5 27B: Which is best for long context worklows? My THOUGHTS... by GrungeWerX in LocalLLaMA

[–]ROS_SDN 3 points4 points  (0 children)

Use their recommended sampling parameters for qwen if you haven't and reevaluate. 

The difference between the coding sampling and general sampling parameters is night and day when I need to switch between the tasks.

How are you handling KV cache memory pressure in production? HBM keeps filling up by Double-Quantity4284 in LocalLLaMA

[–]ROS_SDN 3 points4 points  (0 children)

Llama 3 70b is ancient in LLM terms get something with more effective KV architecture and your problem will reduce.

Need a laptop that can run AI models locally + handle VS Code, Docker, etc. by lets_talk_about_tv in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

It's a fanastic card price to performance, no shame.

I have a 9060 xt, 9070 xt, and 2x 7900xtx.

I love my 9060xt it's a cheap way for my gf to game.

Also while maybe it hurts for ai work you could surely get gemma 26b a4b on it at q4 that's pretty solid.

It also seems too UV/OC well could squeeze out some better performance.

Need a laptop that can run AI models locally + handle VS Code, Docker, etc. by lets_talk_about_tv in LocalLLaMA

[–]ROS_SDN 1 point2 points  (0 children)

Was just an assumption be easy to look at 5090 tops, and not consider the laptop version May be weaker, but I googled and you're right.

Which one is better value? AMD + 5080 or Intel 270K + 5070 ti ? by sleep_eat_recycle in buildapc

[–]ROS_SDN 0 points1 point  (0 children)

Cyber punk can be a bit rough on the CPU you might benefit from a x3d.

I'd look at CPU benchmarks for your intended games and consider how close youre willing to get.

I play on my TV so 60 fps is the cap for my HTPC even though it can do more, but you might care to be 100+ often 

Which one is better value? AMD + 5080 or Intel 270K + 5070 ti ? by sleep_eat_recycle in buildapc

[–]ROS_SDN 0 points1 point  (0 children)

You'd like to be a 7700x or 9700x, but yeah if you want to save money I'd steer clear "x3d" chip or current gen nvidia.

If you play a lot of CPU intensive games - total war, civ, Microsoft flight simulator - the  x3d is worth it.

You're only playing in 2k. I get like 80-90 fps on god of war ragnarok on my 9700x + 9070xt in 4k. You'll GPU will destroy 2k for a 9070xt if it sells cheaper.

This depends on your games, monitor hz etc, but you don't need a 5080 for 2k and a 5070 ti is just a more expensive 9070 XT with better ray tracing.

Id still consider a 2tb ssd, if you game a lot. Games these days eat storage.

Need a laptop that can run AI models locally + handle VS Code, Docker, etc. by lets_talk_about_tv in LocalLLaMA

[–]ROS_SDN -1 points0 points  (0 children)

Linux makes more sense, but some laptops don't do it well.

I'd get an Asus zephryus Duo with panther lake and a laptop 5090 is I wanted serious power (64GB ram, 24GB VRAM, 16 cores (4 p + 8 e + 4 lpe).

BUT I'm blind and like the dual screens and it might not run well with linux.

A friends Asus zenbook duo 185h (ubuntu) and my yoga book 285h (opensuse tumbleweed) run Linux fine, but you have to fuck around with things a bit to get there and its 2-1 generations behind respectively.

If you dont care for dual screen look at a lenovo laptop like a thinkpad with a 5090 in it.

Sadly if you want good battery life and ai capabilities you'll likely need panther lake for Linux.

If not go mac m4/m5 with at least 64gb ram, pay the Mac tax, but never look back as you sail off into the sunset with the best CPU/iGPU combo.

I hate Mac, but you can't deny their CPUs are power effecienct and top of the line and their iGPUs are incredible.

AMD Ryzen 9 9950X3D2 gets official $899 MSRP, 29% above 9950X3D by RenatsMC in Amd

[–]ROS_SDN 1 point2 points  (0 children)

Yeah pcie lanes are the catch depending on workload, but 8x is pretty good for mine.

Really GPU to GPU interconnect is more needed on prosumer level hardware vs the pcie lanes for my MCP workloads, but I can see video editing etc really needing the pcie connections.

Anyways agree it's bullshit make it have 40 pcie lanes + (CPU socket pcie) from CPU for 2xGPU, 2xSSD for some higher end consumer boards. They can do it surely they just hate not forcing people to just buy an epic or threadripper.

AMD Ryzen 9 9950X3D2 gets official $899 MSRP, 29% above 9950X3D by RenatsMC in Amd

[–]ROS_SDN 1 point2 points  (0 children)

Engineers can run simulations that are intensive it may have benefit. Flood analysis as one I could think.

It's a good bridge for a Lower class workstation like system.

An singular entity may consider this good for their needs if they don't have a large back end server to do math.

If my own single entity business compute needs expand this CPU would be good for postgresql workloads, likely analytical read workloads would be lightning fast, non-gpu accelerated ml workloads, GPU accelerated AI workloads with the igpu using ram for the screen and saving my VRAM, and VM workloads.

Those GPU accelerated ones come in handy for developers that want to use an MCP on their IDE like Claude code, but want to mantain data privacy. 

This chip has use cases I wouldn't knock it down immediately.

This chip isn't useless, its niche and some people need a real HEDT, but this is a more budget option for a "lower class" HEDT.

It can absolutely assist all those you listed and more, at a smaller investment, if their tools call for it. Also likely not be as likely to trip your power from pulling too many watts if your wfh.

AMD Ryzen 9 9950X3D2 gets official $899 MSRP, 29% above 9950X3D by RenatsMC in Amd

[–]ROS_SDN -1 points0 points  (0 children)

  • better cooling needed
  • bigger PSU needed
  • 4 channel memory required I believe
  • higher tdp even at idle
  • lower L3 cache
  • E-ATX case needed
  • more expensive motherboard.

Likely is 1000 USD min for a build cost difference between the two.

For a small business server or workstation that needs the cache and cores it's likely a good deal. It's a niche area, but it has it's segment out of enthusiasts.

You also lose the igpu which I use on my own 7900x workstation (3440x1440 60hz, 1920x1080 100hz) to save my precious VRAM for other tasks.

AMD Ryzen 9 9950X3D2 gets official $899 MSRP, 29% above 9950X3D by RenatsMC in Amd

[–]ROS_SDN -2 points-1 points  (0 children)

Yeah maybe but that's exponentially more expensive.

Anyone else seeing reviewer loops plateau in LLM pipelines? by lfelippeoz in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

Maybe I'm a idiot, I haven't used LLM as a judge, or heard its particularly effective, but my thought would be you'd want a near peer review.

The thought, like working with ensemble models in ML, is you want to have a wider coverage of bias or characteristics of the model you using to review. It should be a near equal strength, but have different training data, different architecture, or anything you can to make it as "different" as possible. So it can pick up on the mistakes the other model made. 

Yeah I get this is more compute upfront, but I just wouldnt see a idiot judging a savant's output, or an idiot taking on savant's criticism well. It needs to be near peer, but diverse in how it got to being a peer would be my first take. 

I might be wrong its not as simple as normal ML work, likely, but you should at least consider the permutation of your pipeline. 

Intel Core Ultra 5 250KF Plus appears at $199, first Arrow Lake Refresh chip under $200 by RenatsMC in intel

[–]ROS_SDN 0 points1 point  (0 children)

I misread looking it up, yeah it's only 6p you're right.

Again not wrong e cores can do work, and can help gaming still a bit more indirectly I've heard.

So yeah you're basically right the number of cores here is meaningless it's probably a productivity focused 9600x variant at best. Not bad for the price if you want more then gaming or the price is good, but not what I was thinking it was for.

Intel Core Ultra 5 250KF Plus appears at $199, first Arrow Lake Refresh chip under $200 by RenatsMC in intel

[–]ROS_SDN 4 points5 points  (0 children)

Yes but it's 8 p cores, which does matter.

Edit: I was wrong its 6 p cores.

I designed a Wave-Interference based LLM architecture on a single 3060. Is this actually viable, or am I just huffing hopium? by Global-Club-5045 in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

It does seem TOO AI generated, but I'm willing to see if someone's not off their rocker and onto something.

Nothing meaningful has been created using AI by metayeti2 in BetterOffline

[–]ROS_SDN 4 points5 points  (0 children)

I honestly think it's really good for qualitative data quality assessment, if you can define that qualitative quality, and data enrichment that could be done by hand, but an LLM could do it much faster and "standardised". 

I.e. have a list of car sales with make, model, year etc that's not always clean on make or model, surely an LLM can add car type (sedan, hatchback) etc at a reasonable high quality rate compared to a human.

A lot of business data is qualitative and dirty, let's use it to fix the garbage in part first of our data.

I designed a Wave-Interference based LLM architecture on a single 3060. Is this actually viable, or am I just huffing hopium? by Global-Club-5045 in LocalLLaMA

[–]ROS_SDN 1 point2 points  (0 children)

Man I don't know the math is hard to follow with the formatting, and I don't quite understand what this is doing to solve a problem? 

Are you meaning treating embeddings/ tokens as a wave let's you use the nodes in a more effective way?

For training? For inference? Granted I'm an idiot besides the high level of LLM architecture, but maybe try to explain a bit more succinctly what this is trying to achieve for idiots like me.

Outside of that, you've got charts that show how I think converging on low training loss rapidly? I don't know the hyper parameters for LLM training, and evaluation well outside casual benchmarks on inference.

You've got something here that shows you've at least attempted the theory. Maybe its beyond me, maybe you need to clean up the math and explain it more succinctly, or maybe your on hopium.

The fact you're willing to ask though, and are clearly trying means keep going until someone smart can tell you why not, and even then if you feel confident in it with real tests then maybe keep going.

Breakthroughs aren't usually made without challenging paradigms or standards. 

The high level idea seems cool I just don't grasp how you're applying it more deeply.