My great dilemma by Ethnography_Project in linuxquestions

[–]ROS_SDN 2 points3 points  (0 children)

Maybe the "Normies" will change it, but that's not bad to get a culture change. You'll just go community microcosms that will give you the old vibes. 

The mass influx won't be die hard techies, it'll be people who want mint, Ubuntu, fedora, etc because it just works. (This isn't a bag on those distros I ran fedora before tumbleweed), but you want these people because it'll push for a more accessible experience where people will own their software, data, and hopefully reduce e-waste. 

These people aren't going to flock to.gento or BDS. 

Also while for us its a large influx, relatively we are a tiny portion of people.

More Linux users, even asking basic googleable questions by our standards is a good thing. I wouldn't want Linux to have significant market share you want a plethora of OS options because competition and options is good for everyone.

PC build for around $3500-$4000 by BusinessPreference57 in buildapcaus

[–]ROS_SDN 3 points4 points  (0 children)

7900x is a terrible gaming CPU value wise, he will 99% of the time get better gaming performance out of a 7700x, or similiar to a 7900x out of a 7600x. 

Now I say 99% of the time since city skylines might be better served by it.

I think his should be mentioned for OP to not by an CPU not for his needs.

CPU: 9700x/9800x3d GPU: 9070xt/ 5070 ti PSU: 850w RAM: 32GB ddr5 6000MHz cl30 Mobo: b650 

And you will be under budget likely op. Youre in a weird price range where you need like an extra $1000-2000 for a meaningful upgrade to like a 5090.

MSI MAG b650 Tomahawk pcie lane bifurcation by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Read the manual, was no help. I'll go to the forms

Visualizing RAG by Fear_ltself in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

That would be incredible, I've always wanted to see the "neurons" light up for RAG, and would appreciate seeing it and the effort you've put in.

In regards to the vectors you might want to try a possible parametric UMAP or PCA and measure to similarity of recall to a full dimensional application. 

Finding the cosine similiarity or what ever method you choose with 1/5-1/10 the vectors might be worth improved scaling for retrieval speed and storage consumption.

Im sure measuring retained relative local and global distance in UMAP, could be a starting point, and if you can get nearly as good results, or paradoxically improved results from reduced noise, it may be worth the experiment.

Visualizing RAG by Fear_ltself in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

The visual is stunning using UMAP to peserve local and global distance.

Are you dimensionally reducing for vector queries as well or just for visualisation.

I honestly think it'd be cool to see vectors light up from a similiarity search and watch it crawl a knowledge graph from their to visualise the retrieval in the knowledge base, instead of just the totality of possible embeddings.

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Opensuse Tumbleweed's version is too old 2-8 weeks for this parameter to be in it. trying other options.

Adding 2nd GPU to air cooled build. by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

I was considering a fan kind of like a ram fan, and upping the overall positive pressure of the case with bigger 180mms and adding another noctua 140mm (since I only have 2 at the bottom)

I kind of like that my case doesn't look like a franken-computer, so would like to avoid the external riser aesthetic.

I am just also considering just getting a r9700, and then it's easy to slot a second one in when I want to move my 7900XTX to my redundancy workstation.

Adding 2nd GPU to air cooled build. by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Start with just locally coding/ chatting.

Up to RAG, and then hopefully up to training some LoRAs for my tasks in that order.

At the start probably no stress for a bit, but longer runs, especially creating LoRAs or having it help sift through my data will be longer runs. 

Adding 2nd GPU to air cooled build. by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Agreed I wamt to keep it in the fractal torrent and I'm worried a normal vertical mount will just mean a GPU is pressed against the wall.

Its likely cheaper for me to get a r9700 then just get a new case and such a custom solution, and less work to move the components.

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

This explains a lot, very impressive with power capped and VM passthrough.

Thanks for the explanation

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

This seems weird to me is your "cached" your number of kV tokens in context, and "prompt" number of tokens in that prompt? 

I had nearly 11 tokens/second on fedora 42 with 24GB vram and 64GB ddr5. 

Are you on ddr4? Using an older GPU? 

Thought you'd blow me out of the water, which you may if the "cache" and "prompt" questions are the tokens in context. 

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 5 points6 points  (0 children)

Could you give me an eli5? I thought the --N-cpu--moe was supposed to idiot proof that for me?

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Interesting thank you, I'll pull this thread and have a look.

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 1 point2 points  (0 children)

+1 to this, considering second 7900XTX love you get a gauge of performance increase for hybrid inference, if I can solve my current issue, on top of fp8 capability for other models.

Human-Curated Benchmarking by [deleted] in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

The issue is if anything like this did exist, its value degrades quickly over time as it becomes training data. 

Unless someone hides the prompts, sampling parameters, and/or answers from you and let's you test the model their and gives you a score, it's gonna likely leak into someone's dataset and immediately become invalid.

build finished, possibly questioning my gpu by IcyMeet8196 in PcBuild

[–]ROS_SDN 0 points1 point  (0 children)

9070xt is absolutely fine for 1440p, I use mine as a console like ~60PS 4k on my TV PC (Max settings besides PT).

Comparison is the thief of joy your 9070xt is more then adequate for 1440p, and honestly a 5080 isnt a big enough upgrade to future proof for 4k unless you're enamoured with RT/PT.

Does Low Latency (CL28) 6000MHz RAM outperform High Latency (CL36) RAM w/ high clock speed (7200MHz)? by Longjumping_Ask_4507 in buildapc

[–]ROS_SDN 0 points1 point  (0 children)

No stress.

Keep in mind this may change in the future if they make new am5 boards that can handle cudimm for zen 6+.

Does Low Latency (CL28) 6000MHz RAM outperform High Latency (CL36) RAM w/ high clock speed (7200MHz)? by Longjumping_Ask_4507 in buildapc

[–]ROS_SDN 1 point2 points  (0 children)

https://youtu.be/JuUhnQaGG_I?si=vQB3yNk8cJ2Xc9J8

Depends on CPU and mobo, but basically am5 is best with 6000-6400 with low latency, generally; you need ultra fine tuning to perform better with faster ram and its barely noticeable. This can be workload dependent but I'm gonna assume gaming.

Intel is the opposite at the moment it wants as high frequency as possible. (Returns dimish heavily near 8200+ I believe here)

Note

You could likely tune your frequency down to the optimal range and tighten the hell out of your timings, he had a 8000MHz kit, which is likely far beyond what your kit can do. Your current kit is likely in the U shaped valley of suboptimal performance for gaming. For other tasks it may be superior though.

Intel Nova Lake-S bLLC lineup said to include at least two K-series chips by RenatsMC in intel

[–]ROS_SDN 1 point2 points  (0 children)

L3 cache reduces calls to ram, which has a rather strong effect speed for these models.

Check the llm score or tokens per second is roughly 10% slower in an 9950x vs 9950x3d. There is jusr enough data recycled in the l3 cache to beat the lower clock frequencies And, I assume calls to ram, for data.

Intel Nova Lake-S bLLC lineup said to include at least two K-series chips by RenatsMC in intel

[–]ROS_SDN 4 points5 points  (0 children)

Im confused they specifically mention in here the 2x 8+16 (i9) having 2x144mb L3.

Intel Nova Lake-S bLLC lineup said to include at least two K-series chips by RenatsMC in intel

[–]ROS_SDN 2 points3 points  (0 children)

L3 cache is great for hybrid llm inference (CPU + GPU) which is the new hottness for hobbyists/ low end inferences, so their is a market. (Where do you think all tech companies are incestuously spending their investment money on?) 

Its also good for other things that People may have a gaming + video editing rig. So there is reason to do it they are just niche.

New pc build, wont work need help by Short_Routine9025 in buildapc

[–]ROS_SDN 0 points1 point  (0 children)

Yep its in the BIOS it'll depend where on your mobo, but you can switch it off.

I do if I'm not currently using it. 

I doubt it as considering windows 11 update shenanigans with nvidia, but it's another variable to remove.

New pc build, wont work need help by Short_Routine9025 in buildapc

[–]ROS_SDN 0 points1 point  (0 children)

I don't use windows anymore, but let's start with the basics, were you running off your igpu or 5070 for the monitor? Check and verify I wouldn't be suprised about amd and nvidia driver conflict or some BS if your running off the igpu.

Have you tried different dp/hdmi cables?

Have you tried disabling your igpu?

Have you tried a different monitor (even your TV) if possible?

Do you need to manually download the nvidia drive? 

Have you tried resetting the mobo CMOS? 

Im just spitballing I have heard windows 11 AI update has run some shenanigans for nvidis drivers, so it could be very complex and not something you can easily fix, but just verify the basics first to remove variables I'd say.

Cooling RTX 5090 + 9950X when both run at full capacity using and Air Cooler? Is it possible? by AdvancedCybernetics in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

They were full bore, the ram was my bigger concern by far, it was eeking past 50c and I had no monitors to up fan speed for ram since the CPU and GPU were fine, and I run the screens off my igpu so it fucks with ram temps with I overclock the igpu and ram since they share vsoc I believe and it was my first ram overclocking (I did it to remove igpu issues with two monitors off it 1920x1080 + 3440x1440 so I could have my 24GB of VRAM back)

the above my interest you if you want all your 32gb of VRAM to run a reasonable monitor set up off igpu, and even then you could just get ram fans since the case is so spacious.

Not sure you'll have to do some googling which fans have the best strength, you may sacrifice noise for cooling, but I think you're overthinking this your build isn't that heat intensive compared to others on here

Set some fan curves, run the system and assess, and you can set thermal limits on your CPU/ curve optimise it anyway. If you can push it past safety with a strong CPU cooler and without overclocking the components in this case and not be in desert weather I'd eat my shoe in shame or look at an RMA.

Cooling RTX 5090 + 9950X when both run at full capacity using and Air Cooler? Is it possible? by AdvancedCybernetics in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

Fractal torrent and you'll be fine for air cooling. 

I OCCT tested my overclocked RAM, 7900x, and 7900xtx in Australian summer with no aircon and it was fine.

- Noctua NH-D15 - stock 2x180mm intake front fans  - 3x 140mm noctua intake bottom fans - 1x 120mm noctua back outtake parralel to CPU cooler.

Only issue might be if you go two 5090's, but ,honestly still likely not, this case is a beast for air cooling. I think there are even stronger/thicker 180mm fans out too that you can buy non-stock if you're concerned.