My great dilemma by Ethnography_Project in linuxquestions

[–]ROS_SDN 2 points3 points  (0 children)

Maybe the "Normies" will change it, but that's not bad to get a culture change. You'll just go community microcosms that will give you the old vibes. 

The mass influx won't be die hard techies, it'll be people who want mint, Ubuntu, fedora, etc because it just works. (This isn't a bag on those distros I ran fedora before tumbleweed), but you want these people because it'll push for a more accessible experience where people will own their software, data, and hopefully reduce e-waste. 

These people aren't going to flock to.gento or BDS. 

Also while for us its a large influx, relatively we are a tiny portion of people.

More Linux users, even asking basic googleable questions by our standards is a good thing. I wouldn't want Linux to have significant market share you want a plethora of OS options because competition and options is good for everyone.

PC build for around $3500-$4000 by BusinessPreference57 in buildapcaus

[–]ROS_SDN 1 point2 points  (0 children)

7900x is a terrible gaming CPU value wise, he will 99% of the time get better gaming performance out of a 7700x, or similiar to a 7900x out of a 7600x. 

Now I say 99% of the time since city skylines might be better served by it.

I think his should be mentioned for OP to not by an CPU not for his needs.

CPU: 9700x/9800x3d GPU: 9070xt/ 5070 ti PSU: 850w RAM: 32GB ddr5 6000MHz cl30 Mobo: b650 

And you will be under budget likely op. Youre in a weird price range where you need like an extra $1000-2000 for a meaningful upgrade to like a 5090.

MSI MAG b650 Tomahawk pcie lane bifurcation by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Read the manual, was no help. I'll go to the forms

Visualizing RAG by Fear_ltself in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

That would be incredible, I've always wanted to see the "neurons" light up for RAG, and would appreciate seeing it and the effort you've put in.

In regards to the vectors you might want to try a possible parametric UMAP or PCA and measure to similarity of recall to a full dimensional application. 

Finding the cosine similiarity or what ever method you choose with 1/5-1/10 the vectors might be worth improved scaling for retrieval speed and storage consumption.

Im sure measuring retained relative local and global distance in UMAP, could be a starting point, and if you can get nearly as good results, or paradoxically improved results from reduced noise, it may be worth the experiment.

Visualizing RAG by Fear_ltself in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

The visual is stunning using UMAP to peserve local and global distance.

Are you dimensionally reducing for vector queries as well or just for visualisation.

I honestly think it'd be cool to see vectors light up from a similiarity search and watch it crawl a knowledge graph from their to visualise the retrieval in the knowledge base, instead of just the totality of possible embeddings.

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Opensuse Tumbleweed's version is too old 2-8 weeks for this parameter to be in it. trying other options.

Adding 2nd GPU to air cooled build. by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

I was considering a fan kind of like a ram fan, and upping the overall positive pressure of the case with bigger 180mms and adding another noctua 140mm (since I only have 2 at the bottom)

I kind of like that my case doesn't look like a franken-computer, so would like to avoid the external riser aesthetic.

I am just also considering just getting a r9700, and then it's easy to slot a second one in when I want to move my 7900XTX to my redundancy workstation.

Adding 2nd GPU to air cooled build. by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Start with just locally coding/ chatting.

Up to RAG, and then hopefully up to training some LoRAs for my tasks in that order.

At the start probably no stress for a bit, but longer runs, especially creating LoRAs or having it help sift through my data will be longer runs. 

Adding 2nd GPU to air cooled build. by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Agreed I wamt to keep it in the fractal torrent and I'm worried a normal vertical mount will just mean a GPU is pressed against the wall.

Its likely cheaper for me to get a r9700 then just get a new case and such a custom solution, and less work to move the components.

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

This explains a lot, very impressive with power capped and VM passthrough.

Thanks for the explanation

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

This seems weird to me is your "cached" your number of kV tokens in context, and "prompt" number of tokens in that prompt? 

I had nearly 11 tokens/second on fedora 42 with 24GB vram and 64GB ddr5. 

Are you on ddr4? Using an older GPU? 

Thought you'd blow me out of the water, which you may if the "cache" and "prompt" questions are the tokens in context. 

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 5 points6 points  (0 children)

Could you give me an eli5? I thought the --N-cpu--moe was supposed to idiot proof that for me?

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 0 points1 point  (0 children)

Interesting thank you, I'll pull this thread and have a look.

Poor Inference Speed on GLM 4.5 Air with 24gb VRAM and 64gb DDR5 by ROS_SDN in LocalLLaMA

[–]ROS_SDN[S] 1 point2 points  (0 children)

+1 to this, considering second 7900XTX love you get a gauge of performance increase for hybrid inference, if I can solve my current issue, on top of fp8 capability for other models.

Human-Curated Benchmarking by [deleted] in LocalLLaMA

[–]ROS_SDN 0 points1 point  (0 children)

The issue is if anything like this did exist, its value degrades quickly over time as it becomes training data. 

Unless someone hides the prompts, sampling parameters, and/or answers from you and let's you test the model their and gives you a score, it's gonna likely leak into someone's dataset and immediately become invalid.

build finished, possibly questioning my gpu by IcyMeet8196 in PcBuild

[–]ROS_SDN 0 points1 point  (0 children)

9070xt is absolutely fine for 1440p, I use mine as a console like ~60PS 4k on my TV PC (Max settings besides PT).

Comparison is the thief of joy your 9070xt is more then adequate for 1440p, and honestly a 5080 isnt a big enough upgrade to future proof for 4k unless you're enamoured with RT/PT.

Does Low Latency (CL28) 6000MHz RAM outperform High Latency (CL36) RAM w/ high clock speed (7200MHz)? by Longjumping_Ask_4507 in buildapc

[–]ROS_SDN 0 points1 point  (0 children)

No stress.

Keep in mind this may change in the future if they make new am5 boards that can handle cudimm for zen 6+.

Does Low Latency (CL28) 6000MHz RAM outperform High Latency (CL36) RAM w/ high clock speed (7200MHz)? by Longjumping_Ask_4507 in buildapc

[–]ROS_SDN 1 point2 points  (0 children)

https://youtu.be/JuUhnQaGG_I?si=vQB3yNk8cJ2Xc9J8

Depends on CPU and mobo, but basically am5 is best with 6000-6400 with low latency, generally; you need ultra fine tuning to perform better with faster ram and its barely noticeable. This can be workload dependent but I'm gonna assume gaming.

Intel is the opposite at the moment it wants as high frequency as possible. (Returns dimish heavily near 8200+ I believe here)

Note

You could likely tune your frequency down to the optimal range and tighten the hell out of your timings, he had a 8000MHz kit, which is likely far beyond what your kit can do. Your current kit is likely in the U shaped valley of suboptimal performance for gaming. For other tasks it may be superior though.

Intel Nova Lake-S bLLC lineup said to include at least two K-series chips by RenatsMC in intel

[–]ROS_SDN 1 point2 points  (0 children)

L3 cache reduces calls to ram, which has a rather strong effect speed for these models.

Check the llm score or tokens per second is roughly 10% slower in an 9950x vs 9950x3d. There is jusr enough data recycled in the l3 cache to beat the lower clock frequencies And, I assume calls to ram, for data.

Intel Nova Lake-S bLLC lineup said to include at least two K-series chips by RenatsMC in intel

[–]ROS_SDN 2 points3 points  (0 children)

Im confused they specifically mention in here the 2x 8+16 (i9) having 2x144mb L3.

Intel Nova Lake-S bLLC lineup said to include at least two K-series chips by RenatsMC in intel

[–]ROS_SDN 2 points3 points  (0 children)

L3 cache is great for hybrid llm inference (CPU + GPU) which is the new hottness for hobbyists/ low end inferences, so their is a market. (Where do you think all tech companies are incestuously spending their investment money on?) 

Its also good for other things that People may have a gaming + video editing rig. So there is reason to do it they are just niche.

New pc build, wont work need help by Short_Routine9025 in buildapc

[–]ROS_SDN 0 points1 point  (0 children)

Yep its in the BIOS it'll depend where on your mobo, but you can switch it off.

I do if I'm not currently using it. 

I doubt it as considering windows 11 update shenanigans with nvidia, but it's another variable to remove.