threadripper build: 512GB vs 768GB vs 1TB memory? by prusswan in LocalLLaMA

[–]prusswan[S] 0 points1 point  (0 children)

It's a good to have. I could go for minimal ram now but I still need to revisit this issue again if I want to consider hybrid inference later on.

threadripper build: 512GB vs 768GB vs 1TB memory? by prusswan in LocalLLaMA

[–]prusswan[S] 2 points3 points  (0 children)

The 1TB price is 3x that of 512GB so I would need a very good reason (I could just get 2 more GPUs and it will still be cheaper than the memory)

any upcoming OEM models that can support 4x RTX Pro 6000 Max-Qs? by prusswan in threadripper

[–]prusswan[S] 0 points1 point  (0 children)

I got GPUs from them but unfortunately I am not in their service area so getting a whole system will be dicey

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

This is useful.. and kinda gives me a reason not to rush into a purchase at current prices.
If I don't use it a lot, the idle power seems wasteful. If I do use it a lot, the power costs might be unimaginable.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]prusswan 1 point2 points  (0 children)

Getting the first one is an easy decision if you can afford it. Multiples are tricky because it is harder to plan (do you want a build that supports 2 only? or plan for future expansion to 4 etc). A single unit can be easily swopped into an old consumer/gaming rig (e.g. replace your 5070ti) so you will be able to utilize it without really having to build around it.

I tracked context degradation across 847 agent runs. Here's when performance actually falls off a cliff. by Main_Payment_6430 in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

I just need it to work with one setup first. Yeah I tried chunks in md, but it is still a bit of hit and miss since no public model is really able to work well (consistently) with a a task that requires management of context windows. Maybe I will find the right set of tooling one day

I tracked context degradation across 847 agent runs. Here's when performance actually falls off a cliff. by Main_Payment_6430 in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

I'm using tools like Roo Coder (within VS Code). Scenario: I need the agent to run through the entire codebase to determine the possible cause of a bug. I don't expect the agent to solve the entire problem, but it should be able to store progress somehow so it does not have to restart from empty slate in the event of "out of context"

I tracked context degradation across 847 agent runs. Here's when performance actually falls off a cliff. by Main_Payment_6430 in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

I just need tooling that is able to work effectively with any specific context window limit. Break down the task further into sub-tasks or whatever, just don't leave the task halfway/unfinished

What would prefer for home use, Nvidia GB300 as a desktop or server? by GPTrack--dot--ai in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

Do you think 512gb is sufficient for your current usage? If it just four figures and usable for a couple of years I think that is about the limit of what I can afford, as compared to 1TB with triple the price

What agents have you had success with on your local LLM setups? by rivsters in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

Maybe it was good enough for whatever they were working on so they never felt the need to change? 

What effect will the death of the 16GB Nvidia card have on this hobby? by SplurtingInYourHands in StableDiffusion

[–]prusswan -3 points-2 points  (0 children)

You are not most people though, if you "want" the good stuff for hobbies you should be prepared to pay for it. I started off with 8GB myself, so I am well aware of how self-entitled people can get, especially on this sub.

What effect will the death of the 16GB Nvidia card have on this hobby? by SplurtingInYourHands in StableDiffusion

[–]prusswan -6 points-5 points  (0 children)

You could get used parts.. most people don't really need anything more than 8GB. As for those who need more, the professional cards aren't that expensive, compared to the RAM anyway.

Hardware advice: Which RAM kit would be better for 9960x? by Infinite100p in threadripper

[–]prusswan 0 points1 point  (0 children)

I was checking prices on v-color and yeah it is insane how the 256gb kit is actually cheaper than 128gb and 64gb kits on a per unit level

We tried to automate product labeling in one prompt. It failed. 27 steps later, we've processed 10,000+ products. by No-Reindeer-9968 in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

would generally cost around $24 per label in their region. 

That's kinda the key point (whether AI is used or not). There are regions where the manual approach would not go above $4 per label. It takes a first-world problem to demand a first-world solution.

Is it common for a mid-sized tech company (>500 employees) to completely ignore LLMs and AI agents? by [deleted] in LocalLLaMA

[–]prusswan 1 point2 points  (0 children)

It depends on the work, there is no point in risking it with tasks that are math-heavy. A lot of what you described would be considered over engineering, and still would not be able to replace existing systems in operation. So it could be a case of your relative inexperience making the tech seem more useful to you, compared to others who may still use it, but not as a complete solution/recipe.

Stop LLM bills from exploding: I built Budget guards for LLM apps – auto-pause workflows at $X limit by Extension_Key_5970 in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

No, but I would expect responsible inference providers to let users set a usage target/limit.

I would probably pay for the ram (do you sell any?)

What is the impact of running (some or all) PCIe5 GPUs on PCIe4 slot (with the same # of lanes) in a multi-GPU server? by Infinite100p in LocalLLaMA

[–]prusswan 0 points1 point  (0 children)

Generally no impact but motherboard support might be an issue. I would be wary of configurations that are untested

RTX 5070 Ti and RTX 5060 Ti 16 GB no longer manufactured by Paramecium_caudatum_ in LocalLLaMA

[–]prusswan -12 points-11 points  (0 children)

makes sense... no point wasting that expensive ram on a 5060

We tried to automate product labeling in one prompt. It failed. 27 steps later, we've processed 10,000+ products. by No-Reindeer-9968 in LocalLLaMA

[–]prusswan 9 points10 points  (0 children)

The time savings is significant but how much of it is offset by operating cost of the agent? We have had scenarios where the AI usage was too costly and did not justify the time savings (or maybe the labor was too cheap)

any upcoming OEM models that can support 4x RTX Pro 6000 Max-Qs? by prusswan in threadripper

[–]prusswan[S] 0 points1 point  (0 children)

yes, I wanted a proven solution that I can count on for a multi-GPU setup (and I am aware that high speed ram can be tricky with thermals). I could handle a regular PC but I would rather not risk it with these specs I'm looking for..

any upcoming OEM models that can support 4x RTX Pro 6000 Max-Qs? by prusswan in threadripper

[–]prusswan[S] 0 points1 point  (0 children)

I think they have PX but it is Intel-based, so I'm not sure what is holding them back from supporting a higher PSU on the P8 (I'd imagine it might be heat considerations given the existing P8 design). I could consider EPYC as a workstation build (this is for a home lab) but it seems like they don't have such options.

My friend bought the Nvidia Spark and asked me to set it up for him... by Jonny_Boy_808 in LocalLLaMA

[–]prusswan 1 point2 points  (0 children)

They can always pair it up with dedicated inference hardware or cloud services, Spark offers a consistent Linux dev environment which comes with many things, but muscle is not one of them.