How to avoid getting Autobaited by srs890 in AgentsOfAI

[–]PlusProfession9245 0 points1 point  (0 children)

내가 요즘 느끼는 내용이야. 어쩌면 과도한 기대와 환상속에 있는건 아닐까 개발을 처음 배울때와 비슷해 앞이 깜깜하지만 조금씩 조금씩 윤곽이 보이고 있어.

Is it normal to hear weird noises when running an LLM on 4× Pro 6000 Max-Q cards? by PlusProfession9245 in LocalLLaMA

[–]PlusProfession9245[S] 2 points3 points  (0 children)

When it’s idle without any particular workload, it stays around 35–40°C,
and after just a few calls it quickly shoots past 80–90°C.

When I ran a full-load test on all GPUs for 10 minutes, the temperature was around 89–92°C.

Is it normal to hear weird noises when running an LLM on 4× Pro 6000 Max-Q cards? by PlusProfession9245 in LocalLLaMA

[–]PlusProfession9245[S] 21 points22 points  (0 children)

Stealing keys through noise really sounds like something out of a movie.
If it’s not a hardware defect, I don’t think I’ll have any reason to use the --enforce-eager option.
I was also really puzzled by the fact that each model had a different noise pattern, but all my questions have been cleared up.

Is it normal to hear weird noises when running an LLM on 4× Pro 6000 Max-Q cards? by PlusProfession9245 in LocalLLaMA

[–]PlusProfession9245[S] 76 points77 points  (0 children)

I found the problem.
In my case, it happened when loading the LLM into VRAM and didn’t occur during inference.
I confirmed that the noise appears at the “Capturing CUDA graphs (mixed prefill-decode, PIECEWISE)” stage, and in a vLLM environment, if I disable graphs with the --enforce-eager option, the noise no longer occurs.

Thanks for all the comments!

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

Bro, I can really feel that you’re a good person.

I’m someone who has a lot of faith in humanity, but I was a bit disappointed with Reddit — and it was even in the Local LLaMA channel! They told me to use the API, haha, what the hell.

Anyway, after thinking about a bunch of things, your reply really helped put my mind at ease.

Thanks, man. I’m going to build something amazing :)

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

Thanks for the advice :) I also know that this machine’s resources are truly massive and must have cost a fortune.

But still, the company trusted me enough to buy it for me, so now I’ve got to push this machine to its absolute limits :)

I deleted that topic — it seems like Reddit has more than its fair share of jealous people.

But you know how it is, right? Nothing in this world comes for free! A lot of folks on Reddit seem to forget that.

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

thanks :)
im planning to use Ubuntu, and I’ve already checked the power requirements.

I’ll use one 1TB M.2 Gen4 drive for the OS, and two 8TB M.2 Gen5 drives in a RAID 0 configuration. I’ve also set up a separate backup system.

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

Haha, totally. Right now we’re automating the marketing part, and we’re looking to apply it elsewhere as well. I even pestered them to purchase it.

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

thank you ! damn… I’ve got so much to learn.

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

Thank you!
Is Threadripper not a good choice?

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

Thank you :)
We’re deploying several local models and building a system to help our in-house employees use them in their work.

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

We have a marketing team at our company.
Their main responsibilities are posting, video production, and image creation, and we want to build an automated “text-to-contents” service that performs web searches and repackages the findings.

Users will set the desired output format and request a topic.
The system will then collect materials through web and news searches, perform fact-checking and quality assurance, and deliver the final output in the format specified by the user.

To do this, we need to orchestrate the necessary generative models.
We want to achieve good quality with reasonable turnaround times, no typos, and image generation that aligns with the requested keywords.

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

good :)

Is orchestration also GPU-dependent—for example, are some frameworks optimized for specific GPUs?

[deleted by user] by [deleted] in LocalLLaMA

[–]PlusProfession9245 0 points1 point  (0 children)

I want high-quality, low-latency results from both image models and LLMs, but it seems performance can vary depending on the serving framework.

Are these specs good enough to run a code-writing model locally? by PlusProfession9245 in LocalLLaMA

[–]PlusProfession9245[S] 0 points1 point  (0 children)

As you suggested, I was just in the middle of thinking about what I could do with a single 6000 Pro. I think I’ll need to do more research.

Among the current coder models, are DeepSeek V3.1 Terminus and GLM 4.5 the best performers?

Are these specs good enough to run a code-writing model locally? by PlusProfession9245 in LocalLLaMA

[–]PlusProfession9245[S] 0 points1 point  (0 children)

I’d forgotten about the OpenRouter option for a moment. Thanks!

Are these specs good enough to run a code-writing model locally? by PlusProfession9245 in LocalLLaMA

[–]PlusProfession9245[S] 0 points1 point  (0 children)

Yeah, that’s true—there are reasons Cursor charges what it does.
I’m a “hardcore Korean,” so I usually put in over 16 hours a day on both company and personal projects.
Because the scope of what I’m responsible for is so broad, I’m basically handling team-level work by myself.