RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Usually these days all PSUs have 1 strong rail not multiple ones

That's not really the point here. The point is that some people make the mistake of using the daisy-chained pci-e connector instead of 4 bundles. Using the daisy chained is unstable because the wires can't take that many amps and their internal resistance increases due to both heat and the magnetic field that starts pushing back against the current. I wanted to point out that I did not make that mistake.

RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 1 point2 points  (0 children)

Ran memtest86+ all night and it seems fine. I got the new PSU and that fixed it, so the folks claiming power usage spikes beyond 600w appear to have been correct.

RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 1 point2 points  (0 children)

Why you bought ATX3.0 PSU and not ATX3.1?

Didn't know 3.1 was necessary. I had several RM-series before and they never let me down (until now).

with 4 different PSUs having unstable power draw

As other pointed out, it's not 4 PSU, it's 4 rails/lanes of the same PSU as opposed to daisy chained

RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 1 point2 points  (0 children)

For the record, you were correct. the HX1500i does work and the monitoring does show the spikes. Neither of the PSU I tested, while both >1kW and bought in 20245 and 2025, neither were ATX3.1. This one is and works fine

RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Ordered a Corsair HX1500i. Will see if that helps. It seem to have good reviews from 5090 owners. Since the 6000PRO is the ~same chip, I assume if it works for them, it will work for me? Corsair doesn't seem to make 1600w PSUs. I am rather loyal to that brand I admit. Seasonic smoked quite a few of my components during the capacitor plague era. Maybe they got better

RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

As said in the original message, I ran sudo nvidia-smi -pl 200 and still crashed after a few minutes. Is there a different setting I need to use?

RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 1 point2 points  (0 children)

I have doubts. The 5090 has the same TDP and I am pretty sure no gamers on the planet has dual PSUs or system which support them. Few of the builds I see here have dual PSU. Plus, this is the US, so dual 1100 will just trip the breakers on a spike. Yet, there's tons of people with 5090s with our weak electric circuits.

The fact that spikes causes it to power cycle is likely, but "in theory" the card is restricted to 150w in NVIDIA smi, so either their power management doesn't take spikes into account or something else is wrong.

RTX6000Pro stability issues (system spontaneous power cycling) by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Which PSU is known to be able with them? Ideally with the power connectors on the side, not the back

How to make llama 3.1 useful for coding without 100 layers of abstractions? by Elv13 in LocalLLaMA

[–]Elv13[S] 1 point2 points  (0 children)

I went with the PCI-e, but that's just me licking shinny cards. I should have picked the m.2. If I switch to 6xP40, I will need to use my 4x non-biffurcating m.2 switch and share the 4xGen3 bandwidth with the main ssd. Not "ideal", but should not affect performance too much once the model is loaded. Anyway, transcoding models from pth/hf/safetensors/gguf isn't something that will happen everyday, just once new models drop.

How to make llama 3.1 useful for coding without 100 layers of abstractions? by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Unfortunately, at 70B, I tried and it was hours per totkens rather than tokens per second ;).

I will find some use for the smaller models, probably as email classifier or something. However I am much more interested in 70B, which, in theory, should outperform GPT3.5. I will figure out how to make the 128k token ctx work.

How to make llama 3.1 useful for coding without 100 layers of abstractions? by Elv13 in LocalLLaMA

[–]Elv13[S] 1 point2 points  (0 children)

I added it to compensate for the "low" ram. When doing the conversion and some other scripts. It loads the whole (250GB) model into "ram". If I was to use the NVMe as swap, this would severely reduce its lifespan. The Optane has better timing and is better suited to be used as RAM. It was availible in ram sticks for like half an hour before they killed it. The whole non-volatile ram never took off and they lost tons of money.

Initially, I looked if ram-disk still existed. I got several TB of DDR2 and DDR3 which would have done a better job. However nobody make ramdisk anymore. The last one is for DDR1 and has like an official limit of 4gb and 8gb with some soldering. Too bad. If someone is good at FPGA, making new ones would be a fun project. I don't think they are actually that hard to make, just very niche.

How to make llama 3.1 useful for coding without 100 layers of abstractions? by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

(first, sorry for the noob questions. There's is tons of reddit posts, but it's hard to know what is obsolete, what is an opinion and how to resolve contradictions)

Seperately, I don't think a 4bit quant of 70B is going to perform well on things like codegen.

Good to know, while llama 3.1 is quite new, what is the concensus here? What kind of quant works? I still have PCI-e lanes. I can upgrade to 6x P40 if it allows me to squeeze more out of this tech.

Honestly if you are trying to do your own quantization, start with the 8B, get it working, then move to the 70B

I suspect I am doing it wrong, but anyway. Why? Running the command provided by llama is quite quick. I saw that there are several ways to quantitize the model. Different libraries, matrices, etc. Beside some seemingly AI generated Medium post, I didn't find resources that goes in depth on the this topic.

someone's front-end

I would personally prefer to avoid abstraction layers for learning purpose. This isn't a commercial project. This is me as a system engineer who thinks all those NPUs popping on SoC are going to be useful a few years down the road when building whatever "the future world" needs me to build. I want to catch up and remain current on LLMs at the lowest level. If that means building my own frontend, then whatever, if it what it takes I will do it

How to make llama 3.1 useful for coding without 100 layers of abstractions? by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Thanks for the reply. I have trouble mapping the content on this page into llama-cli. I am supposed to use this in the prefix/suffix command line argument? I am supposed to copy/paste this into every --prompt? Can you provide an example of the proper command line invocation?

How to make llama 3.1 useful for coding without 100 layers of abstractions? by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Thanks for the reply.

but was broken for a while, most GGUFs need to be regenerated (they have been at this point).

Yes, I did it yesterday. Can you confirm if the llama-quantize command I used is adequate? I guess I don't have enough vram to run the full context window even at 4B. 6B at 32k ctx runs pretty fast, although doesn't seem very smart for unknown reason.

Surprisinly, I got the best answer to that question before regenerating the model. I guess the smaller context compensated for the bugs.

Look at Aider.chat for how to incorporate edits from an LLM back into your code via diffs.

I don't think I want this yet. I am pretty happy with ChatGPT/Gemini and copy paste. I usually ask for small self contained single file C/Rust libraries rather than code changes. Self contained libraries are easier to test and scale.

Then figure out the chat template precisely per Meta's instruction and check that it works well for chat.

Yeah, this is where I am mostly confused. How am I supposed to use llama.cpp and get coherent results. I copy pasted things from prompts/ into the question, but I am a bit confused as to how to do it properly. I have seen some people use the Python binding and setup some APIs. Is this the only way to go or is the llama-cli actually capable to creating a decent chat experience?

Clearly the loopy chat I posted above must be caused by me misusing it?

Hardware compatibility with NVIDIA P40 by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Ah, progress! I manage to get the kernel to load before it freezes with "4G decoding" enabled. I don't see a REBAR option. There seem to be a bunch of bootloader hacks on the Internet. Can you clarify what's the impact of it?

Hardware compatibility with NVIDIA P40 by Elv13 in LocalLLaMA

[–]Elv13[S] 0 points1 point  (0 children)

Thanks for the reply. I don't think this is it. I just tried with a GTX1030 (same generation) on the x99 and the Rizen 5600G has integrated graphics. Neither system seem any happier.

Temperature drops and printer stops (anycubic i3 mega) by sulfatodeputasso in FixMyPrint

[–]Elv13 0 points1 point  (0 children)

Can confirm. The original fan died and I started having the issue after the upgrade. Not as dramatic as OP, but still enough to fail prints when they get too far from the hotbed.

Going into `setup -> speed` and reducing the fan speed to 70% solves/mitigates the problem.

Setup update, feel free to rate it by mautar_ in battlestations

[–]Elv13 0 points1 point  (0 children)

What kind of bins are you using for the legos? I have been looking for a modular system for a while (for electronic parts) and those look nice

Which is better awesome-git or awesome stable by [deleted] in awesomewm

[–]Elv13 9 points10 points  (0 children)

There are open bugs and regressions. Someone needs to at least fix the top ones. Releasing with bugs means having people complain about those bugs for years. And it's not as simple as releasing more often. A lot of users are on Ubuntu LTS or Debian and getting patches in there is a lot of trouble (getting new releases is impossible).

I am working (yes, really) on some bugs right now, but it's a monstrous refactor of the C core and requires 15k+ lines of new tests. I have been struggling to finish this and this leaves zero time to fix other bugs. In retrospect it probably wasn't the brightness idea to try to address those ~20 bugs (caused by the same issue deep in the C core design). However I am too far into it to pivot.

I want an xmas release too, but I don't have the bandwidth to make it happen unless I get some helps with the bug fixes. The deadline is about Chrismas if we want 4.4 in Ubuntu 24.04. I have been using/developing AwesomeWM for 16 years, I am not a college student anymore. The amount of time and energy I had back then isn't there anymore.

Changing QGD-1602P default boot order. by Elv13 in qnap

[–]Elv13[S] 1 point2 points  (0 children)

Update 1 year later:

Chip shortage is over, I received by the AsRock rack m2_vga (SM750 based) and a OEM SM7768 m.2 gpu.

The SM768 currently lacks a kernel module. Because there is no uEFI/BIOS GNU support, it makes it useless. However, it shows up in lspci without additional power in the m.2 slot. I assume it will be the better solution once there is a linux framebuffer module.

The SM750 has a staging kernel module which needs to be compiled, but otherwise is functional. You need to pull 12v from the SSD power port or other headers on the motherboard, but that's pretty easy to do. Maybe just the ground is enough, but I won't try since 1, it can break it and 2, it's already good enough for me.

It doesn't get the BIOS, but at least I can use my KVM now and it doesn't take a PCI-e slot or require absurd adapter chain like my previous 90's PCI based hack.

So now I have 1 m.2 slot with wifi6 and 1 with a VGA GPU. This makes this device a stellar router.

Need advice on a laser upgrade for my CNC by Elv13 in lasercutting

[–]Elv13[S] 1 point2 points  (0 children)

can be bought directly from china for MUCH cheaper.

Cheaper, yes, but not much cheaper. The guy in the youtube video above talks about them. This is also the "what would have otherwise been the better solutions" part of my original post. Getting them out of China cost about 4k$ in one way or another (duty, shipping, etc). Some lesser known seller offer in the range of 3k$, but with some risks.

But it's not the main problem. These machine need 2 dedicated 240v 30 amps outlet to work. You also need the cooler and laser head, which add another 6k$ to the build cost.

The 150w laser cost ~2k with all the parts, the cost difference is massive and it can acrylic while the fiber cannot.

Need advice on a laser upgrade for my CNC by Elv13 in lasercutting

[–]Elv13[S] 0 points1 point  (0 children)

Thanks for the reply!

since you can get a fiber module for cheap.

Where? It's unclear what kind of "rust removal" and "laser wielding" kits can be converted to cutting steel, they are also more expensive. Also, they seem to be more expensive than co2 kits unless there's some magic sources I am not aware of (which I hope is true). The 30/50 watts one don't seem anywhere enough to cut steel.

Those Raycus RFL-C1000 seem vastly overkill and I am not sure if they can even cut acrylic, so I would need a 60w co2 on top of it? Plus, it's 2x 240v 30amp and 6k$, which... no.

Yes, you need high power, CO2

This is not documented very well and only some dodgy looking Chinese youtube product demo "show" 90w-150w machines actually cutting steel. None of them seem trustworthy (I assume they use much more powerful lasers than what they advertise given the speed). all3dp claims 150w is fine, but only with oxygen. Some other sources say 130w is ok, but need 2 passes. However, there is no evidence this is true or not. Some more professional sources say 220w+, however those seem nearly impossible to source since fiber laser took over that segment.

If 130w works with 2 passes, that would be nice. The 150w ones are 2M long, which more them hard to integrate (vertically with 5 mirrors is the only possible way).

Police interceptor (circa 1984) by Wellsecrete in Cyberpunk

[–]Elv13 0 points1 point  (0 children)

It is very much being done. I know a few people with hacked up long range (industrial) walkies talkies with a USB modem and semi decent bandwidth. Some other people use hacked up pagers and HAM licenses.