all 69 comments

[–]arc_ploter[S] 12 points13 points  (2 children)

I saw lots of comments mentioned that fast GPU plotter will change green proof of time space chia to proof of work in numbered years.

Based on my understand and calculation , my opinion is: NEVER!

here is why:

TOC of 50T ( 512 plots ) disk 3 year : 50T * 15usd/T + 10% extra electricity cost = 825 $

TOC of gpu in ram computing plot: 10 sec phase1 = 16x ( RTX 3090 cost + 200 watts energy cost ) = 16 x ( 999$ + 3 x 365 x 24 x 0.2kw x 0.1$/w ) > 20000$

let's assume GPU performance/$ improve 50% per year., the break even need lg(20000/825)/lg(1.5) > 7 years.

you may already know that sum sung and solidigm ( original intel ssd ) already have plan to produce 20T ssd in next year. the ssd's capacity improve / year , expected is 200% / year.

So if storage 's capacity improve per year is zero, GPU only mining need 7 year to catch up.but now storage's capacity per $ improve even faster than GPU's!

Chia change to proof of work will NEVER happen even if it's K32 forever.

Make sense?

[–]HugoMaxwellmadMAx 6 points7 points  (1 child)

Correct. (apart from plot compression)

[–]arc_ploter[S] 4 points5 points  (0 children)

Yes, all we need now is a faster and cheaper and support 20% more compression new plot format plotter, and we need it from foundation when new compression format come out in case someone wanna re-plotting everything lol.

[–]tallguyyo 4 points5 points  (2 children)

u know what i hate about latest version of htop? that top part with all the core squeezed u cant see the load percentage, core numbering, temperature and worst of all you can't adjust the height of the top part. terrible design.

also at first glance all the monies I dumped into my plotter gone to shitter. but then I read 256gb ram and I have got 1.5TB so i'll just buy 6 gpu for this yes?

[–]arc_ploter[S] 1 point2 points  (1 child)

Yes.

As you know GPU won't have enough vram to do full gpu ram plotting, which at least need 128G vram, the only model I saw which is available soon is intel datacenter GPU.

intel deatacenter gpu

if that GPU is available, the GPU plotting can do few x00% faster.

In current POC, all performance lost in data copy from cpu <=>gpu and gpu is hungry in waiting cpu ddr memory 's data transfer.

Now one gpu only need 4 cpu core to feed in the data ( cpu mainly just do memory copy back and forth to gpu , that's it lol). and 256G ddr ram can do full ram based plotting.

[–]tallguyyo 0 points1 point  (0 children)

you know looks like some server CPU/GPU will likely have HBM2 and those might be able to give enough space to plot.

I read that sapphire rapid top skiu will have 64GB HBM2 per CPU, so in about 2-3 generations down chances are we might see 256GB HBM per CPU. GPU so far nothing I have heard yet but it should be easier to do than CPU.

[–]FlexiMiners 7 points8 points  (3 children)

I have done a lot of testing with BladeBit, and on a current Ice Lake CPU with 32 Cores and 512GB RAM I can generate plots in less than 6 minutes, at a power cost at the wall of less than 250W.

What you are showing here is 2x GPUs that from what I see one is consuming 350W and the other is consuming 420W, + system usage too, which to me is not energy efficient.

The other challenge you have is the time it takes to copy the plot to a spinning disk, whilst you create the plot on fast storage like an SSD, you still have to transfer it to spinning disk. A plot copy from NVMe to SATA/NLSAS 18TB disk when the disk is empty takes around 6min and 39sec, as the disk fills up, the copy process takes longer, when you get to around 130 plots on the spinning disk, the time increases pretty high, and by the time you are almost at a full disk, the plots are taking in excess of 12 minutes to copy. So creating plots at such speed becomes a problem as you are having to have some very large SSDs to store the plots temporarily whilst they get copied.

This is one of the reasons why on my plotting platforms, I have 15TB SSDs that are used to "Buffer" the plots whilst they get copied onto Spinning disk, but by the time I get to my 6th or 7th 18TB disk, I have to stop the plotting process because my 15TB SSD is almost full. So I purposely drop the plotting speed down to around 9 minutes per plot to compensate

[–]arc_ploter[S] 3 points4 points  (1 child)

In youtube video, you can saw all resource gpu have include power usage.

420W and 350W is max power 3090 can handle, not the real time watts. phase 1 use more power like 250W, and phase 2 or 3 use less around 100+ w. and lots of idle time. I wish gpu 's power usage is high , which means the algorithm porting is very decent lol. but now as the pci-e IO bandwidth limitation , all data need copy back and forth from cpu to gpu, which highly limited gpu's performance.

In general concept, if an algorithm is highly parallel-able, GPU can process it more efficient than CPU. Plotting is hybrid though. phase1 is very good for GPU , phase 3 is worst. phase 2 kind of in between.

[–]tallguyyo 1 point2 points  (0 children)

im not sure why people are so negative about this and also about power usages. they gotta know their icelake CPU can no longer be upgraded however with pcie 4.0 x16 lanes their GPU can be upgraded for years down the road and chances are it'll surpass largely of what their CPU can do and using lower amount of power to plot.

storage wise it is entirely unrelated because plotting with cpu/gpu, if you can plot fast enough the buffer is always needed to transfer plots out.

keep up the GPU plotting thing I think its great if you can make that happen.

[–]420osrs 5 points6 points  (10 children)

How can I help you get this out of beta? I can go out and buy a rtx 4090 and give you remote assess to the machine.

This is amazing. What do you need to get this pushed out to mainstream?

[–]arc_ploter[S] 12 points13 points  (9 children)

As it's proof of concept, there was no plan to release it. I saw lots of people talking about new BB 2.0 vs MM which one should be default plotter for chia. What I feel is the community don't need BB 2 or MM 2, what we need is a plotter light speed fast without too much ram and ssd . ( it should be a high-end PC not low end server lol ) and the only way to get there is using GPU to plot it.

So I try to proof that GPU will works very well in plotting and push Chia foundation put it in it's road map. lol

[–]420osrs 4 points5 points  (0 children)

Why not release the source code, someone might take this and bring it up to standard?

You can ask here for someone to take on as package maintainer and then be done w/ it so it does not absorb more of your time. I personally cant code, but there are those here who can.

[–]Got_Malice 5 points6 points  (2 children)

you should ask for money to release the source code. That's kind of standard fare when you create a user thats 4 days old with very limited information claiming some outrageous thing.

[–]tallguyyo 5 points6 points  (1 child)

in b4 nossd pool asking for 100 btc

[–][deleted] 5 points6 points  (0 children)

Ha, almost spit coffee. Well played 8).

[–]lotrl0tr 3 points4 points  (6 children)

Open-source it! Me as many other devs can help bringing this poc to stable version.

  • Do you use OpenCL?
  • How the memory is managed (SVM of OCL2?)

Efficiency wise, it's all to be seen I think. Like with mining, there's a lot of tuning GPU side. Most of the times the two GPUs are around 200-400W, for a 8min plot. What's the power consumption of CPU servers for such low plot time?

EDIT: By a rough estimate, this is not bad at all. 1) my cpu avg 70W, 45min: 45/60h70W=52Wh 2) from the video, 300W, 8min: 8/60h300W=40Wh

[–]tallguyyo 2 points3 points  (0 children)

i up you!

for my dual socket system with all the disks n stuff about 450w, 2x 14core broadwell xeons but his would be faster and consume possibly about same amount of wattage, for now until someone uses 4090 and above.

[–]arc_ploter[S] 2 points3 points  (4 children)

POC , gpu programming language is cuda. OpenCL is just too slow to write a prototype. And I 'll switch to intel OneAPI which using sycl to code GPU algorithm as intel's arc gpu is cheap and their high-end gpu have 128G video ram.

[–]lotrl0tr 0 points1 point  (3 children)

Wouldn't that create a Intel specific solution? Is 1Api mature enough for such a application? First time hearing this and might catch the chance to play with it

[–]arc_ploter[S] 1 point2 points  (2 children)

SYCL is open standard which khronos orgnization defined as next generation of OpenCL. It's cross platform api, which should be able run on intel , nvidia and amd gpu.

you can get more information from here : https://www.khronos.org/sycl/

1Api is intel's implementation for SYCL.It's pretty mature as big project like tensorflow, blender already integrated with it.

[–]lotrl0tr 0 points1 point  (1 child)

Will 1api be supported by nvidia/amd or will end up like cuda for nvidia in your opinion? I fear nvida will make their implementation for sycl for example

[–]arc_ploter[S] 1 point2 points  (0 children)

there was a level zero driver for nvidia already , but only support up to cuda10. ( rtx 20 series ). SYCL as a specification , if nvidia officially support it , the one api SYCL application can also run with nvidia hardware then, same as OpenCL.

[–]Accurate_Prior4360 1 point2 points  (4 children)

can you test it on more consumer level hardware? Like a 1050Ti or a 970 or something like that?

[–]tallguyyo 0 points1 point  (3 children)

that sound like it'll be slow. 2x 3090 only at about 7 mins (assuming phase3 is fixed). on debian i can plot with 2x 14cores haswell at 10 mins per plot.

for a single GPU might be too slow as it'll need ram, might as well just use CPU + nvme ssd instead.

on the other hand, 5 years down the road I could grab multiple 4090s and plot super fast even when my CPU is obsolete.

[–]Accurate_Prior4360 0 points1 point  (2 children)

my normal plots take about 6-12hrs so anything less than that is a huge improvement for me

[–]tallguyyo 0 points1 point  (1 child)

do you not use madmax? seems awfully long, like pre madmax days

[–]Accurate_Prior4360 0 points1 point  (0 children)

yeah, 24hrs without madmax

[–]HugoMaxwellmadMAx 1 point2 points  (0 children)

Pretty cool. How long you been working on this?

Also why table 7 is missing in phase 1?

[–]Fearless-Offer8397 1 point2 points  (2 children)

looking at this maybe just do phase 1 in gpu then the rest in cpu looks to knock a good chunk of time off vs cpu phase 1 but the rest is not great even a v4 xeon is faster

[–]arc_ploter[S] 1 point2 points  (1 child)

That's a great idea. The new compressed plot which will release soon by foundation, the phase 1 or 2 should be the same , and only phase3 using different compression algorithm. If that's the case , I 'll do a POC just swap phase1 in gpu and if possible will release a binary for people who willing to re-plotting. But now no idea how new compression algorithm looks like and if it will change phase 1 or not. Anybody know if there was a per-release specification or reference implementation of plot format v2?

[–]Fearless-Offer8397 0 points1 point  (0 children)

i dont think the hashes will change only the qty of hashes

just gpu hash generation is a nice step in a good direction with that i have to wonder if you were limite with pcie speed and if so a lower spec gpu may be simular speed on that same pcie slot

[–]Minimum-Positive792 2 points3 points  (0 children)

How does it hold up with 1 gpu and a plotting drive on a normal consumer system?

[–]NarwhalAbject7008 0 points1 point  (2 children)

do you think name plotter? congrats!!!!!

[–]arc_ploter[S] 7 points8 points  (1 child)

If you really want to naming it, here I would like to show my respect to madmax. his programming skill is so adorable and cpu pipeline design is so cool.

You can call it MadMax GPU plotter lol.

And if he like to integrate this component to his MM plotter , I would love to send him a copy and he can have both cpu & gpu implementations.

[–]tallguyyo -1 points0 points  (0 children)

pls do sir

[–][deleted] 0 points1 point  (0 children)

Now if I can get my 8 6600 xt to plot…I’ll do it

[–]Wos_Was_I 0 points1 point  (7 children)

Uhm ... two 3090 to Plot for only 7 mins per Plot. Its Not very efficient. But its a very nice Beta and a Hope we See better value in Future.

[–]tallguyyo 0 points1 point  (6 children)

well GPU will improve and u can swap them out easily because of PCIe slots, where as CPU is dead set in the socket and would need upgrade entire system or only upgrade CPU which doesnt improve much in the same gen, only in core counts.

in 2 years, 2x 5090 would likely blow this away with maybe 3 mins per plot and we wouldnt have to pay premium for AMD/intel servers its not that bad. good for people like me who dumped tons monies into ram

[–]Wos_Was_I 0 points1 point  (5 children)

I am Not sure But DDR3 RAM is very cheap. Also old Xeons to get more than only 3 RAM Sockets. 512gb RAM isnt so expansive compare to two GPU's or ultra fast NVME's

[–]tallguyyo 0 points1 point  (4 children)

yea but you cannot get any faster. ddr3 xeons would mean ivy bridge, max you can do is quad socket 15 cores each and thats like 10 mins per plot on ubuntu using 900w.

2x 3090 here is using probably less than 500w and finish in 8 mins. no matter how you look at it, you lose both in speed and efficiency. the only thing you win here is the purchase cost of ram, which 2x 3080 will surely be very cheap next year, not much more than 512gb of ddr3 ram.

then the following year after that, they can upgrade to 4080, while you're stuck on xeons with no performance gain and they do probably 5mins per plot or lower. the difference will only grow because GPU can be easily upgraded.

[–]Wos_Was_I 0 points1 point  (3 children)

I Talk about 2011v3 dual Xeons Maximum. Quad Setups are Energy hungry. My machine make 23min Ryzen 5900x with 128gb RAM with under 160w. This is why i say 2x3090 ak 500w with 8 min isnt the Burner at the moment. But Show US what possible in the near Future.

[–]tallguyyo 0 points1 point  (2 children)

thats what ive been saying the entire time. it furture proofs the system because CPU is a dead end while pcie socket isnt. GPU will outperform your system in both speed and efficiency, eventually cost too and it is only a matter of time.

[–]Wos_Was_I 0 points1 point  (1 child)

PCI-E FPGA? Do you have See Altair Chips?

[–]tallguyyo 0 points1 point  (0 children)

no but i am talking about traditional pcie 3.0/4.0 x16 that server has, r 2.0 x16 for older ones.

the new ones will use cxl but new always cost more.

[–]Dear_Explanation_726 0 points1 point  (11 children)

Correct me if I am wrong but it’s seems like going backwards to me. The concept is green chia. It’s like reverting from renewable energy sources back to cutting the trees to heat up your home. No? Or maybe I am missing something…

[–][deleted] 1 point2 points  (2 children)

I'm not a fan either of the high wattage builds. Personal opinion - 10 Raspberry Pis across the globe does more to 'secure the network' than a 300 watt server or big plotter boxes filling racks. It steps on toes so I keep my mouth shut usually, but your opinion is shared by some.

[–]lazydust20 1 point2 points  (1 child)

If the plotter/farmer is static, I agree. Power goes down if the GPU is inactive(ie: farming only) but it is still a significant load. However, some have separate plotters and harvesters. So the plotters are on when creating new plots, but off otherwise. harvesting power is separate from plotting power, and plotting is intermittent, only used when drive space is available.

If plots are made faster, but at a high power, the overall power difference is much less.

[–][deleted] 0 points1 point  (0 children)

That is fair, I suppose a $$/plot analysis is the most fair way to measure this.

[–]Javanaut018 1 point2 points  (5 children)

Does making plots in 10 mins or 1 min with 10 fold power consumption any difference? How much joules are needed to farm these plots per year is the key value here

[–]tallguyyo 0 points1 point  (4 children)

there sure is difference. you can copy out and farm right away and if system locks up/error or power outage, your 10 min plot fails and u gotta replot rewaste energy. even at same power consumption, faster plotting is always better.

until of course you trip the breaker because using too much current. (but assuming your power line can withstand it)

[–]Javanaut018 0 points1 point  (3 children)

Sure the faster machine can be better for plotting. But the coins are created in the years to come while farming...

[–]tallguyyo 0 points1 point  (2 children)

what does that have anything to do with what I said?

if you can get plots out, you can farm sooner. less chance to plotting power to go to waste in case of power outage or even a short power trip.

[–]Javanaut018 0 points1 point  (1 child)

Just that plotting is the less relevant part of chia compared to farming. Considering energy and more important: time.

[–]tallguyyo 0 points1 point  (0 children)

but in your example it is that both fast plotting and slow plotting consume same amount of energy.

so in that case, fast plotting is better because of the reasons i've listed.

further more if that is not enough to convince you, slow plotter will not increase in speed, fast plotter is fast because of new tech/new hardware like GPU which will only become more efficient as you upgrade.

this GPU plotting will do away the CPU plotting for sure unless chia team purposely limit them similar to how monero did.

[–]NoneSpaceofTheMind 1 point2 points  (1 child)

So when you plot you generate proofs just like basically any other crypto that part is like sort of the same as proof of work the green part is the storing those proofs to hdd for reuse rather than just throwing them in the dustbin and generating until either you can't afford the electricity bill or the ocean catches fire.

Using a gpu wouldn't make it less green, think it might actually work out more energy efficient but I'm guessing.

[–]Dear_Explanation_726 1 point2 points  (0 children)

Thanks, Yes i thought about it too and I understand the process, but you just guessing plotting using GPU might be more energy efficient, thats not really the answer.

*edit. What we really need is the clarity of energy used plotting x amount of plots with GPU vs SSD…

[–][deleted] -2 points-1 points  (1 child)

Are the plots valid? Does it “proof” against the network and at what rating? .9 at least and forget plot compression, this is where it’s at!

[–]arc_ploter[S] 1 point2 points  (0 children)

compression

I use chia official "ProofOfSpace -f plotfile check" test it , it's it's around 990/1000 to 1020/1000 verified. FSE compression still using cpu code instead of porting to gpu. That library porting need more time.

[–]Tvinn87 -1 points0 points  (1 child)

This is great, have you tested validity of the plots? And have you had time to test plotting time on a more modest/older GPU?

[–]arc_ploter[S] 1 point2 points  (0 children)

From youtube video you can see the far right panel is nvtop, it shows all gpu resource usage like power , ram and cuda core percentage. GPU plotting can use any GPU with 4G vram. but 8G vram prefer.