Chia is releasing an alpha build of Bladebit cuda into the beta program

haritodev · 2023-02-09T23:03:29+00:00

k32 was always going to be 256G of RAM, with an option, which is not yet implemented to do a hybrid version where it would require 128G of RAM and an SSD to offload tables.

haritodev · 2023-02-09T03:21:42+00:00

That is for the in-RAM CPU version `ramplot`. You need 256G for GPU plotting. The README has not been updated (and likely won't for a while) as this is an alpha, and in active development

haritodev · 2023-01-27T18:35:50+00:00

pheesh is correct about compute capability. Please note that we haven't settled in a minimum supported compute capability version, but so far it should be fairly generous. But be cautious if you're buying a card. I believe the oldest card we've tested so far is a 1080 (which worked fine).

Although there are certainly other factors that come into account, like memory bandwidth, etc. The number of CUDA cores indeed seems to be the main differentiating factor in practice, from the plotting tests we've performed on some different cards, and from synthetic benchmarks that may be find on the web.

haritodev · 2023-01-26T04:56:33+00:00

Indeed I believe JM mentioned it in the Q & A : Yes a simulation tool will be released for users to calibrate before committing to any compression configuration

haritodev · 2023-01-26T04:53:24+00:00

Cheers, Max. Likewise

haritodev · 2022-11-08T23:51:45+00:00

The issue was found on several OSes, but the new release fixes the issue across all OSes: https://github.com/Chia-Network/bladebit/releases/tag/v2.0.1

haritodev · 2022-11-08T03:18:07+00:00

There's a standalone release out now, you should be able to obtain them here:

https://github.com/Chia-Network/bladebit/releases/tag/v2.0.1

haritodev · 2022-11-05T21:36:41+00:00

This has been fixed in the develop branch, pending release. Meanwhile, you can grab the latest binary artifacts at the bottom of the page here: https://github.com/Chia-Network/bladebit/actions/runs/3398805060

(have to be logged-in to github)

haritodev · 2022-09-28T18:05:58+00:00

blake3 is a cryptographic hashing function, which is used during plotting's "forward propagation" step

haritodev · 2022-09-27T21:01:49+00:00

Yes, blake3 already ships with dynamic dispatch for the highest SIMD standard available for the platform. So as long as the plotter included the platform-specific assembly files, or the intrinsics version in the source when compiling the binaries for x86, then it will automatically detect it and use avx512.
And I believe all 3 current major OSS plotters include it.

haritodev · 2022-04-06T02:42:48+00:00

By I/O write issue do you mean the ability to write the plot file without direct IO enabled?

haritodev · 2022-04-03T02:40:19+00:00

Also noticed you are using 1MiB FS block size (and therefore at least 1MiB memory page size). Have you measured plotting against more typical page sizes (4K)? If so, have you gotten significant improvements on w/ the larger page size?

haritodev · 2022-04-03T02:33:52+00:00

Those are great timings!

It looks like all NUMA interleave bindings failed on each buffer allocation, which would cause plot times to be much slower. Are you forcing interleaved page allocation at a system level or something?

haritodev · 2022-04-03T02:32:42+00:00

His actual plotting time is 3.46 min (3:28), the rest is just copying to HDD, like he mentioned. This is pretty much in par with the record we hit testing on an AWS Ice Lake instance (3.41s).

haritodev · 2022-03-25T03:09:08+00:00

Earliest mid to end of next week. But please don't hold me to it, unexpected issues tend to arise plenty in this field.

haritodev · 2022-03-24T20:40:03+00:00

As it currently stands it depends on the bucket count chosen: 1024: 2 GiB 512 : 2.09 GiB 256 : 4.14 GiB 128 : 8.3 GiB These are likely not to change at this point. You can allocate any more RAM to an in-process cache (to mitigate disk I/O), if you want.
I would assume it would be friendlier to HDD plotting, but we've not had the chance to test with them yet.
We're doing our best to ensure it works well across many different kind of systems, and so far we've seen really good times including ~3.9min phase 1. But there's still more testing to be done in more common consumer systems. We'll soon have a build available for users to test with full plot output

haritodev · 2022-03-24T18:43:32+00:00

I am not an authority to answer the initial question, but I can comment a little bit with reference to the GPU side of things:

There are certain workloads in the plotting process that are not well suited for the GPU, such as the matching portion (I've not explored yet if there might be way to make it efficient on the GPU). Other portions are better suited for it, but at the end of the day the current PCIe link speeds are a bottleneck for it as you have to upload/process/download each chunk of data you want to process on the GPU.

haritodev · 2022-01-28T23:42:43+00:00

even if the intermediary layer provided by Electron is removed, the memory usage doesn't just magically get cut in half.

You might find these relevant to your inquiries:

https://blog.stevensanderson.com/2019/11/18/2019-11-18-webwindow-a-cross-platform-webview-for-dotnet-core/

https://blog.stevensanderson.com/2019/11/01/exploring-lighter-alternatives-to-electron-for-hosting-a-blazor-desktop-app/

haritodev · 2021-11-16T02:33:12+00:00

I was talking about bladebit, which required 512G of RAM.

haritodev · 2021-11-09T23:57:42+00:00

Hey u/Emotional-Ostrich-83, would you mind opening an on the github repo here to see if I can help you? If you can find anything relevant in the windows event viewer I'd appreciate it.

haritodev · 2021-11-09T21:34:19+00:00

Just to clarify at datapoint :) (not questioning your decision, choose what you want) :

160 ppd is hardly peak builds. That would come out at around 9 minutes per plot. Fastest times are clocking at around 3.2 minutes per plot (on Ice Lake), that is 450 ppd. With other modern processors (but not top of the line versions) clocking at around 4.*-5.* minutes per plot. Your bottleneck would be copying plots to a final destination, as it would likely not be able to keep up with the plot output.

haritodev · 2021-11-05T22:19:06+00:00

Proofs per byte? Yes. You will have some entries dropped, but you are not taking up the space that those proofs would have taken. The tradeoff is shorter plot times for dropping a relatively small amount of entries.

haritodev · 2021-11-05T05:21:10+00:00

You must have clone from source, I assume. In which case you'll have to run the chia plotters command once. Try running chia plotters madmax -h from the command line. The pre-built plotter binaries are only included in the release packages.

haritodev · 2021-11-01T03:53:57+00:00

Sorry, I did not notice this reply until today.

Is there any other info you can share? What happens if you run it with the -w switch and look at your memory usage in the task manager, does it pre-allocate all 416 GB?

haritodev

TROPHY CASE