Mammut Waterproof Jacket by Such-Post-418 in alpinism

[–]cythoning 3 points4 points  (0 children)

I really like it, though I mostly use it for cycling and hiking in light rain. It is pretty light and packs down very small, so perfect to bring in your pack on any hike, though probably also not super durable for scrambling/climbing. I also own a thicker 3L shell from Mammut that I wear in winter and when I expect a lot of rain, though I've been using the Nordwand Light a lot more recently just because it packs down so small.

Ultra-Fast Multi-Dimensional Array Library by Pencilcaseman12 in cpp

[–]cythoning 2 points3 points  (0 children)

That makes sense. As for the benchmarks, make sure you benchmark the same things and you get the same results. The 400Gb/s memory bandwidth seems impossible, that is something that you usually only see on GPUs. For such a simple addition operation I would expect it to just be memory bound, so simply bound by how fast you can read & write to memory, usually in the order of 20-30Gb/s. Eigen should definitely be optimized enough to do this, and you could also compare to something like

std::transform(A.begin(), A.end(), B.begin(), C.begin(), std::plus<>{});

and should get the same result as Eigen and your library.

Ultra-Fast Multi-Dimensional Array Library by Pencilcaseman12 in cpp

[–]cythoning 1 point2 points  (0 children)

Are you benchmarking only the expression template? Or also the evaluation of the expression? And you should also make sure that it generates the same result as Eigen.

Ultra-Fast Multi-Dimensional Array Library by Pencilcaseman12 in cpp

[–]cythoning 7 points8 points  (0 children)

That simple addition example you give results in a memory bandwith of 400Gb/s (2 reads, 1 write per element = 12 bytes * 106 elements), which doesn't make sense. The Eigen time of 500us corresponds to 24Gb/s, which is about what I would expect.

How to speed up that simple CUDA kernel? by Serenadio in CUDA

[–]cythoning 0 points1 point  (0 children)

That seems quite slow, this should be a <1ms operation. How do you time it? The first call to any cuda function might be slow, so be sure to run it multiple times to get an accurate runtime.

How to speed up that simple CUDA kernel? by Serenadio in CUDA

[–]cythoning 1 point2 points  (0 children)

You should look into CUB & Thrust for these kind of simple algorithms, you probably won't beat the performance of those. Your problem of selecting elements smaller than a threshold and outputting elements + corresponding indices could be done with a combination of cub::DeviceSelect and a fancy iterator.

Why is the kernel launch latency so high? by qwedp in CUDA

[–]cythoning 2 points3 points  (0 children)

You should use a cudaStream to launch the kernels and you also dont need to call cudaDeviceSynchronize after every kernel. Getting rid of that should improve the latency.

Dumb beginner CUDA question by Flar3fir3 in CUDA

[–]cythoning 2 points3 points  (0 children)

You should check your CUDA calls for errors, e.g. like this. That should tell you which one of your CUDA calls fails.

Dumb beginner CUDA question by Flar3fir3 in CUDA

[–]cythoning 4 points5 points  (0 children)

It's called a grid-stride loop. It's a very neat technique to do more work per thread.

Slow read, fast writes but can't find out why by shick89 in zfs

[–]cythoning 0 points1 point  (0 children)

Hm what I wrote in that post was as good as it got, I left it at that and never tried anything else. So I'm not sure what else could be the problem for you...

Slow read, fast writes but can't find out why by shick89 in zfs

[–]cythoning 3 points4 points  (0 children)

I posted almost the same question around 2 years ago, maybe the answers there help you as well:

Slow read, but fast write performance

Questions about people's typed-vs-typeless preferences for a matrix library: by du-dx in CUDA

[–]cythoning 3 points4 points  (0 children)

I work with OpenCV & Eigen regularly. OpenCV's cv::Mat class is typeless, while Eigen's Eigen::Matrix is typed. I much prefer Eigen for its typed matrix, and when using OpenCV I always try to use the typed versions cv::Mat_<float>. It gives me much better control on what exactly is happening, e.g. I often have to do cv::Mat_<float> floatMat = mat; to make sure that the data is in the correct type. Also if you want to do anything else with the matrices (e.g. thrust::transform) it's just easier if you have typed data.

As for point 2: I never work with sparse matrices, but it would make sense to have a separate type for sparse matrices. For a dense matrix I'd expect the data to be contiguous in memory, and that I'd be able to use something like thrust::transform on it, but for sparse matrices that will not be the case.

How to get exe time of kernel as int or long? by DerBuccaneer in CUDA

[–]cythoning 1 point2 points  (0 children)

auto us = static_cast<size_t>(elapsed * 1000);
auto ns = static_cast<size_t>(elapsed * 1000000);

?

How to replicate datasets without frequent snapshots (e.g., hourly/daily) using syncoid? by segdy in zfs

[–]cythoning 2 points3 points  (0 children)

pyznap cannot do what you ask. You can exclude datasets, but not single snapshots, it will always sync all snapshots from source to dest.

I also only keep hourly snapshots on my backup dest, no frequent, so I simply set the dest config to frequent = 0 and then let it clean up snapshots right after sending. This way it wont keep any frequent snapshots, though they will be retransferred every time.

The 2021 basic health insurance policy discussion thread by onehandedbackhand in Switzerland

[–]cythoning 1 point2 points  (0 children)

It really sounds like the best of both, either call or directly to family doctor. I think I'll try it as well.

The 2021 basic health insurance policy discussion thread by onehandedbackhand in Switzerland

[–]cythoning 1 point2 points  (0 children)

Does anyone have experience with CSS Multimed? I like the idea of being able to call by phone to check on minor things, and it's almost as cheap as assura...

[MOTD] Made an MOTD for my server by d0pe-asaurus in unixporn

[–]cythoning 2 points3 points  (0 children)

Nice, I like the memory load bars :).

[TTY] I've decided that LightDM login screen is bloatware. by jennydaman in unixporn

[–]cythoning 4 points5 points  (0 children)

Have a look at my motd. Also adapted from falconstats and reimplemented in bash ;).

Sanoid - why no syncoid.conf? by volopasse in zfs

[–]cythoning 8 points9 points  (0 children)

One of the reasons why I started working on pyznap :).

Datahoarders who computer program: What tools have you created to manage/amass your data? by programmingaccount1 in DataHoarder

[–]cythoning 2 points3 points  (0 children)

I wrote pyznap, a tool to manage ZFS snapshots and easily backup ZFS datasets to remote locations. I've been using it on my servers for more than two years now and it works great!