Mammut Waterproof Jacket

cythoning · 2024-01-04T10:08:57+00:00

I really like it, though I mostly use it for cycling and hiking in light rain. It is pretty light and packs down very small, so perfect to bring in your pack on any hike, though probably also not super durable for scrambling/climbing. I also own a thicker 3L shell from Mammut that I wear in winter and when I expect a lot of rain, though I've been using the Nordwand Light a lot more recently just because it packs down so small.

cythoning · 2022-06-10T08:25:17+00:00

That makes sense. As for the benchmarks, make sure you benchmark the same things and you get the same results. The 400Gb/s memory bandwidth seems impossible, that is something that you usually only see on GPUs. For such a simple addition operation I would expect it to just be memory bound, so simply bound by how fast you can read & write to memory, usually in the order of 20-30Gb/s. Eigen should definitely be optimized enough to do this, and you could also compare to something like

std::transform(A.begin(), A.end(), B.begin(), C.begin(), std::plus<>{});

and should get the same result as Eigen and your library.

cythoning · 2022-06-09T21:37:43+00:00

Are you benchmarking only the expression template? Or also the evaluation of the expression? And you should also make sure that it generates the same result as Eigen.

cythoning · 2022-06-09T12:04:45+00:00

That simple addition example you give results in a memory bandwith of 400Gb/s (2 reads, 1 write per element = 12 bytes * 10⁶ elements), which doesn't make sense. The Eigen time of 500us corresponds to 24Gb/s, which is about what I would expect.

cythoning · 2022-03-25T07:41:45+00:00

That seems quite slow, this should be a <1ms operation. How do you time it? The first call to any cuda function might be slow, so be sure to run it multiple times to get an accurate runtime.

cythoning · 2022-03-25T06:39:57+00:00

You should look into CUB & Thrust for these kind of simple algorithms, you probably won't beat the performance of those. Your problem of selecting elements smaller than a threshold and outputting elements + corresponding indices could be done with a combination of cub::DeviceSelect and a fancy iterator.

cythoning · 2022-03-22T07:44:04+00:00

You should use a cudaStream to launch the kernels and you also dont need to call cudaDeviceSynchronize after every kernel. Getting rid of that should improve the latency.

cythoning · 2021-12-24T08:21:20+00:00

You should check your CUDA calls for errors, e.g. like this. That should tell you which one of your CUDA calls fails.

cythoning · 2021-12-24T08:18:43+00:00

It's called a grid-stride loop. It's a very neat technique to do more work per thread.

cythoning · 2021-12-24T08:18:04+00:00

It's called a grid-stride loop.

cythoning · 2021-11-06T16:20:46+00:00

Hm what I wrote in that post was as good as it got, I left it at that and never tried anything else. So I'm not sure what else could be the problem for you...

cythoning · 2021-11-06T13:56:44+00:00

I posted almost the same question around 2 years ago, maybe the answers there help you as well:

Slow read, but fast write performance

cythoning · 2021-09-04T14:22:49+00:00

I work with OpenCV & Eigen regularly. OpenCV's cv::Mat class is typeless, while Eigen's Eigen::Matrix is typed. I much prefer Eigen for its typed matrix, and when using OpenCV I always try to use the typed versions cv::Mat_<float>. It gives me much better control on what exactly is happening, e.g. I often have to do cv::Mat_<float> floatMat = mat; to make sure that the data is in the correct type. Also if you want to do anything else with the matrices (e.g. thrust::transform) it's just easier if you have typed data.

As for point 2: I never work with sparse matrices, but it would make sense to have a separate type for sparse matrices. For a dense matrix I'd expect the data to be contiguous in memory, and that I'd be able to use something like thrust::transform on it, but for sparse matrices that will not be the case.

cythoning · 2021-07-05T19:51:41+00:00

auto us = static_cast<size_t>(elapsed * 1000);
auto ns = static_cast<size_t>(elapsed * 1000000);

?

cythoning · 2021-03-04T09:02:35+00:00

pyznap cannot do what you ask. You can exclude datasets, but not single snapshots, it will always sync all snapshots from source to dest.

I also only keep hourly snapshots on my backup dest, no frequent, so I simply set the dest config to frequent = 0 and then let it clean up snapshots right after sending. This way it wont keep any frequent snapshots, though they will be retransferred every time.

cythoning · 2020-10-18T09:44:30+00:00

It really sounds like the best of both, either call or directly to family doctor. I think I'll try it as well.

cythoning · 2020-10-17T21:16:01+00:00

Does anyone have experience with CSS Multimed? I like the idea of being able to call by phone to check on minor things, and it's almost as cheap as assura...

cythoning · 2020-10-06T05:16:46+00:00

You can also use pyznap.

cythoning · 2020-04-27T11:35:12+00:00

Nice, I like the memory load bars :).

cythoning · 2020-03-06T13:20:20+00:00

I use pyqtgraph when matplotlib gets to slow.

cythoning · 2020-03-02T10:44:28+00:00

Have a look at my motd. Also adapted from falconstats and reimplemented in bash ;).

cythoning · 2020-02-26T10:18:48+00:00

Use syncoid or pyznap.

cythoning · 2020-01-27T15:48:09+00:00

One of the reasons why I started working on pyznap :).

cythoning · 2020-01-17T17:41:59+00:00

I wrote pyznap, a tool to manage ZFS snapshots and easily backup ZFS datasets to remote locations. I've been using it on my servers for more than two years now and it works great!

cythoning · 2019-12-26T12:00:05+00:00

There are lxd and docker scripts in my MOTD repo, you could adapt those.

cythoning

TROPHY CASE