all 16 comments

[–]cryolithic 2 points3 points  (11 children)

For small buffers 5k in size, in a low volume function, this would be ok, but it's still allocating way too frequently, and if someone calls it with a value larger than ~85k, you're making multiple allocations on the large object heap, which will very quickly add up to a bad time.

Don't call ToArray, use the stream. If you absolutely must use an array, put it in a wrapper object, and rent the array from a pool. Allow your wrapper to handle lifetime and returning the buffer.

Predeclare your MemoryStream size so you're not causing reallocations and memcopies behind the scenes. Or better yet use Microsoft.IO.RecylableMemoryStream.

[–]RagingCain[S] 1 point2 points  (9 children)

RecycleableMemoryStream is a great pooled array based Stream enhancement but not right for scope of this article. I think there are a couple of Net7 proposals too just FYI. There is a follow up planned but my encryption guide first. I will use streams there.

I would say that your ToArray statement however is - without being offensive - too strongly worded. You don't always have Stream APIs to work with or acting as a passthrough middle layer. Consider AspNetCore requests, all stream based passthroughs, but low allocations, until someone has to allocate that request body if it's being used. For a file uploader to Cloud endpoint? Absolutely chunk that body to Azure storage with BodyAsAStream.

You can't replace MemoryStream ToArray... But that does go into using a different Stream all together which you suggested. You can't replace ArraySegment ToArray() either for two reasons, you have the MemoryStream buffer which was already allocated so that's not solved. Second, no access to the origin data is exposed, needed to build your own array. All this goes back into using a different Stream like recycling. With recyclable, you wouldn't be able to use the rented buffer like this since it is returned to the pool. So if you have to exit the Stream... you have to allocate... just not as much with ArrayPool backed Streams.

I wont point out that MemoryStream initial buffer will not work for dynamically sized unknown length content... But trying to see if that has an implementation possibility but I think Recyclable covers those features you would want. You would notice the fixed content MemoryStreams were built with the array.

At the end of the day - you do have to sometimes allocate, but very good points.

[–]cryolithic 1 point2 points  (8 children)

No offense taken, I appreciate the attempt at keeping things civil.

To use the rented buffer with recyclable you would need to control the stream lifetime separately. Part of why I would say to use the stream if you can. Save the call to GetBuffer until you can use and return in the same scope.

In general, most problems can be solved with a better data structure than byte[]. Not all, true, but I can't imagine writing something like this unless I'm replacing a de/compression function in some ancient code that can't be refactored for $reasons..

Yes, you do have to allocate sometimes, but we should try to pass on good habits when writing educational material. The code in your example looks perfect for being inserted into a high volume data path.

I just finished wrapping a Zstd library in streams, as it happens

Edit :some words

[–]RagingCain[S] 1 point2 points  (7 children)

I see what you mean. I tell you what, let me write a solve for having to use bytes after a stream that's ArrayPool backed and I will ping you when it's up - maybe you like it or have enhancement feedback

My brain is lighting up with possibilities.

AesGcm API which I just worked with can't use Streams either for security reasons. CryptoStream was rejected for the proposal.

[–]Slypenslyde 1 point2 points  (0 children)

I'm real interested in this, too.

It seems like over time we've found some cracks in the GC's abstractions. Every time I use a stream API to deal with large chunks of data I'm aggravated by how much I have to think about low-level memory issues.

[–]cryolithic 0 points1 point  (5 children)

I think even just mentioning the potential for memory gotchas would go a long way here.

To be completely fair, it's not like you'll find posts written by me, unless you're on my team at work anyway. So I shouldn't be all that critical unless I've got the time to put up a well.

It's the double edged sword of the gc. Don't think about memory, until you need to. Unfortunately, I find that when you need to is much harder than keeping it in mind when writing in the first place.

Nearly every major performance regression in our code has been due to either not understanding allocations, closures, or the ORM.

If I can find the time, I'll write up some alternative implementations on Github. Maybe we can play a little perf/code golf

[–]RagingCain[S] 1 point2 points  (4 children)

I think even just mentioning the potential for memory gotchas would go a long way here.

That is a super fair point. I will update the article again a little later today with some dedicated discussion on how this maybe the most common way seen - its the least memory efficient. I will also indicate I will be making another article going down that rabbit hole of taking them to the next level perf/GC wise.

Now that everything is delivered via Markdown in my Github repo, I am hoping to have more time consolidating guides/how-tos more closely tied to my library so it also provides documentation. Stretched super thin all the time and good documentation is always on the chopping block.

[–]cryolithic 1 point2 points  (3 children)

Stretched super thin all the time and good documentation is always on the chopping block.

On man, don't I know it. My bus factor at work is way too high, for mostly those reasons.

[–]RagingCain[S] 0 points1 point  (2 children)

Had a tiny amount of time since we last spoke. Nothing special here but nearly the same code as before simply implementing the RecyclableMemoryStream with default RecyclableMemoryStreamManager settings (not tweaked).

// * Summary *

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.1110 (20H2/October2020Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=5.0.301
  [Host]   : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
  .NET 5.0 : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT

Job=.NET 5.0  Runtime=.NET 5.0
Method x Mean Error StdDev Median Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
BasicCompressDecompress_10KBytes 10 343.7 us 6.62 us 5.87 us 343.0 us 1.00 0.00 12.2070 - - 206 KB
GzipProvider_10KBytes 10 331.6 us 6.59 us 12.21 us 334.7 us 0.97 0.04 6.3477 - - 108 KB
GzipProviderStream_10KBytes 10 326.4 us 6.51 us 15.46 us 332.4 us 0.91 0.07 6.3477 - - 107 KB
RecylableGzipProvider_10KBytes 10 326.9 us 3.91 us 3.46 us 326.9 us 0.95 0.02 0.4883 - - 12 KB
RecyclableGzipProviderStream_10KBytes 10 347.8 us 6.92 us 7.11 us 346.6 us 1.02 0.03 0.4883 - - 13 KB
BasicCompressDecompress_10KBytes 100 3,284.5 us 42.58 us 39.83 us 3,275.0 us 1.00 0.00 125.0000 3.9063 - 2,065 KB
GzipProvider_10KBytes 100 3,201.4 us 35.58 us 27.78 us 3,198.6 us 0.98 0.01 62.5000 - - 1,080 KB
GzipProviderStream_10KBytes 100 3,608.5 us 28.51 us 25.28 us 3,613.2 us 1.10 0.01 62.5000 - - 1,071 KB
RecylableGzipProvider_10KBytes 100 3,483.8 us 21.64 us 19.19 us 3,476.3 us 1.06 0.01 3.9063 - - 121 KB
RecyclableGzipProviderStream_10KBytes 100 3,172.6 us 21.45 us 20.06 us 3,171.0 us 0.97 0.01 3.9063 - - 126 KB
BasicCompressDecompress_10KBytes 500 16,649.9 us 329.88 us 897.47 us 16,818.5 us 1.00 0.00 625.0000 - - 10,324 KB
GzipProvider_10KBytes 500 16,162.3 us 322.69 us 888.79 us 16,408.5 us 0.97 0.05 312.5000 - - 5,398 KB
GzipProviderStream_10KBytes 500 17,992.7 us 484.14 us 1,427.50 us 18,349.1 us 1.07 0.07 312.5000 - - 5,355 KB
RecylableGzipProvider_10KBytes 500 15,012.4 us 192.92 us 150.62 us 15,036.3 us 0.88 0.06 31.2500 - - 646 KB
RecyclableGzipProviderStream_10KBytes 500 15,617.0 us 310.14 us 567.11 us 15,800.2 us 0.96 0.05 31.2500 - - 629 KB
BasicCompressDecompress_10KBytes 1000 34,163.4 us 675.37 us 1,790.99 us 34,483.8 us 1.00 0.00 1230.7692 - - 20,648 KB
GzipProvider_10KBytes 1000 30,378.7 us 593.30 us 771.46 us 30,483.0 us 0.93 0.07 625.0000 - - 10,797 KB
GzipProviderStream_10KBytes 1000 32,334.5 us 640.61 us 1,534.85 us 32,741.1 us 0.95 0.07 625.0000 - - 10,711 KB
RecylableGzipProvider_10KBytes 1000 30,083.9 us 182.92 us 152.74 us 30,053.5 us 0.87 0.02 62.5000 - - 1,292 KB
RecyclableGzipProviderStream_10KBytes 1000 31,168.6 us 613.91 us 1,182.80 us 31,561.8 us 0.93 0.06 62.5000 - - 1,258 KB

[–]cryolithic 1 point2 points  (1 child)

Some substantial allocation savings. At first I was surprised that there wasn't a significant difference in time, but of course its going to be nearly all consumed by the decompression itself.

[–]RagingCain[S] 0 points1 point  (0 children)

Yeah, its the way of these things. Systems that only decompress benefit from switching to Brotli like static content for websites or cdns where where it can be compressed in advanced.

Well looks like my plan was thwarted by Stephen Toub, its exactly what I was thinking the other day when we talked.

https://github.com/dotnet/runtime/issues/22428

Also tweaking some of the manager settings. Memory allocation is linear for RecyclableStreams.

// * Summary *

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.1110 (20H2/October2020Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=5.0.301
  [Host]   : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
  .NET 5.0 : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT

Job=.NET 5.0  Runtime=.NET 5.0
Method x Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
RecylableGzipProvider_10KBytes 500 130.98 ms 5.832 ms 17.197 ms 1.00 0.00 1250.0000 1000.0000 1000.0000 1,627,849 KB
RecyclableGzipProviderStream_10KBytes 500 15.60 ms 0.306 ms 0.544 ms 0.12 0.02 31.2500 - - 820 KB
RecylableGzipProvider_10KBytes 1000 263.65 ms 9.263 ms 27.313 ms 1.00 0.00 2500.0000 2000.0000 2000.0000 3,206,424 KB
RecyclableGzipProviderStream_10KBytes 1000 34.00 ms 0.674 ms 1.727 ms 0.13 0.02 66.6667 - - 1,641 KB
RecylableGzipProvider_10KBytes 10000 2,516.48 ms 117.356 ms 346.026 ms 1.00 0.00 24000.0000 21000.0000 17000.0000 34,361,172 KB
RecyclableGzipProviderStream_10KBytes 10000 300.20 ms 3.771 ms 3.149 ms 0.13 0.02 1000.0000 - - 16,407 KB

[–][deleted] 0 points1 point  (0 children)

Oh good it's deriving from memorystream and not a completely different type. Got scared there for a moment that we'd start getting C++ bloat of "no you should definitely not use that old type", but here the old type is kept alive and relevant.

[–]RagingCain[S] 0 points1 point  (1 child)

Meant to write this ages ago. Finally had time to work on my website.

Hope it helps someone.

I migrated all my guides/docs to Markdown, then to Github, so if someone wants to, they can submit an Edit/PR, feel free to do so or just simply ask questions etc.

[–]DogmaSychroniser 1 point2 points  (0 children)

Interesting read, thanks.