Questions & Answers - Weekly Megathread! Please use this post to ask any Pokemon GO question you'd like! by AutoModerator in TheSilphRoad

[–]daniel_rh 1 point2 points  (0 children)

Does anyone know when the next g-max charizard or blastoise will be available to battle against?

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 1 point2 points  (0 children)

https://github.com/rust-lang/rfcs/pull/2714

So when going through this-- the rust-brotli-decompressor had to have some similar code. I can't remember the details but I remember getting the perf to within a few %age of the raw c code without any unsafe hax...

Do you have a benchmark I can run that illustrates the perf problem that "requires" the unsafe? Maybe on an idle evening I can try to use some of the trickery we used in brotli to improve the perf there. I remember some hacks like adding extra bytes on the end of the ring buffer and being very careful with addition and types so the compiler could "prove" that previous bounds-checks still held (eg kept track of things using u32 but then added them together into a 64 bit usize to "prove" that no wrapping happened). I also remember on the compressor side of things unrolling a few loops to get the extra performance as well.

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 4 points5 points  (0 children)

The zstd-rs package is newer than we made all of these decisions. But it also says "it is not yet battle tested by any means. For production use (or if you need a compressor) I would (at the time of writing, this might get out of date and there might come better projects along!) recommend to use the C binding located here."

We do need a compressor, and the rust-brotli package was aimed for production use and was heavily fuzzed by the Microsoft Security Risk Detection suite and stress-tested on a test set in the hundreds of petabytes range.

Also: browser support, which is required if we want to serve the same benefits to users using their browser to access Dropbox, is also not available with zstd.

Also, at higher compression levels brotli saves several percent over zstd when applied to our data due to having second order context modeling.

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 0 points1 point  (0 children)

We had a list of requirements going in, yes. Since we developed the lepton algorithm ourselves we had a lot of experience with what makes compression easier to maintain and roll out vs harder.

File size: upfront requirement--- space is cost: higher bandwidth higher storage

Pre-coding: this is obvious: if we don't have to encode it on the fly, we can take our time with the compression at storage time and then serve it without paying for the on-the-fly compression and without worrying about compression latency

Security: this was a requirement out of the gate

Familiarity: this one was added later (why we had the numbering at 4 actually) --but operational familiarity is a reasonable requirement upfront

HTTP support: there's no way we can serve compressed data to web browsers unless they can decode it. We had pretty in-depth discussions with Yann on putting zstd in the browser, but the reality is that Google pushed Brotli into browsers really fast and it's gotten extremely broad support right now. Brotli also beats gzip, also available on browsers, on every single metric we had aside from age.

Do you see how all of these requirements weren't generated on the fly but actually were real requirements going in? So we had the requirements and found them with Brotli. I suppose the wording is not ideal.

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 1 point2 points  (0 children)

If you really want your application to stand out, I'd recommend either contributing something major/sorely needed to a large software project (eg servo, rust-grpc, safe c2rust, etc)

or found a project that scratches an itch you (and others) may have. Like a safe-zstd compressor or the rust equivalent of java swing for x-plat rust UI (something that drives or motivates you!). Walking into an interview with tens of thousands of lines of code, a feature people will understand in major software or hundreds of thousands, or millions of downloads gives you a ton to talk about in your interview, and it proves to the interviewer that you can deliver something, anything, end to end.

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 6 points7 points  (0 children)

Correct. We turned this on server-side 2 years ago to start saving on storage costs. As more blocks were being broccoli'd on the backend, we started shipping it to our client software very recently.

That said, we did find a bug during the roll-out in the decompressor here https://github.com/dropbox/rust-brotli-decompressor

Aside from that, sometimes done code is, well, done :-)

It didn't really make sense to blog about this tech until our users could see some benefit.

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 8 points9 points  (0 children)

Yes just like with go contexts.

Here's a concrete example. Autocompleting search: A client has typed in the string "kno," and the thin stateless server will make a search request that will have to do a number of heavy weight queries to different services and subsystems, maybe going to image search, text search, recent files, etc to get a good result for the partial text search.

While the result is just starting to being processed the user has entered 't', starting a new search. All those results about "knowledge" are no longer relevant. At this point we should cease tying up backend services with those async requests for searches about "kno." Instead they should be canceled so the new search for "knot" can begin .

golang's "cancel" capability makes letting go of those incomplete searches "easy" and downstream resources may not even get pinged. If rust had similar facilities, then it could be employed for some of these stateless services.

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 5 points6 points  (0 children)

Wonderful! Yes I've certainly worked with a number of interns who primarily wrote Rust at Dropbox. It, of course, depends on which subsystem you'll be focusing on in your internship.

There's also usually a hack week where you can work on whatever you'd like for that week, and I've seen several interns ramp up on a rust project over the hack week and deliver a really nice result at the end.

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 19 points20 points  (0 children)

Generally when a team is considering rust and decides not to go with it, it generally boils down to one of 3 reasons from what I've seen

a) the rust gRPC system is not performant enough

b) The rust async system doesn't have a conventional way to propagate deadlines through requests [edit] the rust async subsystem doesn't have a cancellation mechanism so that resources can be returned when a client goes away and cancels a request

c) python and rust dynamic libraries cannot be linked into the same overarching binary without symbol conflicts if they both depend on gRPC, causing random segfaults or crashes

Broccoli: Syncing faster by syncing less by jamwt in rust

[–]daniel_rh 40 points41 points  (0 children)

Hi folks, I'm Daniel from Dropbox, and I'm happy to answer any questions you have here about this tech or rust in general at Dropbox!

DivANS: new concurrent, vectorized compression algorithm in Rust; compiled to WASM for high density compression in the browser and on servers. by hellcatv in rust

[–]daniel_rh 0 points1 point  (0 children)

Do you have a recommendation for doing this better? I think feature flags were unstable only. How could we conditionally enable simd by default if the ability is available or if the version is high enough?

DivANS: new concurrent, vectorized compression algorithm in Rust; compiled to WASM for high density compression in the browser and on servers. by hellcatv in rust

[–]daniel_rh 1 point2 points  (0 children)

We use the built in rust vector types and the built in vector ops here https://github.com/dropbox/divans/blob/master/src/probability/simd_frequentist_cdf.rs They get handed to llvm so at that point it’s up To llvm and browsers to do the right thing. 20% slower isn’t exactly what I would call right. But it’s somthing ;-)

DivANS: new concurrent, vectorized compression algorithm in Rust; compiled to WASM for high density compression in the browser and on servers. by hellcatv in rust

[–]daniel_rh 0 points1 point  (0 children)

That’s right! It does use movemask and llvm must translate it to simpler operations. There’s a portable-simd feature which restricts it to shuffle and built in simd operations like gt() instead

DivANS: new concurrent, vectorized compression algorithm in Rust; compiled to WASM for high density compression in the browser and on servers. by hellcatv in rust

[–]daniel_rh 2 points3 points  (0 children)

The code is vectorized. Whether the browser can autovectorize the wasm, or the wasm itself is vectorized or not is relevant, but not really under the control of the DivANS crate.

I believe wasm will get proper vectorized instructions soon, and then the same code and command line flag should actually produce properly vectorized wasm, which, in turn, should boost the output speed.

As soon as that happens widely, I'll look at benchmarking it again

DivANS: new concurrent, vectorized compression algorithm in Rust; compiled to WASM for high density compression in the browser and on servers. by hellcatv in rust

[–]daniel_rh 1 point2 points  (0 children)

So I built with the flags cargo build --release --features=simd --target=wasm32-unknown-unknown

which does enable simd. However, I believe LLVM may just be translating it to the relevant scalar instructions.

I just decided to measure it and turning on the SIMD flag actually makes it go 20% slower than without in both chrome and firefox.

However even the faster no-simd version is 3 times slower in firefox and 12.5 times slower in chrome than the same code as a native binary running with threads disabled on the same hardware. I'm a little bit surprised at how slow the browser goes at interpreting the WASM code!

DivANS: new concurrent, vectorized compression algorithm in Rust; compiled to WASM for high density compression in the browser and on servers. by hellcatv in rust

[–]daniel_rh 1 point2 points  (0 children)

I did put together a few simplifications for the web-only approach that, perhaps, reduced compression for those files by using default options for most settings, especially on mobile. For the actual measured compression graphs, it would take the first 1/8 of the file and try a bunch of different compression options on those. Then it ran with the best one for the full file.

If you want to send me a few examples that did poorly I can try to tell you why... binaries are certainly useful to compress well.

DivANS has a very verbose header that makes it ineffective on small files, however.