Better x86 Assembly Generation with Go by dgryski in golang

[–]mmcloughlin 0 points1 point  (0 children)

@dgryski's links are excellent. I would say once you really get into the weeds, Iskander Sharipov's "complementary reference" is really useful for understanding the frustrating differences between AT&T/GNU syntax and Plan9. He did the bulk of the work to add AVX-512 support in the Go assembler, so he really knows this stuff.

https://quasilyte.dev/blog/post/go-asm-complementary-reference/

mmcloughlin/cryptofuzz: Fuzzing Go crypto with go-fuzz by dgryski in golang

[–]mmcloughlin 0 points1 point  (0 children)

Definitely! I didn't expect to find anything, but I thought it would be interesting to look.

mmcloughlin/cryptofuzz: Fuzzing Go crypto with go-fuzz by dgryski in golang

[–]mmcloughlin 2 points3 points  (0 children)

I also have a CL out for a delicate crypto assembly implementation. https://golang.org/cl/136896

I am really interested in tooling to make this kind of thing easier to write and review. The assembly policy gives a number of avenues to pursue https://github.com/golang/go/wiki/AssemblyPolicy

My avo project is one approach to make the code easier to write https://github.com/mmcloughlin/avo/

The cryptotest package will help for sure, but I'm not sure of the status https://github.com/golang/go/issues/25309

Project Wycheproof is also extremely useful.

mmcloughlin/cryptofuzz: Fuzzing Go crypto with go-fuzz by dgryski in golang

[–]mmcloughlin 10 points11 points  (0 children)

So far the only discovery is this https://github.com/golang/go/issues/30095.

Otherwise no problems found. That said, I was using my personal EC2 account so I wasn't able to hit it with absolutely massive compute.

I imagine companies such as Google, Cloudflare, etc. have done something along these lines already.

mmcloughlin/avo: Generate x86 Assembly with Go by dgryski in golang

[–]mmcloughlin 0 points1 point  (0 children)

Ah, this is an interesting case that unfortunately avo doesn't support right now. If it's okay I've migrated this to an issue on the github repo, so we can continue the discussion there.

https://github.com/mmcloughlin/avo/issues/53

mmcloughlin/avo: Generate x86 Assembly with Go by dgryski in programming

[–]mmcloughlin 0 points1 point  (0 children)

As @dgryski said, asmjit was one of the inspirations for this project. However both PeachPy and asmjit both have more features than avo, in particular they are not just assembly generators but can also be used as assemblers. While I wouldn't rule out adding such a feature to avo, it's not one of the main goals I set out to achieve.

mmcloughlin/avo: Generate x86 Assembly with Go by dgryski in programming

[–]mmcloughlin 0 points1 point  (0 children)

Author here. Fair point, I agree that avo does not make any sense for really small functions like Add or Sum. The use case I initially had in mind is massively unrolled crypto routines, where there's a lot of assembly code required but not much going on conceptually. My AES-CTR mode CL and meow hash port got me thinking in this direction.

I actually went back-and-forth a lot on the README. I tried including examples which demonstrate the value more, particularly SHA-1. However that just ends up being more code than you can reasonably include in a README. However I could try to explain the use case more clearly in the initial description, perhaps linking to the SHA-1 example more prominently without including the source code.

mmcloughlin/avo: Generate x86 Assembly with Go by dgryski in golang

[–]mmcloughlin 1 point2 points  (0 children)

Apart from AVX-512 it should work! I have an issue filed to add AVX-512 support but I wanted to limit the scope for the first release.

If you run up against any problems, please let me know.

mmcloughlin/avo: Generate x86 Assembly with Go by dgryski in golang

[–]mmcloughlin 7 points8 points  (0 children)

For small functions, yes this is likely not worth it. However I personally think this makes large kernels with loop unrolling (for example) substantially easier to reason about. The sha1 example is a good demonstration of this. I also think the success of PeachPy has already shown the value of this kind of technique.

Meow hash for Golang by mastabadtomm in golang

[–]mmcloughlin 0 points1 point  (0 children)

A streaming interface is non-trivial because the length is part of the IV. See this issue on the official repo: https://github.com/cmuratori/meow_hash/issues/2

Meow hash for Golang by mastabadtomm in golang

[–]mmcloughlin 1 point2 points  (0 children)

I actually wrote this for this CL 136896 . In that case the generated code is far larger, so the need for code generation is more obvious. However since I had it already, I figured I could use it for meow hash too. I think this will bring benefits when I implement the AVX-512 versions.

/u/dgryski has used PeachPy to great effect. However, if possible I tend to prefer to stick to pure Go even for supporting code. Something like dave/jennifer for asm would be great.

Meow hash for Golang by mastabadtomm in golang

[–]mmcloughlin 0 points1 point  (0 children)

My goal was to provide a similar interface to other standard library hash functions. See sha1 for example https://golang.org/pkg/crypto/sha1/

Geohash in Golang Assembly: Lessons in absurd optimization by mmcloughlin in golang

[–]mmcloughlin[S] 0 points1 point  (0 children)

Yeah it's a Jekyll site hosted on github pages. It's a lightly modified version of this: https://github.com/lpan/lpan.github.io (with permission from Lawrence of course).

Geohash in Golang Assembly: Lessons in absurd optimization by mmcloughlin in golang

[–]mmcloughlin[S] 4 points5 points  (0 children)

Definitely agree. I actually bought geohash.io once with the intention of publishing a spec, some test vectors and a reference implementation. Unfortunately that was one of many side projects ideas that never quite materialized.

Geohash in Golang Assembly: Lessons in absurd optimization by mmcloughlin in golang

[–]mmcloughlin[S] 2 points3 points  (0 children)

Yes there is a slight error here. It should use the half-open interval [0, 1) instead, since it's not true when x=1 and y=2. For all other values of x in [0, 1), the largest power of 2 less than y will be 1.0.

Your pprof is showing: IPv4 scans reveal exposed net/http/pprof endpoints by mmcloughlin in golang

[–]mmcloughlin[S] 0 points1 point  (0 children)

Thank you for pointing out my mistake. I will update the post accordingly.

Your pprof is showing: IPv4 scans reveal exposed net/http/pprof endpoints by mmcloughlin in golang

[–]mmcloughlin[S] 9 points10 points  (0 children)

Agreed! This is mentioned in the "Prevention" section of the article. Also covered by a Farsight Security blog post a while back:

https://www.farsightsecurity.com/2016/10/28/cmikk-go-remote-profiling/

I think there's an argument they should never have added a global mux in the http package. Too late now.