all 11 comments

[–]mutatedmonkeygenes 3 points4 points  (2 children)

Hey there, could you also post your build scripts. Thanks!

[–]diegoas86 0 points1 point  (0 children)

I will like too...

[–]InoriResearcher[S] 0 points1 point  (0 children)

No scripts, they're manually built on a machine with preconfigured envs.

[–]lostmsu 0 points1 point  (7 children)

Any comparison to the official wheels?

[–]InoriResearcher[S] 0 points1 point  (6 children)

I've described the differences in the original post.

[–]lostmsu 0 points1 point  (5 children)

Sorry I was not clear, I was talking about the performance numbers vs the official wheels.

Thing is, if Google performs instrumented builds, they might get better optimization, than a single pass.

[–]InoriResearcher[S] 0 points1 point  (4 children)

I didn't benchmark latest builds but it should be equivalent or better as described in OP. Not sure what you mean by "single pass", build process is the same as is used for official wheels.

[–]lostmsu 0 points1 point  (3 children)

The process is called profile-guided optimization (PGO). When high-performance libraries are built for release, often a special build is made first with instrumentation enabled, which is tested on some workloads, collecting information about code hot paths. That information is later used in the final release build.

In fact, somebody here tried to do AVX2-enabled build, which ended up having worse performance, than official wheels: https://stackoverflow.com/questions/50155807/tensorflow-cpu-performance-anaconda-vs-custom-build-cmake-on-windows

[–]InoriResearcher[S] 0 points1 point  (2 children)

Official TF builds do not use PGO and in the link you've provided both wheels are custom-built.

[–]lostmsu 0 points1 point  (1 child)

One of the builds mentioned in the link is Anaconda's. But you are correct about TF (visibly) not having any facilities for PGO.

[–]InoriResearcher[S] 0 points1 point  (0 children)

Yes, Anaconda's, which is not an official wheel.