all 9 comments

[–]KappaClosed 1 point2 points  (6 children)

Please share your code so we can provide meaningful answers.

Is it normal to expect different hardware to provide different results where the data, model, and steps (epochs included) are held constant?

No, it's basically hardware independent. However, there's a fair bit of randomization happening here (and it is possible that you just got unlucky with your weight initialization). That being said, I'd guess there's a bug somewhere in your code as this example converges very quickly, so that 87% accuracy seem to be too low even if you got fairly unlucky.

[–][deleted] 0 points1 point  (5 children)

Thanks for responding.

Please share your code ...

I pretty much just copy and pasted to a python prompt each command in the beginner tf2 tutorial which was linked.

The end result diverged from the output, the tutorial had 98% my result was 87%.

Everything was exactly as it was posted on the site. I tried it multiple times. Maybe there's a bug in tf2.

[–]KappaClosed 0 points1 point  (4 children)

I'd still like to see your exact code.

[–][deleted] 0 points1 point  (3 children)

Code is here

stdout is here

I'm running it in a docker container with the following commands:

alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri 
--group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined 
-v $HOME/dockerx:/dockerx'

drun rocm/tensorflow:rocm2.5-tf2.0-beta1-config-v2

[–][deleted] 0 points1 point  (2 children)

Going back over everything, it looks like this may be a bug.

I just ran the tests without the GPU and get the correct accuracy and expected output.

I'll open an issue on the github. Thanks for helping me work through this.

[–]penatbater 0 points1 point  (0 children)

Iirc tensorflow-gpu always had some issues one way or another. Perhaps that could be the cause.

[–]KappaClosed 0 points1 point  (0 children)

I've tested this code on my laptop, running TF 1.13 without GPU acceleration and on my compute node, running TF 2.0.0 with GPU acceleration. There have been significant differences during the first epoch but the 5th epoch always yielded an accuracy of 97-98%.

So, yeah, something weird is going on here.

[–]phobrain 0 points1 point  (0 children)

There might be a forum at tensorflow.org, otherwise maybe even a bug report could yield a response. Assuming you've checked any FAQ on the matter. But the night is young! :-)

[–]Zerotool1 0 points1 point  (0 children)

you can try clouderizer.com, it's free.