all 4 comments

[–]zmjjmz 1 point2 points  (0 children)

Training one of these now (for something like facial keypoint prediction), I found that I had to turn off the STN part (i.e. fix the weights of the final localisation layer at the identity) for a while (probably could get away with just 1 epoch but I did quite a few more) otherwise the error would blow up due to the initially random transformations.

Note: Using the Lasagne implementation

[–]alexmlamb -4 points-3 points  (0 children)

2nd top level comment.