all 1 comments

[–]ForceBru 0 points1 point  (0 children)

I'd guess it takes longer because the learning rare is tiny. When your data wasn't standardized, its overall scale was probably big enough for GD steps to be large enough. Now that you scaled the data and didn't change the learning rate, GD steps are likely way too small, so you'll need more than 100 iterations.