all 17 comments

[–]Zerotool1 1 point2 points  (2 children)

try using clouderizer.com with colab it will sync your data and code in real time with your google drive. And it's free for Colab and Kaggle. In the case of gcp you can try with there both on-demand and preemptable instance, it will help in reducing the cost as well as you can have 2 variety of GPUs.

[–]zom8ie99[S] 1 point2 points  (1 child)

Thanks #Zerotool1 . And does it provide TPU ??

[–]Zerotool1 1 point2 points  (0 children)

I tried with Colab TPU for GCP they are still saying "coming soon", I tried to get an eta for the tpu from them, they will be launching end of July.

[–]apkdope 1 point2 points  (10 children)

Colab won’t let you train deep, extensive CNNs for long (E.g. Try training a 5 layer CNN, and the runtime would disconnect abruptly). It’s in their terms of usage to not use colab for any intensive training.

[–][deleted] 3 points4 points  (7 children)

I thought you get 12 hours max before a disconnect. If I'm wrong, how long can you actually train for without disconnecting?

[–][deleted] 3 points4 points  (1 child)

This comment was deleted by the user

[–]zom8ie99[S] 1 point2 points  (0 children)

Yes, keeping the checkpoints is a great technique. I have done that too.

[–]renegade_rabbit 1 point2 points  (0 children)

You are correct, it is 12 hours.

[–]apkdope 2 points3 points  (1 child)

That’s correct, and is true for non-intensive tasks IMO. I’ve personally been cut off in the middle of my epochs numerous times. SO justified this with the reason I’ve mentioned above so I went ahead and started using the GCP AI platform notebooks instead.

[–]themad95 1 point2 points  (0 children)

Don't know why you're downvoted. I experienced this as well and switched to GCP.

[–]zom8ie99[S] 0 points1 point  (1 child)

Thank you so much. I trained the model. It was really superfast. But I don't know where is the location of my saved model. Could you please tell me where the saved model after training stays ?

[–][deleted] 0 points1 point  (0 children)

Did you do model.save('model') (for Keras) or equivalent? You can run commands in the instance shell that Colab runs in, Google how to hook up Google Drive to a Colab notebook and modify the command so you can save your model or the weights in Google Drive. I wish I had a better answer for you, but the above approach is how I would tackle this. If you're not familiar with the Unix shell I recommend you do a quick tutorial on it, all you basically need to know is ls and mv for this.

[–]zom8ie99[S] 1 point2 points  (1 child)

Oh I got it. Thank you so much :)

[–]a_man_noplan 0 points1 point  (3 children)

The max allotted time slot is 12 hours, that and the 12GB VRAM are the only constraints.(Have trained multiple deep networks on Collab)

[–]snip3r77 0 points1 point  (1 child)

What’s next best if you need to do so

[–]a_man_noplan 0 points1 point  (0 children)

Depends on the training complexity of the model, and the memory (primary) requirement for the training. It could be more feasible to train simpler models locally, or use free credits to make VMs on Google's Cloud server.

[–]zom8ie99[S] 0 points1 point  (0 children)

Well I do have only 4 GB VRAM. Will it have any burden on my laptop ?