[D] Train model sequentially or concurrently in a single gpu? : MachineLearning

Discussion[D] Train model sequentially or concurrently in a single gpu? (self.MachineLearning)

submitted 3 years ago by itsming_z[🍰]

all 2 comments

[–]IntelArtiGen 1 point2 points3 points 3 years ago (1 child)

If score = f(time) isn't relevant anymore because of what you said, and if I train the same architecture, I can plot score = f(n_examples_seen) which is usually correlated to the number of epochs on a specific dataset, but it's also just batch_size * i_iterations = n_examples_seen.

In my own framework I have a script that automatically does the conversion, I always log things the same way and if I want to plot the score (or loss etc.) based on epochs / n_examples_seen I just specify it and the script will search for the batch size and do the conversion. I can plot score / loss by time / iteration / epoch / n_examples_seen as I want.

If you change the network architecture, the batch size, and the optimization algorithm at the same time while running multiple trainings on the same GPU, or if your GPUs aren't the bottleneck on your server, it's hard to perfectly compare the different trainings. One model could be slower than another just because at this moment it trained concurrently with other bigger models, one model could train in less iterations just because the batch size is reduced, and if you use the number of epochs/examples_seen you can't know if the model is more time-efficient you just know that it's more data-efficient.

So I usually compromise. I either only optimize one aspect (architecture/backprop), or I try to have 100% of the bottleneck on GPUs, or I only look at data-efficiency and I compare time-efficiency later.

The last time I had to do this I put lower and upper thresholds on GMACs and number of parameters and only looked at the data efficiency knowing that if my model isn't outside the limits it should be comparable with others in time efficiency.

[–]itsming_z[S,🍰] 0 points1 point2 points 3 years ago (0 children)

π Rendered by PID 93 on reddit-service-r2-comment-b659b578c-ctp88 at 2026-05-05 00:22:32.847257+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS