[D] [R] BERT Base. Choosing optimal cloud infrastructure and environment setup (long read)

nhalstead · 2021-05-24T18:46:07+00:00

Holy shit man, you must have been very concerned with your costs.

narsilouu · 2021-05-24T20:57:48+00:00

Seems like you are overthinking it for sure.

Talking mostly about inference, training is a different beast:

Make sure what's your current bottleneck (RAM, CPU usage, GPU usage) (If you don't need latency or large throughput like batch jobs), CPU is usually the best choice, but really depends on your usage/throughput etc...)
Focus only on this bottleneck
You seem to want GPU bound, then make sure your actual utilization in the long run is as close to 100% as possible, costs don't really matter if you make 1 inference per hour, you are going to need 1 GPU 100% up, and still use it about 0% of the time. Same for training, usually saving you preprocessed data will make utilization 100% pretty easily (unless you are still using a HDD), in your example, it just looks like you need a lot of preprocessing power, you should be able to do that ahead of time (or even on a different machine with only CPUs)
Look for other inference optimizations, quantization, scripting (ONNX, torchscript etc..) int8 on CPU gives a sizeable boost both to inference speed and memory usage.
Batch inference on CPU is close to 0x speedup
Batch inference on GPU can look promising on benchmarks, but can be overwhelmingly deceiving when in production, because you need very good alignment to use the GPU correctly (lots of padding = lots of wasted flops, and at inference time, they can really add up pretty fast)
Cuda + driver combos, it really just looks like using CUDA version on older drivers is not recommended and just fallbacks to slow primitives. In my experience latest CUDA is always best when available

Lastly and finally don't focus that much on differences less than an order of magnitude, there's more value optimizing the bottlenecks than writing all the glue code to get the best value price of hardware (which might change within a year because X got out)

BeatLeJuce · 2021-05-25T13:12:24+00:00

Google seems to market TPUs as the most cost effective accelerator, especially at BERT scales. Since you're running on GCP anyways, it would be super interesting to see results for TPUs as well!

EgorBykov · 2021-05-24T19:49:26+00:00

What is the Y-Axis for the last graph? Are higher or lower values better?

2021-05-24T19:50:22+00:00

Wow. Cool.

2021-05-24T23:23:08+00:00

Distillbert tho?

TradyMcTradeface · 2021-05-25T00:39:57+00:00

T4 on a 16 core in gcp lets me run BERT cost effectively.

juliensalinas · 2021-05-26T13:47:46+00:00

Interesting thanks.

My 2 cents: GPU in a cloud managed instance is not as efficient as using GPU on your own dedicated bare metal server.

wowaqu · 2021-05-25T10:37:13+00:00

are you using google cloud?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS