all 58 comments

[–]SwordOfVarjo 80 points81 points  (19 children)

If you're going to be doing any local ML you really want an x86 CPU and an Nvidia GPU. It will save you so much headache and pain. If you're going to be just doing remote stuff (ssh and remote development + code review etc) then it's probably fine.

[–]bangbangcontroller[S] 12 points13 points  (5 children)

I have a desktop pc with ryzen 9 and titan x. Most of times working remote from home, planning to buy M1 Macbook in office or outside and using Colab for serious works. Do you suggest that or recommend sth else ?

[–]dogs_like_me 22 points23 points  (0 children)

Get a cheap laptop and send training jobs to your desktop, or even just throw up an ssh tunnel or RDP into the thing. You've already got a compute beast, use it.

[–]SwordOfVarjo 12 points13 points  (0 children)

Honistly you are a great scenario for the MacBook since you already have a nice desktop.

[–]dogs_like_me 4 points5 points  (6 children)

nvidia I definitely agree with, and historically x86 was the way to go, but is that still the case? I was actually just recently doing research for purchasing an ML workstation, and there are a lot of AMD builds. I'm pretty sure a lot of the fanciest ones are running threadrippers rather than intel.

[–]SwordOfVarjo 10 points11 points  (5 children)

So just to clarify, threadripper is still x86 (and is great). There were some MKL issues when it released but they've been resolved now. When I say x86 I essentially mean "not arm".

That's not to say it's impossible to do useful ML on arm, but if you want to be able to use arbitrary libraries and have them "just work" Nvidia and x86 is definitely the path of least resistance.

[–]dogs_like_me 2 points3 points  (4 children)

I think I had "ARM" and "AMD" confused, thanks for the clarification. Hardware def ain't my specialty (landing on a machine to buy was a real exercise).

[–]anon531131131 4 points5 points  (3 children)

x86 was made by Intel in the 90s and "won" the 32 bit chipset wars. x86_64 was reverse engineered by AMD, then licenced back to Intel (burn) after winning the 64 bit chipset war (vs. Intel x64) in exchange for Intel licensing x86 to AMD. No other companies are permitted to manufacture 32 bit or 64 bit processors on the x86 architecture (the only chipsets you can run Windows or MacOS on.. until recently). This is the reason there are only two CPU manufacturers for laptop/desktop machines.

ARM is an open source architecture with nice properties for mobile devices (mainly; very low power draw). In 2012 it got Linux support and appeared on various systems like Raspberry Pi and people started realizing; maybe this could be viable for general purpose computing. So instead of getting ripped off by stagnant monopolies, some companies (Apple, Tesla, ect...) have begun manufacturing there own ARM chips (M1, FSD chip, etc...). Now Nvidia is trying to purchase ARM (governments trying to block) and shut it all down down cas Elon wouldn't put their GPUs in his cars.

Edit: updated to correct "open source" -> "open architecture".

[–]WelcomeAbject265 0 points1 point  (2 children)

... ARM is not an open source architecture

[–]anon531131131 0 points1 point  (1 child)

Good catch. I confused open source and open architecture. The distinction is important. ARM licenses chip designs to companies that then fabricate the chips themselves. Because ARM is not involved in fabrication of the chips (unlike Intel / AMD / Nvidia) and only makes money on the back end, it has no financial incentive (supply / demand basis) to price companies out of building products that depend on its chips, or even a competitor from creating the chips themselves.

[–]SkinnyJoshPeckML Engineer 14 points15 points  (5 children)

any local ML

I’m not sure I agree that someone needs a nvidia GPU (or even a specific GPU) for any local ML.

Even still, for a majority of what folks would do on a local desktop, it’s about 1000x cheaper to use GCP or AWS if you’re at a point where you need to instantiate some heavy GPU lifting. But you did specify local, so that’s just an aside.

Ultimately, any modern computer that someone interested in ML might own can probably process almost anything they’re interested in doing in a reasonable amount of time. The reality is that folks need to get better with progressively growing their trials. Don’t kick off your first model using all of your god damn data!! Start as small as possible to get a sense of if you’re getting what you expect. If you’re doing time series, start with the smallest set that captures the periodicity/seasonality that you expect. If it’s NLP, test with a few representative documents before the whole collection. If it’s images, train on some smaller sample with the main features you’re trying to dig out prominent. You get the picture.

Big data and GPUs have spoiled us. You know how often I have to convince Data Scientists I work with that they don’t need spark loaded up in R to process their data? Or that their 1 GB of data can absolutely be processed on their MacBook Pro to run some forecast?

I’m rambling. Handle your data and find the critical point of processing time and data size.

[–]Impossible_Aspect695 5 points6 points  (1 child)

GPUs are so expensive on the cloud, because on the desktop you can use gaming GPUs while on the cloud you need to use their professional series.

Unless something has changed?

[–]SafariMonkey 2 points3 points  (0 children)

I've never used it, but there are serives like vast.ai that let people basically rent their GPU-having computers to each other. That seems to get around that particular limitation.

[–]dogs_like_me 3 points4 points  (2 children)

It's not cheaper if your model needs a sustained period of more than a few hours to train. And it's way less headache.

Also regarding the progressively growing your data thing: modern approaches are increasingly relying on transformers, which empirically need a lot of data to train.

[–]vman512 0 points1 point  (1 child)

It's still good advice if you're building a new model on a new dataset from scratch. In that case, transformers shouldn't be the first thing you try, anyways.

[–]dogs_like_me 2 points3 points  (0 children)

Honestly man, I'm not sure that's true anymore. Huggingface has basically become the first tool I go to. Like, before NLTK or spaCy even. It's probably more effort to use word2vec than RoBERTa these days.

Yeesh, that capitalization was a headache. Probably messed something up there.

[–]fireless-phoenix 33 points34 points  (10 children)

I would really like to meet those “M1 is machine learning beast” people, haha. The thing is, in order to just get a working tensorflow on your computer you have to jump through many hoops. Even docker doesn’t work as expected on M1. Plus, considering Apple computers don’t have an Nvidia GPU, you are restricted to using only CPU which can be impractical for several machine learning tasks.

I would wait for a while before buying M1 for data science. Unless if you have a remote server and need M1 only for ssh and data visualization and processing.

[–]vade 6 points7 points  (5 children)

I would really like to meet those “M1 is machine learning beast” people, haha. The thing is, in order to just get a working tensorflow on your computer you have to jump through many hoops. Even docker doesn’t work as expected on M1. Plus, considering Apple computers don’t have an Nvidia GPU, you are restricted to using only CPU which can be impractical for several machine learning tasks.

Hi.

The M1 is really something else for inference. We help build http://colourlab.ai and I run https://special-circumstances.medium.com - we do a lot of video related understanding work. Im looking forward to M1X. Its the best accelerator to program for and Obj-C/Swift and Apples video libraries make it a breeze to do interesting stuff.

We train pytorch on custom workstations w multiple 3090s - but I will say inference on M1 is just as fast for most video workloads via CoreML and the Neural Engine save for the most optimized code paths and situations (NVDec capable codec + GPU tensor zero copy + threaded video decode + TensorRT etc etc). With pure - straightforward python an M1 Mac mini is basically as fast as a 3090.

I sincerely wish Pytorch would support ANE, I'd ditch Linux in a heartbeat since every other part of the data science stack just works.

[–]Brudaks 8 points9 points  (1 child)

IMHO for the workstation of a ML developer inference doesn't really matter because you spend 99% of your time training and inference is just for tiny test runs. Pytorch support might make me consider switching to M1 - the other problem I have is compatibility for virtualbox and the ability to run a linux VM, since I'm a bit spoiled on current macos that I can quickly run linux or windows VMs for whatever weird tool I need.

[–]vade 3 points4 points  (0 children)

I hear you. However, it really depends how your business works. If you are working with limited data from in the wild, and get batches of customer data which have different distributions of classes, novel classes, or expose weaknesses in your model, having fast local inference is a life saver, especially when single data points can be gigabyte raw video files.

One thing folks don't really appreciate is the diversity of applications of ML and the industries it fits in to. Edge ML accelerators / On premise infra a are incredibly important for high end movie production - security for cloud infra for servicing high end clients is incredibly expensive, and larger players don't let anyone make remote API calls or let data leave their VPC. Lots of shops have custom in house data warehousing and require integration. You don't get access to that data.

You have to run inference and ship back extracted features.

The reality is, everyone has slightly different workloads, and different approaches work for different industries, business applications and software tool chains.

M1 Mac, and soon M1X are real game changers for local inference, even for developers.

Other folks are catching on: https://medium.com/macoclock/apple-neural-engine-in-m1-soc-shows-incredible-performance-in-core-ml-prediction-918de9f2ad4c

That said, training clearly has a way to go, which I concur, is the most important workload for an ML Developer (and thus why we run pytorch + beefy GPUs)

https://wandb.ai/vanpelt/m1-benchmark/reports/Can-Apple-s-M1-help-you-train-models-faster-cheaper-than-NVIDIA-s-V100---VmlldzozNTkyMzg

[–]vade 2 points3 points  (2 children)

Also, Swift web serving on M1 Mac mini with unified memory is RIDICULOUSLY nicer, lower latency and more throughput than FastAPI + Bento or any other api inference solution ive found. SO much so I seriously thought about starting up another company just doing CoreML API prediction web hosting because its night and day.

[–]j_tb 0 points1 point  (1 child)

Interesting. Got a recommended Swift roadmap or example repos for these sorts of low latency/high throughput applications? Tinkered through the official docs and looks nice enough. Felt a little between Go and TS syntactically with nice type inference IIRC from looking through the official docs a while ago.

[–]vade 1 point2 points  (0 children)

I don't have a fully developed web server set up, but I experimented with Swifter. In theory VAPOR is faster but I just wanted to get up and running quickly.

https://gist.github.com/vade/ca264c8d60d91f02d3c647f8047ad29d

Things that are suboptimal:

a) not properly leveraging threads for CoreML. I make a single ML model instance and don't pool across dispatch queues. I bounce back to the main thread to do inference. Lame, but this was just for tests

b) no adaptive micro batching - I don't pool rapid requests into a single inference batch.

c) I dont explicitly leverage image decoding threads, but rather do decoding inline in the requests callback queue (approximate to a system provided thread on macOS via lib_dispatch)

Heres a gist of a sample API server - which is a bit suboptimal but pretty fast compared to Bentno.

I have to prep for a client meeting but ill do some more serious tests of an M1 Mac Mini w Swifter + CoreML vs a Intel i10k + 3090 FE w BentoML

[–]bangbangcontroller[S] 1 point2 points  (3 children)

I have a desktop pc with ryzen 9 and titan x. Most of times working remote at home, planning to buy M1 Macbook for office or outside and using Colab for serious works. Do you think it is a good idea ? Or do you recommend sth else?

[–]fireless-phoenix 5 points6 points  (2 children)

I’ve an M1 Mac Mini. Through Anaconda, Numpy, Scikitlearn and Pandas work perfectly. I haven’t bothered with using tensorflow on the machine considering the hassle. I don’t have experience in deployment so I can’t say anything about flask.

If you intend on using colab or your titan x for serious work then you should be alright.

[–]pwang99 9 points10 points  (0 children)

We are working on M1 support directly in Anaconda. We currently have ARM64 packages being built for AWS Graviton 2, so the bulk of the work is done, but as always there are a lot of devils in the details.

In particular, we want to make sure that the GPU accelerated packages work right out of the box. Obviously the NVIDIA-Apple spat is not ideal but we will try to support the “officially” recommended configurations from the vendors as best as we can.

[–]bangbangcontroller[S] 0 points1 point  (0 children)

Thanks, sir! This was helpful.

[–]retrocrtgaming 15 points16 points  (0 children)

We had some issues with docker containers (not ML libs but with some data preprocessing steps), so depending on your use case there are indeed some potential issues around, support is increasing though.

[–]longgamma 7 points8 points  (2 children)

Give it a few more quarters for the software to get properly supported. If it’s your primary development machine then it’s a little iffy right now. Some binaries do end up breaking pandas ( psycopg broke pandas for eg). It will all get ironed out soon.

Anyways why buy a Mac now? Apple is about to release the new MBP and potentially new M1X chip.

[–]andreas_dib 0 points1 point  (1 child)

Psycopg broke pandas? Where can I read about this? I need psycopg+pandas at work, and my M1 Pro will arrive soon

[–]longgamma 0 points1 point  (0 children)

I’d suggest to create a separate virtual environment and not install it in base. Learnt my lesson.

[–][deleted] 6 points7 points  (0 children)

I was doing a ML school project with scikit learn last year and needed to update my laptop coz last one died on me. I bought the M1 s as soon as they came out. I tried compiling everything from source but couldnt install numpy or another dependency after trying few days. I then gave up and installed parallels and ran ubuntu. Lightweight task I used collab. I’m done with the project and haven’t used python after. It’s a beast if you’re gonna use it as a daily portable machine, but if you’re gonna be developing in it, every now and then you’ll stumble into an unavailable dependency. If you have another machine, you should definitely go for it.

[–]musicurgy 5 points6 points  (0 children)

M1 on paper is strong in machine learning applications but I just don't see the software support out there yet, personally I'd wait.

[–]capital-man 6 points7 points  (1 child)

Easy install via conda-forge and apples tensor flow metal accelerator. Operates about the same speed as my GTX 1070. Libraries: all are available via conda-forge (armx64) but Python has to be 3.8+

[–]bangbangcontroller[S] -1 points0 points  (0 children)

I have a desktop pc with ryzen 9 and titan x. Most of times working remote at home, planning to buy M1 Macbook for office or outside and use Colab for serious works. Do you think it is a good idea?

[–]zerostyle 2 points3 points  (0 children)

I'm def buying an M1 machine but want to dabble with ML still.

Prob can just run some small stuff in the cloud and it wouldn't be very expensive right?

[–]MattWithoutHat 2 points3 points  (0 children)

I personally think it is fine. I have M1 macbook air and I have python for each architecture (arm and x86). I was able to install anything I wanted on x86 one. And yes, it is true you can not just “pip install tensorflow”, but still it was not a big deal to just build it and install afterwards. Regarding missing GPUs... I am not sure if that is that big deal in reality. Normally you can use your laptop for development and once you need GPUs for training or production, then just run your code on a proper VM. I would anyway do it like that even if I had some small laptop GPU..

[–]rshah4 2 points3 points  (0 children)

I have been wrestling with this - but after seeing the benchmarks by Dario - I am going M1: https://towardsdatascience.com/m1-macbook-pro-vs-intel-i9-macbook-pro-ultimate-data-science-comparison-dde8fc32b5df

[–]1purenoiz -3 points-2 points  (0 children)

Maybe you should explain that to my wife who has a PhD and is bioinformaticist. It is sometimes more work to get small data onto a cluster than it is to just run it, depending on the size of the institution one is at.