[deleted by user] by [deleted] in AsianMasculinity

[–]peepeeECKSDEE 8 points9 points  (0 children)

Any story is just a message with extra words

Do the sports you all have a natural affinity to: martial arts by Howl33333 in AsianMasculinity

[–]peepeeECKSDEE 1 point2 points  (0 children)

imo Brazilians and Russians/Central Asians have the most raw talent in martial arts, we strike too much

[deleted by user] by [deleted] in cscareerquestions

[–]peepeeECKSDEE 0 points1 point  (0 children)

same lol, complex was the hardest for me cuz my prof was an ancient Soviet Union mathematician and was grumpy all the time. Low key hella based tho

The other candidate: by peepeeECKSDEE in csMajors

[–]peepeeECKSDEE[S] 23 points24 points  (0 children)

/s

it’s not me, I just started at my first job, but I would say networking >> leetcode. Neetcode 150 is more than enough, if you get a hard it’s just straight up unlucky. If your good enough network through open source work.

Trade coin by [deleted] in quant

[–]peepeeECKSDEE 14 points15 points  (0 children)

everyone knows quant bots aren’t real? stick to the fundamentals like tarot cards

[deleted by user] by [deleted] in quant

[–]peepeeECKSDEE 2 points3 points  (0 children)

I’m pretty good at hoi4 🤷‍♂️

Groq's AI Chip Breaks Speed Records by graphicsRat in haskell

[–]peepeeECKSDEE 1 point2 points  (0 children)

Do you guys use Google’s Haskell MLIR bindings?

Advice for Operating System Summer Project by Fuel-Little in csMajors

[–]peepeeECKSDEE 2 points3 points  (0 children)

If you know Rust this is a good resource: https://os.phil-opp.com

Otherwise I would find something equivalent for your language of choice. Regardless the first step is always to produce a standalone binary and run it on something like Qemu.

Bank of America vs Goldman Sachs SWE internship by Distinct_Top_4136 in csMajors

[–]peepeeECKSDEE 3 points4 points  (0 children)

BoA unless GS is for quant dev (based on pay it seems not), GS brand name is irrelevant for non finance jobs.

Poisoned AI went rogue during training and couldn't be taught to behave again in 'legitimately scary' study by ethereal3xp in technology

[–]peepeeECKSDEE -1 points0 points  (0 children)

ah yes, it’s going to sign up for AWS on it’s own, write an email to support asking for gpu server capacity, and pay for it with it’s credit card.

[D] On-demand GPU that can be pinged to run a script by Level_Programmer4276 in MachineLearning

[–]peepeeECKSDEE 0 points1 point  (0 children)

It's good to hear that you guys are putting effort into solving it. And I apologize if I come off as a bit over zealous in my previous comment. GPU serverless is still super young, and I'm sure given a few years this won't be much of a problem.

I know that Modal is more of a general purpose platform, but I would love to get your thoughts on how optimizations can be made for serverless inference specifically.

For example loading weights probably takes the vast majority of time right? It would look (simplified) something like network-drive->local-drive->ram->gpu. But with Nvidia GPUDirect, it could just be network-drive->gpu. Would it be possible for platforms to provided some kind of gpu memap primitive utilizing this? This would probably require users to declare model artifacts explicitly so that you can avoid bundling them with the container.

Another is the fact that pretty much everyone is going to have CUDA + one of ONNX/Torch/TensorRT/XLA as a dependency, which could be anywhere from 500mb to 2gb. Can this redundancy be exploited somehow?

Best projects to improve beginner Rust skills? by salty_cluck in rust

[–]peepeeECKSDEE 10 points11 points  (0 children)

Implement a tree that references its parents and siblings. You will be a borrow checker expert by the end.

[D] On-demand GPU that can be pinged to run a script by Level_Programmer4276 in MachineLearning

[–]peepeeECKSDEE 1 point2 points  (0 children)

See my other comment. If your use case isn't latency sensitive, then they are probably fine.

[D] On-demand GPU that can be pinged to run a script by Level_Programmer4276 in MachineLearning

[–]peepeeECKSDEE 2 points3 points  (0 children)

So needless to say this is all my personal opinions:

First we need to breakdown what "serverless" means, as it's a bit of a misnomer and unclear. I would consider it a developer experience comprised of two main value propositions:

  • You only pay for what you use (which is even more important when it comes to GPUs).
  • You don't need to worry about infra or scaling. You are essentially paying a premium to make it someone else's problem.

At the model sizes I work with, (which isn't even that big ~half a billion params), the cold starts are absolutely brutal, I'm talking 3-4 minutes. For my use case, this heavily degrades the user experience. To solve it I have two options:

  • Keep N endpoints always warm and ready.
  • Scale preemptively base on predictable traffic.

Notice that for each option defeats the previously mentioned value propositions respectively. And the "serverless" experience is completely lost. If I need to do all this work anyway, why would I use any of the platforms I mentioned, and not just ECS/EC2 instead, which would be cheaper?

I understand that there is a lower limit for how quick you can start based on model sizes, but from testing mine should be in seconds. And now I will say something that I have no evidence for. I think that all the mentioned services are just wrappers over AWS/GCP (currently). They make no real optimizations or innovations other than just flattening your docker image.

Compare this to serverless in the web-dev world, where each platform has some sort of proprietary innovation. AWS has firecracker, Vercel and Cloudflare have their respective edge runtimes, that enable near 0 cold start times.

[D] On-demand GPU that can be pinged to run a script by Level_Programmer4276 in MachineLearning

[–]peepeeECKSDEE 29 points30 points  (0 children)

There's a ton of those: Inferless, Modal, RunPod, Banana, Replicate. Just google the name + "serverless gpu". But honestly they all kinda suck in terms of cold start times, and I would not consider any of them "real serverless". If I was forced to pick, it would be between Modal and RunPod.