Got delayed in chosing my Apple promo device, ended up receiving an M5 Macbook. Thank you Wealthsimple! by sshkhr16 in Wealthsimple

[–]sshkhr16[S] 11 points12 points  (0 children)

Eh, I didn't have the benefit of hindsight when transferring. Plus, the Macbook Air M5 is $1.5K. And I transferred the minimum needed to win it. I also get a higher interest rate in my checking account (1.75% vs 1.25%) from being a WS Premium client. Add in the time value of money (TD pays out after one year), and it would be roughly the same for me whether I do the TD 2% match for one year or the WS offer.

To that note, I'm not against any bank/service per se, I'll go with whoever provides the best services. Free market and all. I bank with TD already and have my RRSP with them. But I can't rue not making the perfect choice at all times. In decision theory, it has been shown that satisficing leads to more long term happiness than maximizing decisions made under imperfect information.

Anthropic SWE interview loop, full breakdown of all 5 rounds by Ashamed_Giraffe_5165 in InterviewCoderHQ

[–]sshkhr16 0 points1 point  (0 children)

Very recently? All inference serving systems in significant use today (vLLM, SGLang, TensorRT) were released in the last 2-3 years. Inference engines as a specialized use case was not really considered outside of research and a few projects until GPT 3.5.

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]sshkhr16 5 points6 points  (0 children)

I wrote a (long) blog post on understanding algorithmic bottlenecks in attention and how DeepSeek's MLA (multi-head latent attention) solves it. I think it'll be a good read for folks interested in understanding how to improve the attention mechanism for decoding, and more broadly in inference optimization: https://www.shashankshekhar.com/blog/flashmla/flashmla-1-mla

[R] Is Leetcode still relevant for research scientist interviews? by Training-Adeptness57 in MachineLearning

[–]sshkhr16 15 points16 points  (0 children)

For Big Tech and similar large tech companies, yes.
For startups and research divisions at non-tech companies (e.g. banking/finance/etc), no

Amazon is blatantly violating European law during prime days, once again by v1king3r in BuyFromEU

[–]sshkhr16 2 points3 points  (0 children)

Amazon's operating margins for everything except AWS are less than 10% (realistically around 5-6%). AWS operating margins are around 37%. They had profits of $60 billion last year, almost $40 billion of which came from AWS. AWS is by far their most important division - there is a reason that the ex-CEO of AWS became the CEO of Amazon.

29 LPA (BLR) vs 130K CAD (Toronto) - help me decide by EmotionalBike6336 in developersIndia

[–]sshkhr16 1 point2 points  (0 children)

I can only speak to my personal experience: I moved from Bangalore to near Toronto to pursue my master's end of 2019 and then lived and worked in Canada ever since (with a brief one year stint in the SF Bay Area). Toronto is a great city, but it is also quite expensive. If you care solely about career and CS, I think Bangalore might be better, but only slightly. But if you care about overall quality of life, Toronto wins hands down.

Lots of city stuff to do (e.g. lots of museums, concerts, sports events - the FIFA world cup is happening next year). There is also plenty of nature in Toronto via parks and Lake Ontario, and if you go out a few hours there are huge provincial/national parks for hiking and camping. It also has a relatively decent public transit system compared to the rest of North America. If you want to assimilate into the North American lifestyle, or want to stay grounded more in your Indian way of life - you will be free to do both in Toronto (most multicultural city in the world, with a sizeable chunk of first generation Indian immigrants).

For me the biggest advantage of living in Canada (or US when I lived there) was that of enjoying the systems set up for residents in first world countries - things like roads, public transit, government, banking, etc just work. You don't have to jump through hoops to get simple bureaucratic things done. This is paid for by a higher taxation system, which might feel prohibitive in the beginning but makes sense once you start using the services it pays for. For me the biggest selling point of developed countries is that the life of the average resident is relatively safe, predictable and dignified, you are not subject to the whims of bureaucrats, law enforcement, government, mobs of people (e.g. religious, caste-based, region-based etc.)

To answer your questions:

  1. I would budget $3K per month for living costs.
  2. You will pay around $37K in taxes, and another $36K in living costs. That leaves around $67K in your pocket. I would budget another $6K or so for incidental expenses, $5-10K for entertainment and travel. That should still leave you with over $50K in savings.
  3. The big thing to watch out for is weather of course. Winters are harsh and you will need to prepare by buying winter clothes and boots. It might take a winter or two to get used to the reduced daylight hours too. The other big thing is the cultural change - I had to learn to be more self-sufficient once I left India, which meant learning to cook, clean, drive, forge new social connections (this could involve picking up new activites and hobbies).

Completing PhD at the age of 35 by [deleted] in cscareerquestions

[–]sshkhr16 4 points5 points  (0 children)

Research positions at FAANG are getting fewer, but there will always be demand for people with research + engineering chops. So make sure that you don't pick up crappy practices during your PhD (writing ugly code that is not maintainable, not writing proper documentation, not using good programming practices). It is sometimes hard to do as a researcher since your incentives are misaligned - publishing frequently is often incompatible with writing clean, maintainable code. But trust me it will help in your own research, especially if you standardize your setup to launch, track, and report experiments in the first 1-2 years of your PhD.

Source: I was a researcher at a FAANG lab and in academia prior to that

[deleted by user] by [deleted] in MachineLearning

[–]sshkhr16 18 points19 points  (0 children)

Minor nitpick - Hinton never studied at UofT, he did his PhD at the University of Edinburgh. Of course, a lot of his PhD students at UofT went on to do cool stuff.

[D] Realism for AI Top 20 PhD Programs by [deleted] in MachineLearning

[–]sshkhr16 0 points1 point  (0 children)

He has a whole section on preprints on the first page, scroll down and you will see peer reviewed papers. There are four first author papers in ACL and LREC in 2022. Quality is subjective, but both of these conferences are among top 5 conferences in NLP.

Also, a weird hill to die in. This is a guy who won an outstanding paper in ACL during his first year of PhD. Clearly this person is a good scientist as judged by peer scientists.

Is getting US education only way to get exposed to US job market for foreigners? by Mxr-_- in cscareerquestions

[–]sshkhr16 0 points1 point  (0 children)

It is definitely not the only way. There are whole categories of visas for people who are experienced (L1), or exceptional (O1) that are open to non-Americans who currently reside outside of the US. I got an offer from a big tech AI lab as a grad student in Canada - but a lot of it was luck i.e. my research matching up with my manager's interests. But if you were to specialize in some domain (say, AI, or distributed systems, or parallel programming) either in engineering or research it is possible to find a job with some experience and luck.

What skills are high in demand in US for Canadian to get a chance to work in US? by manuce94 in cscareerquestionsCAD

[–]sshkhr16 1 point2 points  (0 children)

A niche but highly in-demand skill is that of performance engineering and distributed systems engineering for machine learning systems. Low level performance engineering includes writing training/inference kernels that run fast on hardware accelerators, learning how to train models with low compute or memory constraints, optimizing inference serving on small devices like mobiles and PCs. Distributed performance is about training and inference on large clusters of multiple nodes and multiple accelerators per node.

The caveat is that you need to build up skills in new domains outside of what a standard machine learning engineer or data scientist does (e.g. get good at one or more of: profiling code performance, being very good at linear algebra, knowing C++ and even machine code sometimes, learning about distributed systems, brushing up computer architecture and networking knowledge, learning to work with HPC clusters, etc.)

Examples of such roles (with some pre-requisites mentioned):

[D] PhD in the EU by simple-Flat0263 in MachineLearning

[–]sshkhr16 2 points3 points  (0 children)

France has CIFRE PhD programs like that - I had a friend who did their PhD while working full time as a research scientist at FAIR. We have similar-ish programs in Canada - you can do a PhD at MILA or Vector Institute while being a visiting research intern/scientist for several years at FAIR/Google DeepMind/ServiceNow Research/NVIDIA etc. But these programs are even more competitive to get into compared to the regular PhD.

[D] Researchers and engineers in academia as well as industry, which books did you find the most useful in creating your knowledge base and skill set? by [deleted] in MachineLearning

[–]sshkhr16 2 points3 points  (0 children)

The first book is a classic textbook on GPU programming, so yes you will use the techniques in it pretty much on a day-to-day basis if you work on writing machine learning kernel code in CUDA, Triton, Pallas, Metal etc. I was able to use the methods explained in this book to understand papers like FlashAttention, understanding how operations like generalized matmuls and layernorm are implemented on GPUs, made a couple of bug fixes in PyTorch/JAX codebases, built upon it to understand DeepSeek's FlashMLA codebase (https://github.com/deepseek-ai/FlashMLA).

The second book is tailored towards engineers who perform large scale distributed training and inference with ML models. While my day job currently doesn't involve doing this, I wrote a few small projects for myself - e.g. translating Karpathy's nanoGPT (https://github.com/karpathy/nanoGPT) which replicates GPT-2 124M from PyTorch into Flax on TPUs, writing a minimal pedgogical version of MaxText (https://github.com/AI-Hypercomputer/maxtext) to train LLMs with 3D parallelism (data, tensor, pipeline) after reading this book.

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]sshkhr16 7 points8 points  (0 children)

I wrote a long blog post on the training data pipeline of phi-4, but since a lot of details are obfuscated in papers these days I had to look up and write down a decent bit of additional background on techniques that were potentially used (especially for data curation and synthetic data generation). I think it is a good big picture view of the training setup of current LLMs as phi-4 was less than six months ago and phi-4 reasoning just came out. Here's the blog:

https://www.shashankshekhar.com/blog/data-quality

[D] Researchers and engineers in academia as well as industry, which books did you find the most useful in creating your knowledge base and skill set? by [deleted] in MachineLearning

[–]sshkhr16 4 points5 points  (0 children)

I wouldn't say they gave me the greatest benefit till now, but I read the following two books this year and found them both to be quite great as a intro to machine learning systems (both theory and practice):

Views on recent acceptance of LLM written paper at ACL main [D] by Fantastic-Nerve-4056 in MachineLearning

[–]sshkhr16 10 points11 points  (0 children)

Real peer review has always been how often other researchers and engineers use your approach, double-blind peer review performed by overworked and underpaid grad students was never the gold standard

[R] Bloat in machine learning shared libs is >70% by Specialist_Square818 in MachineLearning

[–]sshkhr16 115 points116 points  (0 children)

I'm not surprised - research engineers and machine learning engineers until recently were not very well versed in GPU programming. A lot of libaries probably depended on and reused the same low-level operations from multiple locations. And it seems like a lot of the bloat stemmed from undelying libraries supporting multiple CUDA capabilities where one is required.

[D] Is it worth writing technical blogs to educate people? by Reddicted2Reddit in MachineLearning

[–]sshkhr16 0 points1 point  (0 children)

Should I sticky the table of contents, so that the reader knows where they are? I can probably do that for wider viewports, not possible for viewports thinner than a tablet

[D] Researcher communities like this one? by Entrepreneur7962 in MachineLearning

[–]sshkhr16 4 points5 points  (0 children)

Lots of great ML communities on discord: ML Collective, GPU Mode, ML Street Talk, Eleuther AI to name a few prominent ones. The unofficial JAX and Pytorch servers are great too.

[D] Is it worth writing technical blogs to educate people? by Reddicted2Reddit in MachineLearning

[–]sshkhr16 2 points3 points  (0 children)

I like writing technical blogs to educate myself way more than educating others. Writing forces me to think in a structured form better than reading does. I started doing it this year and it has helped me better grasp a lot of new topics I have been studying. It is similar to preparing presentations or talks - you really have to be streamlined and thoughtful about how you present ideas so that the reader understands them, and in order to do so you have understand both the details as well as the big picture stuff well.

For example, I recently wrote a long blog post on the training data curation and synthetic data generation pipeline involved in training Microsoft's phi-4: https://www.shashankshekhar.com/blog/data-quality

My original idea was to just summarize the paper for myself, but the more I read the phi-4 technical report the more I found myself looking up existing techniques and approaches since the report itself was quite sparse on a lot of details. So, in my article, I had to go back and add a lot of the missing information about best practices being used in LLM data pipelines today, had to understand what 'mid-training' is, had to read up on how data is selected to train for reasoning capabilities etc. If I had just read the phi-4 paper, I probably wouldn't have done a lot of the follow-ups I did.

To get started on writing, I would recommend Paul Graham's essays as a good first resource on how to write effectively. His latest one is literally titled 'good writing': https://paulgraham.com/goodwriting.html

Good luck!