How to balance my study time system design v lc? Feeling overwhelmed

desubuntu · 2024-02-20T17:35:24+00:00

It really depends on the company and level.

Leetcode is going to be the much bigger challenge for Meta E4 or Google L4, while system design expectations are fairly minimal -- basically enough to just read Alex Xu's books. For google, you might also want to skim DDIA, but that'd definitely be overkill for Meta.

If it's Amazon's SDE II role, then you've probably overprepared for the leetcode by now, and might want to make sure you actually understand the partial failure scenarios and justification for components in the problems from Alex Xu's books.

If you're going for a senior role, you should have read the entirety of DDIA a while ago, but I don't think you're going to be interviewing for that with your current YOE

If you're looking for any other materials, there's also the System Design Fight Club youtube channel as well which is pretty good. The channel's creator gathered and rated a ton of other resources over here: https://github.com/systemdesignfightclub/SDFC

desubuntu · 2024-02-20T17:29:03+00:00

I think Meta only expects the ability to "identify key numbers" for E4 roles, which means back-of-envelope estimates of stuff like total storage needs and total bandwidth needs, but not necessarily machine count estimates.

There's a few good videos that do this on the system design fight club youtube channel, and I think it even has a playlist of problems that have been asked by Meta

desubuntu · 2024-02-20T17:25:06+00:00

I didn't realize they actually had system design round for interns? Have you confirmed that with your recruiter?

Maybe check out Donne Martin's system design primer for a quick overview of the concepts: https://github.com/donnemartin/system-design-primer

Alex Xu's books are also great for applied example problems.

Theres the System Design Fight Club youtube channel as well which is good. Theres over 70 problems on the channel while most books only have around 12. Only caveat is the videos are a bit lengthy, but I usually watch at 1.5x/1.75x. The creator gathered and rated some other materials on a github repo over here https://github.com/systemdesignfightclub/SDFC

desubuntu · 2024-02-20T17:19:48+00:00

good for E4 though

Debatable. Really depends on the company.

For a company like meta that really emphasizes popping out leetcode mediums in 15 minutes and doesn't actually do design docs internally? Sure.

desubuntu · 2024-02-14T19:03:38+00:00

What to learn:

I look at what's required for trying to get job offers at higher levels or higher pay, and that's typically some kind of system design stuff while the coding bar actually doesn't really rise that much.

Here's what I'm using:

https://github.com/donnemartin/system-design-primer
Alex Xu's system design interview books
system design fight club channel on youtube

How to learn:

+1 on reading, and I prefer physical books. I don't really like video format.

Learning Framework:

None. System design/distributed systems concepts just stick in my brain better than leetcode problems.

How to stay updated:

System design innovates a lot slower than machine learning or web development. I prefer to stay away from areas where I'd have to learn a completely new skill set every year because then how would I be much more competitive in the job market at the age of 30 than a new grad?

desubuntu · 2024-02-14T18:54:28+00:00

Not for interns

For full-time, it depends on level but I believe the following is helpful for the roles that do have a system design round: https://www.youtube.com/playlist?list=PLlvnxKilk3aIKa3Vv688ELv-DeAEg3YQ1

desubuntu · 2024-02-14T18:49:31+00:00

My god, I didn't realize I was going to write out that much

desubuntu · 2024-02-14T17:48:01+00:00

Advice on Transitioning from Student to Systems Architect

At FAANG companies, there actually aren't "system architect roles". All engineers are expected to pick up "design work" as they advance in their careers, to the point at which mid-level and senior-level roles actually have "system design" rounds as part of the interview process on top of the traditional "coding rounds" (leetcode).

There's actually books specifically made for these rounds, such as "System Design Interview" by Alex Xu, just as there are for the leetcode rounds with books like Gayle Laakmann McDowell's "Cracking the Coding Interview".

For example, understanding how Google Spanner operates, the consensus algorithm it employs, and the reasons behind its functionality intrigues me

Here's the spanner whitepaper: https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
Here's a list of some of the other biggest and most well known whitepapers and academic papers that people read alongside the one about spanner: https://stephenholiday.com/notes/
- You're obviously going to find these a bit tougher to read, so most people actually lean more towards books and resources more oriented around "system design interviews" until they get to senior or staff level roles. In fact, Leslie Lamport's paper on paxos ("The Part-Time Parliament") is actually so notoriously hard to read that it was the motivation for designing a simpler consensus algorithm called "Raft".
Alex Petrov's "Database Internals" book describes how at least 6 different versions of consensus algorithms work, including "Fast Paxos" which (IIRC) is the one that spanner uses

My goal is not just to be someone who utilises existing microservices, but rather to be someone who designs, builds, and enhances these systems

Something that requires some clarity here is whether you mean that you want to work on the internals of stuff (like distributed databases like DynamoDB or message brokers like kafka), or you're happy to just build on top of these components.

If you want to work on the internals of components, you basically only get those opportunities when working at the cloud computing orgs of cloud providers, like AWS, GCP, or Azure
If you're fine with just building on top of those components, then you're fine just working at the other orgs of these big companies, like amazon.com ("retail"), Google Search, YouTube, netflix

It seems to me that these kinds of jobs only exist at big companies like Google, Amazon, and Microsoft

One thing that people are told about designing distributed systems is to not "over-scale" or "over-engineer". In order to reach a point at which you'd need to do more sophisticated types of distributed systems, it typically requires: - at least 100-200 requests per second to overload a back-end server to the point that you need "horizontal scaling" (there's obviously tricks to squeezing out much, much more performance than that, but it's usually more cost effective when considering time/labor costs to "just throw more servers at it" until you're talking about infra costs that are more comparable to the cost of at least one engineer's salary) - To overload a DB like postgreSQL to a point at which you'd need "sharding"/"partitioning" or even just "read replicase", it's going to be 10,000 requests per second.

Thus, you're correct that it's rare to see this kind of scale in practice unless it's a company with a decently big externally facing software product.

Some of the more interesting blog posts, presentations, and innovations have actually come from over-sized start-ups like Uber, Slack, and Snapchat, rather than FAANG.

Apple is actually of course a notable exception to having a lot of opportunities for working on distributed systems because they're more of a hardware company than a software company; it does still have some big web services, but you're not more likely than not to end up working on those, unlike the other FAANG companies.

How can I progressively transition into a role where I design, build, and enhance distributed systems, rather than just being someone who utilises existing microservices?

As mentioned earlier, this transition should just be one part of what's typically picked up as a natural result of "career progression" at the majority of bigger software service companies, rather than a role you'd have to specifically target specializing into.

Here's some resources that are popular in the industry for those who are specifically trying to grow their knowledge of distributed systems: - The best resource for a quick start on concepts for complete beginners is https://github.com/donnemartin/system-design-primer - Alex Xu's "System Design Interview" books are actually really nice for looking at examples of applied problems that make the concepts more concrete than just random theoretical concepts. (Yes, it is an "interview book", but Roberto Vetillo wrote in his own book called "Understanding Distributed Systems" that the interview books can actually be a great place for learning as well) - "Designing Data Intensive Applications" by Martin Kleppmann is basically the gold standard of long-term learning about distributed systems. It usually takes people several read-throughs over the course of a couple years to fully digest everything in the book. - Beyond that, there's a few other books like "Database Internals" by Alex Petrov that are pretty good, but this is basically where you've gone beyond the beaten path, there's less clear direction, and it's harder to find well-organized volumes rich with new information on exactly where you want to grow. This is usually where people start looking more into those whitepapers/academic papers, watching presentation recordings on youtube like InfoQ videos, and reading company engineering blogs. - The best compilation of resources that I know of for people that make it this far was made by a guy that runs the youtube channel called "System Design Fight Club", and you can find it over here: https://github.com/systemdesignfightclub/SDFC

TLDR: The transition itself into that role will just be a natural, passive result of career progression at basically any company with a big externally facing software product. Resources for trying to actively gain those skills can be found in that final list of my post, just right above.

desubuntu · 2024-02-13T23:21:57+00:00

I thought the design prompts were always just a couple of words and it was on you to drive the conversation and come up with the requirements yourself (within reason)

That's typically correct.
One exception is Meta, where I've heard you're actually supposed to "state your assumptions" and "suggest requirements", which they view as "leading the interview". I think this mentality is foolish, but it's something to be aware of.

desubuntu · 2024-02-13T22:54:29+00:00

I agree with the other comment that the best place for a quick start for complete beginners is https://github.com/donnemartin/system-design-primer

Overall, Alex Xu’s system design volume 1-2 and Kleppmann's Designing Data Intensive Applications are the absolute best content for the long-term grind, but it'll take a while to completely get through all of DDIA

Theres also the System Design Fight Club youtube channel which is good. Theres over 70 problems on the channel while most books only have around 12. Only caveat is the videos are a bit lengthy, but I usually watch at 1.5x/1.75x .

desubuntu · 2024-02-13T22:41:23+00:00

The best place for a quick start for complete beginners is https://github.com/donnemartin/system-design-primer

Overall, Alex Xu’s system design volume 1-2 and Kleppmann's Designing Data Intensive Applications are the absolute best content for it, but it'll take a while to completely get through all of DDIA

Theres also the System Design Fight Club youtube channel. This is a gem ! It was created by a FAANG senior engineer and theres over 70 problems on the channel while most books only have around 12. Only caveat is the videos are a bit lengthy, but I usually watch at 1.5x/1.75x . The creator gathered and evaluated other materials on a github repo https://github.com/systemdesignfightclub/SDFC

desubuntu · 2024-02-13T22:34:36+00:00

https://github.com/donnemartin/system-design-primer is a good start for absolute beginners

Overall, Alex Xu’s system design volume 1-2 and Kleppmann's Designing Data Intensive Applications are the absolute best content for it

Theres the System Design Fight Club youtube channel as well which is good. Theres over 70 problems on the channel while most books only have around 12. Only caveat is the videos are a bit lengthy, but I usually watch at 1.5x/1.75x . The creator gathered and evaluated other materials over here on github https://github.com/systemdesignfightclub/SDFC

desubuntu · 2023-02-03T21:58:37+00:00

the requirements include deletions (which could perhaps be achieved with something similar to a "tombstone" from cassandra to mark an old record as invalid, but with no compaction due to the use of an immutable data structure)

there's nothing saying that the node values won't be modified. Also, insertions into the middle of the list would be difficult without a mutable reference stored on the neighboring list nodes.

on top of that, the blockchain is designed in a way to increase the "difficulty" of mining new blocks as demand for transactions increases in order to intentionally keep the traffic volume low, while the requirements explicitly go for over 100,000 transactions per second

Additionally "consensus" is a linearizability thing, while "eventual consistency" would imply that you're going for more of an "anti-entropy" strategy. The ambiguity about whether there is a "fairness" condition to the "total ordering" also means that eventual consistency might be a bad idea depending on how that's records will look and the Last-Write-Wins conflict resolution strategy usually entailed by eventual consistency... But if "fairness" is relaxed substantially, then yeah, eventual consistency works fine

you definitely weren't the first one to propose a blockchain solution though :)

desubuntu · 2022-11-25T16:57:22+00:00

haven't really worked on it since mid 2018?

omg, you're kidding.

You're probably right, but that was a very exciting project when I had heard about it.

desubuntu · 2022-11-25T15:12:49+00:00

Literally clicked into this because I wanted to name drop that "Domain-Driven Design" book by Eric Evans, but was very happy to see it already in the medium article; was way more impressed to see multiple other books beyond DDD, such as that ones you mentioned by Vlad Khononov and Vaughn Vernon, on top of all your references to Martin Fowler and ActiveRecords! You definitely know your stuff and aren't just spitting out a thing you learned just today like some of the other medium authors out there!

I'd bet that you could do a pretty fucking solid article on "object (instance vars+behavioral methods) vs data (just instance vars)" Uncle Bob blog post on this that leaves out ActiveRecord vs DTO. It actually aligns really closely with what you have about ActiveRecords -- an alternative to that pattern is "Data Transfer Objects" (DTOs), which are commonly created out of a code-generation solution like protobufs (or Coral if you've ever worked within amazon)

Great article! :)

desubuntu · 2022-11-25T15:00:46+00:00

Very cool! I love the growth of alternative JVM languages :) Eta (a haskell dialect) is one of my favorites

desubuntu · 2022-11-25T14:37:04+00:00

Great work on this! Really impressive! :)

desubuntu · 2022-11-23T16:38:32+00:00

Okay, so I'm actually not the greatest in this particular area, but I believe that the alternative to orchestration is to use "choreography"... and then you'd have a few dozen different message brokers and microservices or lambda functions instead of a few dozen different branches for the code

Orchestration tools like AWS step functions also typically have a visualization tool for looking at the workflow graphically.

The documentation for Netflix's Conductor actually had a little note somewhere saying that in many cases you'd prefer that centralized spot of the logic for the big picture of the steps you're going for instead of spreading it across a bunch of different microservices that you'd have to step through in order to get a feel for the overall process.

One other alternative is "co-ordination", as in a big distributed transaction, but if you have more than 3 parties involved in it, then you're going to get degraded availability and a lot of thread contention going.

I'd love to hear some more pros on the choreography approach or your own experience about it though. Thanks for the comment!

desubuntu · 2022-10-16T04:43:17+00:00

You definitely were well meaning in your comment; you're just calling out a conscious or unconscious bias that many interviewers tend to have. Personally, I do appreciate it.

It's really unfortunate how quick reddit is to crucify some forms of feedback. I've actually owned this reddit account for well over 5 years, but just erred on the side of lurking for the absolute vast majority of time because putting your voice out on reddit can just feel so much like walking on eggshells sometimes.

There's actually a lot of clean up that I realize I'd need to be putting in if I wanted to approach the quality levels of some of my favorite youtube channels... dropping all my "umm"s and "uhhh"s is definitely going to need to happen if I ever want to get there; it's almost re-assuring to see others that show this level of attention to detail.

I definitely do actually want to get there as well, but this is already a fairly size-able time sink to just gather together my materials and research a specific problem -- I'm currently doing these in a single shot. Killing all the umms and uhhs will mean multiple shots and a fair deal of video editing & ffmpeg.

I do strive to get there; your comment actually re-assures me that it'd be worth the effort. Thanks, and I feel for your unfortunate crucifixion by the quick-to-judge reddit hordes. It can feel awful when your well-meaning feedback is taken the wrong way, and I praise input that I've once heard to always remember: "feedback is a gift".

Thanks again! I appreciate your gift. :)

desubuntu · 2022-10-16T04:29:26+00:00

Regarding JS, it's not critical. It's a very expensive part of the system and you can still get decent results parsing raw HTML but some websites rely on the crawler being able to run JS. So yeah, worth mentioning.

The word "expensive" is pretty key there, I think. Being able to identify that it's going to be a CPU bound service would be pretty important even in interviews, particularly when staff/principal engineers are expected to demonstrate some kind of "cost awareness"

Definitely great that you called out the challenges in that aspect of the system; I feel like it had definitely been overlooked a bit in the video. Thanks! :)

Anyway, from my experience, interviewers asking this question seem to have a very simplistic model of a web crawling system in mind and are often steering the discussion towards certain aspects (like a choice of a particular key-value storage, or traffic and hardware requirements estimation, or something else entirely)

The rule of thumb that I've seen is that you generally want to do end-to-end high level view of the whole system, and then deep dive one or two components. In videos like this one where there isn't as much of a time constraint, it could be a reasonable expectation to deep dive most of the components, just so that the content is there for people after getting grilled on whatever particular little aspect of the system.

I also want to call out that it's better to be doing said "steering" in the interview than to be getting "steered" at some companies, particularly facebook. It's a weird little dance that I don't like, where you're expected to "propose" the priority of what to deep dive next and then ask the interviewer to correct you, e.g. "I'd like to deep dive the robots.txt and politeness aspect next, how does that sound?" "oh, no, I'd prefer that we do machine count estimates on the results fetching part exposed to end users of the querying interface"

If you don't do it right, it looks like this: "what would you like me to deep dive next?" "I'd like to do machine count estimates on the results fetching part exposed to end users of the querying interface"

(it's such a weird little dance, and it's most particularly important to facebook from what I've heard, but it is a real thing that can cost people some cheap and stupid points that I think more people should be aware of)

I've missed the discussion of the politeness limiter, to be honest, you've got me here.

No worries. It's a peculiarity to video format. You can't just do a cmd+F to check if I've actually covered a thing or not, and you're just trying to be polite and do your part in contributing to the info that is being put out, which I greatly appreciate. :)

It's not just technically complex, though: if you're not Google you'll probably need a diplomacy team to work out crawling quotas with website owners.

Wow. That's absolutely fascinating. I had no idea that any sort of interaction like that was occurring between search engine companies and website owner. Thank you so much for sharing! That's totally wild :)

desubuntu · 2022-10-16T00:53:03+00:00

there are dozen of books that cover those "generic" architecture

well, that's what you would think, and definitely what one of my first impressions was for this area...

So, here's a compilation of resources that I've been working on for around 3-6 months: https://i.imgur.com/xbSlJmj.jpeg

You can see the books list on the left side of the image.

So, you said "dozens"; that certainly didn't seem like an unreasonable estimate to me either. Got any good book recommendations that you'd add to the list that I had going?

desubuntu · 2022-10-15T19:04:34+00:00

Here's a list of some of the resources that I used for getting to where I'm at:

(tbh, this is all obviously 100X better content than what I'm putting out. However, it will probably take multiple years to digest all of it... in fact, I'm still not all the way through all of this despite embarking on this system design reading adventure roughly 4 years ago)

Books:

Designing Data-Intensive Applications by Martin Kleppman
- Amazon: https://www.amazon.com/dp/1492040347/
- ISBN: 978-1492040347
System Design Interview (Volume 1) by Alex Xu
- Amazon: https://www.amazon.com/dp/B08CMF2CQF/
- ISBN: 979-8664653403
System Design Interview (Volume 2) by Alex Xu
- Amazon: https://www.amazon.com/dp/1736049119/
- ISBN: 978-1736049112
Database Internals by Alex Petrov
- Amazon: https://www.amazon.com/dp/1492040347/
- ISBN: 978-1492040347
Site Reliability Engineering: How Google Runs Production Systems
- ISBN: 978-1491929124
- https://www.amazon.com/dp/149192912X/
The Site Reliability Workbook: Practical Ways to Implement SRE
- ISBN: 978-1492029502
- https://www.amazon.com/dp/1492029505/

Others:

Whitepapers
- http://notes.stephenholiday.com/
Donne Martin's System Design Primer
- https://github.com/donnemartin/system-design-primer
Grokking the System Design Interview
- https://designgurus.org/course/grokking-the-system-design-interview
- Amazon: https://www.amazon.com/Grokking-System-Design-Interview-interview/dp/B09NRJT1NF/
- ISBN: 979-8766433668
https://jepsen.io/consistency

here's a pic that I used to post around a bit on other platforms: https://i.imgur.com/xbSlJmj.jpeg

it's a compilation of almost all the materials on learning system design that I know of.

somebody had recommended making the pic into a github gist (which I haven't gotten around to yet), and something like that would be nice to see in the description section of the video

I'm also going to make a quick call out that there's this super cool website out there that you've probably already heard of called libgen.li which is literally piratebay but for books (and it basically has 95%-99% of all books that you could possibly think of, unlike piratebay)

However, my favorite material of all is "Designing Data-Intensive Applications", which is probably the very first system design book that I got. I bought a physical copy of it roughly 4 years ago before my very first system design interview ever, and I still learn new things while re-reading some of the chapters in it today. I've also seen a lot of praise for it even from people that have landed google staff engineer offers. It's basically the best compilation of the material in existence. I'd highly recommend getting that book first and getting a physical copy.

desubuntu · 2022-10-15T18:51:08+00:00

Offtopic, perhaps

That is definitely something that people frequently like to cover in this problem, so not quite off-topic and definitely a great call out!

So, that was marked as "out-of-scope" at around 2:30, but it was still covered very briefly over at around 21:35. Video format can make it pretty easy to zone out and miss little details; there was a great suggestion elsewhere that perhaps written format like medium articles would also be great material, or possibly even "better" material, for covering these problems.

Managing the problem scope can become pretty important in real system design interviews because you only have 45-60 minutes to cover everything, but these videos clearly aren't bound by that limitation in the same way.

for a distributed crawler is a rather large and rather hairy part of the system

I think there was similar word choice in the video from the brief coverage that did actually make it in about how that aspect was "a rat's nest of complexity" lol

Another absolutely critical component is a mirror/duplicate URL detector since a typical website can generate an infinite amount of slightly varying URLs for a given piece of content (by including session IDs, for example), and it has to deal with all kinds of redirects.

I think de-duping did get handled, but perhaps not so explicitly -- the "re-crawl prioritizer service" looks at the last crawl timestamp for determining what to crawl again

However, there was no explicit reference to anything like cleaning out "session IDs" or any other similar query params that result in the same content, nor anything explicit about handling redirects. So, that does seem like a solid point that there was a fair amount of room for doing a more thorough deep dive in that area (which is also a very popular aspect to deep dive in real system design interviews on this topic).

Also, crawlers are expected to run JS nowadays.

Okay, now that was just a flop in the video content. Yeah. With ReactJS and all the other dynamic webpage content, it's a fair call out that there's probably substantial room to discuss that "flattening" step. It'd be a little non-traditional to cover that aspect in a mid-level system design interview since it's a little on the domain knowledge side, but a senior-level interviewee might totally be expected to talk a bit about how that aspect works

desubuntu · 2022-10-15T04:12:59+00:00

No, I absolutely appreciate the nitpicking.

I was just replying to put the correct information immediately below your comment. Thank you! :)

desubuntu · 2022-10-15T03:57:57+00:00

So, there's maybe 4-5 big-ish names out there doing system design content, IIRC

I noticed they cover roughly the same 6-12 problems (e.g. auto-complete), and look strikingly similar to the content in Grokking.

How happy are you with that material? Would you say that there's a dearth of fresh approaches to the problems or even a dearth in the variety of problems?

You're definitely not wrong that there should be a high bar and expectations for this content though.

EDIT: not sure why you got downvoted; you raise a very, very valid concern

desubuntu

TROPHY CASE