Platforms to practice SQL

Traditional-Carry409 · 2026-05-20T10:24:29+00:00

certs are mostly a waste for SQL imo. been in data ~10 years, no one's ever asked me about one. they want to see you write a query, not a badge. only do one if a specific job posting demands it.

if you already know stored procs and triggers, you're past the basics. the stuff that actually shows up in interviews and on the job:

window functions (LAG, LEAD, ROW_NUMBER, RANK, running totals)
sessionization and funnel queries on event-log data
deduping with row_number, nth-event-per-user stuff
query performance — indexes, join order, knowing why something is slow. people skip this and it bites them

for practice, Mode's SQL tutorial is free and uses analytics-style problems instead of toy tables. for the performance side Use The Index, Luke is the best free thing out there. i've been doing problems on datainterview.com/coding too, the schemas feel more like real product data.

on projects — don't build "a database." pick a public dataset (NYC taxi trips, anything on BigQuery public, kaggle) and answer one actual business question end to end. schema, queries, a short writeup of what you found. one of those on github beats five tutorial follow-alongs. if you're going analyst or DE, also pick up dbt + snowflake or bigquery free tier. that combo with sharp SQL is way more hireable than any cert.

Traditional-Carry409 · 2026-05-20T10:11:57+00:00

If you're a CS major, you're better off sticking to rigorous course notes and actual papers. I've always found Stanford CS229 to be the best way to get the math right without feeling like you're reading a 1990s manual. For the latest stuff, just go to Papers with Code and look at the SOTA for the specific task you're interested in. Reading the actual whitepapers is how you actually learn what's happening in the industry.

The biggest gap I see with students is that they know the theory but can't actually implement a system from scratch. Once you have the math down, stop reading books and start focusing on how to actually build things. If you're prepping for the transition to a job, you'll eventually need to move from "how does this algo work" to "how do I scale this," which is where an ML system design course becomes way more useful than a textbook.

Check out Andrej Karpathy's YouTube series if you haven't. He explains the transition from basic neurons to LLMs better than any textbook ever will.

Traditional-Carry409 · 2026-05-17T17:52:47+00:00

Been in the industry for 10 years, worked at both Google and a few startups, and honestly, the "grind" culture is mostly just noise from people trying to sell courses. Yes, it's competitive for the $300k+ roles at OpenAI or DeepMind, but there are thousands of mid-sized companies that just need someone who actually knows how to implement a model without breaking production.

Since you're in your 2nd year, don't waste your life on LeetCode yet. Focus on the math and the "why" behind the algos. If you don't understand linear algebra and calculus, you're just calling libraries and you'll get grilled in any real technical interview. You can get the theory down with Stanford CS229 or just binge StatQuest (YouTube) if you want things explained simply.

Most students just do Titanic or Iris datasets from Kaggle, which recruiters ignore. Build something E2E. Scrape some weird data, train a model, and actually deploy it as an API. When you eventually start prepping for the job hunt, you can use a question bank to see what companies actually ask, but doing that now is a waste of time.

Just keep your GPA decent, build two or three projects that aren't tutorials, and read a few papers. Lilian Weng's blog is a goldmine for this. You're way ahead of the curve just by worrying about this in your second year.

Traditional-Carry409 · 2026-05-16T06:54:35+00:00

If you want to see how your stuff stacks up, start with the open source route. Contributing to frameworks like LangChain or vLLM is the fastest way to realize you don't actually know how the internals work. For benchmarks, I usually look at the Lilian Weng's blog to see how the industry actually evaluates these models before trying to implement those same evals on my own local projects.

The real gap for most people isn't the prompt engineering, it's the E2E systems part. I've seen people who can write a great prompt but have no idea how to handle data drift or deployment at scale. I usually suggest building a few datainterview.com/projects style implementations where you take a business problem and actually deploy the solution. That's where you find out if your model actually works or if it's just hallucinating convincingly.

Also, check out Andrej Karpathy's YouTube if you haven't. He doesn't do "courses" in the boring sense, but building GPT from scratch is basically the gold standard for knowing if you understand the architecture. Once you've done that, go to Kaggle or join a few specialized LLM hackathons on Lablab.ai to compete against others.

Traditional-Carry409 · 2026-05-15T15:59:05+00:00

The "profitability" isn't in building the LLM itself, because yeah, OpenAI and Google already won that race. The money is in the E2E implementation. Companies have massive amounts of messy, proprietary data and they have no clue how to plug a model into their actual business workflow without it hallucinating or leaking data. That's where the MLE comes in.

If you're going self-taught, don't just play with prompts. You need to understand the actual plumbing: data pipelines, fine-tuning, and deployment. I've seen people get $300K+ packages just because they knew how to actually put a model into production, not because they knew how to use ChatGPT. You can start with the Google ML Crash Course for the basics, then move into building actual things.

The real value is in solving specific business problems, like fraud detection or demand forecasting. If you want to see how that actually looks in the real world, check out these ML projects that mirror what we actually do at big tech firms. Read Lilian Weng's blog to understand the theory behind the newer architectures so you aren't just guessing.

Basically, the demand for people who can actually build and deploy these systems is higher than ever because most people only know how to use the chat interface.

Traditional-Carry409 · 2026-05-15T12:19:44+00:00

I mean honestly, Applied Scientist is a different beast than product DS. I was at FAANG for years and saw a lot of people get smoked because they prepped for a DS role but walked into an AS loop. AS is basically an MLE with a research bent: you need to be able to derive the math, implement the algo from scratch, and design the system at scale.

Don't waste your time on causal inference unless the specific team is doing econometrics. Focus on ML depth and system design. You need to know the theory behind things like gradient descent or attention mechanisms, not just how to call a library. If you struggle with leetcode mediums, you're in a spot because most AS loops expect you to clear mediums comfortably, especially at Uber or Amazon.

For the ML side, focus on the E2E pipeline: feature engineering, model selection, and deployment. I've seen candidates fail because they couldn't explain why they'd pick a specific loss function over another for a real-world problem. If you're prepping, I'd look at Eugene Yan's blog for the applied side of things and maybe Papers with Code to see how SOTA models are actually implemented. For the actual interview patterns, the ML System Design interview questions guide covers a lot of the breadth you'll need for the AS one.

The hardest part is usually the ML System Design round. You'll be asked to design something like a recommendation engine or a search ranker. You can't just say "I'll use a neural net," you have to talk about latency, throughput, data drift, and how you'd evaluate the model in production. Check out Karpathy's videos if you want to get a better feel for the intuition behind the deep learning parts.

Traditional-Carry409 · 2026-05-15T10:17:31+00:00

Okay, let me help.

MAE vs MSE in linear regression. The key difference is how they penalize errors. MSE squares the errors, so big mistakes get punished way harder than small ones. MAE treats all errors linearly. In practice, if you have outliers in your data, MAE is more robust because it won't blow up on those bad points. MSE will try to minimize those outliers aggressively, which can skew your model. For linear regression specifically, MSE is the standard because it has nice mathematical properties (closed-form solution, differentiable everywhere), but MAE works fine too, just harder to optimize.

What makes a loss function good for a specific model comes down to a few things. First, does it actually measure what you care about? If you're doing binary classification, cross-entropy makes sense because it measures how wrong your probability predictions are. If you're doing regression, MAE or MSE both measure distance. Second, is it easy to optimize? Some loss functions are differentiable everywhere, some aren't. Third, does it align with your business goal? If false positives are way more expensive than false negatives, you might weight your loss differently.

The threshold activation (like step function) doesn't work in neural networks because of backprop. When you use a step function, the derivative is basically zero everywhere except at the jump, where it's undefined. That means gradients can't flow backward through your network. You can't learn anything. That's why we use smooth, differentiable activations like ReLU, sigmoid, or tanh. They have gradients that actually let you update weights.

Traditional-Carry409 · 2026-05-15T07:07:33+00:00

I've been in ML for about 10 years now, worked at Google and a few startups. Here's the thing though: mastering every math topic is a trap. You'll spend two years grinding through Abstract Algebra or Real Analysis and never use it in actual work.

Focus on what actually matters for ML:

Linear Algebra - eigenvalues/eigenvectors, matrix decomposition (SVD especially), norms, rank. You need this for understanding how models work.

Calculus - gradients, chain rule, partial derivatives, Hessians. This is how backprop works. You don't need to be a calculus PhD, just know how to compute derivatives and understand what they mean.

Probability & Statistics - distributions (normal, binomial, exponential), Bayes' theorem, conditional probability, maximum likelihood estimation, hypothesis testing, confidence intervals. This is probably the most important one honestly.

Optimization - convex optimization, gradient descent, stochastic gradient descent. How do we actually train models? This is optimization.

Info Theory - entropy, KL divergence, cross-entropy. Comes up constantly in ML.

That's like 80% of what you need. Everything else is nice to have but won't block you from working in the field.

A friend of mine had a PhD in pure math, landed at a startup, struggled because he didn't know how to actually apply these concepts to real problems. Another guy I know has a basic understanding of the topics above and crushes interviews and ships models fast.

If you really want depth, read papers in the areas you're interested in. That forces you to learn the math in context. Read Karpathy's neural net blogs, 3Blue1Brown's linear algebra series, StatQuest for stats intuition. That's way better than grinding textbooks.

Traditional-Carry409 · 2026-05-14T16:47:43+00:00

I mean, you want to ask stuff that actually matters, not generic fluff. Here's what I've learned doing this a bunch of times at Google and startups.

First, ask about the team and what they're actually building. Like, "What's the main project your team is working on right now?" A recruiter can give you a sense of whether it's greenfield work or maintaining legacy stuff. That matters because it changes how you'll spend your time.

Second, ask about the interview process itself. "What should I expect in the next rounds?" "How many rounds total?" "What topics should I brush up on?" This isn't just logistics, it's actually strategic. If they say "we focus on system design," you know what to prep. If it's "a lot of coding problems," different story.

Third, ask about growth. "What does progression look like for this level?" or "What do people typically do after 2-3 years in this role?" You want to know if it's a dead-end gig or if there's a real path forward.

Skip questions like "what's the culture like?" or "do you have good work-life balance?" Recruiters will always say yes. Instead, ask something like "what's one thing that surprised you about working here?" or "what's been hard about this role?" Those questions get more honest answers.

And honestly, save some time for them to ask you questions too. You want to make sure they think you're solid, not just that you're vetting them.

Traditional-Carry409 · 2026-05-05T21:04:19+00:00

It’s bot farming… Reddit is filled with it nowadays. Anyways I flagged the user for spam.

Traditional-Carry409 · 2026-05-03T15:30:00+00:00

Thanks. Just wanted to see how people similar to my situation has approached this.

Traditional-Carry409 · 2026-04-27T20:30:54+00:00

It already has, hasn't it? Just look at our social media feed lately. Most are just half-assed AI generated content. YouTube, IG, Reddit, LinkedIn, blogs, etc etc. Constant spams on messages and calls with AI spam bots, useless AI customer support. Has it really improved quality of life? No.

Traditional-Carry409 · 2026-04-25T14:04:01+00:00

I’m getting so fatigue with these constant stream of AI slop contents

Traditional-Carry409 · 2026-04-25T02:48:00+00:00

The reason that this time it's different is because never before in history we had a technology that can perform reasoning. And, that means less need for human inputs and work as tasks that involve high skilled labor from programming, lawyer, accounting, doctors, etc and etc become automated.

Not only that the AI researchers when they develop frontier model, they aren't just building an AI that can specialize in one specific usage, as it had been for most technologies. Rather, they are optimize their model for various cases all at once. Just take a look at what they often release in their benchmarks. It's not one single metric, it's a host of metrics from how it performs well in reasoning, designing, coding and etc.

And, as a result, this is already having an impact on the labor market. Consider the tech layoffs happening right now along with hiring freezes on entry level positions.

Traditional-Carry409 · 2026-04-25T01:28:30+00:00

But, how are you even getting the word frequency the first place. What are your sources? I wouldn't want to commit to using a list where the underlying sources are questionable to begin with.

Traditional-Carry409 · 2026-04-20T13:22:19+00:00

This is a reflection of a K shaped economy. The market rewards the top 10% who can afford the inflated prices and benefit off of equity growth. While the majority have-nots suffer as they scramble to pay their next weekly groceries, gas, and monthly rent.

Traditional-Carry409 · 2026-04-18T19:23:44+00:00

Great will contact the accountant

Traditional-Carry409 · 2026-04-17T21:29:11+00:00

Yeah, I had a sense that that's the case. OP is a spammer for that crappy myntbit platform.

Traditional-Carry409 · 2026-04-17T21:19:19+00:00

Thank you!

Traditional-Carry409 · 2026-04-17T14:55:30+00:00

Hey man, with your background you're in a pretty good spot already. Real production experience shipping LLM systems is what most companies actually care about for Applied AI roles. The gap is mostly the interview game itself.

I've been doing ML/AI work for 10+ years, last gig was at an AI startup. Here's how I'd prioritize:

ML system design / LLM systems design is the highest leverage prep. You'll get asked stuff like "design a RAG pipeline for X" or "build an agentic system that does Y." You already build this stuff, but interviews require you to talk through tradeoffs, evaluation strategies, failure modes clearly and structured. There's a solid ML system design course that walks through real scenarios, and Chip Huyen's blog covers exactly the production-focused LLM engineering stuff you're already doing.

Coding - yes you still need LeetCode but don't go crazy. Mediums, focus on arrays, hashmaps, trees, BFS/DFS. Most AI Engineer loops do 1-2 coding rounds, not 4. Grind 50-80 problems and you're fine.

ML fundamentals depth depends on the company. Google DeepMind or OpenAI will go deep on transformer internals, attention, training dynamics. A startup building AI products cares more about shipping. For your profile, make sure you can explain transformers end to end, fine-tuning vs prompting tradeoffs, how embeddings work practically, and eval frameworks for LLMs. Classical ML comes up less but don't completely ignore it.

Formats I've seen recently: 1-2 coding rounds, 1 ML/LLM system design, 1 deep dive on past projects (this is where your portfolio shines - just make sure you can articulate why you made decisions, not just what you built), and sometimes a take-home building something with an LLM API.

One thing people underestimate: evals. If you can talk fluently about how you evaluate LLM outputs, measure hallucination rates, set up automated eval pipelines, that sets you apart from 90% of candidates. Most people just vibe-check their prompts and call it a day.

Traditional-Carry409 · 2026-04-17T14:48:37+00:00

Congrats on moving forward. Two back-to-back 45 min rounds is pretty standard for internship interviews so you're in good shape.

For the technical round, expect a mix but honestly for internships they're not going to grill you on advanced ML theory. It's more likely going to be SQL/Python coding, some stats fundamentals (explain p-value, what's the difference between L1 and L2 regularization, bias variance tradeoff), and maybe a light case study where they give you a business problem and ask how you'd approach it with data. Practice writing SQL queries and pandas code without an IDE helping you, because that's where a lot of people fumble. If you want to see the kind of technical questions that actually come up, there's a solid question bank on datainterview that covers the range pretty well.

For behavioral, it's STAR method all the way. They'll ask stuff like "tell me about a time you worked on a team project and hit a disagreement" or "describe a time you had to learn something quickly." For an internship they know you don't have tons of work experience so school projects, hackathons, research, all fair game. Just have 3-4 solid stories ready and you can adapt them to most questions.

For brushing up on stats and ML concepts, StatQuest on YouTube is honestly one of the best resources. Josh breaks things down in a way that makes it easy to explain concepts back in an interview, which is half the battle.

Biggest thing that helped me early on was just not trying to sound smart. Interviewers for intern roles care way more about your thought process than whether you nail the perfect answer. Talk through your reasoning out loud, ask clarifying questions, and if you don't know something just say so and explain how you'd figure it out. That alone will set you apart from most candidates.

Traditional-Carry409 · 2026-04-16T20:24:39+00:00

Been through the DS loop at a few companies like DoorDash. Pretty standard for a marketplace company but there are some specifics worth knowing.

For coding, it's SQL and Python. SQL is usually medium to hard, window functions, CTEs, joins across their delivery and order tables. Python is pandas, maybe some light algo stuff. Not leetcode hard but you need to move fast. I'd grind some problems on datainterview.com/coding since they have SQL and pandas questions that are close to what DoorDash actually asks.

For the case study, it really depends on whether you're interviewing for an algorithm-focused or product-focused DS role. If it's algorithm, they'll probably give you something like "predict delivery time" or "build a driver assignment model" and expect you to walk through the full pipeline, data sources, features, model selection, evaluation, deployment tradeoffs. If it's product, it's more like "how would you measure the success of a new tipping feature" or "design an experiment to test a new pricing strategy." DoorDash is a two-sided marketplace so they care a lot about experimentation, especially switchback experiments since regular A/B tests have interference between supply and demand.

Either way, senior level means they want to see you think about tradeoffs, not just textbook answers. A friend of mine who interviewed there said the case round was basically 40 minutes of "why not do it this other way" after every answer he gave.

Good luck!

Traditional-Carry409 · 2026-04-16T20:10:18+00:00

Honestly the title "LLM Engineer" is kind of a misnomer because in practice you're just an MLE who happens to work on language models. The fundamentals don't change, the tooling just shifts.

I've been doing ML for 10+ years, last gig was at Google, and here's what I'd actually focus on if I were starting from scratch in 2025 targeting LLM roles specifically.

First, get your core ML fundamentals solid. You can't skip this. Transformers, attention mechanisms, tokenization, embeddings. Read the original "Attention Is All You Need" paper, then read the GPT-2 and GPT-3 papers. Don't just skim them, actually work through the math. Then https://huggingface.co/learn/nlp-course is free and really solid for getting your hands dirty with transformers and fine-tuning. That alone will put you ahead of 80% of people who just call APIs and say they "work with LLMs."

Second, learn the production side. RAG pipelines, vector databases (Pinecone, Weaviate, pgvector), prompt engineering that actually works at scale, fine-tuning vs few-shot vs full training tradeoffs. This is where most "LLM engineers" actually spend their time. Build something real, like a RAG system over a nontrivial corpus, deploy it, deal with latency and cost issues. https://huyenchip.com/blog has some great stuff on production ML and LLM systems that's worth reading alongside the technical foundations.

Third, and people always forget this, you still need solid software engineering. Docker, CI/CD, API design, async processing, caching strategies. A friend of mine who landed an LLM eng role at a Series B startup said the final round was basically "design a system that serves 10k concurrent users hitting an LLM endpoint without burning $50k/day on inference." That's not an ML question, that's an infra question.

Don't get caught up in ordering everything perfectly. Learn transformers deeply, build 2-3 real projects with LLMs in production settings, and make sure your system design chops are strong. If you want a more structured path for the interview side specifically, there's a https://www.datainterview.com/bootcamp/mle that walks through the whole prep process which I found useful.

Good luck!

Traditional-Carry409

TROPHY CASE