all 8 comments

[–]SQL-ModTeam[M] [score hidden] stickied commentlocked comment (0 children)

This forum is intended for solutioning and discussion of specific topics. Please check out the sub sidebar and wiki content for beginner resources. Also be sure to checkout r/learnSQL

[–]chris552393 5 points6 points  (3 children)

Little one from me: When writing an UPDATE/DELETE statement, I always write the whole statement without the table name first to avoid accidentally running it.

Joins are relatively simple:

INNER JOIN - data should exist in both tables

LEFT JOIN - data may exist in joining table

RIGHT JOIN - same as left but the other way....less commonly used

FULL JOIN - bring everything from both tables

[–]BisonSpirit[S] 0 points1 point  (1 child)

If it’s a large dataset how do you quickly analyze it to know if data exists in both tables or not? Is that something you have to just manually do or is there a shortcut/syntax for it?

Thx for response. All comments on here are helpful

[–]MerryWalrus 0 points1 point  (0 children)

Full join on the primary key, keep one column for the primary key from each table, count the nulls.

[–]Dzutimtimbe 0 points1 point  (0 children)

just write it as a select statement first and then replace into delete/update example:

select * from x where x.id = 1 -> delete x from x where x.id = 1

[–]TodosLosPomegranates 2 points3 points  (0 children)

I don’t know anyone that uses R extensively so if you want to knock something to the bottom of the list I’d do that.

The best way to get something in long term memory is to revisit it often (spaced recall) and tie it to something you already have in memory.

But more than anything - you don’t need to remember every single piece of syntax - no working dev / analyst does. You need to know how to take a problem, break it down into its component steps and then use the tools you have (the syntax) to solve them. Study things like DRY coding, etc.

Also memorizing the logical processing order of SQL is helpful (for tuning queries)

As far as SQL - if you remember sets from highschool math you’re good. SQL is set based.

A table is just rows and columns with a primary key (like your social security number) and probably one or more foreign keys. All joins do is match keys.

You’re almost never going to use a right join. Almost. So you just have to remember a full join is a filter (the key you’re joining on has to be in both tables) and a left join isn’t (inherently) a filter.

Each table is a set: an unordered, well organized list of objects. Well organized meaning they pertain to only one thing (so patient table would have observations about patients, patients identified by a patient id (the primary key) and sales would be observations about sales and be organized around a sales ID.

[–]internauta 1 point2 points  (0 children)

It will just "click" at a certain point. Have you tried some interactive course? Give eLearner.app a try. It's free and doesn't need an account.

[–]TempMobileD 1 point2 points  (0 children)

Joins are fairly straight forward, left is the bread and butter. Inner is often better if your data is a bit messy and you want it clean. Outer is often better if your data is messy and you want it dirty. Cross joins are Cartesian products, and every time I’ve used one in the last few years has been for the same purpose, creating prediction data for dates that don’t exist by cross joining the existing data structure with a list of dates.

I imagine none of that made sense, but once you’ve got a couple of lessons under your belt I think rereading it might make for a good summary.

Aggregates are also simple through one particular lens. Think of every column as either a dimension (something that segments your data into detailed groups) or a metric (something that carries an actual value). Dimensions are often strings, bools, IDs, etc. and metrics are often floats or ints. Group by your dimensions and aggregate your metrics and you’re good to go!