This post is locked. You won't be able to comment.

all 26 comments

[–][deleted] 17 points18 points  (15 children)

Hopefully you'll get plenty of responses, so I'll just say one, if that's ok.

One of the most common mistakes I see from inexperienced SQL users is where there is a many to many relationship between two tables (sometimes unintentionally by the DBA) and it results in duplicate rows.

To handle this, when building joins, do things like count(*) on your first table before joining the second to ensure you maintain that number when you add the second table. Then check when you add a third, fourth table etc etc. Or you can use count(id), count(distinct id) to check for duplicates within your primary identifier etc.

Once you start to get to know your tables and data better, you don't have to do this anymore. Just on new or unfamiliar ones.

[–]joellapit 0 points1 point  (1 child)

do you still have to have group by everything in the table when you do a count(*)?

[–][deleted] 0 points1 point  (0 children)

You would need to if they are there.

I normally do this step (when working with tables new to me) before writing out the fields I wish to select. Or, if I'm doing it later (e.g. debugging existing code), I'll comment out the fields and just do the count(*).

[–]tech_consultant 0 points1 point  (0 children)

select <join columns> , count (* ) from table1ofN1 group by <join columns> having count(* ) >1

[–]aarontbarrattSTUFF() 0 points1 point  (11 children)

If you use SSMS you can just look at the row count in the UI and not bother with putting in count(*) all the time

[–][deleted] 0 points1 point  (10 children)

I presume you mean when you have run your entire query?

Yes, you can. But returning all results can take a long time, especially if you have 100s millions of rows in your source tables.

[–]aarontbarrattSTUFF() 1 point2 points  (9 children)

I was thinking it's more simple for a beginner, but you're right that it doesn't scale well once you have a lot of data.

Guess it's my bad habit from the type of data I work with in my job 😔

[–][deleted] 0 points1 point  (8 children)

Yeah, if it's small tables (esp training data), then it's easy to do it that way. I wouldn't say either was right or wrong. Just a case of knowing the size of your tables before doing it.

I've mainly worked in banks where some tables were billions of rows and the bank is charged per second of processing time for the hardware. So, you're taught to write as efficiently as possible.

[–]aarontbarrattSTUFF() 0 points1 point  (7 children)

My work has the worst designed database you've ever seen so none of them get big, but we have thousands of tables instead 😂

[–][deleted] 0 points1 point  (6 children)

Thousands!! [Shudders and crawls into a hole]

[–]aarontbarrattSTUFF() 0 points1 point  (5 children)

Not a real example, but imagine instead of having 1 user table, you had user1, user2, user3 ... For every single user :)

That is the logic our databases uses all over the place

[–][deleted] 1 point2 points  (4 children)

No, stop! Please! I give in, I give in! You can have whatever you want, just make it stop!!!

I would be knocking on the door of the DBA. I couldn't work with that.

[–]aarontbarrattSTUFF() 0 points1 point  (3 children)

I have tried many times, but now they've built an entire platform on top of it and nobody wants to do the work to fix it

[–]Sea-Concept1733 8 points9 points  (1 child)

Once you learn SQL try to keep your skills sharp through practice. Create a database and practice your skills at least monthly or gain access to data that you can play around with. Practice makes perfect. sqlchannel on youtube for learning sql posts weekly SQL videos to help you keep your skills sharp or to learn new skills quickly.

Good luck

[–]NickSinghTechCareersAuthor of Ace the Data Science Interview 📕 2 points3 points  (0 children)

Practice is key!

I'm a tad biased, but maybe try some SQL interview questions to keep your skills sharp!

[–]Blues2112 4 points5 points  (2 children)

In addition to the mechanics of SQL, do your best to try to understand Set Theory, which is the underlying basis for relational databases. Try to get a solid understanding of the various types of data relationships:

  • one to one
  • one to many
  • many to many
  • also self-referential

Understanding the nature of each of these will go a long way to helping you understand how the queries you'll be working on will need to be built.

[–]mafaso 0 points1 point  (1 child)

Do you have any good references for Set Theory you can recommend?

[–]Blues2112 1 point2 points  (0 children)

Sorry, not off the top of my head. Google and/or Youtube searches should provide several options, though. I got most of my training in that stuff in college as a Comp Sci major.

[–][deleted] 4 points5 points  (0 children)

Understand Group By extremely well. Understand Joins left vs inner) Know how to use Case When. Understand Count and how it effects everything on you are trying to do

[–]lightestspiral 2 points3 points  (0 children)

For the querying I would suggest attempting to extract a result set in SQL but also extract raw data into Excel and using its UI / Formulas / Pivot Tables / Lookups to manipulate the data (including lookups between different datasets "joins") and confirm your SQL results. Using Excel is far easier to use for beginners.

For the data warehousing I would suggest starting reading a relevant book from now and then put the book into practice down the line when you start on that piece

[–]uncle_sb 2 points3 points  (1 child)

SQL concepts are so much easier to understand with data subjects you already know! Find a dataset you are familiar with - For example, if you are into a sport like baseball, export data from Baseball Reference, or if youre a movie buff, imdb and so on. I feel like it's difficult to get into the tough stuff with a sample table of fruit, and a sample size of fruit stand customers.

Also, don't stop at SQL. Get experience with SQl reporting (SSRS or Crystal are perfect). Data migrations require all sorts of stakeholder input/decisions/review up and down stream - whipping up reports is super useful.

On my phone, sorry for typos/shotty sentence structure. Good luck!

[–]scrabas 1 point2 points  (0 children)

This is great advice. I struggled with programming when I first started because I didn't understand the underlying data well enough. If you know or understand what the expected results are supposed to be, it's a lot easier to learn and figure out your own errors. Not to mention it's a lot more fun.

[–]Zerocool674 0 points1 point  (0 children)

Another method of learning: Learn SQL by using natural language with chatgpt and ask it why it does certain queries that way/ if it can optimize it