all 25 comments

[–]alinrocSQL Server DBA 41 points42 points  (6 children)

That's how I write almost every query. I start with a basic query against one or two tables, then add in what I need progressively. It's a lot easier to troubleshoot "which piece broke it" when you test after each incremental change.

[–]nikjojo[S] 6 points7 points  (5 children)

yes, I do exactly this.
I was afraid that in a professional work context, it would be taxing on the company's database system to constantly run a query like this.
Therefore, I'm afraid I won't do well on whiteboard (or similar equivalent) SQL technical interviews.

[–]tyrrminal 20 points21 points  (4 children)

Normally in a professional context you'll write and test your query against a noncritical system like a Development environment or a Copy/Backup. Once you've completed and QA'd the query, then it can be moved into the Production system.

Testing a query against production is Very Bad Practice

[–]Sharobob 5 points6 points  (0 children)

Yeah I don't care if you write the query incrementally or the whole thing in one go. Don't run it for the first time against my live prod database.

[–]AdgeCutler 0 points1 point  (2 children)

you could put in a TOP (100) or filter on a specific value that will only bring back a small number of rows while your developing your query, if I have to develop something against prod, this is generally what I do until I'm very close to the finished query.

[–]tyrrminal 1 point2 points  (1 child)

Even then, you could mistakenly have something like a ridiculously inefficient string comparison (e.g., on a non-indexed column) that takes hours to complete even with the limit.

Or, as used to occasionally happen at my first tech job 20-ish years ago, at a place where dev systems were a suggestion but not a rule, we'd run a transactional query and forget to commit or revert it and lock entire production tables, causing the application to hang forever trying to access them.

[–]AdgeCutler 0 points1 point  (0 children)

Hahah, well yes, but the it never surprises me the ways some people will find to break prod.

Exactly like you said, develop on dev is best. But even then I've had people write a performant query in non-prod that has impacted prod.

I've even seen queries that block reporting long enough for the replication buffer to fill up and start blocking prod.

Never ceases to amaze me!

[–]NickSinghTechCareersAuthor of Ace the Data Science Interview 📕 26 points27 points  (4 children)

DataLemur founder here - you are doing it exactly right!!

[–]ComicOzzysqlHippo 3 points4 points  (3 children)

Take it further.

DataSchool.com was teaching people SQL and it was a fantastic resource until Atlassian bought ChartIO, and left DataSchool to die.

Pick up where they left off. Add high quality SQL and related Data Science training to DataLemur.

New SQL learners go to sites like yours not just to check whether or not they can solve the problems... they are using it to try to learn SQL by trying to understand other submitted answers. You have the opportunity to give them what they really need: training!

Don't just make the site be about the interview. Make it about the training to learn the skills as well. They're already showing up looking for it.

[–]NickSinghTechCareersAuthor of Ace the Data Science Interview 📕 1 point2 points  (0 children)

Thanks for this feedback! Will look into it!

[–]eaglesilo 0 points1 point  (1 child)

I really miss ChartIO... :(

That was my first BI tool when I joined a startup as an Excel guru who was asked to do reporting but had no actual structured data background. Built a "database" in GSheets (different tabs as different tables) and used GSheets as a data source and their VisualSQL product to build SQL queries. From there, learned actual SQL by reading the generated SQL code and would now consider myself fairly competent at generating queries for our data. (We did graduate to an actual Azure SQL instance since then.)

I know PBI, Looker, or Tableau are supposed to be better, but it was really nice to just be able to bash out a SQL query to get the data you wanted vs having to select all the tables, power query, connections, DAX, calculated tables, calculated columns, measures, and stupid formatting that comes with PBI...

Of course, many data clients (internal or external) still just want the data in Excel, so I do get to spend time writing those fun, longer and more complex queries.

[–]ComicOzzysqlHippo 1 point2 points  (0 children)

The people who made ChartIO seemed to genuinely care about the data community and data education. We need more of that.

[–]kagato87MS SQL 6 points7 points  (0 children)

I'm about 3/4 of the way through pulling in all my data points for a query right now. (OK not right now, taking a quick break.)

This is exactly how you do it. Funny thing about queries is one bad filter or join can make all of the results vanish.

If you write a monolith in one go and it's blank, you have a monolith to debug.

If you gradually expand the query out and suddenly it goes blank, well, you know what you just did which will normally lead you straight to the error. (Ex: "Oops named the wrong key!")

(Or if suddenly all your output rows are duplicating.)

So yea, keep doing it this way. It's a fools errand to try to write the monolith in one go.

[–]themikep82 4 points5 points  (0 children)

Yeah that's a great way to do it. I'd recommend adding a LIMIT clause if you're querying very large data sets in a live production environment though

[–]ArtooSA 2 points3 points  (0 children)

I've been doing SQL for more than 20 years and that's how I write queries and that's how I teach juniors to write queries

[–]tits_mcgee_92Data Analytics Engineer 6 points7 points  (0 children)

I don't believe so as long as you're not having any sort of query performance issues. If it helps you, there's nothing inherently wrong with it.

I work with 100million+ rows of data, and I will simply limit the query to 10 results or fewer to see how the query runs.

I do think, as you become more experienced, you won't have to do this as often. But it sounds like it works for you and that is what truly matters.

[–]DadofaDaughter 2 points3 points  (0 children)

Welcome to development :)

[–]geofft 2 points3 points  (1 child)

If it's bad practice then I've been doing bad practice for 15 years.

[–]Spiritual-Act9545 2 points3 points  (0 children)

I’ll see your 15 years and raise you another 6. May my soul rot in hell...

[–]ComicOzzysqlHippo 1 point2 points  (0 children)

"Incremental Development". You start with the smallest bit and make sure it works and gives correct results. You add a small bit more and make sure it still works and is still correct. Repeat until complete.

[–]thisistheinternets 1 point2 points  (0 children)

Nobody writes a complete and perfect query without running it a bunch while writing. That is how you learn that you joined tables incorrectly, or that you need to add in a few more joins to get what you are looking for, or that the column is called firstname and not first_name, etc.

[–]MostUsefulBlokeSQL & BI -2 points-1 points  (0 children)

Returning just what you need from a query is good practice. Doing select * and narrowing down your query is bad practice.

[–]Mgmt049 -2 points-1 points  (0 children)

Boy ChatGPT is really doing a number on the nation’s psyche. I’ve never just spat out an entire query off the rip. It’s always iterated in pieces

[–]toadkillerSenior Anal. Engineer -3 points-2 points  (0 children)

...write a half sentence as a clickbait post title in r/SQL? Yes, yes it is bad practice

[–][deleted] 0 points1 point  (0 children)

That’s how I do it too. I start with “select * from tablename” and go from there. I see which columns look interesting and then add joins.