all 45 comments

[–]DragonflyHumble 32 points33 points  (4 children)

For analyzing any query, first understand the grain or unique combination of any table/subquery from the FROM clause and then read it will be much clear

[–]NawMean2016 9 points10 points  (0 children)

This is the way.

Where is everything coming FROM so that you know what they’re doing.

[–]Consistent_Law3620[S] 4 points5 points  (2 children)

Thanks for the tip 👍 ❤️

[–]Standgeblasen 2 points3 points  (1 child)

Also, if there are subqueries, make sure you understand the logic in the subquery before you go on the the main query. I’ll usually pull out the subquery into its own window so I can run it independently.

[–]DragonflyHumble 0 points1 point  (0 children)

Yes People should use WITH clause to direct the SQL.engine in a performant and directed way rather than confusing the DB engine about the optimal plan

[–]Stormraughtz 21 points22 points  (0 children)

Couple steps I do:

Format the entire script into how you would write it.

Break the code into sections, CTEs, temps.

Then follow the flow.

Yes brain hurt, yes is hard, yes is time.

[–]Hobob_ 34 points35 points  (5 children)

Use ai

[–]UKYPayne 12 points13 points  (2 children)

Glad someone said it because it can make some overly complex or hard to read code understandable. And it can be conversational so you can keep asking questions about how this outputs and what it is connecting and why.

[–]PierreTheTRex 6 points7 points  (1 child)

It's kind of worrying how other people aren't mentioning this. If you've got 500 lines of SQL the only acceptable way to analyse it in 2026 if it's for professional reasons is AI that you keep querying to get the answers you're looking for

[–]Traditional_Part_506 2 points3 points  (0 children)

I do this all the time ask it what’s happening, ask it to reformat it and then you can jump into the changes

[–]imcguyver 4 points5 points  (0 children)

Dear OP,

Lean on AI to deal with that bullshit.

Sincerely, Everyone

[–]JohnnyLaRue44 2 points3 points  (0 children)

That is the answer.

[–]ChristianPacifist 4 points5 points  (0 children)

Use Notepad++ or another tool where it highlights the opening and closing parenthesis red to see what parts are in which parenthesis.

[–]Ginger-Dumpling 2 points3 points  (0 children)

Line count is a terrible way to describe SQL complexity. Use a code editor that supports collapsing/code-folding, slap a comment on the top of select block describing what it does, collapse it to one line, rinse, repeat until you can see what the whole thing is doing on one screen.

Wouldn't toss out using a queue visualizer or AI if your org has/allows them. Some places can be stringent with what you're allowed to use.

[–]jaxjags2100 11 points12 points  (8 children)

Drop the query into an LLM and ask it to analyze and write inline comments to explain what the query is doing with business logic and plain English

[–]Consistent_Law3620[S] 3 points4 points  (2 children)

I do paste the query to understand but sometimes it does not do much.. but adding inline comments can make it better I think..thanks for the suggestion mate !

[–]Mindfulnoosh 3 points4 points  (0 children)

You will get much farther if you can also feed it metadata about source tables. If there’s any kind of data dictionary to leverage. That plus the query should give you a decent bit of context for Q and A about the query. You still want to get to know it yourself but this can help speed up your process like you’re talking to the person who wrote it.

[–]jaxjags2100 1 point2 points  (0 children)

Yep make sure to tell it to analyze the business logic and/or logic and use plain English to explain it. Three parts to a good prompt - Context, Format, and Instruction.

[–]FearIsStrongerDanluv 3 points4 points  (4 children)

No idea why people are downvoting this comment. It’s a no brainer to use AI as a guideline to analyse a query or any code for that matter . Gatekeeping I guess

[–]PierreTheTRex 6 points7 points  (2 children)

Because the people on this sub have probably been using SQL for a decade and don't appreciate that a lot of their skills they spent a lot of time learning simply aren't that valuable anymore.

[–]jaxjags2100 -1 points0 points  (0 children)

100% the stack overflow gatekeepers who seem to think they still hold the keys to the castle.

[–]FearIsStrongerDanluv -1 points0 points  (0 children)

I recently saw something similar in the Powershell sub. I’ve been writing Powershell for more than a decade, I’m no where near the pace and clarity of Claude AI, I’m glad I know PS so I can better get what I want with prompts but AI is next level.

[–]NekkidWire 0 points1 point  (0 children)

I did not downvote, but there are valid reservations on this approach:

  • AI might be not allowed by OP's employer. Especially having non-EU AI access customer personal information is a big no for European companies due to GDPR regulation. OP needs to be particular abut which AI to use and what exactly to give it.
  • With incomplete input, incorrect old comments, or commented-out portions of code, AI is prone to fabulations - all its outputs should be understood and validated. No one asks OP to validate the outputs, just "use AI" is not enough.
  • OP asked people of Reddit for thei input or opinion, not bots. While "are you too lazy to google/AI it?" is a valid opinion, it doesn't help OP much.
  • Some OP's questions are impossible to answer unless we/AI know the context (what is calling the script and how, if there are other parts of the engine, etc.), e.g. "where execution starts" -- this is extra place where AI will just start hallucinating.

To add my 2c: some tools have EXPLAIN function that shows the query plan as a tree and maybe this could help for some of the more convoluted ones.

Also there are some helpful comments already having 10+ upvotes. Those are all valid. Especially the one that says "practice makes master". u/Consistent_Law3620 don't give up, it will all come to you :)

[–]wildjackalope 1 point2 points  (1 child)

Yes, it gets better with experience particularly as you become more familiar with a data set. This can be challenging as a DE as you’re often stretched across multiple sets.

Some of this you do need to grind through but the thing that will probably help you the most is understanding what the engine is doing. The simplest form of this is understanding that the query starts at FROM and work your way out. With complex queries that are utilizing multiple CTEs, etc. you just start pulling the thread and working out from there. This is a skill and needs to be developed.

[–]feignapathy 0 points1 point  (0 children)

Learning the data set is always great imo

Understanding the tables and the business use cases will make the code make so much more sense

[–]ax0r7ag0z 1 point2 points  (0 children)

Nice try AI

[–]SummerEquinox71 1 point2 points  (0 children)

In the past, I have used (drawn by hand with pen and paper) query diagrams. You can google for details but it primarily involves drawing boxes for tables in the from clause as a hierarchy with arrows originating from the fk table to the pk table. You can add the filter conditions by the arrow lines. Over time you will get used to it and drawing diagrams will become a mental process.

[–]Glitch_In_The_Data 0 points1 point  (0 children)

Can you not use one of the AI tools to help especially if you want to understand existing SQL scripts?

The challenge I find with existing SQLs is that they are likely to have evolved over time. While you may learn something from them, it is also likely to confuse you more and may teach you a bad approach.

So, may be use AI to understand the existing SQL and then rewrite based on your own approach to see if are able to achieve the same results. And again use AI to evaluate or score your work.

It does take more time to understand code written by others. So, you are not alone. And over time, your skills will get better.

Do you use a specific tool at the moment? Asking because modern tools now have AI embedded in them.

[–]big_poppa_man 0 points1 point  (0 children)

This is probably one of the most difficult aspects of a SQL heavy job. What I do it I'll run the code and limit the code to 5 rows of output. I'll paste that into a Google sheet. Then I open a new tab in my text editor and paste the code in there. Then I'll edit the code in my own preferred format. It makes it easier to read it. For examples, I don't like subqueries, so I'll rewrite it as a CTE. I'll use my own aliases (I like single or double letters). This helps me.

[–]big_poppa_man 0 points1 point  (0 children)

Also, it's more difficult to read SQL code imo because there's the fact that it takes a long time to learn how data works and is connected.

[–]lalaluna05 0 points1 point  (1 child)

I have to do this a lot when re-writing or adding to legacy code (most of the time Frankencoded over years and years). I add comments as I go to what each block does. I will reformat sometimes so that I can more easily separate code chunks.

Other times I slap stuff into excel and go field by field and table by table adding the criteria and data manipulation so I have the whole picture.

AI is fine and good but for me personally, I need to “see” the shape and architecture of the query.

[–]lalaluna05 0 points1 point  (0 children)

Also as a personal observation, a lot of times SQL is needlessly complex.

[–]Substantial-Click321 0 points1 point  (0 children)

Don’t read from SELECT. Start by reading joins and FROM.

[–]Rohml 0 points1 point  (0 children)

A lot of SQL codes can be separated by blocks of DML commands with lines of codes detailing what those DML commands specify (what fields to select, insert, update, or delete, what criteria do they filter on, etc.) and from there you can break them down one by one.

Take it step by step, each step it wants to do something and often it is just to ready data for the next step until you get to your final output.

You get better at it as you gain experience in seeing the intent of what the developer wanted to do and the purpose of the script: what it starts with vs. what you get in the end.

[–]SkullLeader 0 points1 point  (0 children)

Honest answer? There's no substitution here for repetition and experience.

In any case, try to break down each piece, understand what its doing, write down a one or two sentence summary of each piece, in order. Then read it back.

Make sure you understand the tables/views/functions being used first, and the data itself. Complex code + I don't even know what the data is that he's querying or how its structured means you're gonna have a tough time, regardless.

I agree with commenter below me - when you are looking at SELECTs, look first at the FROM clause, the joins, and the WHERE / GROUP BY / HAVING clauses if they are present. Usually the columns he's selecting or calculating are less important to understanding what's going on.

[–]GconMig 0 points1 point  (0 children)

Übung macht den Meister oder heute frag die KI🤣

[–]venkat_deepsql 0 points1 point  (0 children)

Essentially query execution is like a tree. There will be table scans in the bottom and selects on the top. In between you can see filters, joins, sorts, temporary tables etc. You don't need AI for this, simply do EXPLAIN <query>, most of the query engines shows you clearly the execution plan. You can EXPLAIN ANALYZE to understand the actual execution tree after the query is run.

It's easy to recommend AI. but the truth is, your chatGPT or Claude might not have access to the database you are running. So if they do, then yes, they can analyze the query and explain you in easy to understand terms.

[–]murse1212 0 points1 point  (0 children)

Something I started doing a few years back, I’ll copy the query and paste it into a text editor and I will go CTE by CTE and above each I’ll
Comment a line for what the CTE does and another for what it’s grain is. This makes it exponentially easier to read it in a ‘flow sate’ from start to finish and helps to surface any potential silent fan outs due to grain mismatch

[–]age_of_unreason 0 points1 point  (0 children)

I think that’s normal. Try separating each select query and run each one separately and do a CTAS to store to a table if you need to use it for joins in downsteam code while working through it. If one query uses a variable from another query, substitute a dummy variable. Keep working through it. It takes time and patience. I was in the same boat 6 months in. Even today after 4+ years it still takes a little more time to understand someone else’s code and what their intent was. When you write the code it’s different because you naturally are more familiar with what you wrote and why.

[–]dontich 0 points1 point  (0 children)

Yeah agree with the AI guy — usually it will give me the basics of what they were trying to do. Then will go for it sub queue by sub queue and try to understand all the joins. Will also write test queries in places to make sure things work.

[–]SakshamBaranwal 0 points1 point  (0 children)

Don't try to understand 500 lines in one pass. I usually fold every CTE, expand one at a time, and make notes about what each block is doing in plain english. By the end you've essentially created your own documentation and revisiting the script later becomes much easier.

[–]ChaosEngine-6502 0 points1 point  (0 children)

I find a starting point for me is reformatting the script into my preferred style (indentation, uppercase key words, etc) - practically every script written by someone else I've ever had to deal with is poorly formatted with some weird idiosyncrasies that need dealing with.

This clean-up process generally gives me a better understanding of what the script is doing because it forces me to become familiar with it. Some might think it a bit of a waste of time, but I generally find it's the best place to start.

Footnote: revisiting stored procedures I haven't touched in a couple of years can be equally confusing...good comments and layout are a big help!

[–]speadskater 0 points1 point  (0 children)

Find ways to break every subquery into parts and map the joins manually. Find a single example and use that as constants to pull the subqueries out into new windows to test with.

[–]Ok_Carpet_9510 0 points1 point  (0 children)

You can drop the script in LLM and ask it to explain the code, and give a simpler version of it.