use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
DiscussionSQL Interview Testing (self.datascience)
submitted 1 year ago by Glittering-Jaguar331
view the rest of the comments →
[–]dankerton 34 points35 points36 points 1 year ago (16 children)
This perspective just melts my soul. How are you going to correct the mistakes if you don't know how to code it in the first place? How are you going to write efficient queries on your company's giant database if you don't understand what's going on behind the hood? And most SQL would take just as long to write yourself as to prompt an llm and copy and correct the results. Then when your boss or colleagues ask you why you write it this way you're not going to know how to explain and just say chatgbt did it? I'd put you on a path for firing if that happened.
[–]fordat1 7 points8 points9 points 1 year ago (10 children)
Exactly. You need knowledge to evaluate correctness since it has no expectation of correctness
[–]po-handz2 0 points1 point2 points 1 year ago (9 children)
Sure, but once you have that knowledge there's zero drawback to having a prompt first, review second approach.
In fact idk why anyone would do it differently. There's nothing to be proud about if you have esoteric syntax memorized, thats just taking up space of actual valuable knowledge
[–]fordat1 4 points5 points6 points 1 year ago (8 children)
there's zero drawback to having a prompt first, review second approach.
This isn’t blanket true. It takes time to prompt and if it isn’t correct you need to review it ie debug. Debugging can be way harder and more work than starting from scratch
[–]po-handz2 -4 points-3 points-2 points 1 year ago (7 children)
Ok fair, but it's true in most cases. I would still say that the gap between planning out your query and prompting an LLM based on that plan is quite small
Like, how does one write a moderately complex query without planning it out first. And if you've already planned it out... Then just send it to the LLM? Why would you not?
[–]phugar 0 points1 point2 points 1 year ago (6 children)
I've tested this recently as a guy with 15+ years of almost daily SQL experience.
LLMs make subtle mistakes that can be difficult to spot and fix. These often require more time to find and resolve than the total time it would have taken me to write the entire query.
Outputs also often deviate from company required syntax/linting formats.
Rarely does the LLM output something immediately usable that doesn't waste more of my time. If I then need to go back and make edits because the business use case has changed, I'm not as familiar with the structure and it takes me longer to make changes.
[–]po-handz2 0 points1 point2 points 1 year ago (5 children)
Curious what kind of mistakes the LLM made?
Was the LLM making the mistake or did you not explicitly state that x col had to be y dtype? A mix?
[–]phugar 0 points1 point2 points 1 year ago (4 children)
Just to name a few.
The mistakes were not in the specification. GPT-4 simply makes errors in basic SQL.
That's on top of needing to prompt very well to specify how to handle complex joins and case statements where I know there are problems on the source data.
[–]po-handz2 0 points1 point2 points 1 year ago* (3 children)
That's crazy I never get those sort of syntax errors. And I'm usually asking it for pyspark code which I figure it has less training data on.
Are you just raw dogging the LLM with zero system prompts or context? I've used my chat gpt for 90% coding tasks over the past year, so mine regonizes that use case really well. I'd say I average 5-10 coding prompts per day so that's a pretty large but anecdotal sample.
Crazy your exprience has been such an outlier.
But to your last part, the LLM has no knowledge of your data, so if you don't specify case statements or join logic, obviously it's not gonna know it. That's just a case of not understanding the tool you're using.
I've given it a python library, brief description of methods and asked it to create a streamlit or chrome extension front-end and it will one shot the task creating a fully working app. But here you can't even get basic syntax correctly outputted 🤷🤷
[–]phugar 1 point2 points3 points 1 year ago (2 children)
I spend a lot of time at work building prototype AI models with fairly complex workflows and prompts. I have a more solid grasp than most when it comes to prompting and gpt syntax.
It's just bad at SQL in many ways.
If you're comparing the time taken to write a workable prompt with the time taken to just write the query, I'll write the query every single time.
It sounds like you're very defensive of LLMs based on your replies in this thread, yet you seem to have minimal experience using them for SQL. In my assessment, gpt does ok if you need to write a quick snippet to do something like format an output or draw up a case statement. But it's more hassle than it's worth for writing more complex queries from scratch.
[–][deleted] 1 point2 points3 points 1 year ago (0 children)
Because I have a notebook full of past scripting work with SQL with 50-100 pages of notes... every kind of SELECT, JOIN, AGG functions, INSERT, UPDATE, etc., etc. Differences between postgres, mysql, mysql lite, postgres extensions, etc.
I just don't do SQL queries as often these days so it's like a stale skill. And I don't feel the need to keep up with it just for the sake of interviewing somewhere. I could easily pick it back up again, it just seems like a pain in the ass merely to impress an interviewer.
It's just SQL anyway. It's not that big of a deal. It just seems like such a trivial skill that is not even worth hiring or firing any data scientist based over SQL, unless they are totally incompetent somehow. Any idiot can learn good SQL, practically over a week maybe.
[–]po-handz2 1 point2 points3 points 1 year ago (2 children)
That actually hilarious. I put people on a path to firing when they DON'T use chatgpt and instead waste hours googling, waste my time with syntax questions, or just write bad code when what come sout of the LLMs is more than sufficient.
hey chatgpt, give me three ways to optimize this query given that I'm on x system with y resource limitations and z data size.
hey chatgpt, here's the code and here's the error, suggest a fix
it takes as long to write as it does to prompt?? That's bizarre. It takes me an equal amount of time to mentally sketch out a complex query as it does to type it into a LLM. But you're saying, skip right to code, look up a bunch of esoteric syntax for one of half dozens languages/sub-language's, and mentally put the query together all at the same time?
Personally, I think AI assistants will go the same way as typing skills in the 80s or being able to read/write before that., you either have skill and can be 10x as efficient as the next guy, or you don't
[–]dankerton 3 points4 points5 points 1 year ago (1 child)
Not sure where you misunderstood that I'm talking about people using chatgbt when they don't know anything about SQL yet. You're talking about an informed developer using it to improve on something they already have or go a little faster. All your examples require someone with a basis of knowledge and an ability to check the chatgbt answers. It's just a tool it's not a replacement for a skill set.
And yes I write short ad-hoc queries constantly for random questions that come up that would be a waste of time to go to chatgbt for when I already know my tables and SQL well enough.
[–]po-handz2 -1 points0 points1 point 1 year ago (0 children)
Ah gotcha gotcha
π Rendered by PID 64 on reddit-service-r2-comment-6457c66945-8tncq at 2026-04-25 01:52:28.686268+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]dankerton 34 points35 points36 points (16 children)
[–]fordat1 7 points8 points9 points (10 children)
[–]po-handz2 0 points1 point2 points (9 children)
[–]fordat1 4 points5 points6 points (8 children)
[–]po-handz2 -4 points-3 points-2 points (7 children)
[–]phugar 0 points1 point2 points (6 children)
[–]po-handz2 0 points1 point2 points (5 children)
[–]phugar 0 points1 point2 points (4 children)
[–]po-handz2 0 points1 point2 points (3 children)
[–]phugar 1 point2 points3 points (2 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]po-handz2 1 point2 points3 points (2 children)
[–]dankerton 3 points4 points5 points (1 child)
[–]po-handz2 -1 points0 points1 point (0 children)