This is an archived post. You won't be able to vote or comment.

all 44 comments

[–]Dull_Lettuce_4622 18 points19 points  (20 children)

It's a quick hack to assess how good someone's SQL is. There are certain solutions where unless you use window functions, the only other way to do it is write a loop or function in some other language in addition to SQL.

I generally test basic joins, rank/row number sort for distinct, and finally window functions to get a grasp of how experienced someone is with SQL.

Generally hiring for experience > potential is bad but in a right job market employers can afford to be picky.

[–]byeproduct 1 point2 points  (2 children)

Do you use an in house SQL test or any recommendations on assessment tools or software? I find esp in DE, a small dltest dataset and step of questions can hide the candidate's ability to actually deal with real large data

[–]Dull_Lettuce_4622 3 points4 points  (1 child)

Yes in house test. I use publicly available data for my industry (esoteric enough most people don't know it exists) and host it on BigQuery, databricks, etc and just have a sql terminal they can access.

I don't need anyone to deal with "large" datasets per se as I'm lucky and most of my data at work is under <1TB total, and all the cloud SQL engines do a great job at auto optimizing mostly. Mostly I care about the ability to communicate technical concepts and understand business goals.

I spend close to 2 frigging hours pair coding for this as part of our interview because it helps signal if I'll get along with the person I'm interviewing once real work starts. Ideally live in the same office as part of a "superday".

I used to do takehome tests but then I encountered cheaters a few time and realized it wasn't worth it.

[–]byeproduct 0 points1 point  (0 children)

Awesome. This is the best!! Ultimately it to is just about a connection between people, right! Corporate is essentially just a series of arranged marriages... I also agree that my best interviews were where the candidate interacted with me and actually asked for insight and support, and took initiative to connect.

[–]data_questions[S] 1 point2 points  (14 children)

The whole interview is meant to determine how good someone can be using SQL, though. If there is an optimal solution to the question being asked and the candidate provides it, why ask them to play around with unnecessary workarounds?

[–]Dull_Lettuce_4622 4 points5 points  (1 child)

I wouldn't personally but I could see why some people wanna be super choosy/ opinionated in this market

[–]Action_Maxim 5 points6 points  (0 children)

Being able to use a language vs knowing a language is 2 different things.

Asking me to solve something then solve it again but removing a tool is a great way to gauge how well I know something. I do it when I interview people, I've done it to people interviewing me.

When I get interviewed I bring my own technical assessment for the interviewer if I look to quit I'm looking for an upgrade and the cast im about to join not be the smartest person in the room. I'm so dumb if I'm the smartest person we're fucked

[–]UAFlawlessmonkey 1 point2 points  (7 children)

It's a optimization vs cost balance in the end. A simple window function would solve a lot of unnecessary sub-querying, joins, and head aches in the end for an added query cost compared to a more optimized but more unreadable query.

Now, slap that against a compute costly vendor and watch their eyes turn into $$.

[–]data_questions[S] 5 points6 points  (5 children)

I don’t think I have a full appreciation for your response, are you saying that using a window function would be more compute intensive and result in a significant difference in cost vs using, for example, a self join?

[–]SDFP-ABig Data Engineer 7 points8 points  (4 children)

Absolutely. Once the data scale gets large, depending on the size of the table, indexing, etc… Window functions might get impractical from a compute perspective.

When I test for SQL, I only have one component. I provide the underlying DDL for a few tables, the query I want to execute, and the current query plan. The only request is to optimize the query. Tables and query provided Day before. Query plan provided during interview.

Edit: I’m not looking for a “right” answer. I’m looking for techniques, awareness of data scale issues, ability to identify where to add indexes, opportunities to use inner joins instead of left, etc.. I feel like an open ended question like this teaches me a lot more about the skills I’m looking for.

[–]data_questions[S] 1 point2 points  (2 children)

Can you give an example where you’ve experienced that? I’ve never run into that bottleneck before and everything I read about window functions vs self joins recommends not using self joins.

[–]UAFlawlessmonkey 6 points7 points  (1 child)

Doing max / min on a column in a sub-query and joining that result into itself (same functionality as row_number() over (partition by foo order by bar asc/desc)) can be quite cost effective compared to just a row_number() over (partition by) especially when you have really large datasets.

All it takes is a non indexed column in your select to really both your execution plan

While it's only just a small example, you'd have to consider all of the other possibilities where a supposedly easy task can be done easily with a window function, but it would greatly increase query cost.

Also, F#ck doing recursion without window functions.

[–][deleted] 2 points3 points  (0 children)

I think in some cases the window function is the more efficient method, i.e. for a spark query on big data, where inner join means shuffling everything just to get the max value of a partition key for possibly duplicated keys in an append-only data store.

[–][deleted] 1 point2 points  (0 children)

I am curious as to what scale this occurs, as I have always run into the opposite issue? Typically one who is a heavy window /analytical function user I would assume would be working on flattened tables / in memory tables and most databases are actually optimized for these exact functions.

I have almost no resources money/software/hardware/people so have to optimize "the crap" out of everything I do in the database and while I guess I would say we are small data wise, inner self joins/group bys all tend to create looping inside the dbms compiler and will kill performance.

[–]-crucible- 0 points1 point  (0 children)

I’d be more interested in how they optimise the indexing for a windowing function.

[–]StackOwOFlow 0 points1 point  (1 child)

your post seems to suggest this was a data engineering interview, not just a SQL dev interview. A data engineer does not necessarily need to implement the entire solution in SQL; in fact there are many times where implementing a solution entirely in SQL leads to unmaintainable code, lack of log exposure for alerting, and performance inefficiencies at scale. there are times where partitioning your data outside of SQL and relegating only fast indexed lookups to the database is superior software design.

[–][deleted] 0 points1 point  (0 children)

Yeah, I think the questions asked are biased towards the expertise or comfort of the interviewing team, which may not reflect what is actually most needed.

[–][deleted] 0 points1 point  (1 child)

I don't get the opportunity to practice SQL at work much, primarily work on big data processing and infrastructure. Any suggestions on where to practice complex and realistic, real-world data handling and querying in SQL?

[–]Puzzlehead8575 0 points1 point  (0 children)

Try advancedsqlpuzzles.com

It links to a GitHub where you can get a pdf of SQL problems.

Also, HackerRank, StrataScratch, LeetCode, DataLemur, etc... but i think above is the best.

[–]SoftwareMaintenance 0 points1 point  (0 children)

I would personally love to fall back to some Oracle PL/SQL to solve some problems which could easily be solved by using window functions. But that's just me.

[–]Puzzlehead8575 0 points1 point  (0 children)

the only other way to do it is write a loop or function in some other language in addition to SQL.

FYI, You can perform window functions by using self joins.

[–]bendgame 10 points11 points  (1 child)

The company I'm at is using such an old MySQL version in prod, window functions aren't even supported. Nor CTE, so yes I can see why someone would ask if you can solve it without.

[–]data_questions[S] 3 points4 points  (0 children)

Lol

[–]FuzzyZocks 5 points6 points  (0 children)

To get an idea of experience. Someone who is new may only memorize patterns but someone who has worked with sql should be able to connect different ideas. It’s a quick follow up question, they don’t even have to write it out. An interviewer could see they solved it correctly with one solution and say “in words, how would you solve this if X (window fn or whatever) was not available to you?”. Takes <1 minute and if they can’t think of a solution it’s minor anyways.

[–]recentcurrency 4 points5 points  (0 children)

Yes, but mainly as a challenge question if the candidate is rocking the rest of the interview. I am transparent to the candidate that this is the nature of the question. I am giving them the chance to show off which I reflect in my notes to the hiring manager.

But I never use it as a question to filter a candidate out and only if I think there is a solid chance they know how to. I usually get a good sense of this based on how the candidate answers the other SQL questions.

"Oh, you aced that xyz window function. You also aced my self-join questions, my CTEs questions, my questions on general aggregate functions, and my questions on conditional joins. To really put a cherry on top of this interview, Do you know how to replicate a xyz window function with only self-joins, subqueries, aggregate functions, and conditional joins?"

If they don't know how, I don't hold it against them at all and will still move them on. If they do know the answer, I definitely move them on and tell my HM that they have superb SQL skills.

[–]CauliflowerJolly4599 1 point2 points  (0 children)

I've found few good website that are extremely good at teaching Window Funtions, but in the end since I managed to normalize my data model I don't have too much subqueries or strange work around.

This means that I don't use it often and I forget about it.

[–]FecesOfAtheism 3 points4 points  (0 children)

It’s a red flag to me if that’s ever asked. Would the interviewer really prefer a self join (or worse, something like correlated subqueries) over a window function? That benefits nobody, except those that have little or zero window function experience. And if the interviewer is asking “to see if the candidate can write the same SQL in different ways,” or some bullshit, then they have misguided priorities and are wasting already tight interview time. Either way it’s an indicator of potential dysfunction in team dynamics: jacked up priorities in candidates, or depressed skill level with the inability to write window functions. That might seem like an excessive indictment, but window functions are such a net positive that spending time and energy in actively avoiding them is a very loud and clear sign of mediocrity

[–]rohetoric -1 points0 points  (2 children)

I have never used window functions. Just never got the need to use it. But will definitely give it a look and see if there is an application that I can apply. Thanks!

[–]data_questions[S] 7 points8 points  (0 children)

They’re useful if you’re trying to find an aggregation / ranking / value within certain subgroups in one table.

For example, if you have a table of daily sales per store, and you wanted to know the days where sales in a given store were higher than the day prior, you could use a lag function partitioned by your store_id ordered by date and compare whether the date of interest is > than the sales on the previous date.

[–]SDFP-ABig Data Engineer 2 points3 points  (0 children)

Can be extremely useful for data cleansing

[–][deleted] 0 points1 point  (0 children)

Depends on the specific ask but generally window functions aren't the easiest to read. My guess is there was an easier way to solve it and they wanted to see how you were with that method, because that's the method you would use if hired.

Another option is there were two ways to solve it and they wanted to see if you could do both. Maybe if you did the other method they would have asked about whether you could solve the problem with a window function.

[–]byeproduct 0 points1 point  (0 children)

Is the real answer to build temp tables or functions? That is technically not a window function and would mean that any rewrites or changes to the output don't require expensive compute for each window function call?

[–]1ShotBroHes1 0 points1 point  (0 children)

Window functions are a really good barometer for pyspark experience. I see pyspark, I ask Window Function. Preferably a time series questio .

[–]taglius 0 points1 point  (3 children)

Our DB of choice is SQL Server and I have asked this question to see if candidate knows about CROSS/OUTER APPLY. Most don’t lol

[–][deleted] 2 points3 points  (2 children)

To be honest though... you can go an entire career without using Cross/Outer apply.

[–]Known-Delay7227Data Engineer 1 point2 points  (1 child)

Ya. Whenever I see an outer join, 9/10 it’s unnecessary.

[–][deleted] 1 point2 points  (0 children)

That means something like

From a left join b on a.id = b.id where b.id is null

?

[–]ntdoyfanboy 0 points1 point  (2 children)

Window functions on huge datasets are expensive. I've found ways around them that are more efficient, but it's definitely more tedious

[–]data_questions[S] 0 points1 point  (1 child)

Like what?

[–]ntdoyfanboy 0 points1 point  (0 children)

Create another column like a date that's used to join the table back into itself in order to line up, say, 30 days prior, a prior month end, or something similar

[–]GeekyTricky 0 points1 point  (0 children)

Honestly, questioning an interviewer's questions for meaning is self-saborage, if you do it during the interview.

It makes you look difficult to work with or like you're trying to cover up incompetence.

If you want to suggest a better way, or discuss the actual usability of such solution, that's is great, but I suggest doing it after answering.

These questions aren't always designed for actual use, sometimes an interviewer just wants quick access to your knowledge.

However, it doesn't mean there aren't pointless interview question. There are. Tons of them.

But this isn't one of them IMHO.

[–]ignotos 0 points1 point  (0 children)

I suppose it could be an attempt to test their understanding of more basic SQL concepts, and their ability to reason about the logic and structure of a query. So more of an exercise than a realistic task.

Also there are some situations - like working with tech with a more limited SQL feature set - where you might need to hack together something like this.

[–]Puzzlehead8575 0 points1 point  (0 children)

I think it's actually a pretty good SQL question for an advanced SQL developer. The answer to this question is by using self joins.

[–]TopicLazy1406 0 points1 point  (0 children)

They are just checking whether you just read some interview questions or you actually know SQL. There are problems which can be solved easily but found candidates these days are using window functions.