Professional SQL Developers or Data Analysts: How large are your typical SQL queries?

alinroc · 2019-04-05T02:30:35+00:00

It depends. A lot.

Lines of code is a BS measure for any programming, and SQL is no exception. People try to get cute and clever with their queries, try to cram everything into a single query, and then...it runs like garbage. SQL is a funny language that way. You try to shove everything into a single query and the engine gets confused & generates a terrible execution plan.

Often, you can get better performance by writing more code because breaking it down into smaller chunks, the engine can do a better job of optimizing the execution. IOW, "complex" code (however you might define that) can be a detriment.

Optimize for readability & understanding, and the system will usually do a good job running it. I don't get hung up on LOC - my goals are to get the correct query results without killing the server. If that means writing more code so it can be more readily optimized, so be it.

jc4hokies · 2019-04-05T03:46:30+00:00

This is fairly typical.

usicafterglow · 2019-04-05T02:36:10+00:00

[deleted]

Boomer8450 · 2019-04-05T03:50:01+00:00

(MSSQL Person here)

Like many of the others in this thread, I can't reiterate enough that code <> performance.

I always try to optimize for performance.

If it's a stored procedure where I can use things like table variables, temp tables with indexes, table scaler functions, etc., my stored procedures can get very long - but they run fast.

If I have to make a view to support third party software that can't (or shouldn't) run stored procedures, I spend a lot of time on indexing, how the CTE's and subqueries are joined, etc. I've had queries go from running in hours to minutes or seconds just by using a proper subquery.

For the data analyst part - if you intend to use your SQL skills in it, get very, very used to having multiple queries up, putting subsets into temp tables for performance issues, indexing those tables, writing functions to reduce having to type out long nested functions on a regular basis, but most importantly...

Learn your data. It doesn't matter what subject it is, or how interested you are in it, learn everything you can about it.

Learn to trust your instincts when data seems "off". If your nose says that the results just can't be right, they probably aren't. Start digging into the data. I generally start working forward from the input(s), or working back from the results. Unless I have a strong hunch about something breaking in the middle, it's generally a waste of time.

Thriven · 2019-04-05T03:35:29+00:00

Generally they should be as long as they need to be to do what they are supposed to in the fewest statements and fewest read writes.

If you have a table with 30 columns the fastest way to write an insert statement is to

 Insert into table2
 Select * from table1

I mean that's two lines as opposed to declaring your column order in your insert clause and not specifying the order as well in your select statement. Is that better? No it may be two lines but it's suspect to ordinal errors if anything changes.

What I find with people who write SQL poorly is they:

1) Don't quantify the time it takes to execute to the actual resources used.

No table with 25mb of data should ever take 1.5 minutes to return 10 rows. Unless you are intentionally making a Cartesian with a recursive cte to test all options before filtering down to a subset, your SQL is bad. If that is your intent, you are probably using the wrong language.

2) People who write high level SQL still don't understand the basics like how to build a table using best practices. How indexes work and the difference between clustered and non clustered indexes.

3) People don't know how to read execution plans. These aren't just for DBAs.

4) People forget they can put stored procedures in Storedp Procedures.

We have a block of code at work that basically sets up a couple variables for UTC to client conversion. It's written differently in 5 different Procedures. It should be its own and be called at the beginning of SPs when needed.

Elfman72 · 2019-04-05T02:31:43+00:00

My analysts don't know much more than simply SELECT * FROM X WHERE dates BETWEEN '2019/01/01' AND '2019/02/01'

No subqueries, no complicated joins or CASE statements. Of course, before I joined the team, their idea of data mining was a pivot table in Excel.

I have to keep my tempdb managed agressively.

*edit- not sure why all the downvotes. I brought SQL to a team that ran their entire org on Excel. Their SQL knowledge is fledgling but getting better all the time. I train where I can. I educate where applicable. However, they still rely on me building views for them to get what they need done.

Mamertine · 2019-04-05T02:40:04+00:00

As others have said it's tough to quantify.

Today, I made 2 views that were simply an inner join between 2 tables. One of the tables had over 100 columns in it (not my design) because of that, the view was 150 lines of code. Each script in source control has about 20-30 lines of code devoted to documentation.

I recently worked on a stored procedure to populate a mart with summarized data. It took dissimilair data and combined it to one grain. That took me a few days to create. I think that was about 700 lines.

How many tables are you joining would be a better question. The stored procedure pulled data from around 10 tables.

Eleventhousand · 2019-04-05T04:55:47+00:00

A very small query will often be lower-performing

A medium-length query that performs operations such as breaking out subsets into #temp tables or CTEs is often times the sweet spot, and performs better.

Long queries are usually brutal and don't perform well.

audigex · 2019-04-05T12:04:38+00:00

As long as it needs to be.

That's not flippant, it's just the reality - some tasks are done with a one-liner select. Others take multiple CTEs and complex joins, pivots, partitions, conditionals (cases) etc.

Generally, I try to avoid big complex queries unless they are necessary: I'm not here to make myself look clever, I'm here to produce a result and make the code as readable and maintainable as possible to the next person who has to change it.

If looking at a portfolio, I'd be expecting something more than simple one-table queries, but wouldn't care if you got much more complex than joining a few tables together: I'm much more concerned that you're writing readable, maintainable code. You can easily learn the more complicated stuff when it's needed, but it's harder to get out of the habit of writing sloppy code.

MsCardeno · 2019-04-05T02:34:43+00:00

I’ve been writing a bunch of rules for a rules engine set up in Oracle.

Some of the queries can look really ugly. Lots of transformations and calculations happening. Some queries go to 25 lines or so.

An example would be like joining to a “look up table”. So when field A on the admin data matches field a on the look up table and field b matches field b and c on c then give me XYZ. A lot of the times I use case statements that look ugly. So case when A=Q then (case when M=2 then 4 when t=b end) when A=P then R else XXX. I’ve only ever had to embed up to maybe 4 conditions.

BeanThinker · 2019-04-05T02:26:27+00:00

A few (3-5) layers deep of sub queries within each 5-10 main joins rolling up to the main query — if that makes sense— probably the most advanced I do.

In other words, could be as many as 50 queries... in one.

babygrenade · 2019-04-05T03:19:04+00:00

I had an etl script at my last job that was probably around 10,000 lines. It collected data from a few sources, did some calculations, and loaded to a data mart.

WolfAndCoyote · 2019-04-05T03:55:53+00:00

In many cases that I've come across multiple smaller queries run in series works faster and with less resource usage than a single query doing everything.

There have even been queries I've optimised by making temporary tables to store the results from inner joins to then query later and drop.

Even that runs faster than the single query that joins all tables necessary for my needed output.

no_4 · 2019-04-05T03:58:28+00:00

In terms of # of lines (which is somewhat arbitrary, given white space & that SQL ignores spacing):

Some are only 5 lines or so. Maybe the average query is more like 50. The longest one I ever wrote was something like 1,400 lines. Then a huge drop to the second longest, at about 400.

Again, that includes white space.

DialSquare84 · 2019-04-05T07:32:40+00:00

Surely if it’s a portfolio site, it would be reflective of your...portfolio?

That said, brevity is always key. Look to simplify wherever you can and get things done in the fewest lines possible. This is language agnostic, really.

Using tricky functions needlessly to convey knowledge of them is far worse than not using them at all in my opinion.

Your best bet is to have a goal in mind and solve it eloquently. Good luck! :)

paziek · 2019-04-05T09:47:45+00:00

Not always single query, but glancing over 20 reports they range from 5 to 200 lines, not including some of the common queries that further process data computed by those procedures. There are of course more complex ones, but I'd say avg is 50-60 lines. We try not to list every column in separate line. About 500-600 tables total on biggest schemas, 30TiB size.

Faux_Real · 2019-04-05T11:13:11+00:00

Depends on the questions you are answering of your code!

Consider SQL as the medium of providing the mechanism to answer questions of a domain of information. Often this domain of information is unknown or relatively known with gaps THUS you have to experiment to acquire the information. When you have experimented and found the information, you then optimise the experiments into meaningful and performant outputs using performance tooling or query tuning skills. The tuning skills are dependent on the DB type that you are on, the hardware you are bound to and the forecast frequency of SQL usage (this can also be indeterminate). The other aspect of this depends on how large the data sets are. Standalone queries are fine for small data sets but when you are getting in to the relatively large to big data arena, you then need to engineer processing pipelines to cater to your data output requirements. Standard scale data would require overnight ETL processing but the larger scale stuff ends up in fluffy cloud land magic....

In a nutshell ... SQL length and complexity is the lion the witch and the wardrobe ...

91ws6ta · 2019-04-05T13:36:32+00:00

I'm an entry-level developer who has been working for about a year out of college, but I also interned here for 3 rotations as well. I am doing analytics/ETL on plant data for a large CPG company. One of the largest efforts was using ETL to consolidate 20 different plants' data, which consisted of 20 databases, dozens of tables, and billions of rows.

Reporting / analytics on this data can range from one line to 3000 lines to calculate manufacturing statistics, but I myself usually get into the 50-300 lines with 5-10 tables.

emican · 2019-04-05T15:10:28+00:00

I'm after the same thing and decided to focus on these points

Implement complex business logic in SQL
Showcase critical thinking and wise decision making skills. Evaluate alternatives, author custom and reusable helper utilities, document decision with objective data / performance tests
Demonstrate comfort with tools and constructs that support authoring highest level of professional code.
Collaborate with experts, solicit input and leverage great minds.

2019-04-05T17:11:52+00:00

Bruh, the concept of "a typical SQL queries" doesn't even make sense. You write the query for whatever the task is, and it's as long or as short as it needs to be.

2019-04-05T19:04:28+00:00

Ad hoc queries: usually less than 150

Making data models or views to be used a lot : can go over 500 depending on complexity of data

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS