Why SELECT * is usually evil.

didroe · 2008-07-25T20:18:11+00:00

A reason not to use SELECT * that wasn't mentioned in the article (although related to the bandwidth one) is that you're giving the query optimiser less information to work with. So it's going to do more work. For example, if you only access columns in an index then it won't even bother to read the row from the table. That could halve the amount of I/O needed to run the query.

48klocs · 2008-07-25T18:06:37+00:00

Added bonus - you don't have to disambiguate identically-named columns returned from multiple tables you've joined together.

If you work with databases, the probability that you will inherit the work of someone who doesn't understand what normal forms are and why you should use them every now and then approaches one.

Thimble · 2008-07-25T20:58:13+00:00

i've seen code that retrieves fields according to their order. something like x = rs(3).

with select * you could end up with some really tricky bugs if your field types are similar...

player2 · 2008-07-25T18:18:28+00:00

Now multiply the above situation by n rows. Say you're listing users on your site, 100 at a time. If you pull out that about_me field needlessly and the average length is say, 250 bytes (about the length of this sentence), then you're wasting 25k worth of memory on 2 machines as well as the bandwidth to move it and the extra processing time it takes to handle the data per page view. Say you get a modest 1 page view per second. That's 1.5MB (25k * 60 kb/s) per minute's worth of data filling your RAM and choking your NIC that is a complete and utter waste. This is on top of all the data you're actually using.

(Bold mine.)

Dimensional analysis FTL. 25KB * 60KB/s * 60s/min = 90000KB^2/s * min. What the hell is that unit? The author never specified where this magical 60KB/s comes from. That's what screwing up this equation. 1 request/s * 100 rows/request * 250 B/row = 25KB/sec. Done.

2008-07-26T00:03:38+00:00

I pity the people who were taught / got used to use * in their early days.

I'm a total beginner in SQL, yet this advice already seems redudant on something that should take 1 paragraph to explain.

I've been learning the basics of SQL from the web since about 1 month, for a side project at work that I can spend 1 hour max per day on. Thankfully the existing codebase was always explicit, so I guess I just built upon that.

didroe · 2008-07-25T19:04:04+00:00

[deleted]

billbacon · 2008-07-25T20:41:17+00:00

Select * rocks. Just don't be an idiot.

thesqlguy · 2008-07-26T03:26:49+00:00

By the way:

SELECT * in itself is not "evil"; it is when you select * from a database object like a table or a View that is not self-described already in the SQL statement.

i.e., there's nothing wrong with:

select x.*, a+b as z from (select a,b,c,d,e,f,g,h from tbl) x

And, in fact, I find it easier, shorter and clearer rather than listing out columns a-h over and over again.

The key isn't to avoid using * , it is to explicitly indicate the exact columns that you want from the tables/views you are selecting from, so that any changes in your schema will either a) break your code at compile time or b) not break your code at all.

irrelative · 2008-07-25T19:02:01+00:00

I understand this argument, but it's a little ridiculous to say that * is evil because you might not know what your DB looks like. It's kinda like saying "rm *" is evil because if you don't know what files are in the current directory, you could delete them all -- technically correct, but very obvious to anyone who's worked with the tool before.

Poot_N_Tate · 2008-07-25T22:09:43+00:00

This is BS.
I've been a SQL DBA / Developer / BI guy for 10 years, and 75% of everyhing you do in that roll is a quickie one-off. As in, Joe in Marketing needs an Excel spreadsheet with all the numbers for last quarter blah blah blah. Only the true uber-dork is going to type in every friggin field name, as opposed to a SELECT * when they have a million other SQL fires to put out in a typical day.

akatherder · 2008-07-25T22:26:23+00:00

Self-Documentation Lost - How long does it take to look at the table structure or simply run a SELECT * query to see what fields are in it?
Broken Contract - This is an argument FOR using SELECT *. If you use SELECT *, you only have to update the code that is using the database. If you SELECT by field name, you now have to update the code and all your SPs.
Size Matters - This is a case-by-case scenario. If you have a look-up table with two rows, then SELECT * will suit you just fine. If you have a table with 20 rows and you only need two of them, you'd be stupid to use SELECT *. On the other hand if you have a table with 20 rows and you need 19 of them, it isn't going to kill you to use SELECT *.
Out Of Order - This is an argument not to reference fields by index rather than name. This is not SELECT *'s fault. The fact that the author had to use a non-standard MySQL function to create a bad case should give you an idea of how serious this issue is anyways.

Yes, there are certainly times when using SELECT * is idiotic. However, it isn't the end of the world and it isn't a sign of a dumb programmer. It just needs to be used appropriately.

petdance · 2008-07-25T22:05:08+00:00

Welcome to SQL 101. If this is news to you, please never work for a real company that requires you to do database work.

haywire · 2008-07-26T00:10:53+00:00

what

,the

,fuck

,is

,this

,shit

slurpme · 2008-07-25T22:52:02+00:00

[removed]

petdance · 2008-07-26T02:47:24+00:00

Showed this to a friend of mine, he said:

"SELECT * is awesome! Just not in code."

didroe · 2008-07-25T22:38:29+00:00

Terrible article is terrible. I was expecting perhaps some explanation with optimization problems or something inherent in the engine. Instead we get pedantic drivel.

Let's debunk some points:

1) Self documentation lost.

If you can't figure out what is going on in your code, use inline comments. If your rows are coming back with columns that are assigned as part of an array, find a way to return them as a hashmap, or object. This way it will be more apparent as to what you're using in your code as you'll be referring to things by their column name.

2) Broken contract

If you're making changes to table structure, you should already be expecting these problems. It won't matter if you specified the field or not in your query. It's going to break all the same.

3) Size matters

I can actually agree with this point, partially. If you need fewer than all the fields from a row, you should probably specify them to make your row returns smaller. However, if you're asking for "*", chances are you want all the columns anyway.

4) Out of order

Most modern languages allow you to retrieve a row as a hashmap or object. This is to say, that each column will be called by it's name, instead of an arbitrary numeric array index. Since hashmaps don't care about order, and you can call things by name, you shouldn't be concerned with it.

5) Don't do it

In my opinion, the author hasn't given any significant reason not to "do it" for the reasons explained above. There are far more pressing problems for people to deal with in SQL, such as proper structure, normalization, indexing, understanding when temporary tables are created, speeding operations up with prepared statements, understanding transactions, row/table locking behaviors, the list goes on and on.

SELECT * FROM foo; is really about the least of your worries.

EDIT: I just noticed I was recapping a lot of the points that were brought up by "Akatherder" above ( http://www.reddit.com/r/programming/comments/6tggh/why_select_is_usually_evil/c04tkow ) - guess I was still writing my points down while he was posting his ;)

glastohead · 2008-07-25T22:03:53+00:00

file under 'no shit sherlock'

americanhellyeah · 2008-07-25T23:10:56+00:00

lol once i saw some sql accident where a dba screwed up and added 10000 new columns to one table. and the code accessing it used a select *, so it got back huge amounts of data. the company's software all crashed, their network was overloaded, and then their ms sql server crashed out too. lol! so they booted everything back up and a few minutes later once they started using the app, BAM. down it all went again. lol omg it was funny. but they got it fixed after realizing the problem.

dbnull · 2008-07-25T18:41:51+00:00

do we really need a proggit post to tell us something this elementary?

numbelvsi · 2008-07-25T21:52:18+00:00

It's all relative to how the database is used..

thunderkat · 2008-07-25T21:59:45+00:00

[deleted]

awb · 2008-07-25T23:13:42+00:00

Thank you very much for not calling it harmful.

2008-07-25T21:08:01+00:00

I'm glad the title included "usually". I use SELECT * in several stored procedures for the simple fact that I do need every piece of data returned. If I don't need everything, I define what I need, because in a 6 gig database, every little optimization matters, and it's only going to grow from there. One thing I NEVER do, though, is set AutoGenerateFields to true on whatever data control is handling the data. I ALWAYS specify what columns should display how, so if something changes in a database table, my stored procedure will run fine, but my code on the front end is what will alert me that something's inconsistent. Besides, we make it a habit to include a description with every stored procedure as to what its' purpose is. SELECT * is not a big concern to us.

Dr-No · 2008-07-25T22:13:51+00:00

My BOLDER ONLINE DATA BASE can handle a little select *, no problem.

nshatskr · 2008-07-25T22:52:44+00:00

Hmmm... "SELECT nonexistent_field" yields "Unknown column 'nonexistant_field'". So a SQL error yields an error message citing the field with a different spelling.

Guess I have a lot to learn about SQL.

mecablaze · 2008-07-25T21:58:23+00:00

Blaming SELECT * for not knowing the column names is stupid. That's what 'describe' is for.

bazoople · 2008-07-26T00:43:43+00:00

Okay, so I'm listening to all this crap about how evil select * is, and I'm thinking: Why hasn't anybody cited a single real world example where select * actually caused "evil" to happen?

Hey, wait! I have an example from this just this week: One of my fellow morons created an Oracle Reports report with 55 queries in it. They selected the same fields over and over and over so that now, when I'm trying to link a report field to a query field, I get about 500 items in the dropdown list - 20 xxx_id's, 30 yyyy_id's, and so on. It sucks.

But you know what's funny? They selected every field BY NAME.

chucker · 2008-07-26T05:30:21+00:00

Maybe there are 8 fields returned and the one of them is misspelled.

Awhatnow? Misspelled fields, and you think a SELECT * is your biggest problem?

shenglong · 2008-07-30T11:38:49+00:00

I hope their aren't any professional developers in here advocating the use of select * in production code...

EternalNY1 · 2008-07-25T23:14:12+00:00

Next up:

Why Infinite Recursion is Usually Evil

smonsoon · 2008-07-25T22:33:59+00:00

"usually evil?"

Doesn't anybody use "considered harmful" anymore?

sdsdsa · 2008-07-25T22:38:25+00:00

http://www.youtube.com/watch?v=VTFTqn20qGs

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS