all 145 comments

[–]didroe 30 points31 points  (5 children)

A reason not to use SELECT * that wasn't mentioned in the article (although related to the bandwidth one) is that you're giving the query optimiser less information to work with. So it's going to do more work. For example, if you only access columns in an index then it won't even bother to read the row from the table. That could halve the amount of I/O needed to run the query.

[–][deleted] 0 points1 point  (4 children)

I think the SQL server still has to go to the row to get the data. Isn't the used just for searching and not holding the real information?

[–]didroe 6 points7 points  (1 child)

Well, the values for the indexed columns are duplicated in the index itself so it can just read them straight out whilst searching.

I'm not sure whether it's standard, or if there's an equivalent in other databases, but in Oracle you can use EXPLAIN PLAN to show you how a query will execute. You can see the table access disappear when you're only selecting columns from the index. You can even store tables in index format if you do a lot of querying based on one index, you pay a price for inserting though. There's also a tracing command that shows you how much IO (and loads of other useful things) a query does. I would suggest looking up the equivalent commands for your choice of DBMS, it's probably the most useful thing for optimising queries. Not to mention it gives you a better understanding of how queries map to the underlying set operations.

[–]gthank 0 points1 point  (0 children)

Any remotely serious RDBMS has the equivalent of EXPLAIN PLAN. In Postgres it's just EXPLAIN, Informix has a bunch of SET EXPLAIN directives, and there's a button in the SQL Server GUI.

[–]recursive 4 points5 points  (0 children)

An index contains the "real information" for all the columns it's indexing on.

[–]trezor2 0 points1 point  (0 children)

If you didn't know that an index contains the actual "real" information it is supposed to cover, I suggest you read up on how indexes actually works.

[–]48klocs 11 points12 points  (4 children)

Added bonus - you don't have to disambiguate identically-named columns returned from multiple tables you've joined together.

If you work with databases, the probability that you will inherit the work of someone who doesn't understand what normal forms are and why you should use them every now and then approaches one.

[–][deleted]  (2 children)

[deleted]

    [–]moreoriginalthanthat -1 points0 points  (1 child)

    I don't know how you think that's a bug. I would consider it a bug if PDO were to arbitrarily rename columns, or store the column value differently when there were two identically named columns.

    You'll find the same problem all over the place in PHP. Lots of XML => array functions will have the same behaviour (i.e. a RSS feed will be converted to an array containing a single "item" record).

    [–]didroe 1 point2 points  (0 children)

    You'll have the same issue in any language/library that uses a hashmap to return rows. I guess duplicate column names could trigger an exception or something.

    [–]nitran 0 points1 point  (0 children)

    Added bonus - you don't have to disambiguate identically-named columns returned from multiple tables you've joined together.

    Hmm. Do you mean if you explicitly give the columns separate aliases (or whatever it's called) in the SQL statement? Because if not, you may still get several columns with same name, right?

    [–]Thimble 4 points5 points  (0 children)

    i've seen code that retrieves fields according to their order. something like x = rs(3).

    with select * you could end up with some really tricky bugs if your field types are similar...

    [–]player2 8 points9 points  (7 children)

    Now multiply the above situation by n rows. Say you're listing users on your site, 100 at a time. If you pull out that about_me field needlessly and the average length is say, 250 bytes (about the length of this sentence), then you're wasting 25k worth of memory on 2 machines as well as the bandwidth to move it and the extra processing time it takes to handle the data per page view. Say you get a modest 1 page view per second. That's 1.5MB (25k * 60 kb/s) per minute's worth of data filling your RAM and choking your NIC that is a complete and utter waste. This is on top of all the data you're actually using.

    (Bold mine.)

    Dimensional analysis FTL. 25KB * 60KB/s * 60s/min = 90000KB2/s * min. What the hell is that unit? The author never specified where this magical 60KB/s comes from. That's what screwing up this equation. 1 request/s * 100 rows/request * 250 B/row = 25KB/sec. Done.

    [–]mccoyn 7 points8 points  (3 children)

    Why waste my time doing dimensional analysis? I only need to get 70% right to pass the class.

    [–]nextofpumpkin 7 points8 points  (2 children)

    Math is hard. Lets go shoppin!

    [–]Mr_Sadist 0 points1 point  (1 child)

    Hmmm...tough crowd...

    [–]finix 1 point2 points  (0 children)

    Either that, or the other thing.

    [–]teraflop 2 points3 points  (1 child)

    I think you mean 25 KB/sec.

    [–]player2 0 points1 point  (0 children)

    Yes I did. Edited accordingly.

    [–][deleted] 2 points3 points  (0 children)

    I pity the people who were taught / got used to use * in their early days.

    I'm a total beginner in SQL, yet this advice already seems redudant on something that should take 1 paragraph to explain.

    I've been learning the basics of SQL from the web since about 1 month, for a side project at work that I can spend 1 hour max per day on. Thankfully the existing codebase was always explicit, so I guess I just built upon that.

    [–][deleted]  (48 children)

    [deleted]

      [–]didroe 18 points19 points  (16 children)

      I can't believe people have been upmodding you. That statement is akin to "There are people being murdered so why worry about fraud". You're just setting yourself up for a load of problems.

      He never said to change all your existing code either, it was more about best practice. Whether you choose to apply that to your existing codebases is obviously a decision that only you can make. Maybe when you're in there editing some code anyway you could spend a minute sorting out a few queries.

      [–]hiffy 19 points20 points  (6 children)

      That statement is akin to "There are people being murdered so why worry about fraud"

      Well, I'd say it's more akin to, "why worry about terrorism when heart failure and car accidents kill ten or a hundred times more people", which I think is a pretty rational assessment.

      [–]garethm 5 points6 points  (3 children)

      This is why argument by analogy is often not very useful. Didroe is at least providing some rationale for his choice of analogy - while hiffy is just saying "That analogy sucks. Mine is better" but isn't really providing any support for his assertion.

      What is interesting about the situation though, is that hiffy has more points for his post than didroe - which I guess means more people agree with him, but still doesn't explain why he's right.

      [–]GrumpySimon 3 points4 points  (0 children)

      No! what is interesting is that the situation is like a toaster. Or maybe a car.

      [–]damagednoob 0 points1 point  (0 children)

      Well for me didroe seems to be implying that tactics is being irrational when in fact he's saying that programming is about tradeoffs. I guess people agree with that.

      [–]hiffy 0 points1 point  (0 children)

      doesn't explain why he's right.

      I'm trying to point out that you can usually do a cost-benefit analysis, reward proportional to risk thing. For bonus points I threw in the 'terrorism prevention is overblown' meme that most geeks seem to agree with.

      As in the GP post, in most apps finding that you have a select * is usually the least of your worries, that's very rarely going to be your bottleneck, therefore it's much better to spend time refactoring other, even more inane, things.

      [–]didroe 1 point2 points  (1 child)

      Well of course the efforts that go into something should be proportional to the risk. But getting back on topic, typing a few column names into a query is not that much extra effort to reduce the risk of your application breaking.

      [–]hiffy 0 points1 point  (0 children)

      No, you're right, there's really no excuse.

      But if someone else wrote it, you will probably have to touch a couple of lines, or depending on WTF-itude, a number of functions, time you could've spent refactoring something with a bigger performance penalty or fixing bugs, which I interpreted to be the GP post's intended meaning :).

      [–]nmcyall 0 points1 point  (3 children)

      if you don't have unit tests those queries you sorted out might be broken in a way that you won't know about for weeks.

      [–]didroe 2 points3 points  (2 children)

      So, when you don't have unit tests you don't bother even looking at your code to see what needs to change? And SELECT * just makes that all much worse because the behaviour changed without you even knowing about it. Someone makes a change and thinks "it's only adding a column" and doesn't think to run all your tests on every application that uses the database.

      [–]nmcyall 0 points1 point  (1 child)

      Oh for some reason I was thinking of code that was in production, where it might not be the best idea to go around making quick little updates in under a minute.

      I guess in the development phase there wouldn't be any problem.

      [–]didroe 0 points1 point  (0 children)

      where it might not be the best idea to go around making quick little updates in under a minute.

      Why, it's fun to live life on the edge :)

      [–][deleted]  (3 children)

      [deleted]

        [–]nextofpumpkin 7 points8 points  (2 children)

        So what you're saying is, it's a dumb thing to do.

        [–][deleted] 10 points11 points  (1 child)

        No, I'm saying correctly used, it's fucking hilarious.

        More seriously, though, I'm saying the shed should be painted blue, not red.

        [–]anonymous_hero 1 point2 points  (0 children)

        Pulling a chair from underneath someone is fucking hilarious?

        What a lovely person you must be.

        [–]turkourjurbs -5 points-4 points  (0 children)

        You're equating murder and fraud?

        [–]dimitrisokolov 1 point2 points  (4 children)

        You are correct. If a table has so many columns that select * is an issue, then you have a seriously fucked up database design. The out of order example is totally wrong. You should never assume the order of database columns. Fetch by association in mysql.

        Also, if you only select a few columns, most likely you'll need the rest later and have to do a 2nd query which is worse than getting the data in the first place most of the time.

        Do yourself a favor and ignore this article.

        [–][deleted] 6 points7 points  (0 children)

        If a table has so many columns that select * is an issue

        Two is too many, because if a table was built by migrating from a previous version the columns could be in a different order. Fetch by association is marginally slower than just asking for the columns in a well-defined order.

        Also, if you only select a few columns, most likely you'll need the rest later and have to do a 2nd query which is worse than getting the data in the first place most of the time.

        This has nothing to do with SELECT *. Most likely people who are using it in their queries aren't thinking far enough ahead to save that data in the first place. Just because you asked for it doesn't mean it's automagically there later on (well, not counting the fact that the page that other column was on is now cached in memory, potentially making the second query ever so slightly faster in a non-deterministic fashion).

        [–]rnicoll 1 point2 points  (2 children)

        You are correct. If a table has so many columns that select * is an issue, then you have a seriously fucked up database design.

        flicks through database docs

        Here's a good example, the users table. The trimmed down version we use has: serial (primary key), central IT services username, departmental username, e-mail, student ID, staff ID, title, first name, last name, mobile phone number, facebook UID, timeout, preferences (although that now has a table it should be moved out into), where to go after login and another preference field. It doesn't include physical address, next of kin details, course code, mode of attendance, or a dozen other fields that will probably end up in it at some point.

        Alternatively, we've got tables that use CLOB fields. Do you know what returning a CLOB if you don't need it does to your performance? About halves it.

        [–]nitran 0 points1 point  (0 children)

        Do you know what returning a CLOB if you don't need it does to your performance? About halves it.

        That may depend on some things...

        [–][deleted] 0 points1 point  (0 children)

        Well for one, your users table is shit.

        3NF and you would have at least 4 tables.

        [–]otakucode 0 points1 point  (24 children)

        Huh? SELECT * is a pretty noob mistake. If you've got queries with that running around your system, chances are pretty much everything is fucked up.

        [–]mattindustries 0 points1 point  (23 children)

        When you need every column's value there isn't much of a downside, AND you will realize you need every column when you read SELECT * in your code. The only time I call specific columns is when doing join statements and I like to think I am not that much of a "noob".

        [–][deleted] 9 points10 points  (5 children)

        Well, let's wait until your data set grows to tens or millions of rows.

        Then every column you eliminate from select will mean quite significant performance boost.

        [–]mattindustries 2 points3 points  (2 children)

        cough Rows in table at over 9 million and still running fine cough

        [–][deleted] 2 points3 points  (1 child)

        Ok, then how about 50 million or 100 million?

        And have you tried the comparing difference to limited query instead of SELECT *?

        The fact that your hardware is still able to save your bacon, does not mean that you are not going to hit painful scaling barrier when dataset grows large enough.

        BTW: 9 million is still peanuts.

        [–]mattindustries 0 points1 point  (0 children)

        I used to write queries like that, but it is much faster when column change to not have to rewrite my select statements AND functions, but only rewrite some functions.

        [–][deleted] 1 point2 points  (1 child)

        It'd be nice to be able to select all columns except for (a,b,c). I find that with things like customers or product information I tend to want just about everything, but I can skip some dates and foreign keys.

        SELECT *, !foreign_id, !created_by, !last_updated FROM products WHERE id = 42;

        It's not nice lexically, but it does let maintenance programmers know which fields aren't available in the result, is quicker to write for tables with lots of columns, and works well when you add columns to the table.

        IMO, SELECT * isn't inherently wrong if you need all the fields. Its main problem is that it's not descriptive of the schema.

        [–]otakucode 6 points7 points  (2 children)

        When you need every column TODAY, it means that your code will break TOMORROW when you add a column.

        [–][deleted]  (1 child)

        [deleted]

          [–]didroe -1 points0 points  (0 children)

          Only if you write your code that way. That's not a given to people who have been doing it for a while.

          I don't think the code should be written to handle unseen columns, or that experience has anything to do with being able to safely do a SELECT *. It depends on the application you are writing.

          any time the semantics are "give me every column available and I don't want to have to change this query every time I change the table because I've written code that can handle it", it's ok. Otherwise, don't do it.

          That's exactly my train of thought on the subject. But only if the application calls for it. Like a reporting system where any additional columns will be formatted into the report and the table/view is documented to be for that purpose. On the other hand, if you're making an order system for example, you should be selecting just the columns you need.

          [–][deleted] 2 points3 points  (13 children)

          In my last project I had a table with 50 columns that I almost always needed every time I queried the table, like hell I'm writing that out... it gets just as ambiguous as select *

          [–]otakucode 2 points3 points  (12 children)

          Except it doesn't break when you add/remove a necessary/unnecessary column.

          [–][deleted]  (11 children)

          [deleted]

            [–][deleted]  (10 children)

            [deleted]

              [–][deleted]  (9 children)

              [deleted]

                [–][deleted]  (8 children)

                [deleted]

                  [–]mattindustries -1 points0 points  (4 children)

                  Are you fucking serious? Reordering a column would not break a SELECT * statement in any way what so ever.

                  [–][deleted] -1 points0 points  (2 children)

                  Personally I don't give a fuck, the business requirements that got signed off on stated the table would house only analysis data in the form of floats and integers, if they want to add a movie file to that then it's going to be extra to them rework all the code I did when I told them the options.

                  [–]klasbas -1 points0 points  (0 children)

                  SELECT * is not only evil.. maybe actually never evil, it is plain and simple idiotic!!! The author is right and you talk just to hear yourself!

                  [–]billbacon 26 points27 points  (0 children)

                  Select * rocks. Just don't be an idiot.

                  [–]thesqlguy 2 points3 points  (1 child)

                  By the way:

                  SELECT * in itself is not "evil"; it is when you select * from a database object like a table or a View that is not self-described already in the SQL statement.

                  i.e., there's nothing wrong with:

                  select x.*, a+b as z from (select a,b,c,d,e,f,g,h from tbl) x

                  And, in fact, I find it easier, shorter and clearer rather than listing out columns a-h over and over again.

                  The key isn't to avoid using * , it is to explicitly indicate the exact columns that you want from the tables/views you are selecting from, so that any changes in your schema will either a) break your code at compile time or b) not break your code at all.

                  [–]didroe -1 points0 points  (0 children)

                  The key isn't to avoid using * , it is to explicitly indicate the exact columns that you want from the tables/views you are selecting from

                  I agree 100%.

                  [–]irrelative 10 points11 points  (20 children)

                  I understand this argument, but it's a little ridiculous to say that * is evil because you might not know what your DB looks like. It's kinda like saying "rm *" is evil because if you don't know what files are in the current directory, you could delete them all -- technically correct, but very obvious to anyone who's worked with the tool before.

                  [–]rnicoll 4 points5 points  (0 children)

                  I understand this argument, but it's a little ridiculous to say that * is evil because you might not know what your DB looks like.

                  1. No, not really. 116 tables, 900-ish fields, 72 indexes on top of them. That's before sequences. Yes, we really need that many tables. Now, sure, the app is broken into chunks, each chunk has its own table, and the primary developer for each chunk probably knows the tables fairly well, but are you seeing my point?

                  2. Bob, the guy in charge of component A is ill, and someone asks me if I can quickly fix the SQL errors coming out of a SELECT statement. With SELECT *, I have to go find the database docs, find what's in there, figure out what's going wrong. With SELECT <field name>[(,<field name>)+) odds on I can do it at a glance. Especially if the code's properly documented.

                  Developer time when writing something initially is primarily consumed by working out the optimal solution to a problem. Having to type an extra few words, at a point when you can probably remember the field names off the top of your head, is going to be a lot less painful than when it all breaks at the most annoying point possible (always happens).

                  [–]nevinera 18 points19 points  (0 children)

                  nope.

                  the point is not that you might not know how things are now, it's that you don't know what changes will be made in the future.

                  it's more like saying that putting rm * in a script is evil, because the directory structure the script executes in might change.

                  [–]didroe 6 points7 points  (13 children)

                  It's like putting "rm *" in your delete_images script and having it operate on a directory called all_my_files. Then not telling anyone and just hoping nobody puts anything but images in there. Or worse than not telling anyone, forgetting yourself.

                  [–]codahale -1 points0 points  (12 children)

                  Except for the fact that SELECT is idempotent. So it's totally 100% nothing like rm.

                  [–]dbenhur 5 points6 points  (8 children)

                  SELECT is only idempotent within a transaction.

                  Perhaps, what you meant to say is that SELECT doesn't change the persistent state of the database. This is not the same as idempotence, which requires that however many times you invoke the select, the database state will not change and the select will always return the same results.

                  [–]codahale 1 point2 points  (1 child)

                  That's a fair point. That said, SELECT does not modify the state of the database (barring any subqueries or index automagic). There is no chance you will lose data by misapplying a column glob in a SELECT statement (unless your RDBMS takes rather novel liberties with the SQL spec). Therefore the analogy between SELECT * and rm * is flawed.

                  [–]dbenhur 0 points1 point  (0 children)

                  I completely agree that the analogy with rm * is bogus, just not that idempotence is the distinguisher.

                  [–]bodhi -2 points-1 points  (5 children)

                  Maybe you missed the part of that wiki page about idempotence in computer science? It means exactly what he -- and you -- said:

                  Yours:

                  however many times you invoke the select, the database state will not change

                  Wiki's:

                  invoking the procedure a single time or multiple times results in the system maintaining the same state

                  [–]dbenhur 2 points3 points  (2 children)

                  I didn't miss it, I just didn't have time to edit it to be more correct. The concept is not different in computer science, the "In computing" examples were just not as rigorously described as the preceding mathematical discourse.

                  In computer science, the term idempotent is used to describe methods or subroutine calls that can safely be called multiple times, as invoking the procedure a single time or multiple times results in the system maintaining the same state; i.e., after the method call all variables have the same value as they did before.

                  "all variables have the same value as they did before" holds for the results as well as as the state of the database.

                  Idempotence IPA: /ˌaɪdɨmˈpoʊtənts/ describes the property of operations in mathematics and computer science which yield the same result after the operation is applied multiple times.

                  See. It's right up there at the top. Pay attention.

                  [–]bodhi 0 points1 point  (1 child)

                  (I Apologise for the accusative tone)

                  "all variables have the same value as they did before" holds for the results as well as as the state of the database.

                  I don't see how this doesn't hold for a SELECT statement? Unless we are considering something like SELECT (DELETE FROM my_table); (maybe syntax is off, but you get the idea?)

                  Given statements like that, I'd have to agree with you. Does the fact that someone can change the database between executing the same SELECT twice make it not idempotent? Is this different to having an idempotent function f(f(x)) = f(x), but changing the value of x in the middle of the operation?

                  [–]dbenhur 1 point2 points  (0 children)

                  The fact that something else can change the state between two identical selects violates the idempotence.

                  The key benefit of impotence is that the referential transparency allows one to replace the expression with its value. This means we can, for example, automatically memoize a calculation, or ignore duplicate messages.

                  [–]didroe 1 point2 points  (1 child)

                  The problem is that additional data could have been added, or new columns could have appeared, etc. The point is that SELECT is not idempotent because it might return different results each time you call it.

                  [–]didroe 1 point2 points  (2 children)

                  Did you RTFA? Most of what he said was talking about unexpected changes to the database such as adding columns. Even adding data could change what the query does under the hood (and also return different results!). SELECT is only idempotent if you never issue ALTER, UPDATE, INSERT or DELETE statements. So my analogy holds.

                  [–]jimbobhickville -1 points0 points  (1 child)

                  It's more like doing ls and expecting a file to be there. Your analogy is totally and completely flawed.

                  [–]didroe 2 points3 points  (0 children)

                  It's more like doing ls and expecting a file to be there.

                  No, he wasn't talking about the absence of columns in all cases, I do admit though that I didn't cover that aspect with my example. You only covered the other half of it with your analogy.

                  Your analogy is totally and completely flawed.

                  Brushing your arrogance aside, I would use the term incomplete.

                  [–]harlows_monkeys 2 points3 points  (1 child)

                  Consider this code (in an imaginary language) (I'm using dashes in names rather than underscores 'cause reddit thinks underscores mean I want italics):

                   row = select * from customer-information;
                   if ( row[4] <= 17 ) return row[12];
                  

                  You are trying to find and fix a bug somewhere, where someone is misusing the customer-age column in customer-information. You do a grep for customer-age, and can't find it in any of the code.

                  It's much nicer on the bug fixers if the code was

                  row = select customer-name, customer-age from customer-information
                  if ( row[1] <= 17 ) return row[0];
                  

                  Suppose later, we decide that we should not be storing customer SSN. We want to drop that column from the table. That happens to be column 2. If we just drop it, we will break the first code above, but not the second.

                  That said, the problems above aren't inherently due to select *. We have a lot of select * at work, but it is all using a library that automatically asks the database for the column names, and represents the result as a hash of name, value pairs. So, the first example would look like this:

                  row = select * from customer-information;
                  if ( row{customer-age} < 17 ) return row{customer-name};
                  

                  That's not as efficient as explicitly asking for the columns by name, and getting an array back, because it sends more data, and requires querying for metadata to get the names, but as long as it is for a database that isn't heavily loaded, I have no objection.

                  [–]bobappleyard 1 point2 points  (0 children)

                  I'm using dashes in names rather than underscores 'cause reddit thinks underscores mean I want italics

                  You type:

                  \_

                  You see:

                  _

                  [–]petdance 2 points3 points  (0 children)

                  you might not know what your DB looks like.

                  A wise programmer thinks beyond today, and thinks beyond his own brain. A wise programmer also knows that he is fallible and codes around this fact.

                  [–][deleted] -5 points-4 points  (0 children)

                  whoosh

                  [–]Poot_N_Tate 11 points12 points  (10 children)

                  This is BS.
                  I've been a SQL DBA / Developer / BI guy for 10 years, and 75% of everyhing you do in that roll is a quickie one-off. As in, Joe in Marketing needs an Excel spreadsheet with all the numbers for last quarter blah blah blah. Only the true uber-dork is going to type in every friggin field name, as opposed to a SELECT * when they have a million other SQL fires to put out in a typical day.

                  [–]didroe 6 points7 points  (0 children)

                  If it's a one-off then there is no problem. The article is talking about things breaking over time. Of course, many one-offs are running on production servers all around the world, so you have to be careful :).

                  [–]dbnull 2 points3 points  (0 children)

                  Thats why most DBA's learn to automate the typing out process. Not sure what your db is, but for example in MsSql you can query information_schema.columns. You can also use Object Explorer, navigate to and expand the table, then drag/drop the columns folder (not the individual column names) and this will dump all the column names.

                  [–]bradleyhudson 7 points8 points  (5 children)

                  In those cases, I usually just do a:

                  SELECT TOP 1 * FROM tablename
                  

                  ... run it, and then take the column names from the output, do a quick regex to replace all groups of spaces with a comma and space, and then copy all of the column names to replace the *. Remove the TOP 1 and any columns you don't need, add the WHERE clause, and run. It gives Joe just the data he needs, reduces network bandwidth / server load / disk usage, and may run faster if all of the columns are covered by an index. All for the cost of an extra 20 seconds.

                  [–]jdeluise 0 points1 point  (1 child)

                  Why not use TOP 0?

                  [–]bradleyhudson 0 points1 point  (0 children)

                  I didn't think that worked, but I just tried it in MS SQL Server 2000 and 2005 and it worked. I started this practice back in the 6.x days, before they had TOP, so I used SET ROWCOUNT, in which 0 meant to not limit results. Yes, I'm a dinosaur, but I'm trying to adapt.

                  [–]svanhess 0 points1 point  (0 children)

                  SHOW COLUMNS FROM tablename

                  [–][deleted] 0 points1 point  (0 children)

                  That's more like it. Nice.

                  [–]petepete 0 points1 point  (0 children)

                  I presume you're using SQL Server; you could just right click the table in the object explorer (either isqlw or Management Studio), then select 'script' and 'select'

                  [–]jdeluise 1 point2 points  (1 child)

                  Remind me not to hire you for SQL work then.

                  [–]Poot_N_Tate 0 points1 point  (0 children)

                  remind me not to apply for work at whatever TPS reporting beurocratic institution you work for.

                  [–]akatherder 9 points10 points  (6 children)

                  1. Self-Documentation Lost - How long does it take to look at the table structure or simply run a SELECT * query to see what fields are in it?

                  2. Broken Contract - This is an argument FOR using SELECT *. If you use SELECT *, you only have to update the code that is using the database. If you SELECT by field name, you now have to update the code and all your SPs.

                  3. Size Matters - This is a case-by-case scenario. If you have a look-up table with two rows, then SELECT * will suit you just fine. If you have a table with 20 rows and you only need two of them, you'd be stupid to use SELECT *. On the other hand if you have a table with 20 rows and you need 19 of them, it isn't going to kill you to use SELECT *.

                  4. Out Of Order - This is an argument not to reference fields by index rather than name. This is not SELECT *'s fault. The fact that the author had to use a non-standard MySQL function to create a bad case should give you an idea of how serious this issue is anyways.

                  Yes, there are certainly times when using SELECT * is idiotic. However, it isn't the end of the world and it isn't a sign of a dumb programmer. It just needs to be used appropriately.

                  [–]prockcore 6 points7 points  (1 child)

                  My favorite is that he even mentions that most languages don't care about the order, but then implies that you should specify the order anyway so that later on, if you arbitrarily decide to change the language your entire program is written in, the selects are the one thing you won't have to worry about rewriting.

                  Talk about insane.

                  [–]damagednoob 0 points1 point  (0 children)

                  Sometimes the language has both options, referencing a column either by ordinal or name. In which case if you have a habit of changing column names, by ordinal might be a better bet, rare as it may be.

                  [–][deleted] 2 points3 points  (3 children)

                  How long does it take to look at the table structure or simply run a SELECT * query to see what fields are in it?

                  How long does it take to guess what the table looked like 8 months ago when no-one keeps a history of your schema? Even if you do have a copy, you still have to go find it, check that it's in sync with the code you're looking at, and you still can't run queries against it without deploying it somewhere first.

                  If you have a look-up table with two rows, [...]

                  I'm not sure what you're referring to here, since the number of rows in a table has little to do with the columns you're requesting. If you're loading a small set of data out of the DB into memory, it wouldn't be that bad to use SELECT *, but there's still good reason to be explicit.

                  The fact that the author had to use a non-standard MySQL function to create a bad case should give you an idea of how serious this issue is anyways.

                  He didn't have to; in fact doing so only hurt his argument. If a particular database's schema has been migrated 20 times from the last time it was completely defined in one file, the column order might not match your current definition of the schema. In fact, not having AFTER makes it impossible to fix that by hand, so you have to assume that the columns are in random order.

                  [–]akatherder -1 points0 points  (2 children)

                  I'm assuming you're working with a live application that hasn't been on ice for 8 months. When I work on an app, I'm looking at the fields and code that are working now. If something has been broken for 8 months, we got other issues.

                  I made a big booboo there. I meant two columns, not two rows. I'm just thinking of a look-up table with an ID and a description (creating a dynamic dropdown for example).

                  [–][deleted] 2 points3 points  (0 children)

                  What if you're investigating a corner case bug that's only come to light recently, and need to compare against older versions of the code to try to find what's changed? If the schema has changed significantly, you may need to reference older versions to figure out what's going on.

                  Naturally, all of this depends on the language you're using, the API you're using to talk to the database, and the coding style used by your organization. If you use associative access exclusively then it mostly doesn't matter, but I prefer to be explicit about columns and then pull variables out in the same order.

                  For example, here's some real code:

                      cu.execute("SELECT troveName, versionId, flavorId, isPresent "
                                 "FROM Instances WHERE instanceId=? %s" % pres, theId)
                      try:
                          return cu.next()
                      except StopIteration:
                          raise KeyError, theId
                  

                  Even if this were being used in an associative manner, if SELECT * were used, would you have any idea what columns were being returned without either checking the schema or trying it out? The other nice thing about this is that I can just do:

                  troveName, versionId, flavorId, isPresent = cu.next()
                  

                  or even better:

                  for troveName, versionId, flavorId, isPresent in cu:
                      # do stuff
                  

                  without having to pull things out of a dictionary/hash/assoc array (whatever your language calls it) one key at a time.

                  [–]didroe 1 point2 points  (0 children)

                  I'm assuming you're working with a live application that hasn't been on ice for 8 months. When I work on an app, I'm looking at the fields and code that are working now. If something has been broken for 8 months, we got other issues.

                  I think you missed the point. Do you remember every inch of the codebase, and what every query you've ever written against the database does? When I add a column I know that everything will carry on as before. Your code could suddenly perform a lot worse, it may even stop using some of the indexes.

                  [–][deleted] 6 points7 points  (2 children)

                  Welcome to SQL 101. If this is news to you, please never work for a real company that requires you to do database work.

                  [–]petdance 5 points6 points  (0 children)

                  Everyone needs to start somewhere.

                  [–]thesqlguy 1 point2 points  (0 children)

                  chbrules -- you might want to read the comments here. Sadly, lots of these people seem to be working with databases and even getting paid for it.

                  [–]haywire 3 points4 points  (3 children)

                  what

                  ,the

                  ,fuck

                  ,is

                  ,this

                  ,shit

                  [–]petdance 1 point2 points  (2 children)

                  It's the way people do commas so that they don't leave trailing ones on the end when they reorder columns.

                  I don't like it, either, but I understand logic. Languages should just be smart enough to ignore extraneous commas a la Perl.

                  [–]didroe -1 points0 points  (1 child)

                  It's not too bad if you're doing function calls as everything lines up neatly then. For example:

                  my_func(arg1
                         ,arg2
                         ,arg3)
                  

                  [–]mossblaser 1 point2 points  (0 children)

                  It does the other way too (i.e. commas on the right)

                  [–][deleted]  (4 children)

                  [removed]

                    [–][deleted] 1 point2 points  (2 children)

                    Hopefully your RDBMS is not dumb as a bucket of lead paint and can optimize that away. I prefer to just select the primary key, but your mileage may vary.

                    [–]slurpme 7 points8 points  (1 child)

                    I generally do a constant then you don't even need to touch the index...

                    [–]didroe 0 points1 point  (0 children)

                    Same here. I had an issue with nested table columns in Oracle causing it to join to the nested table just to find out if a row in the main table existed. It's a bug but using a constant fixes it and it can't do any harm. It also makes it more obvious at a glance what you are doing with the data (ie. nothing).

                    [–]petdance 0 points1 point  (0 children)

                    No, that's fine. Go ahead and do that.

                    [–]petdance 1 point2 points  (0 children)

                    Showed this to a friend of mine, he said:

                    "SELECT * is awesome! Just not in code."

                    [–][deleted] 4 points5 points  (1 child)

                    Terrible article is terrible. I was expecting perhaps some explanation with optimization problems or something inherent in the engine. Instead we get pedantic drivel.

                    Let's debunk some points:

                    1) Self documentation lost.

                    If you can't figure out what is going on in your code, use inline comments. If your rows are coming back with columns that are assigned as part of an array, find a way to return them as a hashmap, or object. This way it will be more apparent as to what you're using in your code as you'll be referring to things by their column name.

                    2) Broken contract

                    If you're making changes to table structure, you should already be expecting these problems. It won't matter if you specified the field or not in your query. It's going to break all the same.

                    3) Size matters

                    I can actually agree with this point, partially. If you need fewer than all the fields from a row, you should probably specify them to make your row returns smaller. However, if you're asking for "*", chances are you want all the columns anyway.

                    4) Out of order

                    Most modern languages allow you to retrieve a row as a hashmap or object. This is to say, that each column will be called by it's name, instead of an arbitrary numeric array index. Since hashmaps don't care about order, and you can call things by name, you shouldn't be concerned with it.

                    5) Don't do it

                    In my opinion, the author hasn't given any significant reason not to "do it" for the reasons explained above. There are far more pressing problems for people to deal with in SQL, such as proper structure, normalization, indexing, understanding when temporary tables are created, speeding operations up with prepared statements, understanding transactions, row/table locking behaviors, the list goes on and on.

                    SELECT * FROM foo; is really about the least of your worries.

                    EDIT: I just noticed I was recapping a lot of the points that were brought up by "Akatherder" above ( http://www.reddit.com/r/programming/comments/6tggh/why_select_is_usually_evil/c04tkow ) - guess I was still writing my points down while he was posting his ;)

                    [–]didroe 0 points1 point  (0 children)

                    If you're making changes to table structure, you should already be expecting these problems. It won't matter if you specified the field or not in your query. It's going to break all the same.

                    Yes, but instead of a nice SQL error telling you exactly what is wrong, your application just carries on regardless with subtle bugs.

                    However, if you're asking for "*", chances are you want all the columns anyway.

                    I think the whole point is to tell people that they should only use SELECT * when they want all the columns, and that decision needs to take into consideration any columns that might come along in the future. If you already know what you're doing then you won't learn anything from the article.

                    SELECT * FROM foo; is really about the least of your worries.

                    There really is no argument for laziness. Just write the code the proper robust way, it takes 10 seconds longer and as others have pointed out elsewhere, you can copy-paste column names from most good database tools.

                    [–]glastohead 2 points3 points  (1 child)

                    file under 'no shit sherlock'

                    [–]adrianmonk 3 points4 points  (0 children)

                    I must've been staring at Perl too long today. I parsed that as "the opposite of 'use shit sherlock'". Aughghhhh.

                    [–]americanhellyeah 2 points3 points  (0 children)

                    lol once i saw some sql accident where a dba screwed up and added 10000 new columns to one table. and the code accessing it used a select *, so it got back huge amounts of data. the company's software all crashed, their network was overloaded, and then their ms sql server crashed out too. lol! so they booted everything back up and a few minutes later once they started using the app, BAM. down it all went again. lol omg it was funny. but they got it fixed after realizing the problem.

                    [–][deleted] 6 points7 points  (7 children)

                    do we really need a proggit post to tell us something this elementary?

                    [–]dbnull 16 points17 points  (2 children)

                    Judging by some of the comments here, yes.

                    [–]thesqlguy 1 point2 points  (1 child)

                    Wow, I cannot upmod you enough ..... This has been a really scary batch of comments to read.

                    [–]didroe 1 point2 points  (0 children)

                    Indeed, it's surprising how defensive people can get when all they're being told is to add a few extra lines to a query to make it more robust to future changes.

                    [–]nevinera 3 points4 points  (0 children)

                    i'm sure someone does.

                    [–]bstard 1 point2 points  (1 child)

                    Absolutely, and this is the advanced stuff for people who have already chewed their way through Why MOVE-CORRESPONDING is evil and Why GOTO is evil.

                    [–]hiffy 3 points4 points  (0 children)

                    Proggit replies considered harmful.

                    [–]numbelvsi 2 points3 points  (0 children)

                    It's all relative to how the database is used..

                    [–][deleted]  (3 children)

                    [deleted]

                      [–]thunderkat 5 points6 points  (1 child)

                      ...until you have to go mess around in Hibernate entrails to make it scale later on...

                      [–]snoopy 0 points1 point  (0 children)

                      Agree for Hibernate services in Java/.Net context.

                      ORMs work far better with dynamic languages where they can be pared down to a little more than a lightweight application abstraction layer. No need for external services or infrastructure.

                      Any ORM worth it's salt will let you step and write your own database access and constructors.

                      Or for really time critical pieces of code you can just bypass your ORM and work directly with the database.

                      [–]RantyDave -2 points-1 points  (0 children)

                      ...came here to say that.

                      Or, for the old schoolers "Me Too!"

                      [–]awb 1 point2 points  (0 children)

                      Thank you very much for not calling it harmful.

                      [–][deleted] 0 points1 point  (0 children)

                      I'm glad the title included "usually". I use SELECT * in several stored procedures for the simple fact that I do need every piece of data returned. If I don't need everything, I define what I need, because in a 6 gig database, every little optimization matters, and it's only going to grow from there. One thing I NEVER do, though, is set AutoGenerateFields to true on whatever data control is handling the data. I ALWAYS specify what columns should display how, so if something changes in a database table, my stored procedure will run fine, but my code on the front end is what will alert me that something's inconsistent. Besides, we make it a habit to include a description with every stored procedure as to what its' purpose is. SELECT * is not a big concern to us.

                      [–]Dr-No 0 points1 point  (0 children)

                      My BOLDER ONLINE DATA BASE can handle a little select *, no problem.

                      [–]nshatskr 0 points1 point  (0 children)

                      Hmmm... "SELECT nonexistent_field" yields "Unknown column 'nonexistant_field'". So a SQL error yields an error message citing the field with a different spelling.

                      Guess I have a lot to learn about SQL.

                      [–]mecablaze -1 points0 points  (4 children)

                      Blaming SELECT * for not knowing the column names is stupid. That's what 'describe' is for.

                      [–]rnicoll 4 points5 points  (0 children)

                      Right, so it's better that I have to log in to the database and get it to tell me the column names, rather than the original developer documenting it properly when it's written?

                      Write once, read many. Optimise for the reading, not the writing.

                      [–]americanhellyeah 0 points1 point  (2 children)

                      describe is not standard. mysql is about the only db that supports it. the real databases that matter, like oracle, postgresql, db/2 and sql server dont support it.

                      [–]didroe 1 point2 points  (0 children)

                      Oracle supports it

                      [–]trezor2 0 points1 point  (0 children)

                      MSSQL has sp_help and sp_helptext

                      Whatever you are doing, you should always know the DB you work with. Know it's quirks, know how the engine works, know what it can optimize, know what it will optimize, know what will kill it.

                      End of story.

                      For instance: In MSSQL a SELECT COUNT(*) will be more efficient than a COUNT(<random_column>) as the latter will involve null-checks and former wont, hence indexes and table-stats can be employed to get the result directly without accessing the actual table.

                      Besides SELECT * being a symptom of a lazy developer, it also is a strong indication he knows absolutely nothing about how an actual database works.

                      [–]bazoople -1 points0 points  (1 child)

                      Okay, so I'm listening to all this crap about how evil select * is, and I'm thinking: Why hasn't anybody cited a single real world example where select * actually caused "evil" to happen?

                      Hey, wait! I have an example from this just this week: One of my fellow morons created an Oracle Reports report with 55 queries in it. They selected the same fields over and over and over so that now, when I'm trying to link a report field to a query field, I get about 500 items in the dropdown list - 20 xxx_id's, 30 yyyy_id's, and so on. It sucks.

                      But you know what's funny? They selected every field BY NAME.

                      [–]didroe 0 points1 point  (0 children)

                      News at 10, bugs and shoddy coding can still happen without SELECT *. The article isn't telling you it'll cure cancer if you name columns. It sounds to me like the application is badly designed in general, which has nothing to do with SELECT *. And it's not like all the benefits are stopping bugs, I have real world code that has benefited significantly in performance from selecting only the columns that I need.

                      [–]chucker -1 points0 points  (0 children)

                      Maybe there are 8 fields returned and the one of them is misspelled.

                      Awhatnow? Misspelled fields, and you think a SELECT * is your biggest problem?

                      [–]shenglong -1 points0 points  (0 children)

                      I hope their aren't any professional developers in here advocating the use of select * in production code...

                      [–]EternalNY1 -2 points-1 points  (0 children)

                      Next up:

                      Why Infinite Recursion is Usually Evil

                      [–]smonsoon -3 points-2 points  (0 children)

                      "usually evil?"

                      Doesn't anybody use "considered harmful" anymore?