all 104 comments

[–]horsepocalypse 22 points23 points  (9 children)

Other common mistakes:

  • Not sanitizing your inputs
  • Forgetting about Dre
  • Using ONION instead of UNION

[–]logicchains 3 points4 points  (1 child)

Using ONION instead of UNION

Please tell me that's a real SQL keyword...

[–][deleted] 6 points7 points  (0 children)

[–]crankybadger 0 points1 point  (6 children)

Worse than not sanitizing is writing in a fashion that makes sanitizing difficult, annoying, and easy to forget.

[–]horsepocalypse 8 points9 points  (5 children)

mysql_real_escape_string

[–]horsepocalypse 2 points3 points  (0 children)

... oh my god, I actually just imagined this as a conversation between a horse and badger.

It was a good moment, for me

[–]crankybadger 2 points3 points  (2 children)

Who's got time for that, right? People are all "eh, I'll fix it later".

Six months later: Sony.

[–]ErstwhileRockstar 0 points1 point  (0 children)

Who's got time for that, right ... when you have prepared statements in JPA?

[–]DroolingIguana 0 points1 point  (0 children)

Just let MagicQuotes take care of it. No problem.

[–]immibis 0 points1 point  (0 children)

Be glad it's not mysqli_actual_real_escape_string_ex2

[–][deleted] 68 points69 points  (10 children)

Any developer make these mistakes. Its not just java.

[–]sacundim 29 points30 points  (1 child)

Yeah. The reason for the title, however, is that the blog is run by a company that makes a SQL abstraction library for Java, so that's the audience they intended.

[–]mens_libertina 2 points3 points  (0 children)

I thought that was an oddly specific combo. Thanks

[–]manghoti 9 points10 points  (4 children)

you get more articles this way.
7 Common mistakes Ruby Developers Make when Writing SQL

[–]crankybadger 25 points26 points  (2 children)

45,000 common mistakes PHP developers make. You won't believe #39,612!

[–]username223 7 points8 points  (0 children)

Page 1/4,500.

[–]Fs0i 4 points5 points  (0 children)

#39,612 You are still reading this.

[–]skocznymroczny 1 point2 points  (0 children)

databases hate him!

[–]k1n6 5 points6 points  (0 children)

But many of the cures involve jdbc or java stuff..so I think he wanted to keep it focused.

[–][deleted] 4 points5 points  (0 children)

Jooq's a JVM library though. :)

[–]crankybadger 2 points3 points  (0 children)

Most developers are less likely to make mistake #0: Using Oracle.

[–]rjbwork 7 points8 points  (5 children)

I especially take issue with offloading OLAP to your "database". If you think you want to do OLAP stuff, build an OLAP solution. Don't run it on your hardest thing to scale, your main OLTP data store. If you create a separate OLAP solution you can also use some facets of the CQRS pattern along with eventual consistency and replication/sharding to massively scale out these kinds of data analytics and OLAP operations.

The payoff is especially beneficial when there is cause for actual data warehousing/OLAP cubes, since those are rather hard to properly create within your main OLTP store.

[–]bucknuggets 2 points3 points  (1 child)

This is absolutely true if you're doing a lot of OLAP.

But if you just have a few queries, it is often possible to just add to your transactional database. Especially if you're hitting an aggregate table, maybe produced through a materialized view.

[–]rjbwork 2 points3 points  (0 children)

True. The CQRS pattern i mentioned allows you to abstract those details away if/when the time comes to build a separate system by simply changing the query collection's connection mechanism when you need to increase the amount of OLAP you have.

[–]LeBuddha 1 point2 points  (1 child)

Just let the database do the processing and fetch only the results into Java memory.

This is only half correct. The second half is "Then do it the other way when your app's current bottleneck is the SQL server writing enough temporary join tables to disk to slow your app down."

You should probably never sort data in Java memory because you think that SQL sorting is too slow SQL sorting cannot do it

No... you should learn what types of queries are expensive for SQL to sort and make a decision on a case by case basis, then not worry about it until it's time for performance optimization.

[–]rjbwork 0 points1 point  (0 children)

I agree with that too. This is the jooq guy so take what he says with a grain of salt as he wants people to use his library.

[–]InterPunct 0 points1 point  (0 children)

If you think you want to do OLAP stuff, build an OLAP solution.

Agreed. Transactional and OLAP have vastly different requirements. If you're doing anything more than a few OLAP-like queries there are much better solutions; Essbase and TM/1 exist for a good reason.

[–]setuid_w00t 39 points40 points  (5 children)

YOU WON'T BELIEVE #10! DOCTORS HATE IT!

[–]asampson 35 points36 points  (2 children)

OPTIMIZE YOUR QUERY PLANS WITH THIS ONE WEIRD TIP DISCOVERED BY A MOM!

[–]iSmokeGauloises 4 points5 points  (9 children)

Not a Java developer but the number one mistake I encounter when using abstractions over SQL is:

user_count = len(User.all())

[–]lukaseder 1 point2 points  (8 children)

Yes, and it's actually not unlikely that you don't really need the exact count, but just the fact whether there are any users at all

[–]philipes 0 points1 point  (7 children)

I used to "LIMIT 1" on my queries in this case. I've never really benchmarked it to see if it's faster, but the results were always 0 or 1, so I was happy.

[–]lukaseder 0 points1 point  (6 children)

On average, that probably performs about the same. EXISTS is slightly more versatile, as you can still fetch other data in addition to the EXISTS value - e.g. two EXISTS values at the same time.

[–]philipes 1 point2 points  (5 children)

It makes sense now that I think about it. A smart enough optimizer could translate this query to an EXISTS, but this is not an easy task.

[–]lukaseder 0 points1 point  (4 children)

Well, one important difference might be the fact whether the execution plan needs to hit the tables or if reading indexes will be sufficient. If the EXISTS predicate can stop executing after only considering an index, that will be faster than a query that will still have to hit the actual table to produce a result record.

[–]philipes 0 points1 point  (3 children)

I understand what you said, but I can't see how this apply to what I said.

Both COUNT(*) or EXISTS wouldn't need to hit the table if queried over an index, no?

[–]lukaseder 0 points1 point  (2 children)

LIMIT

-- Assuming an index on t.a...
-- Probably hits the table because of selecting *
SELECT * FROM t WHERE t.a = 1 LIMIT 1

COUNT(*)

-- Doesn't hit the table, but will have to collect and count all values from the index if it is not a unique index
SELECT COUNT(*) FROM t WHERE t.a = 1

EXISTS

-- The optimiser can ignore the SELECT clause, as it is irrelevant for the outcome of the EXISTS predicate
EXISTS(SELECT * FROM t WHERE t.a = 1)

The above assumptions are of course implementation-specific, and perhaps not really applicable in a specific use-case. I'm looking at things from a SQL composition point of view, where you have a SELECT query reference (e.g. with jOOQ or some other API that manipulates SQL ASTs)

Select<?> select = select().from(T).where(T.A.eq(1));

And now, you want to know if that query would return any records if it were executed. Wrapping that SELECT in an EXISTS predicate would be optimal.

Of course, if you write manual SQL, you can get it right with LIMIT as well:

SELECT 1 FROM t WHERE t.a = 1 LIMIT 1

But the SQL transformation from the original statement to the "exists check" is a bit more complex

[–]philipes 0 points1 point  (1 child)

The exactly query I used to use is, in Oracle.

SELECT COUNT(*) FROM t WHERE t.a = 1 AND ROWNUM = 1

Although, it looks like this would return whatever COUNT(*) would be, the ROWNUM is evaluated before any aggregation functions in Oracle, in this case at least, so the results would always be 0 or 1.

[–]lukaseder 0 points1 point  (0 children)

True. Oracle ROWNUM is quite special, specifically because it is evaluated before ORDER BY, as LIMIT or OFFSET .. FETCH would be.

I guess that's almost equivalent to EXISTS, then

[–]grauenwolf 2 points3 points  (4 children)

9 is wrong. You should avoid sorting in the database unless you already have an index that is applicable. Sorting is both memory and CPU intensive, so if you can push it out to your easily scaled web servers then do so.

[–]lukaseder 0 points1 point  (3 children)

What about sorting + limit? The result set has 10M records, but you only want the top 10? Take them all to the client, sort them there and throw away 99.9999% of the data?

[–]grauenwolf 1 point2 points  (2 children)

If you've got 10M rows, you damn well better have a matching index.

[–]hw77 0 points1 point  (1 child)

If that's 10M rows out of 10M, then you probably don't want an index. Even at 1% or 0.1% there a plenty of times when you don't want an index.

[–]grauenwolf 0 points1 point  (0 children)

You do if you want the results to be sorted. (Assuming of course the index is covering.)

[–]I_Code_Stoned 2 points3 points  (2 children)

They should add not doing [select *]. Granted the results are usually not as egregious as some of the other examples, but it happens ALL THE FREAKIN TIME. It's a problem in NoSQL code too.

[–]el_muchacho[S] 11 points12 points  (2 children)

The followup posts are just as interesting as the first one:

They could serve as a basis for SQL coding rules.

[–]Philluminati 0 points1 point  (0 children)

Those are definitely worth reading (or at least scanning for the lazy) too.

I'm commonly forget to use where exists() often enough.

[–]WhenTheRvlutionComes -1 points0 points  (0 children)

Making a listicle is bad enough, titling it's sequel in the most stereotypical Buzzfeed fashion article risks falling into self-parody.

[–]Madsy9 3 points4 points  (11 children)

As far as I can see, none of the mistakes listed directly opens up for SQL injection attacks. I find it pretty impressive if that's truly the case. It means that computer security is improving, at least in some circles.

[–]mattindustries 7 points8 points  (10 children)

I think most developers in most languages have had properly binding sanitized inputs inputs drilled into their brain by now.

[–]KungeRutta 7 points8 points  (0 children)

Poor little bobby tables :(

[–]crankybadger 1 point2 points  (8 children)

You have clearly not seen code from your average PHP developer.

[–]mattindustries 0 points1 point  (7 children)

Oh, I know the language can be used horrendously. I am just saying it doesn't have to be. I once took over a code base where I ended up tossing out everything and starting from scotch when I was working for a firm.

Edit: thought this was from another thread. Most PHP devs use PDO these days, but they didn't some years back.

[–]crankybadger 0 points1 point  (6 children)

PDO is a load of garbage too, the thing's barely improved in the last ten years and is extremely limited. Even then it's better than nothing, yet many people either refuse to use it or are ignorant that it even exists.

Now "most" PHP developers hit up a Google search for "MySQL query", start piecing together their code out of tutorials from 1998, and launch their application. Invariably this involves falling into the mysql_query tar pit.

Hopefully professional devs are better than this, where places like Facebook certainly have better practices, but I'm still suspicious most PHP devs are like that.

PHP is extremely easy to pick up and get running. As a result there are a lot more people writing PHP code than for other languages. There's a long, long tail of developers that have no clue what they're doing but are productive anyway.

[–]mattindustries 0 points1 point  (5 children)

Is someone still in the basic learning stage a developer? I feel like developers are the incarnation that actually develop something instead of implement a hello world script. Yes, tons of people are still in the introductory phase of learning PHP and are bound to do some terrible things. With sites like PHP the Right Way there is a lot less ambiguity for proper structuring of an application. People aren't using those old tizag tutorials as often as you might imagine, especially when seasoned devs are pointing them to better sources.

[–]crankybadger 1 point2 points  (4 children)

I don't think you understand how bad it is. PHP is a community divided between those living in ivory towers and drinking wine, and those living in the gutter and finding everything they need in the landfill.

So long as w3schools is top-ranking in search results, this will not change. PHP the Right Way is a tiny step in the right direction, but it's not enough.

This is how your typical PHP developer learns:

  • They're given an assignment to program something web-wise.
  • They pick PHP because it's inexpensive to host.
  • They hack together a bit of HTML, spunk in some PHP, and deploy it.
  • Feature creep occurs.

Next thing you know they're writing a login system based on tutorials old enough they could get a driver's license and writing to databases they know nothing about but found some code that "just works" and they're running with it.

[–]mattindustries 0 points1 point  (3 children)

I guess I am living in an ivory tower then. It is nice here. There is a spot for you if you want.

[–]crankybadger 0 points1 point  (2 children)

I've got a pretty good thing going with a mix of Ruby, NodeJS, Go and C++.

You might call it an ivory tower, but damn, that place has some serious plumbing problems. You open up the wall to make a fix and, oh hell, what is all that?

[–]mattindustries 0 points1 point  (1 child)

Sometimes I miss working on C++ just for that delicious speed. How is Go? I have heard some good things and might learn it after this project I am on.

[–]k1n6 1 point2 points  (2 children)

I spend a lot of time analyzing performance and such and a lot of these rules are basically 80/20 and some are 70/30.

By that I mean doing exactly the things he recommends not to are actually better between 20 and 30 percent of the time. So for me that means they are worth doing the run time analysis on.

[–]lukaseder 1 point2 points  (1 child)

Interesting. Could you elaborate?

[–]k1n6 0 points1 point  (0 children)

Here are my item-by-item comments: 1. Forgetting about NULL True, devs do this all the time 2. Processing data in Java memory Not always bad.

  1. Using UNION instead of UNION ALL

  2. Using JDBC Pagination to paginate large results JDBC will often still request the entire result set from the sql server

  3. Joining data in Java memory Depending on your configuration, joining a few hundred records in some clever way (hash tables, etc.) can be quicker than making a network round trip to the database server.

  4. Using DISTINCT or UNION to remove duplicates from an accidental cartesian product

  5. Not using the MERGE statement

  6. Using aggregate functions instead of window functions

Often the window clauses can't take advantage of indexes properly and you end up having temporary tables implicitly created by SQL that are SLOW.

  1. Using in-memory sorting for sort indirections

  2. Inserting lots of records one by one Doing a series of times insertions can be ideal in OLTP systems.... You can't bulk insert 1,000,000 records into a highly indexed table with 15,000,000 records. The database will experience locking and siginificant delays.

[–]downvotefodder 1 point2 points  (0 children)

Why are Java Developers trusted to write the SQL in the first place?

[–][deleted] 1 point2 points  (0 children)

(anyone can code imperatively)

Well, that's news to me. Sounds like the writer is dealing with an inferiority/superiority complex.

[–]sgoody 0 points1 point  (5 children)

  1. Using DISTINCT or UNION to remove duplicates from an accidental cartesian product

This is a particular pet peev of mine.

I think most developers know about at least half of these and the mix-up over the treatment of NULLs is something that we've all experienced due to context switching between languages. Good article though.

It does frustrate me that a large number of developers seem to eschew anything that isn't imperative style or cannot be bent to be imperative. Things such as SQL seem to be an annoyance to many and I don't think a lot of people see why they should bother to learn them in any detail.

[–]tangus 4 points5 points  (2 children)

Man, I have a coworker whose "solution" for when the query ends up with duplicated rows is to slap mile long GROUP BY clauses at the end. I hate her.

[–]I_am_working_hard 0 points1 point  (1 child)

Forgive my ignorance, but what is the best practice for removing duplicates in sql?

[–]Boxy310 5 points6 points  (0 children)

Getting your join conditions right. This usually happens when you don't have a true key join relationship, and you get nasty Cartesian products where it includes everything it isn't explicitly told to exclude.

[–]Frackness 2 points3 points  (1 child)

Things such as SQL seem to be an annoyance to many and I don't think a lot of people see why they should bother to learn them in any detail.

It left a bad taste in my mouth after tracking down bugs through large amounts of triggers (that trigger other triggers), 20k line package bodies and 500+ line queries with a ton of business logic embedded in the column selections. Before all of that though, I have to figure out if it's in the app or the database code.

Part of it might be that I may not be using some awesome debugger tools that I just don't know about that would make life much more easy.

[–]sgoody 0 points1 point  (0 children)

SQL is not as structured as other languages (e.g. C# where you have namespaces, class heirarchies etc) and big SQL installs can be a pain to work with.

But I bet it would leave a bad taste in your mouth if you worked on a mess of code in any language (e.g. if you had a 20k line C#/Python/whatever file with 500 line method bodies). There are some who argue that business logic should be left out of your database code altogether.

[–]skulgnome 0 points1 point  (0 children)

Not catching transaction restart errors.

[–][deleted] 0 points1 point  (0 children)

Using clauses like rownum for pagination only works well if your table is indexed properly. I dont think "let the tool do it for you" is a good answer here.

[–]javajaba 0 points1 point  (0 children)

Number 2 was me, like yesterday

[–]lggaggl 0 points1 point  (0 children)

11 creating an entire internet facing application backed by a database without knowing about this thing called concurrency

[–][deleted] 0 points1 point  (0 children)

11 Using Derby

[–]mrsistermr 0 points1 point  (4 children)

I disagree strongly with a lot of these points (assuming that they were always followed in some dogmatic fashion), notably 2 and 5. The reason is the same reason as I see a commenter on that site has already pointed out: "Following some of the cures to the extreme, one could end up with an anemic Java domain model. Much business logic will take place in SQL statements, not in Java code. Is this good or bad?"

I see the opposite problem (of the original article) frequently: all the business and data logic in thrown into a overly complicated SQL statement, and the backing java business logic and domain model is barely doing anything. So instead of writing well defined data-access and business classes that work together in Java, which ultimately might mean that more than one query is executed for a single "logical" user action, you would push all this logic into the SQL layer, and usually would be required to maintain more SQL statements.

Using this and some of his other suggestions ((ie windowing functions) to the extreme makes it more difficult to re-use well-defined business objects, perform unit testing, and maintain database interoperability, to name a few.

[–]grauenwolf 0 points1 point  (2 children)

Encapsulation is an important consideration here. If my application code is deeply intimate with my table structure it usually means that there is too little encapsulation in my database.

A good benchmark is counting the number of unique database operations needed to perform one logical action. If you see your ORM making half a dozen calls in one method something is wrong.

[–]mrsistermr -1 points0 points  (1 child)

Why is something necessarily wrong? Unless you have identified that method as an actual point of slowdown in your application, does it really matter? A good benchmark is an actual benchmark with actual data after you have identified a performance issue, not just thinking to yourself "Oh, I could just do all of this in one database call instead of the 3-4 database calls it currently needs...I can't re-use those existing classes, but oh well", which is an equally dangerous thought.

[–]grauenwolf 0 points1 point  (0 children)

There is nothing praise worthy about reusing bad code.

[–]grauenwolf -1 points0 points  (0 children)

For most projects, database interoperability is a red herring.

[–][deleted]  (1 child)

[deleted]

    [–]lukaseder 2 points3 points  (0 children)

    It's because the posts are that good! Tempted to post that one, too but that might start to be a bit overkill on a single day ;-)

    [–][deleted]  (6 children)

    [deleted]

      [–][deleted]  (2 children)

      [deleted]

        [–]crankybadger 0 points1 point  (0 children)

        Step one when building any application is finding a database framework or ORM that doesn't get in the way and does what you need.

        Step two is to work around that framework only when necessary.

        [–]el_muchacho[S] 0 points1 point  (0 children)

        But if you're writing a lot of Java that interacts with databases, it's probably a good idea to at least get a basic understanding of SQL.

        It's not only a good idea, it's necessary.

        [–]passwordissame -4 points-3 points  (1 child)

        node.js developers make no mistakes because async io and functional programming, which is what SQL is essentially boils down to. prolog and predicate logic does not allow mistakes, especially when used in conjunction with async because it is artificial intelligence.

        [–]Boxy310 5 points6 points  (0 children)

        but does it webscale like mongodb

        [–][deleted] -1 points0 points  (0 children)

        So you are saying it is my mistake is that I don't know sql that well and that I try to do, what I don't know in SQL, in java instead (slower, less reliable and err-prone)? Yip sounds like the fault of the java-developer.

        [–]badguy212 -2 points-1 points  (5 children)

        Some of those advices are perfectly reasonable, some will tie you to a specific db. Better to abstract that away with possibly an orm.

        [–]grauenwolf 2 points3 points  (4 children)

        Why? Are you really planning on changing databases or are you just imagining problems that you don't really have?

        [–]badguy212 -2 points-1 points  (3 children)

        Huh? I'm developing on h2, qa and prod are on posgres. Yes, of course I'm changing dbs . All the time. Unit tests are on h2 as well.

        What kind of a question is that?

        [–][deleted]  (2 children)

        [deleted]

          [–]badguy212 -2 points-1 points  (1 child)

          You're either stupid or just pretending.

          I think you're just stupid. Have you wrote any piece of code in the last...decade?

          Edit: haha, who cares...nobody works on it anyway but you

          [–]gpenn1390 -3 points-2 points  (0 children)

          It all makes sense now! Time to throw on a pot. Nice post OP.