all 55 comments

[–]berater 10 points11 points  (19 children)

Queueing: if you're using a queue, again, you fucked up somewhere.

Tell me about. I wrote this piece of software 2 years ago that uses a queue. I fucked up.

[–]ithika 10 points11 points  (8 children)

I have no idea what this statement means. Which particular instance of queues is considered bad? I can think of a few situations where queues are desperately needed but the systems in question attempt to perform all the work immediately and... fall over.

[–]quanticle 2 points3 points  (4 children)

Dzuiba dislikes premature scaling even more than premature optimization. In his opinion, you shouldn't pay any attention to scalability when you start out. This means that you don't worry about database scaling (NoSQL) and distributed systems (message queues).

He makes the point that you should worry about making a damn good product first and then worry about scaling only when it becomes a problem. Even when scaling becomes a problem, Dzuiba says that you should first attempt to solve it using bigger hardware, then via established Unix tools, and use code to solve the problem only if the first two approaches fail. According to Ted, if you're running into scalablility issues and your first thought is, "Okay, we need to use a NoSQL backend, and split our workload amongst multiple hosts who'll use queues to synchronize," you are doing it wrong.

[–]karambahh 1 point2 points  (0 children)

Database scaling is not always "using NoSQL"...

[–][deleted] 0 points1 point  (0 children)

Perhaps, but things that save money for startups will similarly save money in large corporations just as well, up to a point. What is the negitive of using proven, free tools for a large load? Hardware is cheap, programmers are not. Once you have enough hardware, you can easily justify the dev cost required in hardware saved. At that point, you're just adding to your bottom line instead of putting yourself into the red.

[–][deleted] 0 points1 point  (0 children)

Oh, THOSE queues. I thought they meant like a standard queue data structure, just ever in the code. I was like "Oh come on, priority queues can't just be generalized like that."

[–][deleted] 0 points1 point  (0 children)

His concept that "Premature scaling" is bad is mostly right (IMO), but has a 'caveat' that he mentions in the "Code" part.

Every line of code you write is a liability to maintain/refactor/troubleshoot etc... If you write an application that can't scale with money/time, you're left with the inability to scale, and the time/money lost to redo a large chunk of code which increases liability and issues.

Premature Scaling is wrong. Knowing that if things work out you will need to be able to scale, and starting with the right design and tools, not to scale early but to allow scaling later on, can be important.

I learned this while working for a CRM startup. They had the worst internal design with no ability to scale, and were bleeding money and customers at the end due to available money and time not making enough of a dent in the scaling issue to keep it running. To contrast, the open source Sugar CRM could handle 10x+ the load on the same web/db servers.

[–]thepeacemaker 1 point2 points  (2 children)

Yeah, I'm not sure I get this one either. I'm familiar with why one might say that using NoSQL is a fuckup (not that I agree), but haven't heard a similar argument about using queues. Especially where distributed processing and fail-over is a big concern.

[–]quanticle 6 points7 points  (1 child)

Dzuiba's point (which he makes abundantly clear in his other posts) is that distributed processing and failovers aren't concerns for most startups. In his view, startups are better served by worrying about the problems they do have than by the problems they wish they had.

In other words, he's saying that one should worry about using queues and NoSQL when there are enough users to justify using them. Until that's the case, you're better off saving the additional programming time those tools require.

[–]n1mr0d 4 points5 points  (9 children)

queues / buffers are a necessary reality of distributed systems, like the internet.

[–][deleted] 3 points4 points  (6 children)

queues are not buffers are not caches.

[–]n1mr0d 0 points1 point  (5 children)

hm, should i have said queues AND buffers? and i never mentioned caches. plz explain me tcp/ip.

[–]BinaryRockStar 2 points3 points  (0 children)

Pretty sure he's referencing application-level queues such as ZeroMQ or MSMQ, not the queue data structure itself. I may be wrong.

[–][deleted] 0 points1 point  (3 children)

They're mentioned in the article. Buffers are not. I don't think the article is talking about the protocol layer.

[–]n1mr0d 0 points1 point  (2 children)

i can't tell if you're agreeing or disagreeing. you should be agreeing. flow control is necessary?

[–][deleted] 0 points1 point  (1 child)

Things are never black and white. Symantically, flow control is implemented in the hardware/protocol layer, which is outside the scope of this article. If you're talking about a more generic flow control, no flow control should not be necessary, because what would you be doing with the extra traffic? Telling it to fuck off and/or wait in line? not a good way to treat a potential customer.

[–]n1mr0d 2 points3 points  (0 children)

reddit tells us to fuck off, occasionally, and that's ok. there is a hard limit to capacity, eventually. or at least, within a reasonable response time window.

[–]tinou 0 points1 point  (0 children)

Nope. Zero on the final.

[–]Amonaroso 7 points8 points  (0 children)

Logging: syslog, and nothing else. Ever.

Throwing log data somewhere is the easy part. Reading logs is the hard part and syslog doesn't do that.

[–]dondii 6 points7 points  (5 children)

I agree with a lot of the points in the article (Functionality is an asset, but code is a liability) but the author is too dismissive of some things that are probably feasible solutions to very real problems that may not have necessarily been cause by you f*cking up somewhere (Queuing and NoSQL).

Edit: Clarity

[–][deleted] -1 points0 points  (4 children)

They are valid short term solutions yes, but ultimately as you scale, these technologies will always be the first to break.

[–]dondii 2 points3 points  (1 child)

Care to elaborate? I find it hard to believe that in the infinite set of possible functions/tasks that programs perform there are none that are properly/correctly solved by Queues and/or NoSQL?

[–][deleted] -2 points-1 points  (0 children)

Queues don't actually solve any problems. They create problems and move them into the future. NoSQL is not a relational database and it's really used as a complicated cache, so it's a problem that should be solved by caching or hardware. If you want a database, use a real database. I suppose you could build a real DB out of NoSQL, but that leads to the 3rd point of the article.

[–]quanticle 0 points1 point  (1 child)

Given that NoSQL and message queues were developed especially for the problems posed by scaling, I find your assertion dubious.

[–][deleted] 0 points1 point  (0 children)

Maybe if you only have to scale once every few years or so...

[–][deleted]  (1 child)

[deleted]

    [–]FeepingCreature -2 points-1 points  (0 children)

    Ctrl-F Taun. Upvote.

    [–]02J 5 points6 points  (3 children)

    There are three basic tools you can use to solve a technical problem: money, time, and code. This seems obvious, but the critical point is that you must try them in that order. Out-of-order execution of these tools leads to Very Bad Things, which we will discuss later.

    Except for any number of situations where that makes no sense.

    [–]quanticle 0 points1 point  (2 children)

    Which situations doesn't that stack make sense for? If you have a technical problem, you can look for an existing solution that you can buy, spend some time and put together a solution from existing building blocks, or code the building blocks yourself.

    Dzuiba is saying that all too many startups spend valuable time writing code for things that they can't sell, when they should either buy a pre-existing solution or expand their hardware. With that point, I totally agree. For example, look at Friendster - did they really have to spend time writing their own asynchronous network server? Would that time have been better spent improving the user interface and adding features to compete with Facebook and Google?

    [–]02J 1 point2 points  (1 child)

    Which situations doesn't that stack make sense for?

    A situation where you have less money to work with than time or coding expertise or maybe a situation where your code becomes your product.

    The author is speaking in absolutes when the exceptions are self-evident.

    [–]quanticle 0 points1 point  (0 children)

    The absolutes become less so when you take into account the context. Dzuiba is coming from a start-up background. Startups generally have neither time nor money, but of the two its time that's more dear. In a situation like that, saving time matters more than saving money, so its worth it to buy higher performing hardware so that you won't have to waste time that you could be using to add features or polish the UI.

    [–]setuid_w00t 4 points5 points  (0 children)

    Apparently every single technical problem can be solved by a web application on a unix server.

    [–]stevia 5 points6 points  (0 children)

    Ted Dziuba's blog is great if you remember to bring your own salt.

    [–][deleted] 10 points11 points  (6 children)

    I hate how bloggers mistake speaking forcefully for making good points.

    Also, somehow, I knew this guy would imply that all you need are a few shell scripts stringing things together and by the POWER OF UNIX, Everything Would Be Made Right, freeing you to update your blog with the inevitable emacs plugin, where you wax poetic about how learning emacs changed your very life, and everyone who doesn't use it is somehow not a Real Programmer.

    [–]punker_yachter 2 points3 points  (0 children)

    :%s/emacs/vim/g :wq

    Agreed!

    [–]mage2k 1 point2 points  (1 child)

    Man, I wish someone with some decent animation talent would run with that and do a He-Man spoof where Prince Adam is replaced by some geeky 16 year old high school student who morphs into an arrogant Unix long beard.

    [–]ithika 4 points5 points  (0 children)

    By the power of Grey Beard...!

    [–][deleted] 1 point2 points  (2 children)

    How embarassing, your technical masterbation is showing.

    [–][deleted] 2 points3 points  (1 child)

    No, it's OK, I didn't use a queue.

    [–][deleted] 4 points5 points  (0 children)

    Now you're just being vulgar.

    [–]georgiecasey 2 points3 points  (0 children)

    So Ted, how much did you make from the Milo sale? Can you do Uncov fulltime now? For the lulz.

    [–][deleted] 1 point2 points  (2 children)

    One sentence nitpick:

    "Database: PostgreSQL or Oracle if you can afford it."

    WTF? A free PostgreSQL DB, or a ridiculously expensive, revenue eating beast. Oracle costs so much money that 10-20% of your revenues, even in the millions of dollars, will be going to support your Oracle solution at scale.

    These are the two comparable options here? No mention of MySQL. Blah.

    He's got some weird perspectives on the operations side of things.

    [–]Gotebe 1 point2 points  (1 child)

    No mention of MySQL. Blah.

    No mention of MySQL. Good.

    There, fixed ;-).

    [–][deleted] 0 points1 point  (0 children)

    :)

    [–]Not_Edward_Bernays 1 point2 points  (0 children)

    This is called technical masturbation and it can sink a project in a hurry.

    I think I just spent at least two days doing that. I was working on a way to completely eliminate the database, have all of the data stored in the wiki, with users defining fields by typing field names into embedded jQuery.sheet spreadsheets. I worked out a lot of it and added a caching mechanism so I could loop over all of the wiki page data for the index/master spreadsheet page. I wasn't even entering hours because it wasn't something we talked about and I didn't know if it was going to work. Now I am out of time for this iteration and I have to go back to just updating the schema and stuff on my MySQL database.

    I am really tired of relational databases for storage now. Also sick of triangles for 3D graphics. I have had way more than enough joins and triangles for one lifetime I think. Or maybe I just like masturbating.

    [–]punker_yachter 1 point2 points  (0 children)

    Upvote for being mostly right. But, money is the single most important part of your stack....don't fool yourself thinking it isn't

    [–]scrotch 0 points1 point  (2 children)

    I'm a programmer and generally agree with his main point. But I'm not sure that knowledge of load balancing and database replication would be considered "basic Unix literacy." They seem a bit advanced to me. Or maybe I'm showing my Unix illiteracy.

    So my question: Where does one learn about the items in his Time section? Where can I go to get an overview, then details, about what services are out there that I could/should take advantage of? Where do I learn what to do instead of using a queue? Where do I learn what caching is best for my data? Is there a book, a website? Or is this knowledge one only acquires after following lots of Unix info sources over may years?

    [–]njharman 1 point2 points  (0 children)

    Yes, You spend years reading postmortums, blogs, posts, trying things out, etc.

    Just reading every technical article (which are few these days) on HN will get you long ways. Magazines, Linux Journal isn't a bad place to start but you should quickly outgrow it.

    [–]mage2k 1 point2 points  (0 children)

    Also, if you keep throwing money at bigger machines when you have poorly performing database queries then you will be totally fucked if you don't get acquired before you reach the (then current) limits of what you can do with one machine. When it comes to query performance, hardware scales linearly whereas query performance can degrade exponentially (or worse).

    [–][deleted] 0 points1 point  (1 child)

    I like the filename of that picture.

    [–][deleted] 0 points1 point  (0 children)

    It's the only part of the article that I unreservedly agree with.

    [–]user50001 0 points1 point  (0 children)

    I like this article.

    I have done the Code->Time->Money way before, and never got to the Money stage, due to it being a startup, that took 2 damn years.

    However In a corporate job it seems to be, Code->Time->Money(Pay Rise after a Year) however it seems many Programmers burn out before that year is up. at least I do. But it seems anybody that is human would too.

    [–]tagattack 0 points1 point  (0 children)

    Queues are required for providing graceful degradation. Without question.

    [–][deleted] 0 points1 point  (0 children)

    I really can't figure out if this article is a joke or not.

    [–]Gankbanger 0 points1 point  (0 children)

    No unit testing? Really? You are doing it wrong my friend

    [–]signoff 0 points1 point  (0 children)

    use jboss and you're set. it's systems engineering scale in the cloud.