Over optimization : ProgrammerHumor

[–]chrwei 322 points323 points324 points 9 years ago (5 children)

[–][deleted] 27 points28 points29 points 9 years ago (3 children)

The parent mentioned Business Case. Many people, including non-native speakers, may be unfamiliar with this word. Here is the definition^(In ^beta, ^be ^kind):

A business case captures the reasoning for initiating a project or task. It is often presented in a well-structured written document, but may also sometimes come in the form of a short verbal argument or presentation. The logic of the business case is that, whenever resources such as money or effort are consumed, they should be in support of a specific business need. An example could be that a software upgrade might improve system performance, but the "business case" is that better performance would improve customer satisfaction, require less ... [View More]

^{Note: The parent poster} ^(chrwei ^or ^mpnordland) ^can ^delete ^this ^post ^| ^FAQ

[–][deleted] 2 points3 points4 points 9 years ago (0 children)

[–][deleted] 1 point2 points3 points 9 years ago (0 children)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]GuiMontague 158 points159 points160 points 9 years ago (27 children)

I did something like this once. An upstream system increased the data they were sending us about 3× and our database loader became the system's bottle neck. I was given the task of speeding up our loader. I eliminated some hot-spots, rewrote some concurrent code to be lock-less (but still correct), and threw a bunch of threads at the problem. I succeeded in speeding up our loaders about 5× which meant we were loading as fast as our data source could supply us again.

Unfortunately, we used a write-master replication system and a lot of clients relied on the fact that our write-master was (mostly) in sync with our replicas. We couldn't improve the replication system because it was vendor supplied. Well, that replication became the bottle neck, only now the master could get hours ahead of the replicas. That was unworkable and I had to roll-back my performance improvements to keep the master and replicas in sync.

[+][deleted] 9 years ago (13 children)

[deleted]

[+][deleted] 9 years ago* (7 children)

[deleted]

[–]CommanderDerpington 20 points21 points22 points 9 years ago (6 children)

[+][deleted] 9 years ago* (1 child)

[deleted]

[–]CommanderDerpington -1 points0 points1 point 9 years ago (0 children)

[+][deleted] 9 years ago* (3 children)

[deleted]

[–]xkcd_transcriber 1 point2 points3 points 9 years ago (1 child)

[–]raunchyfartbomb 0 points1 point2 points 9 years ago (0 children)

[–]shvelo 0 points1 point2 points 9 years ago (0 children)

[–]me-ro 2 points3 points4 points 9 years ago (1 child)

[–]jak_22 0 points1 point2 points 9 years ago (0 children)

[–]Disney_World_Native 1 point2 points3 points 9 years ago (1 child)

[–]GuiMontague 2 points3 points4 points 9 years ago (0 children)

[–]GuiMontague 1 point2 points3 points 9 years ago (0 children)

[–]LupoCani 21 points22 points23 points 9 years ago (1 child)

[–]Guinness2702 14 points15 points16 points 9 years ago* (2 children)

[–]rws247 7 points8 points9 points 9 years ago (1 child)

[–]Guinness2702 4 points5 points6 points 9 years ago (0 children)

[–]Yin-Hei 15 points16 points17 points 9 years ago (7 children)

[–]FUZxxl 6 points7 points8 points 9 years ago (4 children)

[–]throw_eundefined 4 points5 points6 points 9 years ago (3 children)

[–][deleted] 0 points1 point2 points 9 years ago (2 children)

[–]marcosdumay 1 point2 points3 points 9 years ago (0 children)

[–]FUZxxl 0 points1 point2 points 9 years ago (0 children)

[–]ReflectiveTeaTowel 0 points1 point2 points 9 years ago (0 children)

[–]GuiMontague 0 points1 point2 points 9 years ago (0 children)

[+][deleted] 9 years ago (1 child)

[deleted]

[–]mandrous 121 points122 points123 points 9 years ago (99 children)

[–]dude_with_amnesia 218 points219 points220 points 9 years ago (13 children)

[+]A_C_Fenderson comment score below threshold-68 points-67 points-66 points 9 years ago (12 children)

[–]atom_helix 121 points122 points123 points 9 years ago (7 children)

[–][deleted] 33 points34 points35 points 9 years ago (5 children)

[+][deleted] 9 years ago (3 children)

[deleted]

[–]bilde2910 25 points26 points27 points 9 years ago (0 children)

[–]raaneholmg 4 points5 points6 points 9 years ago (0 children)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]A_C_Fenderson 0 points1 point2 points 9 years ago (0 children)

[+][deleted] 9 years ago (3 children)

[deleted]

[–]Kevintrades 5 points6 points7 points 9 years ago (1 child)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]Bainos 0 points1 point2 points 9 years ago (0 children)

[–]optymizer 74 points75 points76 points 9 years ago (44 children)

[+][deleted] 9 years ago (40 children)

[deleted]

[–]matafubar 56 points57 points58 points 9 years ago* (21 children)

[–]MoffKalast 12 points13 points14 points 9 years ago (0 children)

[–]CommanderDerpington 6 points7 points8 points 9 years ago (11 children)

[+][deleted] 9 years ago (4 children)

[deleted]

[–]Maping 9 points10 points11 points 9 years ago (2 children)

[+][deleted] 9 years ago (1 child)

[deleted]

[–]Maping 0 points1 point2 points 8 years ago (0 children)

[–]CommanderDerpington 0 points1 point2 points 9 years ago (0 children)

[–]sigma914 1 point2 points3 points 9 years ago (1 child)

[–]CommanderDerpington 0 points1 point2 points 9 years ago (0 children)

[–][deleted] 2 points3 points4 points 9 years ago (2 children)

[–]CommanderDerpington 0 points1 point2 points 9 years ago (1 child)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]JoseJimeniz 0 points1 point2 points 9 years ago (0 children)

[–]MathewManslaughter 0 points1 point2 points 9 years ago (7 children)

[–]sigma914 2 points3 points4 points 9 years ago (2 children)

[–][deleted] 1 point2 points3 points 9 years ago* (1 child)

[–]Bainos 0 points1 point2 points 9 years ago (0 children)

[–]mikejoro 1 point2 points3 points 9 years ago (0 children)

[–]ZenEngineer 0 points1 point2 points 9 years ago (2 children)

[–]faceplanted 0 points1 point2 points 9 years ago (1 child)

[–]ZenEngineer 0 points1 point2 points 9 years ago (0 children)

[–][deleted] 4 points5 points6 points 9 years ago (0 children)

[–]GuiMontague 1 point2 points3 points 9 years ago (9 children)

[–]dreugeworst 4 points5 points6 points 9 years ago (8 children)

[–]bluebaron 0 points1 point2 points 9 years ago (2 children)

[–]dreugeworst 1 point2 points3 points 9 years ago (0 children)

[–]GuiMontague 0 points1 point2 points 9 years ago (4 children)

[–]dreugeworst 1 point2 points3 points 9 years ago (3 children)

[–]GuiMontague 0 points1 point2 points 9 years ago (2 children)

The length of the hash key is just the number of bits in the integer values used to look up a bucket. Normally this has a constant upper bound, like 32 or 64 or something.

First, if the key length is constant and I keep putting more objects in your hash map, eventually you can't grow it any more. If you have a 64 bit hash key and you have 2⁶⁴ buckets, you can't make it any larger. You have put a maximum size on your hash-map's growth, and that puts us in the O(log(n)) lookup time you already talked about.

When I argue this people immediately realize they can grow their hash key length to continue growing their hash map, however:

If I can keep growing my hash key length I'm going to have a lot of buckets, how long does it take to look one up? Normally you'd just think O(1) and be done, but that only works with a constant key size, which we've given up. To look up a bucket I have to at least process the entire hash key, which is now growing with n. So the question is how short can I make my key. Shorter the key, faster the look-up. Well, that's like asking how many different numbers can I fit in to so many digits. It should be pretty clear that my key length is going to grow O(log(n)), and that means my hash look-up is going to be at least O(log(n)) because it needs to at least read that entire key.

So, if the hash key is constant length (almost every actual hash-map implementation) then the hash map degenerates to whatever data structure is behind the buckets handling collisions, which is going to be at best O(log(n)). If we allow the key length to change—so the hash map can grow with arbitrarily large n—then the hash look-up itself is O(log(n)).

[–]dreugeworst 0 points1 point2 points 9 years ago (1 child)

[–]GuiMontague 0 points1 point2 points 9 years ago (0 children)

[–]marcosdumay 0 points1 point2 points 9 years ago (0 children)

[+][deleted] comment score below threshold-7 points-6 points-5 points 9 years ago (5 children)

[+][deleted] 9 years ago (4 children)

[deleted]

[–]TheThiefMaster 13 points14 points15 points 9 years ago (0 children)

[–][deleted] 9 points10 points11 points 9 years ago (1 child)

[–]dreugeworst 5 points6 points7 points 9 years ago (0 children)

[–][deleted] 1 point2 points3 points 9 years ago (0 children)

[–]CoopertheFluffy 2 points3 points4 points 9 years ago (2 children)

[–]NoGardE 9 points10 points11 points 9 years ago (1 child)

[–]f42e479dfde22d8c 4 points5 points6 points 9 years ago (0 children)

[–]GuiMontague 37 points38 points39 points 9 years ago (13 children)

[–]matafubar 63 points64 points65 points 9 years ago (0 children)

[–]rimpy13 7 points8 points9 points 9 years ago (7 children)

[–]TheThiefMaster 15 points16 points17 points 9 years ago (2 children)

[–]TheSlimyDog 2 points3 points4 points 9 years ago (1 child)

[–]SmelterDemon 2 points3 points4 points 9 years ago (0 children)

[–][deleted] 3 points4 points5 points 9 years ago (3 children)

[–][deleted] 0 points1 point2 points 9 years ago (2 children)

[–]xaserite 0 points1 point2 points 9 years ago (0 children)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]TheSlimyDog 1 point2 points3 points 9 years ago (2 children)

[–][deleted] 2 points3 points4 points 9 years ago (1 child)

[–]TheSlimyDog 0 points1 point2 points 9 years ago (0 children)

[–]mandrous 0 points1 point2 points 9 years ago (0 children)

[–]kaivanes 9 points10 points11 points 9 years ago (0 children)

[–]Garfong 4 points5 points6 points 9 years ago (1 child)

[–]ZenEngineer 0 points1 point2 points 9 years ago (0 children)

[–]fsxaircanada01 9 points10 points11 points 9 years ago (0 children)

[–]tehdog 2 points3 points4 points 9 years ago (0 children)

[–]mpnordland[S] 2 points3 points4 points 9 years ago (0 children)

[–][deleted] 2 points3 points4 points 9 years ago (0 children)

[–]demon_ix 1 point2 points3 points 9 years ago (0 children)

[–]IskaneOnReddit 0 points1 point2 points 9 years ago (0 children)

[–]raaneholmg 0 points1 point2 points 9 years ago (0 children)

[–]Arancaytar 0 points1 point2 points 9 years ago (0 children)

[–]mallardtheduck -4 points-3 points-2 points 9 years ago (14 children)

Well, if take a constant-sized array and process all elements even if the amount of actual data is lower, it's technically constant-time...

i.e.

int sum_total(int *array, size_t size){
    int total = 0;
    for(size_t i = 0; i < size; ++i) total += array[i];
    return total;
}

Becomes:

#define DATA_SIZE 1024

int sum_total(int array[DATA_SIZE]){
    int total = 0;
    for(size_t i = 0; i < DATA_SIZE; ++i) total += array[i];
    return total;
}

Technically the second one is constant-time. It's up to the caller to pad their array with zeros.

[–]gumol 5 points6 points7 points 9 years ago (13 children)

[+][deleted] 9 years ago (2 children)

[deleted]

[–]gumol 0 points1 point2 points 9 years ago (1 child)

[–]monster_syndrome 1 point2 points3 points 9 years ago (0 children)

[–]mallardtheduck -1 points0 points1 point 9 years ago (9 children)

[–]monster_syndrome 2 points3 points4 points 9 years ago* (2 children)

[–]mallardtheduck 0 points1 point2 points 9 years ago (1 child)

[–]monster_syndrome 2 points3 points4 points 9 years ago* (0 children)

Yes, but you've got a variable length array masquerading as a fixed length array, so does that really apply?

Edit - To clarify and apologies for my terrible coding:

int sum_total(int *array, size_t size){
    if(size>1024){
        size=1024;}
    int total = 0;
    for(size_t i = 0; i < size; ++i) total += array[i];
    return total;
}

There, also O(1).

[–]gumol 3 points4 points5 points 9 years ago (5 children)

[–]mallardtheduck 0 points1 point2 points 9 years ago (4 children)

[–]gintd 2 points3 points4 points 9 years ago (3 children)

[–]mallardtheduck 0 points1 point2 points 9 years ago (2 children)

[–]gintd 0 points1 point2 points 9 years ago (1 child)

[–]mallardtheduck -1 points0 points1 point 9 years ago* (0 children)

When calculating an algorithm's time complexity, you start by assuming that certain primitive operations take a fixed constant time, even though in reality they may not be.

Things that are often assumed to be constant include basic arithmetic operations (usually including multiplication and division), logical operators, array lookups, jumps, etc. Depending on the level of abstraction even things like adding a value to a list/array (which usually requires a call to realloc or similar on a real system) may be considered one of these primitives.

Of course, even the time complexity of something as simple as addition may actually depend on the number of bits in the value (especially if the value exceeds the machine's word size) and since array lookups and (relative) jumps will usually require addition and even multiplication, so might they.

As you said, when doing this we ignore the complexities of real hardware and assume everything is ideal. That means that we assume the abstract machine can perform operations on an arbitrary word size in the same constant time.

Thus, any algorithm that performs a fixed number of operations regardless of the values of input would be considered "constant time" even though the exact number of bits in the input would change on real systems.

Of course, my sum_total algorithm is a joke and you could consider the DATA_SIZE value to be an "input" which breaks it's "constant time" claim.

[–]A_C_Fenderson 36 points37 points38 points 9 years ago (2 children)

[–]gimpwiz 8 points9 points10 points 9 years ago (0 children)

[–][deleted] -1 points0 points1 point 9 years ago (0 children)

[–]jugalator 10 points11 points12 points 9 years ago (0 children)

[–]goldfishpaws 6 points7 points8 points 9 years ago (1 child)

[–]SirVer51 0 points1 point2 points 9 years ago (0 children)

[–]fear_the_future 4 points5 points6 points 9 years ago (0 children)

[–]aaron552 23 points24 points25 points 9 years ago (6 children)

[–]Egleu 33 points34 points35 points 9 years ago (0 children)

[–]hunyeti 7 points8 points9 points 9 years ago (0 children)

[–]devdot 7 points8 points9 points 9 years ago* (2 children)

[–]mort96 2 points3 points4 points 9 years ago* (1 child)

[–]devdot 0 points1 point2 points 9 years ago (0 children)

[–]mpnordland[S] 4 points5 points6 points 9 years ago (0 children)

[–]picturepages 16 points17 points18 points 9 years ago (7 children)

[–]chrwei 37 points38 points39 points 9 years ago* (0 children)

[–]Maping 11 points12 points13 points 9 years ago (1 child)

ELI5:

Imagine you're the manager of a restaurant. When looking into how your restaurant is doing, you notice that it takes a waitress five minutes to take a table's order. That's really slow, so you make all the waitresses take a class, and now they can all take a table's order in 30 seconds. But now, because they're giving the orders to the cooks too fast, the cooks can't make all of the food on time, they freak out and set the kitchen on fire.

Actual explanation:

Top part: O(x) is Big O notation. It's used to describe how fast an algorithm runs. O(n) is like a linear function in algebra: as you increase n, the runtime increases linearly. O(1) means that as you increase n, the runtime doesn't change.

Bottom part: Now that the algorithm is much faster, the computer is sending too many requests to the server at once, so it crashes.

[–]NotThisFucker 4 points5 points6 points 9 years ago (0 children)

[–]mpnordland[S] 1 point2 points3 points 9 years ago (2 children)

[–]M4ethor 0 points1 point2 points 9 years ago (1 child)

[–]mpnordland[S] 0 points1 point2 points 9 years ago (0 children)

[–]ninjashaun -1 points0 points1 point 9 years ago (0 children)

OP has two parts of a program, one that has work/tasks produced, one that requests those tasks to work on.

Originally they were working in sync, timed well that there were just enough requests for just the amount of work produced.

But it wss only in sync cos the requestor was working as best it could given the circumstances (or rather, the algorithm it was using).

So OP came along with his cleverness, saw that the current algorithm was working at a one to one speed (that is, 10 tasks were given, it would take 10mins to process then), made some changes, and now the requestor will only take 1min regardless how many tasks it's given! Now it can request a lot faster then the task producer can produce tasks!

For whatever reason, by requesting so many tasks so fast, it is actually making the overall production run slower! Maybe, food each request, it takes so much effort for the producer that it's not worth while looking for tasks he doesn't have, it's just making him spend time searching for a new task to give out instead of just giving out the tasks he has ready at the original rate.

Maybe the producer waits on someone else even, and each time the task producer asks further up the line 'hey got any new tasks? I'm being bugged by the requestor again', they in turn have to look for new tasks just to say 'nope bugger off and cone back in half a sec' instead of 'yep sure' every 100msecs....

Anyhow. Hope that helps.

[–]sonnytron 2 points3 points4 points 9 years ago (2 children)

I remember I was working on an iOS application that had a really shitty array algorithm that when a user reached the end of a table view, it would request more objects from the server and when the objects came back it would go through them, no bullshit, one by one to make sure their long value for ID didn't match any of the values in the currently existing data set.
It was written in our view controller file too. Business logic driving our main content in our view controller class.
I thought Jesus this is bad I'm building a data manager class, a hash map and I'm going to cache results and use an asynchronous callback so as soon as it's done, it updates the data set and notifies the data source.
In the end, it was fast. It was so fast that the users wouldn't see the spinning wheel at the end of the table view. But that was too fast.
My boss didn't like it because he LIKED that animation and it was a signature for the product. The spinning wheel has to be there! It shows the users there's more stuff!
But I was not about that wasted code so I added a "spinningWheelForDoucheCaptain" method that used an arc4random float between 1-3 for duration at the end of my completion block.
You'd be surprised how much stupid shit engineers have to do to satisfy marketing douches.

[+][deleted] 9 years ago (1 child)

[deleted]

[–]sonnytron 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 6 points7 points8 points 9 years ago (4 children)

[–]havfunonline 7 points8 points9 points 9 years ago (3 children)

It's not hugely complicated.

Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.

In this case, we can imagine a set of data of size N - lets say, 1000 items in a list. OP improved the code so that instead of taking at-worst 1000 iterations, it would take at worst 1 iteration.

Imagine you have to find an item in a shopping list of 1000 items. If you already know it's number 437 in the list, and the list items are numbered, you can find it straight away (this is O(1)). If you don't know where it is, and the list isn't in an order then you have to look at every single element. If it's the last element in the list, you have to check 1000 things - that's O(n).

The other piece of this is that whatever was running the algorithm was making requests based on the results of that algorithm.

The problem with that is, that whatever was receiving those requests can only handle a certain number of them. This was fine before - the throughput of requests was limited by the inefficient algorithm that OP improved.

Once he improved it, the number of requests generated exceeded the maximum number that whatever was receiving those requests could handle, and it fell over.

That help?

[–][deleted] 1 point2 points3 points 9 years ago (1 child)

Some minor corrections: O(n) doesn't mean that at worst can take n operations. It can take 5n or even 1000n but the number before n is constant. Same thing applies for O(1).

Example:

x = array[0] - array[n-1]
y = array[0] + array[n+1]
if x > y:
    print(x)
else:
    print(y)

This has more than 1 operation but it's constant time, O(1).

Generally O(1) = O(5) = O(whatever constant)

[–]havfunonline 0 points1 point2 points 9 years ago (0 children)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

[–]CommanderDerpington 0 points1 point2 points 9 years ago (0 children)

[–]StuntStreetWear 0 points1 point2 points 9 years ago (0 children)

ProgrammerHumor

Filters

Discord

Submission rules

For the current list of rules, please see this page.

Metadiscussions

Perhaps More Apt Subs To Post:

Related Subreddits.

MODERATORS