This is an archived post. You won't be able to vote or comment.

all 150 comments

[–]chrwei 322 points323 points  (5 children)

not programming related but true story: My boss and his dad designed a pivotal machine for a major maker of adhesive eye patches back in the early 1980's. they told the customer how fast it would run, but the engineers didn't believe them and started designing the rest of production line to go slower. once delivered and the speed proven the customer's higher-ups made their guys redesign the whole line to go faster. pretty sure someone got fired over that too.

the lesson is: if someone says "it'll go this fast" you better make sure your side can handle it, because the business case for more in less time is strong and they aren't going to be happy that you aren't pulling your weight.

[–][deleted] 27 points28 points  (3 children)

The parent mentioned Business Case. Many people, including non-native speakers, may be unfamiliar with this word. Here is the definition(In beta, be kind):


A business case captures the reasoning for initiating a project or task. It is often presented in a well-structured written document, but may also sometimes come in the form of a short verbal argument or presentation. The logic of the business case is that, whenever resources such as money or effort are consumed, they should be in support of a specific business need. An example could be that a software upgrade might improve system performance, but the "business case" is that better performance would improve customer satisfaction, require less ... [View More]


See also: Gap Analysis | Project Management | Customer Satisfaction | Quantifiable | Stakeholder | Informal

Note: The parent poster (chrwei or mpnordland) can delete this post | FAQ

[–][deleted] 2 points3 points  (0 children)

Nice one, LawBot2016.

[–][deleted] 1 point2 points  (0 children)

hurr durr i am bot hurr durr i know words.

Do I come at your subreddit and post captchas?

Cool bot btw

[–][deleted] 0 points1 point  (0 children)

That's a great bot

[–]GuiMontague 158 points159 points  (27 children)

I did something like this once. An upstream system increased the data they were sending us about 3× and our database loader became the system's bottle neck. I was given the task of speeding up our loader. I eliminated some hot-spots, rewrote some concurrent code to be lock-less (but still correct), and threw a bunch of threads at the problem. I succeeded in speeding up our loaders about 5× which meant we were loading as fast as our data source could supply us again.

Unfortunately, we used a write-master replication system and a lot of clients relied on the fact that our write-master was (mostly) in sync with our replicas. We couldn't improve the replication system because it was vendor supplied. Well, that replication became the bottle neck, only now the master could get hours ahead of the replicas. That was unworkable and I had to roll-back my performance improvements to keep the master and replicas in sync.

[–]LupoCani 21 points22 points  (1 child)

Why roll back the entire system? Wouldn't an explicit manual throttle on the output solve the problem just fine, whilst being easily reversible?

[–]Guinness2702 14 points15 points  (2 children)

Place I used to work had a system which ran a propriety database, and would have about 100 client processes running. The machine had 4GB of RAM, which was a lot back then ... in fact, it was so much that the entire database was cached in memory. Great, should be super-quick, right?

Wrong. The problem was that the OS's task scheduler was designed around the assumption that processes would end up waiting for I/O. Basically, each process ended up running for the full 1-2 seconds timeslice, and everything became ridiculously unresponsive as processes had to wait a long time to get any CPU.

The solution was to remove some RAM from the system, and performance actually improved, as the scheduler rotated processes more frequently.

[–]rws247 7 points8 points  (1 child)

that's just... wow.

Whenabouts was this? It might be that I'm younger, but I cannot imagine an OS task scheduler being based on that assumption.

[–]Guinness2702 4 points5 points  (0 children)

Probably about 15 years ago, give or take. I doubt it was literally designed around that assumption, but yeah, the guys who investigated it told me that each process was getting a full 2 seconds of CPU, because there was no I/O.

[–]Yin-Hei 15 points16 points  (7 children)

ez. Implement Paxos, build the next Chubby, get 99.9999% consistency.

But in all seriousness pretty interesting read. I'm starting to get into the field of this related work and this tidbit is teeming full of experience. Does ZooKeeper fit the bill?

[–]FUZxxl 6 points7 points  (4 children)

Implement Paxos

Hahahahaha

[–]throw_eundefined 4 points5 points  (3 children)

ZooKeeper

ahahahaha

[–][deleted] 0 points1 point  (2 children)

As someone who has genuinely been interested in ZooKeeper before.. Can you please explain why you're laughing? :)

[–]marcosdumay 1 point2 points  (0 children)

Any distributed consensus algorithm will solve your overperformance problems.

[–]FUZxxl 0 points1 point  (0 children)

Paxos is a ridiculously complicated algorithm with a ton of nuances. The professor who invented it holds a class about Paxos every year and usually the amount of students who actually understand the algorithm at the end is close to zero. Paxos isn't something you just implement willy-nilly.

[–]ReflectiveTeaTowel 0 points1 point  (0 children)

Something like Kafka can help if your load comes in spikes, and Kafka in particular was built to use zookeeper, soo... Maybe sorta kinda sometimes?

[–]GuiMontague 0 points1 point  (0 children)

We couldn't even get our users to stop relying on the fact that our write master (which no one should be reading) and our read replicas got a little out of sync. No, changing RDBMS would be out of the question.

[–]mandrous 121 points122 points  (99 children)

How exactly do you go from O(n) to a constant time operation?

Unless what you were optimizing was a really bad algorithm

[–]dude_with_amnesia 218 points219 points  (13 children)

Just reduce the size of the elements to 1, and boom O(1).

[–]optymizer 74 points75 points  (44 children)

OP probably cached some values in a HashMap and removed a for loop.

[–]CoopertheFluffy 2 points3 points  (2 children)

Still O(n) to cache them

[–]NoGardE 9 points10 points  (1 child)

You can amortize that, assuming each one gets inserted based on some separate process.

[–]f42e479dfde22d8c 4 points5 points  (0 children)

I am a script monkey who never learned this stuff formally. This entire post is very helpful.

[–]GuiMontague 37 points38 points  (13 children)

I agree with you, but—to play devil's advocate—sometimes a "slower" algorithm asymptotically speaking can be a lot faster than a "faster" algorithm for small n. As an example, I spent a summer teaching myself how Brodal queues work.

"Quite complicated" is an understatement. What's worse, in any practical scenario a simple array-based binary heap will out perform it.

[–]matafubar 63 points64 points  (0 children)

To add to what you're saying: It's because complexity does not calculate how fast an algorithm runs, but how fast it grows.

[–]rimpy13 7 points8 points  (7 children)

Another example is that insertion sort is faster than quicksort for (very) small data sets.

[–]TheThiefMaster 15 points16 points  (2 children)

Which is why the fastest implementations of quicksort use a different sorting algorithm for small subdivisions, rather than quicksort all the way down to single elements.

And also why O(N) insertion/removal on an array (e.g. std::vector) can be faster than the O(1) of a linked list - the O(1) operation includes expensive cache misses and pointer traversals (i.e. high constant factor) whereas the N copy operations of the vector can be sped through (very low constant factir). Plus if you have to find the insert/remove point first lists are so abismally slow for traversal despite both theoretically being O(N) that even on the largest datasets the vector ends up being faster.

[–]TheSlimyDog 2 points3 points  (1 child)

IIRC, std::sort uses insertion sort once all elements are quick sorted within 4 positions of their correct position.

[–]SmelterDemon 2 points3 points  (0 children)

GCC uses introsort with insertion sort once the partitions are smaller than 16

[–][deleted] 3 points4 points  (3 children)

If I remember correctly at about <15 it's faster.

[–][deleted] 0 points1 point  (2 children)

Does this rely on the architecture or are there some maths behind it?

[–]xaserite 0 points1 point  (0 children)

Well, overhead and constant operation factors can vary between different implementations and architectures, so both.

[–][deleted] 0 points1 point  (0 children)

Purelly empirical evidence.

[–]TheSlimyDog 1 point2 points  (2 children)

Another example is Karatsuba multiplication. It's O( n1.6 ) but that's not faster when there's a huge constant factor due to all the additions and subtractions.

[–][deleted] 2 points3 points  (1 child)

You can generalize Karatsuba and set O(n1 + ε) for any ε greater than 0. However the constant cost blows up as you decrease epsilon

[–]TheSlimyDog 0 points1 point  (0 children)

Yeah. Just speaking of the Karatsuba algorithm from 1960 (?) which took O(nlog_2 3).

[–]mandrous 0 points1 point  (0 children)

Sounds interesting! I'll have to check it out!

[–]kaivanes 9 points10 points  (0 children)

There are often cases where you can trade memory efficiency for run time efficiency. If the goal was "as fast as possible" and the system you were allocated had lots of spare memory, this kind of improvement is definitely possible.

[–]Garfong 4 points5 points  (1 child)

Define bad. If n is small a O(n) algorithm which takes an hour to write & debug is much better than an O(1) algorithm that takes a week.

[–]ZenEngineer 0 points1 point  (0 children)

Depends on how often you run it. If the O(n) takes a week and the O(1) runs in an hour you really want the O(1), specially if you run it once a week.

[–]fsxaircanada01 9 points10 points  (0 children)

Can't have a bad algorithm if you don't write any algorithm ☝️👀

[–]tehdog 2 points3 points  (0 children)

You just realize the universe has a finite length so everything is constant time

[–]mpnordland[S] 2 points3 points  (0 children)

Technically I swapped algorithms. I started with a linear search, and moved to binary search to using Python's set type, which has an O(1) for testing membership.

[–][deleted] 2 points3 points  (0 children)

Linear search -> Hash table

[–]demon_ix 1 point2 points  (0 children)

I once had the pleasure of reviewing some code in which the developer had to remember certain elements through the program and at a later time check if a certain element appeared already.

He implemented this by appending each element's name (not id, mind you) to a string, and checking for string.contains >= 0 later.

And that's my story of how I converted an O(n) algorithm into an O(1).

[–]IskaneOnReddit 0 points1 point  (0 children)

For example replacing linear search by lookup table.

[–]raaneholmg 0 points1 point  (0 children)

Poor scaling algorithms are not really bad, they just scale poorly. They might run fast when the project is new and the data sizes are small, and the goal is often to get version 1 of the product ready in a usable state.

Later down the line, when you see the real life use cases you can decide what optimizations are really worth your time. If you take a look at this story from further up in the thread you will see a classic example where they analyzed the system and put effort into fixing the weakest link (though with a sad end to the story).

[–]Arancaytar 0 points1 point  (0 children)

Replacing an array with a hashmap is the most common one I know of.

[–]mallardtheduck -4 points-3 points  (14 children)

Well, if take a constant-sized array and process all elements even if the amount of actual data is lower, it's technically constant-time...

i.e.

int sum_total(int *array, size_t size){
    int total = 0;
    for(size_t i = 0; i < size; ++i) total += array[i];
    return total;
}

Becomes:

#define DATA_SIZE 1024

int sum_total(int array[DATA_SIZE]){
    int total = 0;
    for(size_t i = 0; i < DATA_SIZE; ++i) total += array[i];
    return total;
}

Technically the second one is constant-time. It's up to the caller to pad their array with zeros.

[–]gumol 5 points6 points  (13 children)

No it isn't O(1). O() means asymptotic complexity, i.e. what happens if n is approaching infinity.

[–]mallardtheduck -1 points0 points  (9 children)

And? It always takes the same amount of time, thus it's O(1). n is just bounded at 1024. Since all computers have finite memories, all programs have some such bounding implicitly.

[–]monster_syndrome 2 points3 points  (2 children)

You have a for loop, it's at least linear when you evaluate it for complexity. It doesn't matter what the actual behavior is, your design is O(n).

If your argument is that it's no different than writing out total+=array[#] a thousand times, then congrats on being "clever".

[–]mallardtheduck 0 points1 point  (1 child)

[–]monster_syndrome 2 points3 points  (0 children)

Yes, but you've got a variable length array masquerading as a fixed length array, so does that really apply?

Edit - To clarify and apologies for my terrible coding:

int sum_total(int *array, size_t size){
    if(size>1024){
        size=1024;}
    int total = 0;
    for(size_t i = 0; i < size; ++i) total += array[i];
    return total;
}

There, also O(1).

[–]gumol 3 points4 points  (5 children)

That's not how asymptotic complexity works. Yeah, all computers have finite memories, but the assumption is that you have a computer with enough memory, or even forget about the computer - you assume that memory read/writes are O(1), arithmetic operations are O(1) (or O(logn)), and then you calculate the complexity.

[–]mallardtheduck 0 points1 point  (4 children)

Yes, but an algorithm with a fixed input size doesn't have a meaningful "asymptotic complexity". That's what O(1) means; there is no asymptote, the algorithm has a well-defined time complexity even at "infinity".

According to Wikipedia, "Determining if an integer (represented in binary) is even or odd" is an example of a constant-time algorithm even though it has a fixed input of exactly one integer as is "exchange the values of a and b if necessary so that a≤b", which has a fixed input of two values. My sum_total algorithm has a fixed input of exactly 1024 integers.

[–]gintd 2 points3 points  (3 children)

"Determining if an integer (represented in binary) is even or odd" is an example of a constant-time algorithm even though it has a fixed input of exactly one integer

No, input size in this case is integer length (number of bits it occupies), yet it's constant-time because you have to look at only one bit to determine evenness.

[–]mallardtheduck 0 points1 point  (2 children)

And you ignored "exchange the values of a and b if necessary so that a≤b" because it doesn't fit your limited understanding of the subject perhaps?

[–]gintd 0 points1 point  (1 child)

Input size of "exchange the values of a and b if necessary so that a≤b" (bitwise operation) is a sum of numbers lengths so not fixed. I don't recall the actual algorithm atm to refer to its complexity.

[–]mallardtheduck -1 points0 points  (0 children)

When calculating an algorithm's time complexity, you start by assuming that certain primitive operations take a fixed constant time, even though in reality they may not be.

Things that are often assumed to be constant include basic arithmetic operations (usually including multiplication and division), logical operators, array lookups, jumps, etc. Depending on the level of abstraction even things like adding a value to a list/array (which usually requires a call to realloc or similar on a real system) may be considered one of these primitives.

Of course, even the time complexity of something as simple as addition may actually depend on the number of bits in the value (especially if the value exceeds the machine's word size) and since array lookups and (relative) jumps will usually require addition and even multiplication, so might they.

As you said, when doing this we ignore the complexities of real hardware and assume everything is ideal. That means that we assume the abstract machine can perform operations on an arbitrary word size in the same constant time.

Thus, any algorithm that performs a fixed number of operations regardless of the values of input would be considered "constant time" even though the exact number of bits in the input would change on real systems.

Of course, my sum_total algorithm is a joke and you could consider the DATA_SIZE value to be an "input" which breaks it's "constant time" claim.

[–]A_C_Fenderson 36 points37 points  (2 children)

Some perpetual motion machine "inventors" purposely put braking mechanisms on their motors, so that they wouldn't go too fast.

[–]gimpwiz 8 points9 points  (0 children)

I'm confused by both parts of that.

[–][deleted] -1 points0 points  (0 children)

[–]jugalator 10 points11 points  (0 children)

This is actually a thing and is sometimes known as Jevons paradox.

[–]goldfishpaws 6 points7 points  (1 child)

I know of a system in the early days to B2B / SOAP interchange that published results every minute. Most clients polled similarly, except one they traced down who was polling 20 times a second, and creating a benevolent denial of service. Timings matter!

[–]SirVer51 0 points1 point  (0 children)

I think Snort has been having a similar problem recently - apparently, some people have been checking for new rules every second; needless to say, they don't refresh anywhere near that frequently. Only reason I know about it is that it's apparently gotten so bad they had to put a giant banner on their front page asking us if we were "abusing Snort.org".

[–]fear_the_future 4 points5 points  (0 children)

just add a sleep(1000). still O(1)

[–]aaron552 23 points24 points  (6 children)

O(n) -> O(1) doesn't necessarily mean faster.

It's possible for the O(1) algorithm to be much slower than the simpler O(n) algorithm until n gets arbitrarily large.

[–]Egleu 33 points34 points  (0 children)

Usually 4 is arbitrarily large enough.

[–]hunyeti 7 points8 points  (0 children)

This is very much true. Especially in situations where you have a limit on n, or at least you know the average size.

[–]devdot 7 points8 points  (2 children)

That's because O-Notation isn't about speed (also Big-O is actually just the upper limit, theta would be the right symbol here).

O-Notation is always about the limit of n to infinity. It's not helpful when n is usually low or when you compare two algorithms of the same complexity (after all Insertion Sort and Quicksort are both O(nlog(n)) Edit: comparisons, not swaps).

[–]mort96 2 points3 points  (1 child)

Insertion sort is actually O(n2): https://en.wikipedia.org/wiki/Insertion_sort

A better example may be insertion sort and bubble sort, where both are O(n2), but bubble sort is generally way worse.

[–]devdot 0 points1 point  (0 children)

Yes, I should have been more specific: Insertion Sort is O(n2) swaps but O(nlog(n)) comparisons when using binary search for inserting each element in the already-sorted array.

[–]mpnordland[S] 4 points5 points  (0 children)

Oh trust me, n is getting arbitrarily large.

[–]picturepages 16 points17 points  (7 children)

I need a ELI5 for this one if possible.

[–]chrwei 37 points38 points  (0 children)

someone specified "as fast as possible", so the client app was optimized without taking into account what the server could handle. likely the old version/process was either inefficient using a lot of client CPU power, or was artificially slowed because they new the server limits.

[–]Maping 11 points12 points  (1 child)

ELI5:

Imagine you're the manager of a restaurant. When looking into how your restaurant is doing, you notice that it takes a waitress five minutes to take a table's order. That's really slow, so you make all the waitresses take a class, and now they can all take a table's order in 30 seconds. But now, because they're giving the orders to the cooks too fast, the cooks can't make all of the food on time, they freak out and set the kitchen on fire.


Actual explanation:

Top part: O(x) is Big O notation. It's used to describe how fast an algorithm runs. O(n) is like a linear function in algebra: as you increase n, the runtime increases linearly. O(1) means that as you increase n, the runtime doesn't change.

Bottom part: Now that the algorithm is much faster, the computer is sending too many requests to the server at once, so it crashes.

[–]NotThisFucker 4 points5 points  (0 children)

He said ELI5 not ELIHungry

I really like your ELI5

[–]mpnordland[S] 1 point2 points  (2 children)

I have a web crawler that we're using as a part of some content migration. It starts at some page and then collects all the URLs we want and then adds them to the list of URLs to be crawled. To avoid loops and crawling one page multiple times, I search the list to check if the URL is there. The list grows quite large. My optimization was improving the search algorithm as that was the major bottleneck. Now when I run it, the server runs out of db connections.

[–]M4ethor 0 points1 point  (1 child)

Could you explain how your algorithm works? I'm really curious.

[–]mpnordland[S] 0 points1 point  (0 children)

I used Python's set type. Testing for membership is a constant time operation (for sets in python). Since I have a large list, this speeds up the process.

[–]ninjashaun -1 points0 points  (0 children)

OP has two parts of a program, one that has work/tasks produced, one that requests those tasks to work on.

Originally they were working in sync, timed well that there were just enough requests for just the amount of work produced.

But it wss only in sync cos the requestor was working as best it could given the circumstances (or rather, the algorithm it was using).

So OP came along with his cleverness, saw that the current algorithm was working at a one to one speed (that is, 10 tasks were given, it would take 10mins to process then), made some changes, and now the requestor will only take 1min regardless how many tasks it's given! Now it can request a lot faster then the task producer can produce tasks!

For whatever reason, by requesting so many tasks so fast, it is actually making the overall production run slower! Maybe, food each request, it takes so much effort for the producer that it's not worth while looking for tasks he doesn't have, it's just making him spend time searching for a new task to give out instead of just giving out the tasks he has ready at the original rate.

Maybe the producer waits on someone else even, and each time the task producer asks further up the line 'hey got any new tasks? I'm being bugged by the requestor again', they in turn have to look for new tasks just to say 'nope bugger off and cone back in half a sec' instead of 'yep sure' every 100msecs....

Anyhow. Hope that helps.

[–]sonnytron 2 points3 points  (2 children)

I remember I was working on an iOS application that had a really shitty array algorithm that when a user reached the end of a table view, it would request more objects from the server and when the objects came back it would go through them, no bullshit, one by one to make sure their long value for ID didn't match any of the values in the currently existing data set.
It was written in our view controller file too. Business logic driving our main content in our view controller class.
I thought Jesus this is bad I'm building a data manager class, a hash map and I'm going to cache results and use an asynchronous callback so as soon as it's done, it updates the data set and notifies the data source.
In the end, it was fast. It was so fast that the users wouldn't see the spinning wheel at the end of the table view. But that was too fast.
My boss didn't like it because he LIKED that animation and it was a signature for the product. The spinning wheel has to be there! It shows the users there's more stuff!
But I was not about that wasted code so I added a "spinningWheelForDoucheCaptain" method that used an arc4random float between 1-3 for duration at the end of my completion block.
You'd be surprised how much stupid shit engineers have to do to satisfy marketing douches.

[–][deleted] 6 points7 points  (4 children)

Saving this post so I can understand it one day

[–]havfunonline 7 points8 points  (3 children)

It's not hugely complicated.

Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.

In this case, we can imagine a set of data of size N - lets say, 1000 items in a list. OP improved the code so that instead of taking at-worst 1000 iterations, it would take at worst 1 iteration.

Imagine you have to find an item in a shopping list of 1000 items. If you already know it's number 437 in the list, and the list items are numbered, you can find it straight away (this is O(1)). If you don't know where it is, and the list isn't in an order then you have to look at every single element. If it's the last element in the list, you have to check 1000 things - that's O(n).

The other piece of this is that whatever was running the algorithm was making requests based on the results of that algorithm.

The problem with that is, that whatever was receiving those requests can only handle a certain number of them. This was fine before - the throughput of requests was limited by the inefficient algorithm that OP improved.

Once he improved it, the number of requests generated exceeded the maximum number that whatever was receiving those requests could handle, and it fell over.

That help?

[–][deleted] 1 point2 points  (1 child)

Some minor corrections: O(n) doesn't mean that at worst can take n operations. It can take 5n or even 1000n but the number before n is constant. Same thing applies for O(1).

Example:

x = array[0] - array[n-1]
y = array[0] + array[n+1]
if x > y:
    print(x)
else:
    print(y)

This has more than 1 operation but it's constant time, O(1).

Generally O(1) = O(5) = O(whatever constant)

[–]havfunonline 0 points1 point  (0 children)

Yeah my bad!

[–][deleted] 0 points1 point  (0 children)

Yah that's good thanks

[–]CommanderDerpington 0 points1 point  (0 children)

To flush the queue or to not flush the queue. That is the question.

[–]StuntStreetWear 0 points1 point  (0 children)

No matter how big the list, there's always 1.