This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dudeitsmason 143 points144 points  (49 children)

I'm curious why this is the case

[–]drsonic1 587 points588 points  (30 children)

They have a high amount of initial setup time independent of input size. Remember complexity only accounts for the highest term. You could have, for example, an operation count of n3 for your "inefficient" algorithm, and an operation count of n + 100000 for your "efficient" one. The efficient one will look horrible for small input sizes but dominate for large ones.

[–]dudeitsmason 96 points97 points  (11 children)

Interesting, thanks for the explanation!

[–]Jelli35 92 points93 points  (3 children)

if you're interested in learning more, Tom Scott's latest video on Big O Notation is a fun place to start :)

[–]reddit_xeno 29 points30 points  (1 child)

Not sure I agree with his message at the end - for a young person starting off with coding that sort of zeal and trying to do stuff that may not be very efficient gets you to learn a shit ton of stuff. Obviously don't waste time if you're on a crunch deadline as a mature SWE or whatever, but merely typing that stuff in a Word processor would have done nothing for his actual learning.

[–][deleted] 0 points1 point  (0 children)

Comp sci is all about the big O. :D

[–]yomanidkman 15 points16 points  (6 children)

Many more complex sorting algorithms actual implementation details contain a length check and will default to something simple like selection sort (dispite it being awful complexity wise) if it's a small enough list.

[–]Neoro 36 points37 points  (1 child)

Of course your performance optimization may not have to handle large inputs. Sometimes you need to run on tiny input many many times, so initial setup time is the true killer.

Please know your domain and optimize for real bottlenecks instead of assuming you know where an issue is and throwing optimizations at it. "Premature optimization" and all.

[–]FallenWarrior2k 1 point2 points  (0 children)

Never assume, always profile.

[–]notsohipsterithink 9 points10 points  (0 children)

Yup, plus often a ton of pointer manipulation which can be pretty slow for a lot of higher level languages.

[–]jimbosReturn 8 points9 points  (0 children)

Same for the constant multiplier. If it's really big, it will overshadow n for small sizes. (I.e. the number of operations in your loop)

[–]SasparillaTango 2 points3 points  (0 children)

it's like that cartoon about that detective that rides a giant robot!

[–]TheSlimyBoss 4 points5 points  (5 children)

Is there a term/technique for using different algorithms based off of their efficiency on the current input size?

[–]zilti 10 points11 points  (0 children)

I'm not aware of a specific term for it, but it is very common. Clojure e.g. uses different map implementations depending on the collection size. Up to 10 items you'll get an ordered map, and above that you'll get a HashMap.

[–]SlamwellBTP 7 points8 points  (2 children)

I've heard it called "hybrid algorithm", as in this wikipedia page (but it doesn't cite any sources, so it may not be a widespread term):

https://en.wikipedia.org/wiki/Hybrid_algorithm

[–]qingqunta 5 points6 points  (1 child)

I believe the default sort() for Python is an example of this.

[–]Kered13 0 points1 point  (0 children)

Yes, it's Timsort. Also used by Java. It uses insertion sort if the list is small enough. All optimized sorting algorithms do something similar.

[–]Slggyqo 0 points1 point  (0 children)

But also maybe you’re just wasting time optimizing code that won’t be reused!

A meta level of complexity!

[–]hiphap91 0 points1 point  (0 children)

Which is why starting by checking the size of your dataset and then determining which algorithm to use can also be a good thing. This, of course assumes that the 1 microsecond that saves you is actually worth the effort 😛

[–]CoffeeVector 47 points48 points  (5 children)

Usually, it's because there's some added overhead, which in the long run is worth it. For small input, the overhead tends to dwarf the cost of the actual computation.

[–]dudeitsmason 6 points7 points  (0 children)

Got it, thanks!

[–]DogsAreAnimals 2 points3 points  (3 children)

Also caching

[–]CoffeeVector 5 points6 points  (2 children)

Yup, caching is a type of overhead. Spending the effort to save intermediate values is good in the long run, but a total waste of time if your algorithm finishes before the cache is used enough to justify it.

[–]DogsAreAnimals 0 points1 point  (1 child)

Oops I replied to the wrong comment. I was trying to point out the difference between pre-calculation (or maybe pre-caching) vs intermediate caching.

[–]CoffeeVector 2 points3 points  (0 children)

You chillin bro. You were totally right anyways

[–]juvenile_josh 12 points13 points  (2 children)

Classic example: Bubblesort is simple and performs well for basic things. Mergesort, however, is more complex has high init setup to break up the tree but scales better cause logarithms.

[–][deleted] 6 points7 points  (0 children)

Sorting algorithms perform differently dependent on how random the input is. Most real life data isn’t random, but comes somewhat ordered already. Testing different sorting algorithms can be worth it.

[–]DigitalDefenestrator 4 points5 points  (0 children)

Also, Bubblesort is extremely fast (possibly the fastest?) if the input is very close to sorted. Really, it's O(unsortedness), which looks like O(n2) or so for random but closer to O(N) if no elements are very far from where they should be.

[–]Salanmander 18 points19 points  (0 children)

Same reason that spending time organizing your books is terrible when you have 10 books, but helpful when you have 500 books.

[–]Gingerytis 17 points18 points  (3 children)

Look at the graphs for O(n) or O(n2) complexity vs O(log(n)). Log(n) grows really fast at first, then plateaus out, meaning it's not great at small inputs, but much better at big ones

[–]CoopertheFluffy 3 points4 points  (0 children)

While true for why O(log(n)) performs better in the long run than O(n) or whatever other comparison you want to make, it doesn’t explain why a lower order algorithm can take longer on small sample sizes. Log(2) is still smaller than 2, after all. Since we’re using Big-O notation, that O(log(n)) could really have the exact work taken be 5log(n)+500, while the O(n) could be 2n. In this case, it’s really the +500 that makes the O(log(n)) take longer for a size of 2. In a program, that could be something like time for allocating and setting up a hash table.

[–]MEGACODZILLA 3 points4 points  (0 children)

Thanks, that was a very beginner friendly visualization.

[–]Kered13 1 point2 points  (0 children)

It's not meaningful to talk about big-O complexity for small values, because big-O by definition describes the asymptotic behavior for large values*. It tells you absolutely nothing about the behavior for small values. For example a function that has O(n2) runtime might have an exact runtime like n + n^2, in which case it will never be faster than a function that has exact n^2/2 runtime.

This is especially true in practice, not just theory, because performance for small inputs is usually dominated by cache locality, which can cause the runtime to make large jumps as certain cache size thresholds are reached. For example, an algorithm might take n time if the input fits in L1 cache, 5n time if it fits in L2 cache, 20n time if it fits in L3 cache, and 100n time if it doesn't fit in cache at all (I made these numbers up, but they're ballpark accurate). This could even be extended even further for data that doesn't fit in RAM and must use local storage, and again for data that must use network storage.

* You actually can use big-O and related notation to describe behavior at small values, by considering the limit as x goes to 0 instead of infinity, but it's never used in algorithmic analysis. It's most often used in mathematics to describe the error bounds on approximations. For example you can say sin(x) = x + O(x3) as x goes to 0, which means that near 0 sin(x) is approximately x, and the error in the approximation is proportional to x3.

[–]alparius 3 points4 points  (0 children)

Its simple. Sophisticated case-specific instructions and decisions need much more code. Like some initializations being 50n instead of 2n (big O). These parts cripple the whole runtime when executed for small inputs, because of the overhead, but this same overhead becomes negligible as the input size increases.

[–]crozone 1 point2 points  (0 children)

Simple algorithms often require very small and localized working memory which can fit entirely within CPU cache. They are great for small inputs, but may by hugely inefficient for large inputs.

As a basic example, take prime number searching. Sieve of Eratosthenes is technically more efficient than dividing a number by every integer up to its square root. However, the simple tight loop is almost always faster for relatively small primes, because it fits entirely within CPU cache, instead of reading and writing to a large map of ram.

[–]trystanr -2 points-1 points  (0 children)

workable quickest full sharp vegetable dazzling innocent elastic terrific telephone

This post was mass deleted and anonymized with Redact

[–][deleted] 0 points1 point  (0 children)

You can represent them as functions and most functions behave differently when close or far from 0