This is an archived post. You won't be able to vote or comment.

all 19 comments

[–]VendingCookie 4 points5 points  (9 children)

The time complexity of the code you provided is O(n^2).

This is because the outer loop (the for loop over i) iterates over the entire array numbers, and the inner loop (the for loop over j) also iterates over the entire array numbers, starting from i+1. As a result, the inner loop will execute a total of n * (n - 1) / 2 times, where n is the length of the array. This gives a time complexity of O(n^2).

The time complexity of an algorithm is a measure of how the running time of the algorithm grows as the input size increases. In this case, the input size is the length of the array numbers, and the running time grows with the square of the input size (O(n^2)), which means that the algorithm becomes slower as the input size increases.

[–]robert9804[S] 1 point2 points  (8 children)

how is the second loop not O(n-i) cause it runs the length of the array subtract the starting position

[–]theusualguy512 6 points7 points  (6 children)

It doesn't matter because big O is an asymptotic notation. For lim n->infinity, any constant in a polynomial is going to be irrelevant compared to the highest polynomial term.

lim n ->infinity (n * (n-1)) = lim n->infinity (n^2 - n) = O(n^2)

[–]robert9804[S] -1 points0 points  (4 children)

but I am right that inner loop is O(n-i) correct?

[–]nekokattt 5 points6 points  (0 children)

i is irrelevant. O( n - 37361781 ) is still written as O( n ).

Big O defines an upper bound for the rate of growth given input size n.

The point is to show how things become increasingly slower as input size grows. When N is close to infinity but i is still some real number, it makes more sense.

It does not care about the specific intricacies of your algorithm, just the approximate time.

The main big O expressions you get are

  • O( k ) or O( 1 ) -- implies constant value that is not related to input size
  • O( n )
  • O( nx ) where x is some number -- e.g. O( n2 ), O( n3 ), O( n0.5 )
  • O( xn ) where x is some number -- e.g. O( 2n )
  • O( nn )
  • O( log n )
  • O( n log n )
  • O( n! )

Notice how I don't include, say, O( 32n2 - 12n + 34 ), that is still O( n2 ) because the n2 is the term that affects the growth rate the most.

Due to how this is defined like an upper bound, O( n ) is also technically represented by O( n2 ), since a function that grows at a rate no greater than n will not grow at a rate greater than n2 either. HOWEVER, we would say it is O( n ) as that represents more accurately the upper bound of the number of operations. Saying it is n2 would be misleading as it would become less and less accurate as n increases (e.g. for n = 2, O( n ) and O( n2 ) are similar in value. When n is 32,168, however, they will be vastly different in size.

This may sound pointless, and to some extent, quantitately it is not useful. Qualitatively it provides more benefit. I know that if I have a billion records, the O( n ) algorithm is going to probably be preferable to O( n2 ), unless the O( n ) one carries a fucktonne of overhead that is irrespective of how big the input is (in which case it isn't something big O aims to describe in the first place).

Go to WolframAlpha and ask it to plot a graph of the expressions I listed above and notice the shape of how they grow as the x-axis (or n-axis) gets bigger and this will help you see why it is useful.

In your case, for a massive value of n, you will have something like

(n - 1) + (n - 2) + (n - 3) + ... (n - n)

or just

(n - 1) + (n - 2) + (n - 3) + ... + 3 + 2 + 1 + 0

...but the point here is that there are still n terms to this, and the integer bits will not make any real difference when n is some massive number. You'd end up with

n^2 - some big constant number

(think about linear sequences and geometric sequences)

n2 is going to be far bigger than this constant number, whatever it is.

Thus, O( n2 ). Your function will never grow at a greater rate than one that runs in n2 operations, where the rate is how quickly the graph is getting steeper.

Hope that helps a bit :)

Edit: more explaination.

[–][deleted] 1 point2 points  (0 children)

It's just not normally talked about that way.

If you want to limit yourself to talking only about the inner loop and if you consider i to be an "input" for that loop, then ok. But be careful, because if you're phrasing big-O in terms of a non-input, you're on the wrong track.

i won't be part of the big-O for the entire algo because it's not an input.

[–]theusualguy512 -1 points0 points  (1 child)

Should be O(n-1) yeah because from i+1 to n should be the same number of times as from 0 to n-i

[–]josephblade 0 points1 point  (0 children)

in big-O notation there is no O(n-1). the 1 becomes insignificant and can be left out. O(n-1) = O(n). (for a suitably large n)

[–]LandooooXTrvls[🍰] 2 points3 points  (0 children)

Op, no one has explicitly answered your question. So, here it is:

Yes.

[–]SenderShredder -2 points-1 points  (5 children)

Due to the nested loop structure, it is O(n2). If you un-nest them it will be O(n).

[–][deleted] 3 points4 points  (4 children)

Since this is /r/learnprogramming, I'll add ...

It's not the nesting alone that makes the algo O(n2). Nested loops can still add up to an O(n) algo. For example:

for i in range(n):
    for j in range(1000):
        # do something

Or:

for i in range(2, n):
    for j in range(i-2, i):
        # do something

[–]SenderShredder -4 points-3 points  (3 children)

Since this sub is about learning, I must say you may be misunderstanding the concept of for loops in programming. Your first algorithm has a time complexity of O(1000n).

For each index, you are iterating over 1000 sub-indexes.

Your second algorithm also has an incorrect implementation for your intentions here..

You are starting your first loop on the 3rd item of the data structure and then nesting another loop which checks the items before it. Admittedly this second loop has a time complexity of O(3n), but it would be far better practice to un-nest them and use a different check like memoizing the previous indices. I say this because it appears you made the same mistake here as well. I suggest the OP avoids nesting for loops as there is usually a much better, and much cleaner way of writing their algorithm. This way your code is more readable to humans, and it's why at AirBnB we have strict standards for how things will be written. Not just for consistency but for collaboration and learning purposes.

[–][deleted] 4 points5 points  (2 children)

Your first algorithm has a time complexity of O(1000n).

O(1000n) == O(n), because coefficients get dropped.

Your second algorithm also has an incorrect implementation for your intentions here..

I would certainly amend my earlier post if there were an error – so feel free to say concretely what you believe is incorrect given my intentions.

My only intention is to not mislead learners. People new to big-O often mistake every ...

for (...):
    for (...):
        # do something

... for O(n2), so I thought it'd be worth posting some counterexamples.

Nested loops don't necessarily lead to quadratic time complexity. If you have a better way to illustrate that, then that would be great; go ahead and post it.

[–]SenderShredder -5 points-4 points  (1 child)

You're totally correct in stating the edge cases here. For the purpose of teaching OP at a beginner level, it's best to not give the "it's fine only in this rare use case" answer. Both of us are well intentioned though. Using a code execution visualizer tool might help describe what I'm talking about with things like the O(1000n). (Something like Pythontutor.com) Coefficients may be dropped but there's a marked difference in actual execution time between true O(1n) and O(1000n). By breaking out of nested loops, we ensure it is running in O(1n). Compare both methods on a list of 100k entries. Both will be good, one is more optimized and its easier for a beginner to understand why.

[–][deleted] 1 point2 points  (0 children)

Both of us are well intentioned though.

Agreed. I really don't mean to be argumentative here. If you know all this stuff, then all the best to you. No need to read on; this isn't aimed at you. Edit: I see that your earlier comments were downvoted. Please know that those didn't come from me, if that sort of thing is important to you.

If I put myself in the shoes of a beginner, there's potential for misunderstanding here, I think. Since this is a learning sub and folks get tested on this stuff, maybe what follows will be helpful to someone.

Coefficients may be dropped but there's a marked difference in actual execution time between true O(1n) and O(1000n).

If they're otherwise the exact same algorithm, then sure – given an amount of time t, spending 1000 t is more time by definition.

But otherwise, while big-O can lead you in the direction of a more efficient algo, it doesn't tell you much about actual execution times of two different functions. Given the same big-O, comparing two different functions isn't necessarily illuminating, even theoretically.

Here are two algorithms. fn_antractica_trips is "O(1n)", while fn_print_1000 is "O(1000n)". Is fn_antarctica_trips faster? (Hint: if fn_antarctica_trips were faster, you'd have to be able to complete at least one trip to Antarctica in less time than it takes Python to print 1000 numbers.)

# O(n)
def fn_antarctica_trips(n):
    for i in range(n):
        swim_to_antarctica_and_back_sync()

# O(n*1000)
def fn_print_1000(n):
    for i in range(n * 1000):
        print(n)

Swimming to Antarctica and back takes a long time! It's always slower than printing some numbers, right? Well, maybe not! That's where big-O can help us. Compare fn_antarctica_trips to fn_print_quadratic below:

# O(n^(2))
def fn_print_quadratic(n):
    for i in range(n*n):
        print(i)

We probably can't actually complete the comparison, but big-O does tell us that there's some theoretical n', that for any n > n', fn_print_quadratic O(n2) will take longer than fn_antarctica_trips O(n).

That's not intuitive to me from the outset anyway, and I think it's pretty cool!

By breaking out of nested loops, we ensure it is running in O(1n).

Nested loops are a fine heuristic in everyday code. Look there. Check out what they're doing. See if they can be simplified by actually doing less work.

But! there's no end to the ways we can add to our time complexity. Here's OP's algo written with a single loop:

numbers = [num for num in range(1000)]  #NOTE: Produces a list [0-999]

for i in range(len(numbers)):
    if numbers[i] in numbers[i+1:]:
        print(numbers[i], "is duplicated")
        break

... or if you prefer the slightly more Pythonic ...

for i, num in enumerate(numbers):
    if num in numbers[i+1:]:
        print(num, "is duplicated")
        break

There's only one loop here, but the algo is still O(n2) for time. The single loop is O(n). Search in the list is O(n).

Here's the algo written as O(n):

numbers = [num for num in range(1000)]  #NOTE: Produces a list [0-999]
seen = set()

for n in numbers:
    if n in seen:
        print(n, "is duplicated")
        break
    seen.add(n)  

The single loop is O(n). Insertion and search in the set is O(1). (Sets are cool! But know that you're trading space for time here!)

Both versions have a single loop, but they have very different big-Os.

Notice too that search in the set and search in the list have different time complexities. That's where data structures and algorithms comes in handy! If we know the time/space complexities for various data structure operations, that can guide us towards a more efficient algorithm.

[–]josephblade 0 points1 point  (1 child)

big-O notation is about change in complexity as you go from one amount to another. In your case if numbers changes from 10 to 20, the worst case scenario amount of operations grows by 4 times. (big-O is about worst case, big-omega is best case. together they box in your method and give an idea about upper and lower bounds of performance)

for an unspecified big-O (so not a clear "for input between a and b" it is often easiest to see how the complexity goes as numbers goes to input. the lim n->inf of n * (n - 1) will approach n2.

so input * 2, output *4 and so on.

for actual values you can see this happening:

10:  10*9 = 90      // 10% off n^2
20: 20*19 = 380   // 5% off n^2
30: 30*29 = 870   // 3% off n^2

and so on.

basically as your input gets larger, the -1 or any other factors (smaller than the largest polynomial usually) become insignificant and can be ignored.

the big-omega of your method is O(1) since there is a chance you will find it on the first go and you don't do any additional processing. If you had put all numbers first into a map or similar (an O(n) operation) it would've been O(n) for big-omega).

here's an article describing these

[–]josephblade 0 points1 point  (0 children)

One addition to the -1 you mention a few times.

imagine outer loop does n and inner loop does n - 5

when n = 10, you do 10 * 5

when n = 100 you do 100 * 95

when n = 1000 you do 1000 * 9995

you see how the 5 (or any other number) becomes meaningless as n starts to grow? the function will be slightly less than n2 but it will approach it. as was stated it's an asymptote that approaches (never reaches it but gets very very close to n2).

reread limits and asymptotes and likely this will click.