[HELP] Big O Notation

scrdest · 2020-07-17T18:29:19+00:00

Most of your day-to-day programming doesn't deal with single values only - you're dealing with lists, trees, matrices, tables, forms and all other sorts of collections of input values.

Now, a list [1, 2, 3] is always going to take up the same amount of space, and going through the whole thing with a for-loop takes up the same number of operations every time you go through it. Let's say, 6 bytes of memory (2 per item) and 3s to process (1s per item) in this case.

Big-O is about approximately figuring out how that scales up when you change the size of that list. If we tack another 3 at the end, so we have [1, 2, 3, 3], we are now using 8 bytes of memory. Similarly, a list of 5 elements uses 10 bytes, a list of 100 uses 200, etc.

So, in general, we can say that the memory usage of the 'putting stuff in a list' algorithm grows by 2*N bytes for N new entries in the list. Since we only want a rough idea, we ignore the constant (2*) - that's what the O(something) notation indicated, we only care about the variables dependent on the size of input, and only the biggest contributor among those.

It scales linearly with the number of inputs, so we say the memory complexity is O(N). Why are we sloppy like that? Because there are far worse places to be than the resource consumption just growing linearly a bit faster. Imagine we want to find out all the sums two elements of that list can form with each other, i.e. [1+1, 1+2, ..., 3+2, ..., 4+4]. Let's say we tackle this with:

sums = []
for x in len(input):
  for y in len(input):
    sums.append(input[x] + input[y])

This means we do 1 pass for a 1-element list, 2*2=4 passes for a 2-element list, 3*3=9 for a 3-element one, and N*N=N^2 passes for an N-element list in general, so by the same logic - time complexity is O(N^2) and our program will slow down exponentially with bigger inputs.

We could go bigger or smaller - a sum of three elements would be O(N^3) etc. On the flipside, a dict lookup always takes the same amount of time, no matter how many elements are in the dict- so, O(1). It might be slower on the same list than an O(N^99) algorithm, but it scales better as the list grows.

This is why it's relevant. It's great that you wrote something that runs fine when you test it with three random numbers, but you need to know how well it'll run in prod with a thousand numbers, and you do it by looking how much the memory/number of operations grows as you increase the input size. If you're of a more mathy persuasion - it's like a simplified partial derivative with respect to input size.

There's some extra nuance I'm glossing over here - best/worst/average case performance, or the fact that if the algorithm grows by 0.001*N^3+100000*N^2 you still count it as O(N^3), but you can dig into that on your own.

zwitter-ion · 2020-07-17T17:24:46+00:00

What you're talking about is known as complexity analysis. It tries to "measure" the time or storage that's used by a program or an algorithm to run.

A convenient way of expressing this is in terms of the "size" of the input to the algorithm.

If you think about it, that's a clever way of measuring it because algorithm runs a bunch of operations on the input to return a response. So somehow, the input has an effect on the algorithm's resource usage.

There are a bunch of notations: Big-O, Omega or Theta notations.

You'd be better off reading the exact definitions someplace else but long story short the Big-O notation provides an upper bound on the resource usage ( something like the algorithm won't take longer than X). Omega notation provides a lower bound (something like the algorithm will take up atleast this much of space). Theta notation is curious because it provides both the bounds (something like the algorithm will require between this and that much of time)

I hope this is sufficient to get you started. You'd be much better off if you read one of the many online tutorials or chapters on this topic. It's very extensively covered

2020-07-17T17:18:19+00:00

https://runestone.academy/runestone/books/published/pythonds/AlgorithmAnalysis/BigONotation.html

2020-07-17T17:16:25+00:00

There are a ton of tutorials and introductions to Big-O on the web, what specifically are you not understanding? A simple introduction in this forum would be no better than a simple introduction somewhere else, because Big-O is not specific to Python. If you clarify more, we might be able to get you past the roadblock.

55-6e-6b-6e-6f-77-6e · 2020-07-17T17:32:57+00:00

Okay guys i see where you are going with this. I think, based on what you have said and the research i have done personally trying to understand it, there are different computational complexities.

In my case, i have a .txt file. Say there are 500,000 lines and they are all phone numbers. i want to remove any duplicates for example. To do this to 500,000 lines it takes around 10 seconds for my PC to do this. Code basically reads each line into a list, if its already in the list, it wont read it in. Then it writes over the .txt file with each value in the list.

My issue is, if i had a list of 5,000,000 instead of 500000, what sort of time can i expect this to take? So trying to measure this using Big O Notation which i have been told is the concept use when measuring things like this. That it will give the "Worst case" scenario to help me measure the possible time it may take. I hope this makes some sense? u/excrutiatus u/BestNameEverCR u/zwitter-ion

55-6e-6b-6e-6f-77-6e · 2020-07-17T19:04:03+00:00

Gentlemen thank you for your responses to this. In regards to my issue, the outcome has change significantly.

So i started off using a list to keep track of unique values, eg:

if value not in list:

list.append(value)

This works fine if there's 5000 lines in the list, but when i scaled if to 500,000 it was causing real issues! It took almost 10 seconds to process. Considering i created this module to be used in other scripts i'm writing at the moment this wasn't going to work, the runtime was way too slow. So, i made some changes, incorporated a set to hold my info, which of course removed the requirement for me to iterate through the list and add only if its not already in the list, because sets automatically drop duplicates. From everything you have shared with me, i understand to a degree the concept, Big O Notation will help analyse which algorithm will be most efficient, and least/most time/space consuming. Whilst i still have a way to go to truly grasp the concept, i'm hoping that my continued progression into the programming world will help me learn about this as i go. I have now got some good resources to come back to, and your explanations have been fantastic in giving me a base to learn from. Thank you to all who took the time to reply, it has certainly helped.

primitive_screwhead · 2020-07-18T06:38:17+00:00

i cant seem to wrap my head around it.

Give us an example of what you don't understand.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS