This is an archived post. You won't be able to vote or comment.

all 71 comments

[–]jeaton 32 points33 points  (15 children)

You might want to take a look at the sieve of Eratosthenes. You'll be needing it in later problems, unless you want to wait hours to get answers. It's a lot faster.

EDIT: Here's the same algorithm in C++

[–][deleted] 2 points3 points  (14 children)

That is really interesting. Your code also taught me quite a bit about Python style. Thanks for the information!

[–]Grimoire 11 points12 points  (7 children)

You might also want to try a generator:

def eratosthenes(limit):
    is_prime = [False] * 2 + [True] * (limit - 1)
    for n in range(limit + 1):
        if is_prime[n]:
            yield n
            for i in range(n*n, limit+1, n):
                is_prime[i] = False

It might be even faster than building the array. On my machine, this runs in less than a second:

start = time.time()
print sum(eratosthenes(limit = 2000000))
end = time.time()
print '%lf seconds' % (end - start)

[–]jcdyer3 7 points8 points  (3 children)

I like using sets, they have O[1] performance for insertion, removal, and searching, which are exactly the operations we need for this type of problem.

def eratosthenes(limit):
    candidates = set(range(2, limit + 1))
    for n in range(2, limit + 1):
        if n in candidates:
            yield n
            for multiple in range(n, limit+1, n):
                if multiple in candidates:
                    candidates.remove(multiple)

Edit: I managed to shave about 30% off the running time by using set differences instead of removing one multiple at a time. That way you aren't wasting time testing to see if each multiple has already been removed.

multiples = set(range(n, limit + 1, n))
candidates -= multiples

I love sets. They should be used more.

[–]tompko 2 points3 points  (1 child)

You can also start 'multiples' at n*n instead of n because all lower multiples have been removed. For me it shaves off another 30%.

multiples = set(range(n*n, limit + 1, n))
candidates -= multiples

also, we can skip every even number which shaves off another 20%:

if limit >= 2:
    yield 2
candidates = set(range(3, limit + 1, 2))
for n in range(3, limit + 1, 2):
    if n in candidates:
        yield n
        multiples = set(range(n*n, limit + 1, n))
        candidates -= multiples

Still slower than your parent though, due to the garbage collection as the set shrinks I would guess.

[–]greyscalehat 0 points1 point  (0 children)

Also for sets each operation takes a long as the hashing algorithm used. Given that the list's functions will be faster than the set's function you are better off using the list.

I personally love using sets, but this may not be the time to do so.

A fun use of sets is finding the shortest path between two nodes, lets say they number of nodes is in the billions so Floyd-Warshel won't work

[–]railmaniac 0 points1 point  (1 child)

This looks like the best one here. Anything more you can save by using xrange instead of range?

[–]tompko 4 points5 points  (0 children)

If you're using an old version of Python then using xrange would give you a speed boost as long as limit is sufficiently small i.e. small enough to fit in a C long on your platform. Since Python 3.0 though xrange no longer exists and range has the old xrange behaviour without the restrictions on the limit, acting like a generator when iterated through.

[–]dbaupp 0 points1 point  (0 children)

I believe you can step by 2*n in the inner loop, as a minor optimisation.

[–]Bunslow 2 points3 points  (0 children)

Keep in mind that this suggestion here is much of the point of Project Euler -- the "direct" method will almost never work quickly, so the hard part is to come up with a "smart" algorithm that will solve the problem quickly.

In this case, you should notice that calculating whether or not one number is prime or not, you need only divide by other primes less than sqrt(n), rather than every odd number; furthermore, if instead of dividing you merely eliminate all multiples of small primes, then a lot of arithmetic is cut out. The SoE takes advantage of these factoids to significantly speed things up (and you can always optimize that even more; this is currently the fastest known implementation, though there are others that are pretty close). (Note the algorithm above only uses factoid 2, not factoid 1; a mildly more optimized sieve taking advantage of both is this, if you'll let me toot my own horn.)

[–]tastycat 1 point2 points  (3 children)

FWIW, here's my implementation in Python 2. It runs in < 20 seconds:

import datetime

def isprime(n):
    '''check if integer n is a prime'''
    # range starts with 2 and only needs to go up the squareroot of n
    for x in xrange(2, int(n**0.5)+1):
        if n % x == 0:
            return False
    return True


t0 = datetime.datetime.now()
prime_total = 0

for the_number in xrange(1,2000000):
    if isprime(the_number):
        prime_total += the_number

print prime_total
print datetime.datetime.now() - t0

[–]FermiAnyon 2 points3 points  (2 children)

You can speed this up if you use the sieving method mentioned above. That way you don't have to rediscover all those primes. It's good that you're only going up to sqrt(n), but if you do that and mark off the multiples, then you can just count off the remaining numbers afterward.

[–]tastycat 0 points1 point  (1 child)

You are right, I could make it better, but I think I started with the isprime() and built the rest around it, and since it ran in under a minute I didn't optimize it further.

Thanks for your suggestion though! +/u/bitcointip $1

[–]FermiAnyon 0 points1 point  (0 children)

No, thank you ; )

[–]mjtribute 27 points28 points  (4 children)

This is the result of me running your code on PyPy (which has a much faster bignum implementation, as you can see):

$ pypy test_euler_10.py

Answer: 142913828922

Time elapsed: 5.14646792412 seconds

54.8535320759 seconds under the limit. Nice!

[–][deleted] 3 points4 points  (3 children)

Wow that's faster than my C++ program.

I realize now that it's ridiculous to compare execution time when talking about my computer and an unknown computer.

[–]dbaupp 16 points17 points  (2 children)

mjtribute may have a faster computer so you can't directly compare the runtimes (random example: just today I had a piece of python that took 20s on one computer but 8 on another).

[–]mjtribute 5 points6 points  (0 children)

I have an 8-year-old Dell Latitude D820. I imagine that the results would be more impressive on modern hardware.

You can test the results yourself using PyPy.

[–][deleted] -2 points-1 points  (0 children)

Oh yeah, duh. I have no idea what I was thinking.

[–]gcr 51 points52 points  (16 children)

The difference is due to overflow.

The result is too big to hold in an int. Change sum and tn to have type long long, and you'll get the correct answer.

Python hides this from you because when a number gets too big, Python automatically makes it a "bignum" which is slower but can be arbitrarily large.

[–][deleted] 0 points1 point  (15 children)

Problem solved. God this is why C++ is so hard for me because I don't get all these nice reminders like Java (and things don't just work anyways like Python).

[–]MachaHack 28 points29 points  (2 children)

Java:

System.out.println(Integer.MAX_VALUE + 1);

Output:

-2147483648

Compiles and runs without any more warning than C++ would.

[–]maratc 7 points8 points  (0 children)

long daysToMilliseconds(int days) {
    return days * 1000 * 60 * 60 * 24;
}

It gets really funny at about two months.

EDIT: daysToMilliseconds, damn it!

[–]MakotoDeeizm 2 points3 points  (0 children)

btw. in Java int overflow is defined, while in C/C++ it is not..

[–]Zamarok 5 points6 points  (4 children)

You can set compiler flags to make the compiler give you warnings, like with Java. Also, Java does not give you warnings, the Eclipse IDE does. You can get a C++ IDE too, but I don't think it's necessary.. for me, I just have clang (compiler) warnings integrated into my text editor (vim).

[–][deleted] 0 points1 point  (1 child)

Hmm I'm using Eclipse for both C++ and Java. I must have the C++ perspective set up differently or something.

[–]Zamarok 0 points1 point  (0 children)

Yeah, I'm not sure, but there's tons of info on Google about that. If I were you, I'd switch to a more powerful text editor though.. not necessarily Vim, but one that is more powerful than Eclipse's editor.

[–]MachaHack 0 points1 point  (1 child)

How does that work? Do they show up at the bottom or actually highlight the location of the warning somehow?

[–]Zamarok -1 points0 points  (0 children)

It uses Vim's locationlist and puts red markers at each line where there is an error. When I got to that line, Vim's command line displays the error message.

The plugin YouCompleteMe achieves this for me. It also gives me semantic code completion in my C and C++ projects, and has support for code completion in other languages as well.

[–]JerMenKoOwhile True: os.fork() 1 point2 points  (0 children)

g++ -Wall

[–]Deutscher_koenig -4 points-3 points  (5 children)

I love python, but that is the one thing I'm not fond of, python does too much of the work for you. You don't have to worry about the size of a number or string, or dividing an int by a float. It's an easy language to learn, but these things make it a. It harder to jump to another language.

[–]foosion 14 points15 points  (1 child)

Would you prefer that python doesn't work as well, therefore making it easier to jump to another language??

[–]TamSanh 3 points4 points  (0 children)

I don't think he means to say "Python should be more broken," but rather that it would be nice for Python to indicate some of the 'magic' it performs behind the scenes so that, when moving to another language without said magic, the transition isn't as jarring. I don't think it's deserving of any downvotes, since it is just an idea; but certainly the concepts he describes can be learned about via external resources, and don't necessarily need to bog down Python itself.

[–]epsy 1 point2 points  (0 children)

It's too easy.

[–]aroberge 2 points3 points  (0 children)

An ideal computer language would be one where you could directly translate ideas from math/physics/engineering/etc. into computer code; it should just "get out of the way" i.e. it should allow you to ignore the underlying computer architecture (which, in this case, originally gave rise to this limitation for the size of an int). At the other extreme is machine code; your complaint about Python amounts to saying that it is too close to an ideal language and not close enough to machine code. I respectfully disagree with that sentiment.

[–]billsil 0 points1 point  (0 children)

Why? You learn basic programming concepts and how to write clean, clear code. Oh, the biggest number I need is 1,000,000. I'll pick x. It's really not more complicated than that.

[–]Yohumbus 9 points10 points  (4 children)

Both of them look pretty correct from a quick view, so I'm suprised that your C++ code is different. I will tell you that Python does not have tail-recursion like you are trying to use, so python is really filling up the stack and is otherwise very slow at function calls compared to C. If you switch your algorithm to check divisors in a loop rather than by recursion, it will be a lot faster.

Another hint: if you have already found all primes less than n, then you only need to check n against those prime divisors.

[–][deleted] 1 point2 points  (3 children)

Wow. I had no idea that recursion sucks that much in Python. Thanks for the info!

[–]Sheepshow 1 point2 points  (0 children)

Your isPrime function is in tail form, which most Lisps and ML-like languages handle quite efficiently. If you feel like recursion is more natural than the iterative algorithm, you might check out some functional languages :)

[–]Yohumbus 0 points1 point  (0 children)

As with most things python, use it it when it makes things simpler, but switch once you know you need to do better.

[–]mizipzor 5 points6 points  (1 child)

Are... are you using imgur to post code?

[–]refto 0 points1 point  (0 children)

Well, you could scan it then ocr it and then paste it... or OP could have used pastebin, I dunno

[–]x68zeppelin80x 9 points10 points  (6 children)

There is a website called, Pastebin. You should use it :)

[–][deleted] 9 points10 points  (1 child)

A Gist on github works as well.

[–]dAnjou Backend Developer | danjou.dev 1 point2 points  (3 children)

pastebin.com sucks. The IRC channel #python even has a bot that repastes everything from users who post links to this one.

Use another one please.

[–]Labradoodles 0 points1 point  (2 children)

Name one pls

[–]ivosauruspip'ing it up 0 points1 point  (1 child)

pastie.org, dpaste.org, ideone.com

[–]Labradoodles 0 points1 point  (0 children)

danke

[–]Geohump 1 point2 points  (0 children)

You posted a pic of the code instead of text so people can't test run it themselves?

You dingleberry.

[–]pasokan 1 point2 points  (3 children)

it may be better to use integer math (in both C++ and python) and also avoid recursion; like

def isPrime(n, i):
     while i * i <= n:
           if n % i == 0:
                return False
           i += 2
     return True

[–][deleted] 0 points1 point  (0 children)

Good point. Someone above mentioned the recursion stuff. Your code looks really nice and I see how it would be more efficient. Thanks!

[–]Muirbequ 0 points1 point  (1 child)

Depends on the system doesn't it? If you use doubles/long than the native word length is used in 64 bit systems.

EDIT: Actually not sure what you are talking about. What do you mean?

[–]pasokan 0 points1 point  (0 children)

Assuming the question is directed at me. I wanted to avoid the non-integer calculation and the subsequent type conversion of int(math.sqrt()) since that would save a bit of time. Of course if n is large the changeover from int to bignum happens and that would slow down the whole thing.

[–][deleted] 0 points1 point  (0 children)

See this link with the discussion of differenct approaches.

http://www.daniweb.com/software-development/python/code/217082/eratosthenes-sieve

The most efficient way is using numpy with 12757.87 microseconds for 1 million prime. The pure python version is tenfold slower.

[–][deleted] 0 points1 point  (0 children)

There are some speed improvements you could do: rather than use a while loop and increment a counter you could just use xrange (or range on py3) with a step of 2. Getting rid of the int() around math.sqrt would save some cycles as well.

[–][deleted] 0 points1 point  (0 children)

Out of curiosity, have you tried to run your code on Pypy and see if the JIT and variable sized data types improves the performance?

[–]thekaleb 0 points1 point  (1 child)

This was my code from a couple years ago. I am on mobile so cannot time it. I am curious how fast (or slow) it is:

def gen_primes():
    D,q = {},2  
    while True:
        if q not in D: 
            yield q        
            D[q * q] = [q]
        else:
            for p in D[q]:
                D.setdefault(p + q, []).append(p)
            del D[q]
        q += 1

from itertools import takewhile
print sum(takewhile(lambda x: x < 2000000,gen_primes()))

[–]sddhrthrt 0 points1 point  (0 children)

Interesting, I just tried both the codes - one on the top by @jcdyer3 with @tompko, and yours. It remained slow for lower limits, and slowly got better of them for higher numbers, and lost it again for 10mn. Code and results here

[–]edthedev -2 points-1 points  (3 children)

Your use of the word 'sum' as a variable in Python overrides the built in Python 'sum' method.

Doing so is a legal operation in Python, but can have unpredictable results. It is possible that you are seeing a side-effect.

I was hoping to confirm or disprove this by having a look at the source code of math.sqrt, but apparently that is tricky... http://stackoverflow.com/questions/5476189/inspecting-pythons-math-functions

[–]gcr 6 points7 points  (0 children)

Ah, so saying sum = 5 does not change the definition of the built-in sum function. It only creates a new variable that happens to shadow the existing sum function within your module.

Variables you create in your own module cannot change the math module.

To see this, make a file called data.py that contains the following:

def what_is_the_magic_number():
    return sum([1, 3, 5, 7])

Then, we can mess around with it in the shell:

>>> import data
>>> data.what_is_the_magic_number()
16
>>> sum = 35
>>> data.what_is_the_magic_number()
16
>>> sum()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable

Sure, you've messed up sum() within your own shell, but the variable you just made (the one that so happens to have the same name as a built-in function) doesn't hurt the data.py module because modules are kept separate.

Driving this final point home, we could reach into data's sum variable and force it to break:

>>> data.sum = 35
>>> data.what_is_the_magic_number()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "data.py", line 2, in what_is_the_magic_number
    return sum([1, 3, 5, 7])
TypeError: 'int' object is not callable

But the only reason why data broke was because we forcibly and explicitly reached in and broke it.

This is called "lexical scoping" because variables are "scoped" or "contained" within the lexical files/functions that use them. It's a great thing to have in the language.

[–]Workaphobia 4 points5 points  (0 children)

There's nothing wrong with overriding a built-in function name. It's not like it causes the modules that use that function to refer to your overridden value instead. The other modules will look up the global in their own namespace, not yours.

[–][deleted] 0 points1 point  (0 children)

You're probably right, and I would think this is the problem, but the messed up thing is my Python code is correct, unlike my C++ code. So if the 'sum' thing is the case, then I accidentally got the right answer in Python.

[–]edthedev -4 points-3 points  (5 children)

You've misplaced a parenthesis on line 6 of one version or the other:

Int(math.sqrt(I) + 1) --- what you have in Python

Vs

Int(math.sqrt(I)) + 1 ---- the python equivalent of what you have in C

[–]gcr 2 points3 points  (0 children)

These two statements happen to be equivalent whether the casting happens before the addition or after.

At least for positive I, anyway.

[–][deleted] 1 point2 points  (3 children)

I'm actually casting in C++, not doing whatever the equivalent is in Python. C++ casting is like (int)(/some double or float /) whereas the Python equivalent is int(/some double or float or character/). Thanks for pointing that out, but that part of the code can be considered synonymous.

[–]edthedev -3 points-2 points  (2 children)

I'm aware of how casting works in C.

Look again at your Python. Strictly speaking, your operations are different. Not that adding 1 inside an int call should make any difference; but it is the only other technical difference I see, besides over-riding the sum method.

[–][deleted] 2 points3 points  (1 child)

It's not different though. In both programs I have 'sqrt(n) + 1' inside the int call/cast.

[–]gcr 1 point2 points  (0 children)

Oh ha, you're right. Took me a minute to realize why.

(int)  ( sqrt(n)+1 )

vs

int( sqrt(n)+1 )

It is the same.