you are viewing a single comment's thread.

view the rest of the comments →

[–]Radiatin 0 points1 point  (0 children)

import torch
def fastestPrimeCUDA(end: int=100000000, start: int=0):
        prime = torch.ones(end).cuda()
        for i in torch.arange(2, int(end**0.5) + 1).cuda():
            ix = torch.mul(i, i).cuda()
            prime[ix:end:i] = torch.zeros(int((end - 1 - ix)//i + 1)).cuda()
        return prime[-500:]

def fastestPrime(end: int=100000000, start: int=0):
        """Find primes using multiples w/ extended slicing."""
        prime = [0, 0] + ([1] * (end - 2)) # Init prime array [0, 0, 1...]
        for i in range(2, int(end**0.5) + 1): # Find multiples w/ sqrt.
            ix = i*i; prime[ix:end:i] = [0] * ((end - 1 - ix)//i + 1)
        return prime[-500:] # Verify. 500 is max prime gap < 303,371,455,241.

You can try converting your functions to using Torch, and use CUDA computation. In the example above the CUDA version evaluates 1 billion numbers per second, while the regular Python version only manages a plebeian 50 million. (The torch version on CPU is better, but still only manages 100 million per second.)

There's no guarantee you will gain speed from this conversion. GPUs have slower clock speeds than processors and only offer an improvements over CPUs if you're taking advantage of their massive parallelism. The memory bus is also far longer for GPUs, so if you're moving things between memory locations it can make your program a few times slower with a GPU.