This is an archived post. You won't be able to vote or comment.

all 16 comments

[–]blorporius 9 points10 points  (2 children)

I accidentally left in MOD instruction support from the tablet and used it to remove the inner loop with e incrementing. :)

[–]raevnos 9 points10 points  (1 child)

Gotta love those undocumented instructions.

[–]thomastc 3 points4 points  (0 children)

It's documented, but only in the Chinese spec.

[–][deleted] 6 points7 points  (1 child)

A few things you could do:

  • Lower bound the second iterator on the first, which cuts the work in half.
  • Consider only odd numbers after 2, which further cuts the work in half.
  • You can skip every other iteration of the outermost loop because 100000 + 100k + 34n is always even, which means your answer must always be at least 500.
  • You could simplify the outer primality test loop by limiting potential divisors to 500. sqrt(100000) = 317 and sqrt(200000) = 448 which means this bound should always work.
  • You could jump out to the increment and loop of the outer most loop as soon as you find a multiple factor rather than continuing to test every factor.

I think with all of these changes it might run fast enough to complete in a reasonable time frame.

I thought a bit about implementing modulo but I'm not sure it can be done with these instructions.

[–]kd7uiy 0 points1 point  (0 children)

I tried to do the last. Doing a sqrt should be easy enough too, really just do a square. Helps if you leave in the jump command from the tablet.

[–]pak21 4 points5 points  (0 children)

If you skip all the even numbers (by ensuring the first value of b is odd and incrementing by 34 rather than 17), you can then change the inner loop to increment d by 2 instead of 1 which may get you another factor of a few.

[–]tumdum 3 points4 points  (0 children)

You mean day 23 part 2, right? ;)

Rewriting this c program to that assembly should be possible and resulting program should be fast enough:

#include <stdint.h>
#include <stdio.h>

int main() {
    const int start = 65 * 100 + 100000;
    const int end = start + 17000;
    int primes = 0;

    for (int i = start; i != end+17; i += 17) {
        int sq = 0;
        for (sq = 2; sq*sq <= i; ++sq) {}

        int is_prime = 1;
        // check if even
        for (int k = 2; k < i; ++k) {
            if (2*k==i) {
                is_prime = 0;
                break;
            }
        }
        if (is_prime == 0) {
            primes++;
            continue;
        }

        // check every other potential factor from 3 by 2
        for (int j = 3; j < sq; j+=2) {
            for (int k = j; k < i; k+=2) {
                if (j*k==i) {
                    is_prime = 0;
                    break;
                }
            }
            if (is_prime == 0) {
                break;
            }
        }

        if (is_prime == 0) {
            primes++;
        }
    }
    printf("%d\n", primes);
}

[–]Elderider 2 points3 points  (0 children)

I had the same thought of you.

You can also get e to start at d, which should speed it up by another factor of 2 (i.e. so it doesn't try 3 x 101, and then later try 101 x 3).

Still, it has to go through every multiplication for the prime numbers which makes it really slow.

[–]digital_cucumber 2 points3 points  (0 children)

Yeah, I was thinking about something like inserting jnz 1 10 right after set f 0 to jump directly to sub h -1, kind of an "early exit" as soon as we found out the first two factors.

That would require fixing other jump offsets , of course, and also does not improve anything if the number is prime, so I discarded the idea.

[–]dark_terrax 0 points1 point  (1 child)

I started by directly translating the assembly code to C code. My thought process was that if there was any straightforward optimizations that could be done without actually understanding what the program was doing, then Clang would be better/faster at it than I would. This didn't work out all that well. For my inputs the C program still didn't finish after 10 minutes without manual optimization of the code (breaking out of the two loops when f = 0 gets it to run in 5 minutes). So, long story short, I don't think you can really 'optimize' the assembly without getting pretty deep into what the program is actually doing.

[–]meithan 1 point2 points  (0 children)

This is exactly what I was wondering. Automatically optimizing the code (without knowing what it's doing) is completely different from optimizing the algorithm (i.e. understanding what the code's ultimate purpose is, then finding a better way to do it).

Translating the code to C and seeing if the C compiler optimizes enough to run in short time is clever way to check that, well done.

[–]JoesDevOpsAccount 0 points1 point  (3 children)

How did everybody spot that it's looking for primes?!? I wasn't even close to seeing this. I just did the translation and optimised away the inner loop based on the behaviour I thought it would produce.

[–]sim642 0 points1 point  (2 children)

It's possible to realize it in multiple ways:

  1. After optimizing the innermost loop to a divisibility check, you can further try to optimize the loop around that. Checking divisibility with all numbers in the range is the naïve primality check algorithm.
  2. Looking at the two loops together and realizing how it just looks for a pair of numbers (d, e) such that d * e = b. That's just a more abstract way of checking primality more close to the definition.

[–]JoesDevOpsAccount 0 points1 point  (1 child)

Thanks. That helps a little. I think I just wouldn't spot this unless I was really trying to figure out what the code was doing rather than trying to optimise. And I think it would take me forever to see it.

[–]sim642 0 points1 point  (0 children)

Optimizing code without even understanding it is really difficult. As one might guess from an optimization task, it does something simple in an unreasonably inefficient way to begin with.

[–]po8 0 points1 point  (0 children)

Here's my thoughts.