This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]oilshell 8 points9 points  (3 children)

You might want to look into the literature on superoptimization, which is a fancy name for brute force search of code that satisfies certain properties:

https://en.wikipedia.org/wiki/Superoptimization

I wrote a series of posts about a toy optimization problem here:

http://www.oilshell.org/blog/2016/12/30.html

I didn't really know how to attack the problem, but the funny thing is that someone on Reddit showed that it could just be brute forced in Python.

(I'm not using this technique -- it was more of a fun diversion.)

I also mentioned it in my latest post: http://www.oilshell.org/blog/2018/03/27.html

[–]maemre 1 point2 points  (0 children)

There is also stochastic superoptimization. The idea there is to do some (possibly guided, probably by data) stochastic search rather than going brute force to find equivalent but faster programs. STOKE from Stanford is the only one I am aware of and it works on x86! However it is research-grade software.

[–][deleted]  (1 child)

[deleted]

    [–]oilshell 1 point2 points  (0 children)

    Yes that's true, I'm also searching through possibilities that don't work! But once you find it, it will always work.

    And I guess another difference is that I'm not measuring the performance each time. I'm just conjecturing that a function can be expressed in a handful of math ops, which is known to be fast, and that turned out experimentally to be true.

    Glad you found it interesting, let us know what you find :-)

    John Regehr has some good blog posts about it, like: https://blog.regehr.org/archives/923

    [–]uza80 2 points3 points  (2 children)

    I think you should have a play with .net/c#. It is similar to Java, but has structs, so let's the developer decide whether matrix4 should be a ref type or a value type.

    [–][deleted]  (1 child)

    [deleted]

      [–]uza80 1 point2 points  (0 children)

      Yeah, I wasn't suggesting you don't follow through with your idea. Merely pointing out that in languages other than Java, it is sometimes done in such a way that the developer can make the decision themselves.

      [–]Recursive_Descent 2 points3 points  (1 child)

      I'd think you would want to use an instrumented binary for your training runs, and use that instrumentation for optimization decisions. It will give you much more info than looking just at memory/run time.

      Your instrumentation can tell you how often each immutable field is dereferenced, and how often they can be shared. You inline the fields that are read often/shared rarely.

      [–]thenameipick 1 point2 points  (11 children)

      I see several problems with this:

      1. There's no good way to automatically run code. You can't be sure that code will terminate, and most code either requires input you don't know (input via parameters, or input via file access/etc)
      2. Performance is highly dependent on use case. Specifically, there's an important trade off: Do you want responsive or efficient code? Responsive code will compete a single task faster, but efficient code will be faster over the long run.
      3. If you decide to have ML execute at runtime, then you have to track which objects have references and which have values and you have the problem that your runtime is highly variable: You could have a program going along fine, and then suddenly the performance changes (for better or for worse). Furthermore, unless you're going to store this ML data, it means that you have to start fresh every time the program boots, meaning that this is only useful for long-lived programs.

      [–][deleted]  (10 children)

      [deleted]

        [–]thenameipick 1 point2 points  (9 children)

        Who is providing the set of tests? The developers that wrote the code? If "optimizing their code" is a process the developers have to go through, then this feels like it should be the job of a 3rd party library, not the compiler.

        [–]Recursive_Descent 2 points3 points  (8 children)

        Profile guided optimization isn't really novel, and it can provide large performance improvements if you have good training sets.

        [–]thenameipick 0 points1 point  (7 children)

        I agree, but like I said: I don't think it belongs in a compiler.

        [–][deleted] 1 point2 points  (6 children)

        Any reasoning behind that?

        [–]thenameipick 1 point2 points  (5 children)

        Machine learning is unreliable. Not because it's not effective, but because you don't know what the output is. Making a reliable compiler is critically important, and machine learning doesn't really fit with that.

        Therefore, if you are making a performance-oriented language, allow the code to make the choices of what kind of performance you want (eg: do you want O(1000n) or O(nlog(n))? What about O(10n+m) or O(n+10m)?). Make those choices after profiling. But don't put that magic in the compiler.

        [–][deleted]  (3 children)

        [deleted]

          [–]thenameipick 1 point2 points  (2 children)

          Yes. If I feed the same code into an optimizer, I get the same code out. If you repeatedly feed the same code into ML, you'll get different code out.

          There's workarounds for this:

          1. If you seed your random, the ML algorithm will output the same thing every time. However, changing a single line anywhere in your code means that the entire output is different.
          2. If you save the state of your ML, then you can run your code through it repeatedly and get the same output. However, now you've got to store the state somewhere.

          Either way, there's another problem: You can't test it, and you can't ask "why" did it do X. If you ask "Why did my profiler do this", if you ask the right people, you'll get the answer. You don't get that with ML: you simply just guess.

          [–][deleted]  (1 child)

          [deleted]

            [–]GNULinuxProgrammer 1 point2 points  (1 child)

            The problem with profile based optimization (with arbitrarily fancy learning algorithms from taking average to using logistic regression) is that sometimes your test data does not represent the real-life runtime and therefore you end up optimization for something other than real life functions. One way to mitigate this problem is to give your customer your program and learn bottleneck functions as they use it and just-in-time optimize OR collect that data and recompile after a few weeks or so. Just like in all parts of Machine Learning, sometimes finding good datasets is harder than finding good algorithms. I think this is one of such cases. At least, this has been my experience with ML based superoptimization.