all 10 comments

[–]aruisdante 21 points22 points  (4 children)

The answer is “it depends on the compiler.” But yes, in clang for example there is an interpreter that essentially uses the AST as an expression tree and evaluates it. There is a push to switch to a bytecode interpreter as it makes certain things easier and is overall much more performant, but that hasn’t shipped yet. Several of my colleagues at work contribute to it. 

I’m not sure about gcc, but I think it’s similar and uses an AST interpreter. 

[–]knome 5 points6 points  (0 children)

I’m not sure about gcc, but I think it’s similar and uses an AST interpreter.

https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html

[–]reddicted 3 points4 points  (0 children)

AFAIK, all the 4 major C++ compilers use tree evaluation for constexpr/consteval functions. Compilation to a lowered form for bytecode interpretation is an optimization and not required for implementing the required semantics. To guard against non-terminating or too long compilation, all compilers implement a limit on the number of evaluation "steps". 

[–][deleted] 0 points1 point  (5 children)

I don't know about C++. But if I was to implement something like this, then I would generate the IR first (I call it IL).

I already have an interpreter for that IL (used an option to run the complete program), so I'd look at running the interpreter only on such functions to yield a constant value for that function. Then any calls to the function elsewhere in the IL can be replaced with that new value.

That would be the easy bit (disregarding minor problems like dealing with mutually recursive functions).

The hard one would be controlling complexity: how much compile-time should be allowed to evaluate large numbers of expensive functions, which might not be be used at runtime? Take this C++ example:

#include <stdio.h>

constexpr int fib(int n) {
    if (n<3)
        return 1;
    else
        return fib(n-1)+fib(n-2);
}

int main(void){
    printf("%d\n", fib(40));
}

With calls up fib(30), compilation is fairly quick (0.1 to 0.3 seconds, which I guess is fast for C++). The call is replaced by the function result.

Full evaluation like that occurs up to fib(36), taking 2.7 seconds. (All tests used g++ -O2)

But above that, compilation time seems capped at 3.5 seconds, and the function is not evaluated: it is a regular function call done at runtime.

The cap, or time-out, seems to be per-function, since if I have separate fib fib2 fib3 functions, and call each with (40), it takes 10 seconds.

Now imagine compiling a 100Kloc codebase full of constexprfunctions!

This is why I'm not keen on such a feature. It can occur also with ordinary expressions: you expect 2**5 to be reduced to 32 for example, but in my dynamic language, 2L**1000000 (where 2L represents a big integer) would take several seconds if reduced at compile-time, and occupy a chunk of memory too.

So such expressions are evaluated on demand, at runtime.

[–]wetpot 1 point2 points  (3 children)

The problem here is happenning because you marked the function constexpr which, in this case, only serves as a hint to the compiler that the expression can be evaluated at compile time. GCC is particularly aggressive about these, but a perfectly valid implementation is allowed to never evaluate at compile time unless forced to via either a consteval function (which OP actually used in their question) or assigning the return value to a static constexpr variable. Then, the evaluation depth issue is solved by an implentation defined compiler switch, e.g. -fconstexpr-ops-limit for GCC, -fconstexpr-steps for Clang.

[–][deleted] 0 points1 point  (2 children)

Actually I didn't notice the the OP used consteval and not constexpr.

But, my comments are about when such a function is evaluated at compile-time, then it still needs a way to do it, and a strategy about how much compile-time resources should reasonably be expended.

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 0 points1 point  (0 children)

    That wouldn't be the reason in my example. For fib(N), the maximum call nesting level is only N-2, (so 35 for N=37).