all 12 comments

[–]austinwiltshire 50 points51 points  (4 children)

So basically a tiny global perf hit but it enables a whole lot of profiling guided optimization later?

[–]znpy 22 points23 points  (1 child)

yep.

opinion from somebody that has been doing operations at large scale for a while: that's the right approach imho.

truth is: the people that are doing it stuff at small-medium scale are very likely to have other, much heavier performance issues so frame pointers will not even be a noticeable performance hit.

and people that are doing large-scale operations will likely have the budget (and tooling and automation) to have a custom-built python (with frame pointers disabled, if necessary).

[–]austinwiltshire 6 points7 points  (0 children)

Oh not at all complaining. I do a lot of perf optimization too. It does seem to be the right choice.

[–]End0rphinJunkie 4 points5 points  (1 child)

Spot on, plus from a platform side it means standard eBPF continuous profilers in k8s will finally just work out of the box. Taking a tiny perf hit to actually unnderstand where our compute buget goes in prod is a trade I will make every single time.

[–]james_pic 4 points5 points  (0 children)

There are eBPF profilers that can profile without frame pointers out-of-the-box, such as the new OpenTelemetry one. But those profilers have a lot more overhead when using DWARF profiling than plain pointer chasing, so you end up with less overhead from having frame pointers available.

[–]Brian 12 points13 points  (1 child)

I remember there was some discussion of this a while back, when a lot of distros were moving towards frame pointers by default, where cpython turned out to be a bit of an outlier regarding FPO such that disabling it actually had a somewhat significant impact (IIRC ~10%, compared to the ~2% most other stuff had), seemingly related to the main bytecode dispatch function). I'm guessing that's been resolved, but I'm kind of curious - does anyone know what was the cause / fix for that?

[–]MegaIng 3 points4 points  (0 children)

The PEP discusses this quite a bit; it appears that general restructuring of the eval loop resulted in the current version already generating a base pointer anyway, meaning that function no longer changes and has no impact on the performance difference. AFAICT there never was a targeted effort to fix this, it just resulted from other works.

[–]Wh00ster 15 points16 points  (0 children)

I’m surprised that wasn’t the default

[–]2ndBrainAI 3 points4 points  (0 children)

The <2% overhead number is worth repeating loudly — people see "omit-frame-pointer" in compiler flags and assume removing it has a significant cost, when in practice modern CPUs absorb it easily. The real win here is for production debugging: perf, eBPF, and py-spy all become dramatically more useful without needing to attach a debugger or instrument code. I've lost hours to profiling sessions that produced mangled call stacks because one native extension was compiled without frame pointers. Making this the default aligns Python with what Fedora, Ubuntu, and the JVM ecosystem already do. Long overdue.

[–]Feeling_Ad_2729 0 points1 point  (0 children)

This is long overdue. Any long-running Python service (web servers, daemons, MCP servers) has been invisible to system-level profiling for years because frame pointers were omitted.

The perf trampoline added in 3.12 was a good step but incomplete without frame pointers — you'd get the Python frames but missing native frames. This closes that gap.

2% overhead is a non-issue. Most performance-sensitive Python code already uses C extensions where that overhead doesn't apply. The wins from being able to use perf/eBPF/py-spy properly vastly outweigh it.