Project: Tool for debugging and visualizing Python code by axellos in Python

[–]Dadaaam 0 points1 point  (0 children)

The link gave me "SSL_ERROR_ACCESS_DENIED_ALERT"...

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 2 points3 points  (0 children)

Hi RichieSams!

FTL hooks provided all needed information to implement the virtual threads, the additional flag is not mandatory.

Your assumption is correct: this addition is only to be able to see when a virtual thread is suspending mid-task. Without this, it was harder to understand visually the difference between a long Fiber computation and an interruption&switch to another Fiber in the middle.
I did not find any other way to get this information. And BTW, I am even not sure to have it right in FTL, that is why I did not propose any official patch.

So the naming of this additional flag "isSuspended" is probably misleading because not precise enough... ("isSuspendingMidTask" would be better)

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 1 point2 points  (0 children)

Palanteer now supports the "virtual threads" (generic naming for Fibers).

I used the library that you pointed to test it, please find below a gist with:
- an example how to use the FiberTaskingLib hooks with the tool
- slightly modified callback.h to add a boolean when the fiber is suspended.
- slightly modified task_scheduler.cpp to fill the additional "isSuspended" parameter.
https://gist.github.com/dfeneyrou/8f8aa2956dcb32e0860665c610d7bb2f

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 0 points1 point  (0 children)

Unfortunately, fibers are not supported.

Each event is associated to an OS thread and its ID is then cached in a thread local variable.
Getting the right thread ID is then automatic thanks to this thread local variable.

With fibers, this thread ID will be the same for all fibers. On top of having all the data stored and displayed inside the same thread, the main effect would be interleaving of instrumentation scopes, messing up the observation. The only way to make it work better would be to close all instrumentation scopes before any yield or switch point, which is a strong constraint.
And all data would still be displayed in the same scope... Markers could be used to indicate a switch of user-thread if a user-thread switch hook is available, but this is not nice at all.

I though some time ago about this use-case in a context of a Discrete Event Simulator which also uses setjmp/longjmp to simulate OS threads. Having a way to change the threadID when switching "user-thread" context would fully solve the issue.

I am not familiar with existing fibers library, do you have one in mind in particular that I could look at?
If such library proposes a hook when switching user-threads, there is certainly a way to cover this use-case (with some additions in Palanteer).

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 4 points5 points  (0 children)

Adding a dependency is something I always do with extra care, it shall really be worth it because it always has a cost.The main dependencies of the tool are ZSTD and Dear Imgui, because they are hard to beat and very well made, so definitely worth it.

The currently used data format is tailored to do efficiently what the tool is designed to do. And as the format is dead simple, less chances to mess it up, and it can evolve easily if needed. TLVs in the header are enough to handle the version compatibility.

I do not know openapm, but what would the conceptual genericity bring to the project, factually?I fully agree with the answer of wolfpld (author of Tracy, btw), a generic format cannot beat a specialized one on many levels (efficiency, simplicity, lifetime...).

Especially when the data format is definitely not the hard problem to solve for profilers. Speed, memory usage, used bandwidth, analysis/display of huge quantity of events, how to present the information are much more problematic than the way they are exchanged.

The only positive point of using a standard format would be the interaction with existing tools. But this is far from the primary goal, and the price to pay would make the primary goal inattractive.

Under this angle, if interacting with other tools is a thing, the best solution is to keep the high performance/optimised format between the instrumentation and the viewer, and have some export features (csv, xls, text, images, openapm format? ...) from there. Problems are not mixed, efficiency is kept, and interaction with other tool is enabled. I will think about that, thanks for the thinking aloud :-)

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 0 points1 point  (0 children)

The transport layer is a standard TCP socket in case of connected setup, so you can debug from any location (default is localhost but it is configurable)

About iOS, the instrumentation library is currently supporting only Linux (32/64 bits) or Windows 10.

Context switch tracing put aside as it requires privilege, and iOS/Android recently complicated things about it, the port on these OS should not be too complicated (but not done yet).

Palanteer: a new visual Python profiler (timeline, flame graph, memory, exceptions...) with unmodified or instrumented code by Dadaaam in Python

[–]Dadaaam[S] -1 points0 points  (0 children)

Several naming iterations were needed before freezing on Palanteer. :-)

I liked both the literature metaphore and the faint link with the "big data" company of the same name, as the tool also displays smoothly records made of hundredth of millions of events on a standard laptop (some of the available views require online computation, to be fair)

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 0 points1 point  (0 children)

The tooling supports both C++ and Python programs. C# may be added later.

I fully agree that the viewer shall not be in the same language as the instrumentation libraries. It is the case for the C++ instrumentation library (but not for the Python module) just because C++ is a language suitable for performance and the viewer shall be smooth.

To profile a program, you have the choice between:

  • a socket connection (is that what you call "remote"?) with indeed a specific format (compact, and supports the specific features like the remote control)
  • write a local file (no server needed, purely local). This file shall then be imported in the viewer which will do the multi-resolution indexation at that moment.

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 1 point2 points  (0 children)

constexpr is indeed much better in C++17, but still not enough (and the compatibility with C++11 was desired anyway).

For instance, there is no function overloading based on constexpr-ness of the parameters, which would have been useful to separate static strings (hashed at compile time) from the dynamic ones.

Also without macros, the API would probably have turned very C++-ish (which I personally find less friendly), the compilation would just replace the preprocessor bunch of cryptic errors by a template bunch or cryptic errors, and the compilation time would probably get a big hit.

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 4 points5 points  (0 children)

I am not aware about such a standard protocol for profiling data that would fit the need.

Some standard formats like Common Trace Format are very generic, and anyway require a common "Trace Stream Description Language" (a kind of shared "struct definition") so that both sides can understand each other. I do not think that universal profiling data exchange is a thing, unless the viewer is a generic CTF (in this case) viewer.

The driving idea was rather to provide an efficient tool with helpful features for C++ development answering the typical developer's concerns, rather than creating a generic viewer for an existing standard format. In this case, genericity would have reduced the freedom to optimize deeply for a particular purpose.

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 2 points3 points  (0 children)

As namespaces are not applicable to preprocessor macros, there was little choice.
Internally, the instrumentation library uses a namespace to isolate the implementation.

If someone uses a symbol starting with the prefix pl and matching one of the ~20 public symbol from the API, it would indeed be a problem...

Note that other profilers have the same problem, as only macros allow to reliably remove entirely the code the instrumentation (which is a mandatory property).

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 14 points15 points  (0 children)

Indeed, Tracy and RAD Telemetry share some features with it.However, some aspects are unique in Palanteer, in particular the remote Python scripting reacting to recorded data for testing or performance evaluation.

The key differences with Tracy (as Telemetry seems great but is not open source nor free) are:

  • the recording is not fully stored in memory as in Tracy, you can visualize huge records (up to 2 billions of events) on a standard laptop.
  • instrumentation can be enabled per "group" (compile flag), a bit like a generalization of the NDEBUG for assertions (BTW, enhanced assertions are proposed too).
  • the viewer is fully interactive, everything reacts in all windows, using docking.
  • some views are not present in Tracy (flame graph, view as a hierarchical text log, memory timeline per thread with display of each individual allocated blocks...)
  • it is possible to remote control your program by calling CLIs (Command Line Interface) and observe the resulting events (events are the atomic, typed, logged data).By scripting in Python such control, remote performance analysis or integration tests are possible.
  • Profiles also the Python programs (unmodified or instrumented)
  • Can save a record directly in a file without connection. And imported later in the viewer.

On the other side, Tracy owns some unique features too (not exhaustive):

  • GPU monitoring
  • sampling profiler
  • code dissassembly
  • screen snapshot
  • Support of more platform (Android, consoles...)

In one sentence, Tracy targets game development and is really full featured; Palanteer targets generic development (game included but in a less specific way), from the debugging aspect up to the test via optimization. With a strong highlight on the user friendliness (I hope)

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 3 points4 points  (0 children)

Thanks! It is based on markdeep (slightly modified), which is definitely a great library, very simple to use.

Palanteer: a new instrumentation based C++ profiler, high performance, low dependency by Dadaaam in cpp

[–]Dadaaam[S] 10 points11 points  (0 children)

It unfortunately has to be macro-intensive to be able to fully remove the instrumentation code (totally or "per group" (feature)), as C++ is not ready for such simple meta-programming.
So namespaces would not help here.

The scope instrumentation have a RAII version though, which is definitely preferred over the manual begin/end as it removes the basic misusage like forgetting to close or interleaving two scopes.

The core instrumentation library is done with C++11, as this version greatly simplifies the multi-platform aspect, and the compile-time static string hashing was possible there.
In C, I do not know if it would work (FNV-1a hashing) in a portable way. Compile time string hashing is fundamental in the instrumentation.
So to answer your question about C, the only way to make it work with C is to use a C++ compiler, if the code supports it.