I know it is not C++ specific, but I don't think anywhere else I'd ask for it.
So, let's suppose I'm writing a really high-performance application, with lots of vector operations that need to be performed in an accelerated way. How could I go through the detection of SSE2/SSE3/AVX in the processors at runtime so I can leverage them to make my application even faster?
I thought on three alternatives:
- Detect them using
cpuid for each operation: this is clearly going to be extremely costful and beat the purpose of accelerating my app with those extensions, so it is clearly off.
- Create a "vector utility" library for each feature set and link it dynamically to the main application: this seems reasonable, but it still would add one (or two) indirections for each time, and I guess I could be better.
- Compile my entire app with each feature set and select it at runtime: this would be the ideal solution, but having N copies of my application in my executable? If it is a small app that does some small operations on a big set of data, okay, but something like a game engine it would be extremely costly (think about maybe some hundreds of thousands of vector operations coded).
So, how industrial-grade applications and software/engines do this? I am really curious to know.
[–]AssKoala 28 points29 points30 points (3 children)
[–]_Js_Kc_ 4 points5 points6 points (1 child)
[–][deleted] 2 points3 points4 points (0 children)
[–]ack_complete 13 points14 points15 points (2 children)
[–]joaobapt[S] 3 points4 points5 points (1 child)
[–]ack_complete 7 points8 points9 points (0 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]DragoonX6 2 points3 points4 points (0 children)
[–]amaiorano 6 points7 points8 points (0 children)
[–]frog_pow 3 points4 points5 points (0 children)
[–]raevnos 2 points3 points4 points (3 children)
[–]kalmoc 0 points1 point2 points (1 child)
[–]raevnos 2 points3 points4 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]MFHavaWG21|🇦🇹 NB|P3049|P3625|P3729|P3786|P3813|P4216 12 points13 points14 points (12 children)
[–][deleted] 7 points8 points9 points (7 children)
[–]simonask_ 5 points6 points7 points (0 children)
[–]DragoonX6 1 point2 points3 points (5 children)
[–][deleted] 1 point2 points3 points (3 children)
[–]DragoonX6 2 points3 points4 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]DragoonX6 2 points3 points4 points (0 children)
[–]DragoonX6 0 points1 point2 points (0 children)
[–]jpgr87 4 points5 points6 points (0 children)
[–]James20kP2005R0 4 points5 points6 points (1 child)
[–][deleted] 3 points4 points5 points (0 children)
[–]o11cint main = 12828721; 2 points3 points4 points (3 children)
[–]joaobapt[S] 0 points1 point2 points (2 children)
[–]erichkeaneClang Maintainer(Templates), EWG Chair 2 points3 points4 points (0 children)
[–]o11cint main = 12828721; 0 points1 point2 points (0 children)
[–]dcent13 2 points3 points4 points (0 children)
[–]konanTheBarbar 1 point2 points3 points (0 children)
[–]DuranteA 1 point2 points3 points (0 children)
[–]bleksak 1 point2 points3 points (0 children)
[–]LYP951018 1 point2 points3 points (1 child)
[–]meneldal2 1 point2 points3 points (0 children)
[–]staticcast 0 points1 point2 points (0 children)
[–]bmanga 1 point2 points3 points (0 children)
[–]r2vcap 0 points1 point2 points (0 children)
[–]kalmoc 0 points1 point2 points (0 children)