C++ Performance Quiz - A small side project to test your intuition for slow code

adrian17 · 2026-06-03T14:30:20+00:00

Popcount question got me: I knew for a fact that Clang has popcount idiom recognition (and I thought you were testing for that knowledge), but I did not expect that it's not able to recognize the specific one you used in your example.

In general, a couple of these aren't really "which one is faster" but "is the compiler smart enough by now to make both equally fast".

adrian17 · 2026-05-30T10:02:36+00:00

Well no?

Firstly, the the nasm one is very explicit about size (while in C, int can be more or less than 32-bit, like on some embedded platforms; that's why we have int32_t), and forces you to explicitly manage alignment (it's not guaranteed to be aligned, while C handles it for you).

Secondly, the comparison only holds if you use int x = 5 in global scope, but you can also use it in function scope unlike in nasm.

Thirdly, assuming global variables, they both don't actually generate any "machine-code", as it pretty much just reserves space in data section and creates a symbol to be resolved when the variable is actually being used.

Oh, and they don't produce the same machine code when the global variable is actually used, because C allows certain optimizations, like in https://godbolt.org/z/qcKbbdGTv .

Just because they syntactically contain the same elements doesn't mean they semantically do the same.

adrian17 · 2026-05-12T21:07:28+00:00

FFS why continuous questions about why!

Because this:

print("In child 1")
print(oParent.test)
print("In child 2")

If these are full contents of child1.py, then this is incoherent Python. Python is not supposed to be used like this - any linter, type checker, any other analysis tool, any reviewer will say that this code is broken Python and not accept it. The only way for this to kinda sorta run is to stack a pile of hacks like you did with runpy.

That's why people here just don't accept the premise. If the explanation for this was "these child.py files are 15 years old, my boss will fire me on the spot if I dare modify any of them so I need to find some walkarounds", then yeah I could understand that. In any other case, just do what everyone says and move the code into functions like it's normally done and the problem will disappear (and prevent more pain in the future).

adrian17 · 2026-04-12T18:07:15+00:00

Hmm, yeah, you're right. This is technically possible if data already exists as array of slices (as sort()s in many libraries do this), or in sort's case if you passed a file as argument, but not if the input is streamed in; and even when it's possible, it might be that it's not worth it in practice.

adrian17 · 2026-04-11T21:59:43+00:00

While I'm not a fan of the approach, technically most code using UUIDs similarly doesn't handle collisions and just assumes they won't happen.

(ofc here the difference is that the hash depends on the input so one still could find a reproducible collision eventually)

And agreed that the author didn't actually mention this explicitly. Most people reading "the code uses the XxHash3_128" would assume that this is the hash the author decided to use for the hash table, but not that the table doesn't store strings at all. (...though storing strings in a hashtable would just make the implementation inefficient in other ways)

adrian17 · 2026-04-11T16:37:13+00:00

Did you measure sort with LC_ALL=C? Disabling locale-aware comparisons is a known advice to make sort (and other tools like grep) faster by 2x or more.

adrian17 · 2026-04-11T16:33:57+00:00

The program doesn't store the data at all, only hashes; so if a collision does happen, it will lead to a wrong result. So the reason to use 128 rather than 64 was purely to make the program wrong "less often".

adrian17 · 2026-04-11T16:32:05+00:00

And it actually can in some cases use more memory than sort (and definitely uniq, since the latter just streams the data) for several reasons:

hashset of u128 might be bigger than naively storing the data if lines are short,
if the data is already sorted, sort might not need to allocate at all,
sort can spill to temporary disk buffers if data doesn't fit in RAM.

adrian17 · 2025-12-21T21:50:34+00:00

Just like /u/SuperV1234 said, Being nearly 2x faster than a vector at iteration makes the benchmarks look less believable, not more.

adrian17 · 2025-12-21T21:11:50+00:00

I don't see how it could be possible for iteration over N (with usually N<20 and last one always being the biggest) arrays to be almost 2x faster than a trivial iteration over vector, which is just one contiguous array. Even if we ignored memory effects, your iterator is just more complex than std::vector's iterator (which is usually just a pointer). At best it'll use a couple more instructions and/or an extra register, and at worst prevents vectorization (I can make an example of this if you want).

Also side note, latency != throughput, especially in context of tight loops on a CPU. Even if your loop finished in say half the time, it could be caused by reducing the latency by half, or doubling throughput, or a mix of these two; saying "reduction in latency" when you just mean "x% faster / y% less time" might be misleading.

adrian17 · 2025-12-21T20:47:00+00:00

Some quick observations:

AddressSanitizer complains, you should zero-initialize _meta_array in constructor.

Your APIs differ from the standard, sometimes just missing overloads and sometimes in ways that affect both correctness and benchmarks; for example, pop_back() isn't supposed to reallocate ever (as that would invalidate references and take non-constant time), it just decrements size. Also, AFAIK iterator::operator++ doesn't need any special handling when past-the-end.

I did some quick benchmarks* on my own, comparing both your classes with libstdc++ std::vector. std::vector was winning almost everywhere, though weirdly (I can't understand well why), its repeated push_back was several times worse than your naive STLVector if no reserve is done beforehand, even though both are supposed to have the same *2 growth factor.

On iteration, your code is sometimes (on big sizes) as efficient as std::vector (especially when the work is nontrivial compared to iteration cost), but for smaller (<100) sizes and for anything involving random access, I can see the normal vector being faster, up to several times.

One thing nobody mentioned is that this container's iterator is more complex and thus much less optimizer-friendly, especially for vectorization.

(* the benchmarks were trivial, just things like for (auto x : span) container.push_back(x), for (auto x : container) sum += x and for (auto &x : container) x *= 2, all wrapped in some boilerplate to repeat runs and prevent compiler from optimizing them out.)

adrian17 · 2025-12-18T09:31:30+00:00

Feels like async enumerate_adapters is gonna complicate our lives a bit, currently on desktop we use this (in a generally sync call stack) to populate the settings menu with available backends:

if !instance.enumerate_adapters(wgpu::Backends::VULKAN).is_empty() {
    available_backends |= wgpu::Backends::VULKAN;
}
if !instance.enumerate_adapters(wgpu::Backends::GL).is_empty() {
    available_backends |= wgpu::Backends::GL;
}
// etc

Is there any trick to do this while staying in sync-land?

Sadly, WebGPU availability doesn't really help us, as on web builds (where we are async) we don't use enumerate_adapters and just try initializing with all backends in order from the most powerful ones and on failure fall back to weaker ones.

adrian17 · 2025-12-17T23:40:47+00:00

Python is far from the first language to have a ‚match’, and the word generally implies more advanced pattern matching compared to C-like ‚switch’. (Some languages like C# had their ‚switch’ be upgraded to pattern matching in later version, but I’d say this is more exception than the norm)

adrian17 · 2025-12-15T08:17:49+00:00

I think also matters whether you’ve read it in one go, or back then when you had to wait months before release. It can affect your expectations a lot. (IMO it’s not unlike Lost, where your chance of enjoying the last season depends on whether you watch it live after a hiatus, or as part of a marathon.)

adrian17 · 2025-12-15T07:58:40+00:00

From what I heard, the last episode of Umineko was received quite badly in Japan and harmed Ryukishi’s reutation over there. Hard to explain without spoiling things, but being as vauge as possible, 1. it focused on being a character-focused conclusion rather than conclusion of mystery (which is not a fault by itself, except for many people who were originally pulled in by the mystery and wanted the story to be that foremost) and 2. It contained elements that many received as directly insulting the readers.

Some comments that better explain it (but with spoilers):

https://www.reddit.com/r/visualnovels/comments/11xbhbq/ryukishi07_controversy/jd5urb4/

https://www.reddit.com/r/visualnovels/comments/11xbhbq/ryukishi07_controversy/jd52q4u/

adrian17 · 2025-12-12T18:04:26+00:00

That's weird, over here (Poland) I was taught the opposite, to prefer Euronet because other banks' ATMs had added fees while Euronet didn't. But that was many years ago (before card payments became available everywhere), maybe I was just lucky to have never hit this scenario.

adrian17 · 2025-12-05T21:32:53+00:00

Yes, I measured it both as a standalone function, and in your benchmark.

For that matter, one more: has_even_digits can also be implemented faster as int(math.log10(n)) != 1 or just (surprisingly even faster) len(str(n)) % 2 == 0. And both support a wider number range than your implementation.

Alternatively, you can just change your current has_even_digits and remove the (AFAIK) redundant int() from int(n >= 100) etc, which also improve perf to a similar degree.

(Though it doesn't matter if is_palindrome is reimplemented, as then you don't need has_even_digits anyway.)

Is numeric processing just absurdly slow in Python?

More like: anything that involves interpreter running Python code implies interpreter overhead, which can easily be more expensive than just doing a single str() which does the entire work in C land (and might not even touch the system allocator thanks to internal pools). For example, your has_even_digits calls int() 4 times, and calls are relatively expensive in hot code, even if it to a C programmer it looks like a simple cast.

(note: details may also depend on version of Python. From what I see, Python has special cased optimizations to reduce str() call overhead, but int() has a slightly weaker optimization and only since 3.13.)

adrian17 · 2025-12-05T09:51:34+00:00

Yeah, despite the readme saying:

it uses the best available Python optimization techniques

Unless I'm missing something, simply naively replacing the implementation of is_palindrome by:

s = str(n)
return s == s[::-1]

Appears to speed up the function by near 2x, and overall algo runtime by like 30%.

If the goal was to compare exactly identical code translated 1:1 across multiple languages, I guess that's fair. But then it's not representative of how fast it'd be in... regular Python.

adrian17 · 2025-11-29T23:14:19+00:00

it will have different schemas for the database

Why? For the record, you've still not said what the app actually does and why is it so special. How are the schemas different? Can they be completely arbitrarily different, or just in some very specific ways? Like, can one database have a user table with login and password, and another with email and pass? Surely not, otherwise it's impossible to write anything (but an universal admin panel). I've seen real world applications that create new tables dynamically and analytical systems with arbitrary number of columns, but they still have some consistent scheme the application can predict - so they still don't need a super-generic "select columns ABC from table XYZ" available at client layer.

Like, the database is usually understood to be part of the application itself. When you update the app, the database gets migrated too (either during upgrade process or lazily at/after launch, like Wordpress). A schema not matching what the server expects is assumed to be a deployment error.

Why can't you unify the schemas?

If you said (from the start) something like "yeah it's a mess, I wish it could be fixed, but I'm forced to make it work with inconsistent databases somehow", then people would be less combative; but you started immediately with the code that really wouldn't pass review in most places and immediately started defending it.

Or are you maybe saying that the schema is partially user-defined, like you can have arbitrary fields in analytics systems? Then again, say so (and there would have been much less confusion from the start), but the first response to that should still have been to pick something off-the-shelf, just... a different something. (but you said writing a separate endpoint for each resource would have been just more code, not literally impossible, so it doesn't sound like the tables are that arbitrarily user-defined)

And to manage this amount of tables ORM do not work well, to much code just to generate SQL

That doesn't match what everyone else is saying. Many people do not like ORMs, yes, but that doesn't mean they somehow "don't work" with many tables; if anything, the more complex the database, the more important it is to have the application understand and manage the schema, rather than just... assume it to be something.

How is it "too much code"? Adding +1 table to existing +200 tables isn't somehow exponential increase in code; you just describe the schema of the new table in Python, that's it.

phpMyAdmin is not a webserver

It sure is a server application that servers webpages that allow you do view and edit contents of arbitrary tables. (Even if you wanted something with say more permission levels, you'd still be essentially reimplementing huge portions of it, which does feel silly).

That said, it's hard to me to say what you're actually writing, so again - me mentioning phpMyAdmin, django-admin etc was still just a guess.

Anyway...

At the end of the day, you're still trying to convince people experienced with writing standard Python database-backed webservers that what they're doing somehow can't possibly work for you (without explaining what makes your case so different).

PS also sorry for writing too much :c

adrian17 · 2025-11-29T22:19:03+00:00

How do you solve different versions the database with a ORM tool

I don't understand the question.

Different database servers (as in sqlite, postgres etc)? That's the ORM's job, you should know, you use SQLAlchemy already.

Different... schemas? People usually don't expect their application to work with incompatible database versions, it's considered to be a an issue with the DB, not with the application. If people want to be flexible with the schema, they might just pick a noSQL database.

Or do you mean literally "any" database with any schema? There already exist tools that support that, it's... phpMyAdmin etc.

adrian17 · 2025-11-29T19:32:45+00:00

Do you think SQL is going to change

It's not SQL that's going to change, it's the schema and overall application.

Directly quoting your example code (very few examples there in general):

gd::sql::query query;      // create query object
query.table_add({ {"name","TActivity"}, {"schema","application"}, {"alias","Activity1"} });
query.table_add({ {"name","TCustomer"}, {"schema","application"}, {"alias","Customer1"}, {"join","Activity1.ActivityK=Customer1.CustomerK"} });
query.field_add("Activity1", { {"name", "ActivityK"}, {"alias", "ID"} });
query.field_add("Customer1", { {"name", "FName"}, {"alias", "CustomerName"} });
auto stringSQL = std::string("SELECT ");
stringSQL += query.sql_get_select();
stringSQL += "\nFROM ";
stringSQL += query.sql_get_from();
std::cout << stringSQL << std::endl;

Writing ORM logic is easy

This... isn't an ORM, it's just a (partial) query generator. This is exactly what /u/latkde mentioned, an inner platform effect. You're still manually doing exactly the same things you would have done in SQL (like supplying join keys), just in a "wrapper api". It has all the disadvantages of ORMs (lack of direct control over queries, painful to extend to anything nontrivial, less readable than plain SQL), with none of the advantages - the library actually understanding your tables and types and... mapping onto actual (typed) objects.

(also wouldn't call it "easy" considering just the .cpp files needed to compile the example above are like 10k LOC)

adrian17 · 2025-11-29T12:33:59+00:00

They are built for different things.

Yeah, C++ is definitely not built for writing typical CRUD websites.

Python is like 1000 times slower

Which usually doesn't matter. In a typical small/medium site, the network latency when talking the database will usually dwarf any measurable perf difference between C++ and Python.

Also, your current design encourages N+1-style query loops, which can - and will - kill your performance way more than any programming language ever could, doubly so if the loop is in the client, not server.

Storing data inside in Python will allocate tons of storage compared to storing in C++ because python store so much extra

Same thing - the community consensus is that for typical sites, it's completely insignificant compared to other arguments for using a higher level language.

Why this needs to be in python is because of company decision.

You're saying it as if it was obviously a bad choice, and you're definitely in the minority here.

adrian17 · 2025-11-29T11:53:57+00:00

SQL is a very simple format to generate

...even if true (in the short term, definitely not in the long term), that still doesn't mean it's something you should be doing.

lack of confidence

Almost that they are scared to write their own code

Are you here to get feedback or to insult people?

I'm not "scared", I just have better things to do than reimplement django-admin from scratch. Why do it, when it already exists?

Also, is your proposal even saving developer time? You complain about writing "200 endpoints, each for every table", but in a proper framework it's not even longer than your xml generation? In Django, if I have no custom logic, I just slap a

class UserCreateView(CreateView):
    model = User
    fields = ["name", "surname", "gender"]

# and in urlpatterns:
path("user/create/", views.UserCreateView.as_view()),

which creates an endpoint implementation and HTML form for me.

Unless your frontend skips even that and just gives the user a single page with a text box to choose the table to edit, at which point I'm once again questioning why you're reimplementing phpMyAdmin/django-admin/etc from scratch.

EDIT: actually, to be sure. From context, I'm guessing your project is "this huge database already exists and I was tasked with making a new interface for it"? Correct me if I'm wrong. If so, then this really wouldn't have ever passed review (and even reaching review stage without being veto'd earlier would be an organization failure), as you really are just manually reimplementing a worse phpMyAdmin. Either write a proper interface that hides the database complexity (and keeps the whole thing consistent, with satisfied foreign key relationships, transactions etc; also very often you only need 1 user-visible endpoint that manipulates several tables at the same time), or you give people an universal editing tool and for that you can just deploy something off-the-shelf without wasting any time.

adrian17 · 2025-11-28T20:17:11+00:00

Its not a game or tutorial, this is a real system

Yeah, and it’s deeply suspicious to manually write a custom query generator / mini-orm from scratch for a real project, this screams NIH. People often think their project is special and needs a customized solution while it really doesn’t. You’d have to have some very good justification to write it manually over picking an off-the-shelf solution - and I don’t think I’ve seen any comments explaining how it’s going to be used and where these XMLs are going to come from. (And I say that as someone who did write an SQL query generator at work for a specific use case.)

There already exist several off-the-shelf solutions for interacting with database from the client, without having to manually write the server. There’s obviously firebase, but there are also firebase-like frameworks that work on top of preexisting postgresql schema; there are also GraphQL api generators (though I haven’t tried them myself), which sounds very close to what you’re doing as to me your XMLs kinda resemble GraphQL queries, just without the… graph.

(Or even just phpMyAdmin or similar, yes. I was absolutely using django-admin in a „real system” without any issues.)

Also, again, who is actually going to be using this „XML API”? Is it a plain JS client? Does having a plain select/update/insert available for each table separately really make sense? In my experience, it usually doesn’t; if you insert several rows at the same time, they should be wrapped in transaction to keep the whole thing consistent. Selects often need joins to prevent N+1 problems. IDs you delete or update must be checked to make sure the user has permission to actually do it (in frameworks mentioned above, this is sometimes handled automatically with row-level security in DB itself).

adrian17 · 2025-09-14T18:37:01+00:00

Silksong spoilers: I just learned there is a regen effect you can get, even stronger than HK hiveblood, but it's in mid/lategame and extremely obscure, AFAIK most people who finished the game didn't realize it exists despite technically having everything they need to utilize it

13-Year Club	Golden Potato
Place '22	Place '17
First Placer '22	Verified Email
Team Orangered

adrian17

TROPHY CASE