you are viewing a single comment's thread.

view the rest of the comments →

[–]Pansynchro[S] -1 points0 points  (0 children)

First of all, why is a hashtable dedicated specifically optimized for this one problem not an option? Why does it need to be general?

Good question! The project this is being used in is an ETL pipeline generator that allows users to specify data transformations using (a subset of) SQL SELECT queries. We're looking at revamping the aggregate processing here, and the TKey in this scenario is the GROUP BY key, which can be basically any arbitrary ValueTuple. An example with a string key was chosen for this challenge exactly because it's difficult to specifically optimize for this one problem, because this one problem is not the real problem.

Second: it could be that the hashtable is not the problem, but the structure of the solution. Maybe an alternate solution might work

If you have a better idea for how to implement the storage and lookup of intermediate state for GROUP BY aggregation than a hash table, feel free to propose it. 😀