- Notifications
You must be signed in to change notification settings - Fork5
Commitb30d3ea
committed
Add a macro templatized hashtable.
dynahash.c hash tables aren't quite fast enough for someuse-cases. There are several reasons for lacking performance:- the use of chaining for collision handling makes them cache inefficient, that's especially an issue when the tables get bigger.- as the element sizes for dynahash are only determined at runtime, offset computations are somewhat expensive- hash and element comparisons are indirect function calls, causing unnecessary pipeline stalls- it's two level structure has some benefits (somewhat natural partitioning), but increases the number of indirectionsto fix several of these the hash tables have to be adjusted to theindividual use-case at compile-time. C unfortunately doesn't provide agood way to do compile code generation (like e.g. c++'s templates forall their weaknesses do). Thus the somewhat ugly approach taken here isto allow for code generation using a macro-templatized header file,which generates functions and types based on a prefix and otherparameters.Later patches use this infrastructure to use such hash tables fortidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation,...). In queries where these use up a large fraction of the time, thishas been measured to lead to performance improvements of over 100%.There are other cases where this could be useful (e.g. catcache.c).The hash table design chosen is a variant of linear open-addressing. Thebiggest disadvantage of simple linear addressing schemes are highlyvariable lookup times due to clustering, and deletions leaving a lot oftombstones around. To address these issues a variant of "robin hood"hashing is employed. Robin hood hashing optimizes chaining lengths bymoving elements close to their optimal bucket ("rich" elements), out ofthe way if a to-be-inserted element is further away from its optimalposition (i.e. it's "poor"). While that can make insertions slower, theaverage lookup performance is a lot better, and higher fill factors canbe used in a still performant manner. To avoid tombstones - whichnormally solve the issue that a deleted node's presence is relevant todetermine whether a lookup needs to continue looking or is done -buckets following a deleted element are shifted backwards, unlessthey're empty or already at their optimal position.There's further possible improvements that can be made to thisimplementation. Amongst others:- Use distance as a termination criteria during searches. This is generally a good idea, but I've been able to see the overhead of distance calculations in some cases.- Consider combining the 'empty' status into the hashvalue, and enforce storing the hashvalue. That could, in some cases, increase memory density and remove a few instructions.- Experiment further with the, very conservatively choosen, fillfactor.- Make maximum size of hashtable configurable, to allow storing very very large tables. That'd require 64bit hash values to be more common than now, though.- some smaller memcpy calls could be optimized to copy larger chunksBut since the new implementation is already considerably faster thandynahash it seem sensible to start using it.Reviewed-By: Tomas VondraDiscussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>1 parentaa3ca5e commitb30d3ea
2 files changed
+881
-0
lines changed0 commit comments
Comments
(0)