Skip to content

gh-149807: Fix hash(frozendict): compute (key, value) pair hash#149841

Open
vstinner wants to merge 3 commits into
python:mainfrom
vstinner:frozendict_pair_hash
Open

gh-149807: Fix hash(frozendict): compute (key, value) pair hash#149841
vstinner wants to merge 3 commits into
python:mainfrom
vstinner:frozendict_pair_hash

Conversation

@vstinner
Copy link
Copy Markdown
Member

@vstinner vstinner commented May 14, 2026

@tim-one
Copy link
Copy Markdown
Member

tim-one commented May 14, 2026

It's good to avoid the frozenset hash code. It's not a good hash function. You can check this by constructing subsets of [i/2**n for i in range(2**n)[. The hashes of those elements vary in only the high-order bits, and the frozenset hash function is poor at "avalanching" high-order changes to lower-order bits. It's good in the other direction, though. For frozensets of low-precision floats, collisions are far too common.

This showed up when trying to construct "bad cases" for the xxHash-based tuple hashing. Raymond was made aware of it, but never got around to "doing something" about it.

No idea how the Boost-inspired scheme would work. Its scrambler does do some high-to-low propagation (via right shifts), but xxHash's rotate is best-of-all (and we took care to ensure that all major compilers did emit a "rotate" instruction instead of the longer-winded portable C spelling we use).

Long story short: properly validating a compound hash function in the context of how it plays with CPython's hash results for primitive types (which, apart from string hashes, make no attempt at creating "random-looking" results) can require weeks of work.

I can't make time for that, and have less than no interest in doing it again anyway ;-) I do have confidence in the tuple hashing approach - which was hard won.

@lambda-abstraction
Copy link
Copy Markdown

what about changing the starting hash to avoid the collision with frozenset(frozendict.items())? im not sure if its realistic for a single set/dict to use both frozensets and frozendicts as keys, but i dont think making the hash of a frozendict just exactly identical to the frozenset of entries is perfect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants