hash - Is it possible to come up with distinct strings with identical hashcodes algorithmically? - Stack Overflow

In Java we know some examples of strings which have identical hash codes despite being distinct strings

In Java we know some examples of strings which have identical hash codes despite being distinct strings, such as Ea and FB both having a hash code of 2236.

Is there a way, algorithmically, to come up with more examples of distinct strings sharing hash codes without using brute force testing of many strings, given we know the exact hashing algorithm used in each language?

In Java we know some examples of strings which have identical hash codes despite being distinct strings, such as Ea and FB both having a hash code of 2236.

Is there a way, algorithmically, to come up with more examples of distinct strings sharing hash codes without using brute force testing of many strings, given we know the exact hashing algorithm used in each language?

Share Improve this question asked Nov 20, 2024 at 3:36 AamirAamir 4934 silver badges19 bronze badges 8
  • Yes, but the solving algorithm is going to be different depending on the hashing algorithm. Are you looking for any specific one? I guess most generically, there are solvers like Z3. – Ry- Commented Nov 20, 2024 at 3:43
  • I didn't have any in mind particularly, but I was curious as to the approach once given a hashing algorithm to come up with arbitrary collisions. I suppose if I were to try this myself it would be with Java's algorithm for strings – Aamir Commented Nov 20, 2024 at 3:51
  • 1 By "in each language" do you mean you want an algorithm that would work for every language? What if a language doesn't have a built-in concept of "hash code"? Or what if it uses a hash scheme that's specifically designed to resist this, e.g. with a random seed (as in Perl), or a deterministic but cryptographically secure hash like SHA-256? – ruakh Commented Nov 20, 2024 at 3:52
  • I mean an approach for coming up with arbitrary examples of collisions giving a hashing algorithm. Doesn't even necessarily need to be for programming languages implementations but just in general – Aamir Commented Nov 20, 2024 at 3:56
  • 2 @MartinBrown: That's exactly what I wrote. – President James K. Polk Commented Nov 21, 2024 at 21:52
 |  Show 3 more comments

1 Answer 1

Reset to default 0

Is there a way, algorithmically, to come up with more examples of distinct strings sharing hash codes without using brute force testing of many strings, given we know the exact hashing algorithm used in each language?

Sometimes. Let's take your Java example... "Ea and FB both having a hash code of 2236" and consider the hash function (here in C++, and only for ASCII strings):

int java_hash(const std::string& s) {
    int hash = 0;
    for (char c : s) (hash *= 31) += c;
    return hash;
}

We can see it starts by taking the value of the first character. Then for two-character strings, it multiplies that by 31 and adds the next character.

Say we generalise and explain this: we have a two-character string KL (e.g. for "Ea", K == 'E' and L == 'a'); the hash value ends up being 31K + L. Then to find a colliding key, we could try incrementing K to the next character value (e.g. 'F') which increases the hash output by 31, so to counter-balance we decrease L's character value by 31 (i.e. 'a'==ASCII 97 back to 'B'==66).

We can find yet another collision by incrementing to 'G' and decreasing to '#'.

Or, we could move in the other direction: if the new L value had been out of the valid value range for character types, we could instead decrement K and increase L by 31, still having a net-0 effect on the character value, but in this particular case "Ea" would need 'E' to become 'D', and 'a' to become 128, which may or may not be valid in your string type.

If we add more characters, we can observe something interesting - the overall hash value gets more complicated, but to create a collision we can still just screw with the last two characters in the same way, making sure their impact on the overall hash value is net-zero for our colliding keys.

With better and better hash functions, this kind of analysis becomes more and more difficult, and with uncompromised cryptographic strength functions it should be impractical even for an expert mathematician. (That's why we see cryptocurrency mining having to use brute-force searches for "nonce" values in the key fields, seeking to create a desired overall hash value for the block.)

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742384125a4433710.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信