I am trying to implement a function to generate java hashCode equivalent in node.js and python to implement redis sharding. I am following the really good blog @below mentioned link to achieve this /
But i am stuck at the difference in hashCode if string contains some characters which are not ascii as in below example. for regular strings i could get both node.js and python give me same hash code.
here is the code i am using to generate this:
--Python
def _java_hashcode(s):
hash_code = 0
for char in s:
hash_code = 31*h + ord(char)
return ctypes.c_int32(h).value
--Node as per above blog
String.prototype.hashCode = function() {
for(var ret = 0, i = 0, len = this.length; i < len; i++) {
ret = (31 * ret + this.charCodeAt(i)) << 0;
}
return ret;
};
--Python output
For string '者:s��2�*�=x�' hash is = 2014651066
For string '359196048149234' hash is = 1145341990
--Node output
For string '者:s��2�*�=x�' hash is = 150370768
For string '359196048149234' hash is = 1145341990
Please guide me, where am i mistaking.. do i need to set some type of encoding in python and node program, i tried a few but my program breaks in python.
I am trying to implement a function to generate java hashCode equivalent in node.js and python to implement redis sharding. I am following the really good blog @below mentioned link to achieve this http://mechanics.flite./blog/2013/06/27/sharding-redis/
But i am stuck at the difference in hashCode if string contains some characters which are not ascii as in below example. for regular strings i could get both node.js and python give me same hash code.
here is the code i am using to generate this:
--Python
def _java_hashcode(s):
hash_code = 0
for char in s:
hash_code = 31*h + ord(char)
return ctypes.c_int32(h).value
--Node as per above blog
String.prototype.hashCode = function() {
for(var ret = 0, i = 0, len = this.length; i < len; i++) {
ret = (31 * ret + this.charCodeAt(i)) << 0;
}
return ret;
};
--Python output
For string '者:s��2�*�=x�' hash is = 2014651066
For string '359196048149234' hash is = 1145341990
--Node output
For string '者:s��2�*�=x�' hash is = 150370768
For string '359196048149234' hash is = 1145341990
Please guide me, where am i mistaking.. do i need to set some type of encoding in python and node program, i tried a few but my program breaks in python.
Share Improve this question edited Apr 4, 2014 at 4:17 balyanrobin asked Apr 3, 2014 at 18:26 balyanrobinbalyanrobin 911 silver badge5 bronze badges2 Answers
Reset to default 12def java_string_hashcode(s):
"""Mimic Java's hashCode in python 2"""
try:
s = unicode(s)
except:
try:
s = unicode(s.decode('utf8'))
except:
raise Exception("Please enter a unicode type string or utf8 bytestring.")
h = 0
for c in s:
h = int((((31 * h + ord(c)) ^ 0x80000000) & 0xFFFFFFFF) - 0x80000000)
return h
This is how you should do it in python 2.
The problem is two fold:
- You should be using the unicode type, and make sure that it is so.
- After every step you need to prevent python from auto-converting to long type by using bitwise operations to get the correct int type for the following step. (swapping the sign bit, masking to 32 bit, then subtracting the amount of the sign bit will give us a negative int if the sign bit is present and a positive int when the sign bit is not present. This mimics the int behavior in Java.
Also, as in the other answer, for hard-coded non-ascii characters, please save your source file as utf8 and at the top of the file write:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
And make sure if you receive user input that you handle them as unicode type and not string type. (not a problem for python 3)
Python 2 will assume ASCII encoding unless told otherwise. Since PEP 0263, you can specify utf-8 encoded strings with the following at the top of the file.
#!/usr/bin/python
# -*- coding: utf-8 -*-
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1743676636a4488676.html
评论列表(0条)