What's the best way to consistently hash an object/dictionary that's limited to what JSON can represent, in both JavaScript and Python? What about in many different languages?
Of course there are hash functions implemented consistently in many different languages that take a string, but to hash an object you have to convert it to a string representation first.
I want a hash function that will always return the same value for the same dictionary in any language, but the JSON spec doesn't guarantee anything about the order of keys in the serialized representation.
Do json.dumps()
and JSON.stringify()
behave identically? How would you verify this?
If not, is there a serialization format with libraries in many languages (I'm practically interested in Python and JavaScript but also curious about all languages) that doesn't require any additional processing by the caller to produce consistent results?
What's the best way to consistently hash an object/dictionary that's limited to what JSON can represent, in both JavaScript and Python? What about in many different languages?
Of course there are hash functions implemented consistently in many different languages that take a string, but to hash an object you have to convert it to a string representation first.
I want a hash function that will always return the same value for the same dictionary in any language, but the JSON spec doesn't guarantee anything about the order of keys in the serialized representation.
Do json.dumps()
and JSON.stringify()
behave identically? How would you verify this?
If not, is there a serialization format with libraries in many languages (I'm practically interested in Python and JavaScript but also curious about all languages) that doesn't require any additional processing by the caller to produce consistent results?
Share Improve this question edited Nov 8, 2013 at 5:15 mwhite asked Nov 8, 2013 at 5:06 mwhitemwhite 2,1311 gold badge16 silver badges21 bronze badges 3- They behave identically if you give them the right types of data (i.e. simple JavaScript objects and simple Python dicts, numbers, strings, and lists/arrays). – Ry- ♦ Commented Nov 8, 2013 at 5:10
- so sorting is part of the spec? – mwhite Commented Nov 8, 2013 at 5:11
- 1 No, order shouldn’t have anything to do with it. Order is never guaranteed, and neither Python’s dicts nor JavaScript’s objects are ordered. What are you doing, paring strings? – Ry- ♦ Commented Nov 8, 2013 at 5:12
3 Answers
Reset to default 4I would split this into two problems.
- How do you get the same serialized string in both JavaScript and Python?
- Which byte array hash function should you use? It must be an established algorithm with identical implementations in both JavaScript and Python.
Use (1) to get two strings, then UTF8 encode, then use (2) to get hashes.
Since (2) is straightforward, I'll only address (1).
There are multiple facets to the problem of making sure the two JSON strings you generate are identical.
- You'll want to used unformatted JSON (no extraneous spaces, tabs, or newlines).
- null values must be treated identically. Some serializers will by default throw away a dictionary key-value pair if the value is null.
- Ordering of key-value pairs within a dictionary must be consistent.
- JSON number serialization should be consistent. For example, you can't have integer one serialize as
1
on one side and1.0
on the other. (This probably won't be as big of an issue however.) - The string encoding should be the same for both. JSON allows serialization to Unicode text, only mandating that
"
and\
be backslash-escaped in JSON strings. Most serializers do more than necessary, however, and reduce almost all Unicode characters to the\uXXXX
equivalent. See json for the details on JSON string encoding. One way to remove all ambiguity is to only escape when absolutely necessary.
You'll want to make sure all of these are matched between JavaScript and Python. Most JSON serialization libraries I've used provide configuration hooks for all of the things I mention in the list above. Unfortunately, I'm not very familiar with the JavaScript or Python libraries.
JSON is a well-defined language for representing the state of objects. The functions do not behave identically, but they do behave equivalently.
For instance:
json.dumps({'hello':'goodbye', 123: 456})
May produce either:
{"hello":"goodbye", "123": 456}
or
{"123": 456, "hello":"goodbye"}
And if you pass in the indent
parameter then you get even more possibilities for different results.
Most languages if they do not already have a built-in way to handle JSON (e.g. Python & JS) then they'll have a 3rd party utility that is perfectly sufficient (see Newtonsoft JSON library for .NET)
Each language that I'm aware of will produce valid JSON, which means that it can be parsed by each other language that provides a JSON parser.
I thought I might attempt a practical example.
In javascript I did:
import stringify from 'json-stable-stringify'
import sha256 from 'simple-sha256'
hash_str = sha256(stringify({'hello':'goodbye', '123': 456}))
// hash_str = 72804f4e0847a477ee69eae4fbf404b03a6c220bacf8d5df34c964985acd473f
json-stable-stringify
guarantees a sorted json. sha256
allows for nodejs / browser patibility.
In python 3.8 I did:
import hashlib
import json
hash_str = hashlib.sha256(json.dumps({'hello':'goodbye', '123': 456}, sort_keys=True, separators=(',', ':')).encode("utf-8")).hexdigest()
# hash_str = 72804f4e0847a477ee69eae4fbf404b03a6c220bacf8d5df34c964985acd473f
I haven't yet done extensive testing but with the json objects I've tried, it has successfully matched.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745365306a4624549.html
评论列表(0条)