Should I store objects in an Array or inside an Object with top importance given Write Speed?
I'm trying to decide whether data should be stored as an array of objects, or using nested objects inside a mongodb document.
In this particular case, I'm keeping track of a set of continually updating files that I add and update and the file name acts as a key and the number of lines processed within the file.
the document looks something like this
{
t_id:1220,
some-other-info: {}, // there's other info here not updated frequently
files: {
log1-txt: {filename:"log1.txt",numlines:233,filesize:19928},
log2-txt: {filename:"log2.txt",numlines:2,filesize:843}
}
}
or this
{
t_id:1220,
some-other-info: {},
files:[
{filename:"log1.txt",numlines:233,filesize:19928},
{filename:"log2.txt",numlines:2,filesize:843}
]
}
I am making an assumption that handling a document, especially when it es to updates, it is easier to deal with objects, because the location of the object can be determined by the name; unlike an array, where I have to look through each object's value until I find the match.
Because the object key will have periods, I will need to convert (or drop) the periods to create a valid key (fi.le.log
to filelog
or fi-le-log
).
I'm not worried about the files' possible duplicate names emerging (such as fi.le.log
and fi-le.log
) so I would prefer to use Objects, because the number of files is relatively small, but the updates are frequent.
Or would it be better to handle this data in a separate collection for best write performance...
{
"_id": ObjectId('56d9f1202d777d9806000003'),"t_id": "1220","filename": "log1.txt","filesize": 1843,"numlines": 554
},
{
"_id": ObjectId('56d9f1392d777d9806000004'),"t_id": "1220","filename": "log2.txt","filesize": 5231,"numlines": 3027
}
Should I store objects in an Array or inside an Object with top importance given Write Speed?
I'm trying to decide whether data should be stored as an array of objects, or using nested objects inside a mongodb document.
In this particular case, I'm keeping track of a set of continually updating files that I add and update and the file name acts as a key and the number of lines processed within the file.
the document looks something like this
{
t_id:1220,
some-other-info: {}, // there's other info here not updated frequently
files: {
log1-txt: {filename:"log1.txt",numlines:233,filesize:19928},
log2-txt: {filename:"log2.txt",numlines:2,filesize:843}
}
}
or this
{
t_id:1220,
some-other-info: {},
files:[
{filename:"log1.txt",numlines:233,filesize:19928},
{filename:"log2.txt",numlines:2,filesize:843}
]
}
I am making an assumption that handling a document, especially when it es to updates, it is easier to deal with objects, because the location of the object can be determined by the name; unlike an array, where I have to look through each object's value until I find the match.
Because the object key will have periods, I will need to convert (or drop) the periods to create a valid key (fi.le.log
to filelog
or fi-le-log
).
I'm not worried about the files' possible duplicate names emerging (such as fi.le.log
and fi-le.log
) so I would prefer to use Objects, because the number of files is relatively small, but the updates are frequent.
Or would it be better to handle this data in a separate collection for best write performance...
{
"_id": ObjectId('56d9f1202d777d9806000003'),"t_id": "1220","filename": "log1.txt","filesize": 1843,"numlines": 554
},
{
"_id": ObjectId('56d9f1392d777d9806000004'),"t_id": "1220","filename": "log2.txt","filesize": 5231,"numlines": 3027
}
Share
Improve this question
edited Nov 22, 2017 at 19:06
nanobar
66.6k25 gold badges152 silver badges176 bronze badges
asked Mar 4, 2016 at 20:22
DanielDaniel
35.8k17 gold badges114 silver badges161 bronze badges
4
- 2 a quick test is worth lots of speculation... – dandavis Commented Mar 4, 2016 at 20:25
-
what does
t_id
signify? – AxxE Commented Mar 4, 2016 at 20:59 -
its an ambiguous id, the significance is that there are multiple
t_id
s and each have multiplefile_name
s (1:m) – Daniel Commented Mar 4, 2016 at 21:12 - This is really more of a question of how you "read" the data than of "write" performance. Clearly if you intend to "read multiple series" at once then it's generally better to keep the data within the same colllection object. If not, and particularly if there is more "create" than "update" then separate collection objects makes much more sense from a "write" perspective. The general difference is reasonably negligable in terms of writing on modern engines, and with separate documents giving you more concurrency with document locking on Wired Tiger. Use to your case. And test, then test again – Blakes Seven Commented Mar 5, 2016 at 0:48
1 Answer
Reset to default 6From what I understand you are talking about write speed, without any read consideration. So we have to think about how you will insert/update your document.
We have to pare (assuming you know the _id
you are replacing, replace {key}
by the key name, in your example log1-txt
or log2-txt
):
db.Col.update({ _id: '' }, { $set: { 'files.{key}': object }})
vs
db.Col.update({ _id: '', 'files.filename': '{key}'}, { $set: { 'files.$': object }})
The second one means that MongoDB have to browse the array, find the matching index and update it. The first one means MongoDB just update the specified field.
The worst:
The second mand will not work if the matching filename
is not present in the array! So you have to execute it, check if nMatched
is 0, and create it if it is so. That's really bad write speed (see here MongoDB: upsert sub-document).
If you will never/almost never use read queries / aggregation framework on this collection: go for the first one, that will be faster. If you want to aggregate, unwind, do some analytics on the files you parsed to have statistics about file size and line numbers, you may consider using the second one, you will avoid some headache.
Pure write speed will be better with the first solution.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744319471a4568343.html
评论列表(0条)