javascript - storing data as object vs array in MongoDb for write performance

Should I store objects in an Array or inside an Object with top importance given Write Speed?

I'm trying to decide whether data should be stored as an array of objects, or using nested objects inside a mongodb document.

In this particular case, I'm keeping track of a set of continually updating files that I add and update and the file name acts as a key and the number of lines processed within the file.

the document looks something like this

{
  t_id:1220,
  some-other-info: {}, // there's other info here not updated frequently
  files: {
    log1-txt: {filename:"log1.txt",numlines:233,filesize:19928},
    log2-txt: {filename:"log2.txt",numlines:2,filesize:843}
  }
}

or this

{
  t_id:1220,
  some-other-info: {},
  files:[
    {filename:"log1.txt",numlines:233,filesize:19928},
    {filename:"log2.txt",numlines:2,filesize:843}
  ]
}

I am making an assumption that handling a document, especially when it es to updates, it is easier to deal with objects, because the location of the object can be determined by the name; unlike an array, where I have to look through each object's value until I find the match.

Because the object key will have periods, I will need to convert (or drop) the periods to create a valid key (fi.le.log to filelog or fi-le-log). I'm not worried about the files' possible duplicate names emerging (such as fi.le.log and fi-le.log) so I would prefer to use Objects, because the number of files is relatively small, but the updates are frequent.

Or would it be better to handle this data in a separate collection for best write performance...

{
    "_id": ObjectId('56d9f1202d777d9806000003'),"t_id": "1220","filename": "log1.txt","filesize": 1843,"numlines": 554
},
{
    "_id": ObjectId('56d9f1392d777d9806000004'),"t_id": "1220","filename": "log2.txt","filesize": 5231,"numlines": 3027
}

Should I store objects in an Array or inside an Object with top importance given Write Speed?

I'm trying to decide whether data should be stored as an array of objects, or using nested objects inside a mongodb document.

In this particular case, I'm keeping track of a set of continually updating files that I add and update and the file name acts as a key and the number of lines processed within the file.

the document looks something like this

{
  t_id:1220,
  some-other-info: {}, // there's other info here not updated frequently
  files: {
    log1-txt: {filename:"log1.txt",numlines:233,filesize:19928},
    log2-txt: {filename:"log2.txt",numlines:2,filesize:843}
  }
}

or this

{
  t_id:1220,
  some-other-info: {},
  files:[
    {filename:"log1.txt",numlines:233,filesize:19928},
    {filename:"log2.txt",numlines:2,filesize:843}
  ]
}

Or would it be better to handle this data in a separate collection for best write performance...

{
    "_id": ObjectId('56d9f1202d777d9806000003'),"t_id": "1220","filename": "log1.txt","filesize": 1843,"numlines": 554
},
{
    "_id": ObjectId('56d9f1392d777d9806000004'),"t_id": "1220","filename": "log2.txt","filesize": 5231,"numlines": 3027
}

Share Improve this question edited Nov 22, 2017 at 19:06 nanobar 66.6k25 gold badges152 silver badges176 bronze badges asked Mar 4, 2016 at 20:22 Daniel 35.8k17 gold badges114 silver badges161 bronze badges

2 a quick test is worth lots of speculation... – dandavis Commented Mar 4, 2016 at 20:25
what does t_id signify? – AxxE Commented Mar 4, 2016 at 20:59
its an ambiguous id, the significance is that there are multiple t_ids and each have multiple file_names (1:m) – Daniel Commented Mar 4, 2016 at 21:12
This is really more of a question of how you "read" the data than of "write" performance. Clearly if you intend to "read multiple series" at once then it's generally better to keep the data within the same colllection object. If not, and particularly if there is more "create" than "update" then separate collection objects makes much more sense from a "write" perspective. The general difference is reasonably negligable in terms of writing on modern engines, and with separate documents giving you more concurrency with document locking on Wired Tiger. Use to your case. And test, then test again – Blakes Seven Commented Mar 5, 2016 at 0:48

Add a ment |

1 Answer 1

Sorted by: Reset to default 6

From what I understand you are talking about write speed, without any read consideration. So we have to think about how you will insert/update your document.

We have to pare (assuming you know the _id you are replacing, replace {key} by the key name, in your example log1-txt or log2-txt):

db.Col.update({ _id: '' }, { $set: { 'files.{key}': object }})

db.Col.update({ _id: '', 'files.filename': '{key}'}, { $set: { 'files.$': object }})

The second one means that MongoDB have to browse the array, find the matching index and update it. The first one means MongoDB just update the specified field.

The worst: The second mand will not work if the matching filename is not present in the array! So you have to execute it, check if nMatched is 0, and create it if it is so. That's really bad write speed (see here MongoDB: upsert sub-document).

If you will never/almost never use read queries / aggregation framework on this collection: go for the first one, that will be faster. If you want to aggregate, unwind, do some analytics on the files you parsed to have statistics about file size and line numbers, you may consider using the second one, you will avoid some headache.

Pure write speed will be better with the first solution.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744319471a4568343.html

javascript - storing data as object vs array in MongoDb for write performance - Stack Overflow

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

javascript - storing data as object vs array in MongoDb for write performance - Stack Overflow

1 Answer 1

相关推荐