indexing - MongoDB: Creating an Index for Efficient Regex Search on a Text Field - Stack Overflow

I have a MongoDB collection where I store values as concatenated object references in a single string f

I have a MongoDB collection where I store values as concatenated object references in a single string field. The values are structured like this:

{
  "resource": {
    "fields": {
      "value": {
        "to": "parent_671f3db04b00e7efd82a6c5b;image_67d00437953155602e87b6fa;file_671f3db04b00e7efd82a6123"
      }
    }
  }
}

Currently, I am searching for specific ObjectId substrings within this field using a regex query:

{
  "resource.fields.value.to": {
    "$regex": "671f3db04b00e7efd82a6c5b",
    "$options": "i"
  }
}

This search works but is slow, especially on large datasets. I want to optimize the query performance by creating an index on this field.

Questions: How can I create an index to speed up regex-based searches on this field?

Would a full-text index help in this case, or is there a better approach?

Are there alternative ways to structure my data to make queries more efficient?(not optimal)

I have a MongoDB collection where I store values as concatenated object references in a single string field. The values are structured like this:

{
  "resource": {
    "fields": {
      "value": {
        "to": "parent_671f3db04b00e7efd82a6c5b;image_67d00437953155602e87b6fa;file_671f3db04b00e7efd82a6123"
      }
    }
  }
}

Currently, I am searching for specific ObjectId substrings within this field using a regex query:

{
  "resource.fields.value.to": {
    "$regex": "671f3db04b00e7efd82a6c5b",
    "$options": "i"
  }
}

This search works but is slow, especially on large datasets. I want to optimize the query performance by creating an index on this field.

Questions: How can I create an index to speed up regex-based searches on this field?

Would a full-text index help in this case, or is there a better approach?

Are there alternative ways to structure my data to make queries more efficient?(not optimal)

Share Improve this question asked Mar 11 at 13:44 Jovan DimovJovan Dimov 434 bronze badges 1
  • 1 From the docs: Index use and performance for $regex queries varies depending on whether the query is case-sensitive or case-insensitive. ... Case-insensitive indexes typically do not improve performance for $regex queries. The $regex implementation is not collation-aware and cannot utilize case-insensitive indexes efficiently. – aneroid Commented Mar 11 at 13:52
Add a comment  | 

1 Answer 1

Reset to default 2

From the $regex docs: Index use and performance for $regex queries varies depending on whether the query is case-sensitive or case-insensitive. ... Case-insensitive indexes typically do not improve performance for $regex queries. The $regex implementation is not collation-aware and cannot utilize case-insensitive indexes efficiently.

Wrt

How can I create an index to speed up regex-based searches on this field?

1. If you can convert the resource.fields.value.to field to always be lowercase and also do the search with your id lower-cased beforehand, then you can drop the case-insensitive option: "i" part and the regex search could use the index.

2. If you could use prefix expressions with Case-Sensitive queries: "Further optimization can occur if the regular expression is a "prefix expression", which means that all potential matches start with the same string. This allows MongoDB to construct a "range" from that prefix and only match against those values from the index that fall within that range."

3. Wrt

Are there alternative ways to structure my data to make queries more efficient?(not optimal)

This may be obvious but your data structure can be optimised to split up on the fields you have. Like:

{
  "resource": {
    "fields": {
      "value": {
        "to": {
          "parent": "671f3db04b00e7efd82a6c5b",
          "image": "67d00437953155602e87b6fa",
          "file": "671f3db04b00e7efd82a6123",
        }
      }
    }
  }
}

Note that I have dropped the prefixes parent_, image_, file_ since the field specifies which Id it's for. And then you can use a prefix regex or just standard equality checks with an $or clause, which will use the index - this one will require 3 indexes: one for each of the to fields. (You may also want to consider un-nesting the fields a bit, but that depends on your usage.)

The query then becomes:

db.collection.find({
  $or: [
    { "resource.fields.value.to.parent": "671f3db04b00e7efd82a6c5b" },
    { "resource.fields.value.to.image": "671f3db04b00e7efd82a6c5b" },
    { "resource.fields.value.to.file": "671f3db04b00e7efd82a6c5b" }
  ]
})

Mongo Playground


An alternative structure you could have is: split up the to fields as an array of separate strings:

{
  "resource": {
    "fields": {
      "value": {
        "to": [
          "parent_671f3db04b00e7efd82a6c5b",
          "image_67d00437953155602e87b6fa",
          "file_671f3db04b00e7efd82a6123"
        ]
      }
    }
  }
}

And to query it, you can use an $or or $in clause. Note that you will need to add each prefix for the equality check to work and take advantage of the index:

// with OR clause
db.collection.find({
  $or: [
    { "resource.fields.value.to": "parent_671f3db04b00e7efd82a6c5b" },
    { "resource.fields.value.to": "image_671f3db04b00e7efd82a6c5b" },
    { "resource.fields.value.to": "file_671f3db04b00e7efd82a6c5b" }
  ]
})

// with IN clause
db.collection.find({
  "resource.fields.value.to": {
    $in: [
      "parent_671f3db04b00e7efd82a6c5b",
      "image_671f3db04b00e7efd82a6c5b",
      "file_671f3db04b00e7efd82a6c5b"
    ]
  }
})

Mongo Playground 2A, with $or
Mongo Playground 2B, with $in

Or as a single regex, with prefixed-regexes OR'ed internally - but check if this uses the index:

db.collection.find({
  "resource.fields.value.to": {
    "$regex": "^parent_671f3db04b00e7efd82a6c5b|^image_671f3db04b00e7efd82a6c5b|^file_671f3db04b00e7efd82a6c5b"
  }
},
)

Mongo Playground 3

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744791030a4593917.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信