elasticsearch - Fuzzy matching multi term query wrong results - Stack Overflow

Looks like I'm missing something obvious when trying to fuzzy match multi term query.What I'

Looks like I'm missing something obvious when trying to fuzzy match multi term query.

What I'd like to achieve is to get only "Goleniow Helenow" result when providing "Goleniow Heleniow" query (city + district name with typo). Instead I get all the docs.

Basically I think I've tried all combinations of minimum_should_match, operator and even fuzziness parameters with no satisfying result.

Anyone could point out what am I missing ?

Index setup

curl -X PUT "localhost:9200/test-index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": { 
      "name": { 
        "type": "text",
        "index": true
      } 
    } 
  }   
}'

Docs to index

curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow Helenow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow Jaworow"
}'

Query and result

curl -X POST "localhost:9200/test-index/_search?pretty" -H 'Content-Type: application/json' -d'{
  "query": {
    "match": {
      "name": {
        "minimum_should_match": "100%",
        "operator": "and",
        "fuzziness": "2",
        "query": "Goleniow Heleniow"
      }
    }
  }
}'
{
  "took" : 104,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.32180583,
    "hits" : [
      {
        "_index" : "test-index",
        "_id" : "Waqnj5UBnvH7uZURvQTX",
        "_score" : 0.32180583,
        "_source" : {
          "name" : "Goleniow Helenow"
        }
      },
      {
        "_index" : "test-index",
        "_id" : "Wqqoj5UBnvH7uZUR2QSO",
        "_score" : 0.2793999,
        "_source" : {
          "name" : "Goleniow"
        }
      },
      {
        "_index" : "test-index",
        "_id" : "W6qoj5UBnvH7uZUR8AT4",
        "_score" : 0.21600665,
        "_source" : {
          "name" : "Goleniow Jaworow"
        }
      }
    ]
  }
}

Looks like I'm missing something obvious when trying to fuzzy match multi term query.

What I'd like to achieve is to get only "Goleniow Helenow" result when providing "Goleniow Heleniow" query (city + district name with typo). Instead I get all the docs.

Basically I think I've tried all combinations of minimum_should_match, operator and even fuzziness parameters with no satisfying result.

Anyone could point out what am I missing ?

Index setup

curl -X PUT "localhost:9200/test-index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": { 
      "name": { 
        "type": "text",
        "index": true
      } 
    } 
  }   
}'

Docs to index

curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow Helenow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "Goleniow Jaworow"
}'

Query and result

curl -X POST "localhost:9200/test-index/_search?pretty" -H 'Content-Type: application/json' -d'{
  "query": {
    "match": {
      "name": {
        "minimum_should_match": "100%",
        "operator": "and",
        "fuzziness": "2",
        "query": "Goleniow Heleniow"
      }
    }
  }
}'
{
  "took" : 104,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.32180583,
    "hits" : [
      {
        "_index" : "test-index",
        "_id" : "Waqnj5UBnvH7uZURvQTX",
        "_score" : 0.32180583,
        "_source" : {
          "name" : "Goleniow Helenow"
        }
      },
      {
        "_index" : "test-index",
        "_id" : "Wqqoj5UBnvH7uZUR2QSO",
        "_score" : 0.2793999,
        "_source" : {
          "name" : "Goleniow"
        }
      },
      {
        "_index" : "test-index",
        "_id" : "W6qoj5UBnvH7uZUR8AT4",
        "_score" : 0.21600665,
        "_source" : {
          "name" : "Goleniow Jaworow"
        }
      }
    ]
  }
}
Share Improve this question asked Mar 14 at 8:38 John TamedJohn Tamed 231 silver badge2 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

You can use span_near query. Here is a similar discussion.

Example query:

GET test-index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "name": {
                        "value": "Goleniow",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              },
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "name": {
                        "value": "Heleniow",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744667291a4586827.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信