映射与统计
最后更新于:2022-04-01 00:39:23
# 映射与统计
当我们在进行搜索的事情,我们会发现有一些奇怪的事情。比如有一些内容似乎是被打破了:在我们的索引中有12条推文,中有一个包含了`2014-09-15`这个日期,但是看看下面的查询结果中的总数量:
~~~
GET /_search?q=2014 # 12 results
GET /_search?q=2014-09-15 # 12 results !
GET /_search?q=date:2014-09-15 # 1 result
GET /_search?q=date:2014 # 0 results !
~~~
为什么我们使用字段`_all`搜索全年就会返回所有推文,而使用字段`date`搜索年份却没有结果呢?为什么使用两者所得到的结果是不同的?
推测大概是因为我们的数据在`_all`和`date`在索引时没有被相同处理。我们来看看Elasticsearch是如何处理我们的文档结构的。我们可以对`gb`的`tweet`使用_mapping_请求:
~~~
GET /gb/_mapping/tweet
~~~
我们得到:
~~~
{
"gb": {
"mappings": {
"tweet": {
"properties": {
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"name": {
"type": "string"
},
"tweet": {
"type": "string"
},
"user_id": {
"type": "long"
}
}
}
}
}
}
~~~
Elasticsearch会根据系统自动判断字段类型并生成一个映射。返回结果告诉我们`date`字段被识别成了`date`类型。`_all`没有出现是因为他是默认字段,但是我们知道字段`_all`实际上是`string`类型的。
所以类型为`date`的字段和类型为`string`的字段的索引方式是不同的。
So fields of type `date` and fields of type `string` are indexed differently,and can thus be searched differently. That's not entirely surprising.You might expect that each of the core data types -- strings, numbers, booleansand dates -- might be indexed slightly differently. And this is true:there are slight differences.
But by far the biggest difference is actually between fields that represent_exact values_ (which can include `string` fields) and fields thatrepresent _full text_. This distinction is really important -- it's the thingthat separates a search engine from all other databases.