Complex_datatypes

最后更新于:2022-04-01 00:39:34

[[complex-core-fields]]=== Complex core field types Besides the simple scalar datatypes that we mentioned above, JSON alsohas `null` values, arrays and objects, all of which are supported byElasticsearch: ==== Multi-value fields It is quite possible that we want our `tag` field to contain morethan one tag. Instead of a single string, we could index an array of tags: ~~~ { "tag": [ "search", "nosql" ]} ~~~ There is no special mapping required for arrays. Any field can contain zero,one or more values, in the same way as a full text field is analyzed toproduce multiple terms. By implication, this means that _all of the values of an array must beof the same datatype_. You can't mix dates with strings. If you createa new field by indexing an array, Elasticsearch will use thedatatype of the first value in the array to determine the `type` of thenew field. The elements inside an array are not ordered. You cannot refer to `the firstelement'' or`the last element''. Rather think of an array as a _bag ofvalues_. ==== Empty fields Arrays can, of course, be empty. This is the equivalent of having zerovalues. In fact, there is no way of storing a `null` value in Lucene, soa field with a `null` value is also considered to be an emptyfield. These four fields would all be considered to be empty, and would not beindexed: ~~~ "empty_string": "","null_value": null,"empty_array": [],"array_with_null_value": [ null ] ~~~ ==== Multi-level objects The last native JSON datatype that we need to discuss is the _object_-- known in other languages as hashes, hashmaps, dictionaries orassociative arrays. _Inner objects_ are often used to embed one entity or object insideanother. For instance, instead of having fields called `user_name`and `user_id` inside our `tweet` document, we could write it as: ```js{ "tweet": "Elasticsearch is very flexible", "user": { "id": "@johnsmith", "gender": "male", "age": 26, "name": { "full": "John Smith", "first": "John", "last": "Smith" } } ### } ==== Mapping for inner objects Elasticsearch will detect new object fields dynamically and map them astype `object`, with each inner field listed under `properties`: ### [source,js] { "gb": { "tweet": { <1> "properties": { "tweet": { "type": "string" }, "user": { <2> "type": "object", "properties": { "id": { "type": "string" }, "gender": { "type": "string" }, "age": { "type": "long" }, "name": { <2> "type": "object", "properties": { "full": { "type": "string" }, "first": { "type": "string" }, "last": { "type": "string" } } } } } } } } ### } <1> Root object. <2> Inner objects. The mapping for the `user` and `name` fields have a similar structureto the mapping for the `tweet` type itself. In fact, the `type` mappingis just a special type of `object` mapping, which we refer to as the_root object_. It is just the same as any other object, except that it hassome special top-level fields for document metadata, like `_source`,the `_all` field etc. ==== How inner objects are indexed Lucene doesn't understand inner objects. A Lucene document consists of a flatlist of key-value pairs. In order for Elasticsearch to index inner objectsusefully, it converts our document into something like this: ### [source,js] { "tweet": [elasticsearch, flexible, very], "user.id": [@johnsmith], "user.gender": [male], "user.age": [26], "user.name.full": [john, smith], "user.name.first": [john], "user.name.last": [smith] ### } _Inner fields_ can be referred to by name, eg `"first"`. To distinguishbetween two fields that have the same name we can use the full _path_,eg `"user.name.first"` or even the `type` name plusthe path: `"tweet.user.name.first"`. NOTE: In the simple flattened document above, there is no field called `user`and no field called `user.name`. Lucene only indexes scalar or simple values,not complex datastructures. ==== Arrays of inner objects Finally, consider how an array containing inner objects would be indexed.Let's say we have a `followers` array which looks like this: ### [source,js] { "followers": [ { "age": 35, "name": "Mary White"}, { "age": 26, "name": "Alex Jones"}, { "age": 19, "name": "Lisa Smith"} ] ### } This document will be flattened as we described above, but theresult will look like this: ### [source,js] { "followers.age": [19, 26, 35], "followers.name": [alex, jones, lisa, smith, mary, white] ### } The correlation between `{age: 35}` and `{name: Mary White}` has been lost aseach multi-value field is just a bag of values, not an ordered array. This issufficient for us to ask: - _Is there a follower who is 26 years old?_ but we can't get an accurate answer to: - _Is there a follower who is 26 years old **and who is called Alex Jones?**_ Correlated inner objects, which are able to answer queries like these,are called _nested_ objects, and we will discuss them later on in<>.
';