ElasticSearch Cookbook
上QQ阅读APP看书,第一时间看更新

Mapping arrays

Arrays or multi-value fields are very common in data models, but not natively supported in traditional SQL solutions.

In SQL, multi-value fields require the creation of accessory tables that must be joined to gather all the values, resulting in poor performance when the cardinality of records is huge.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

Every field is automatically managed as an array. For example, to store tags for a document, the mapping will be as shown in the following code snippet:

{
  "document" : {
    "properties" : {
      "name" : {"type" : "string",  "index":"analyzed"},
      "tag" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"},
    
    }
  }
}

This mapping is valid for indexing the following documents:

{"name": "document1", "tag": "awesome"}

and

{"name": "document2", "tag": ["cool", "awesome", "amazing"] }

How it works...

ElasticSearch transparently manages the array; there is no difference if you declare a single value or a multi-value due to its Lucene core nature.

Multiple values for fields are managed in Lucene, adding them to a document with the same field name (index_name in ElasticSearch). If the index_name field is not defined in the mapping, it is taken from the name of the fields. It can also be set to other values for custom behaviors, such as renaming a field at indexing level or merging two or more JSON fields in a single Lucene field. Redefining the index_name field must be done with caution as it also affects the search.

For people coming from SQL, this behavior may seem quite strange, but this is a key point in the NoSQL world as it reduces the need for joint query and creating different tables to manage multiple values.

Tip

An array of embedded objects has the same behavior as that of simple fields.