Mapping base types_ElasticSearch Cookbook-QQ阅读中文短篇网

上QQ阅读APP看书，第一时间看更新

Mapping base types

Using explicit mapping allows faster insertion of the data using schema-less approach. Thus to achieve better results and performance in indexing, it's required to manually define mapping.

Fine-tuning mapping brings some advantages such as:

Reducing the index size on disk
Indexing only interesting fields (general speed up)
Precooking data for faster search or real-time analytics (such as facets)

ElasticSearch allows using base fields with a wide range of configurations.

Getting ready

You need a working ElasticSearch cluster and a test index where to put mappings.

How to do it...

Let's use a semi real-world example of a shop order for our eBay-like shop.

We initially define an order such as:

Our order record must be converted in an ES mapping.

{
  "order" : {
    "properties" : {
      "id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"},
      "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"},
      "customer_id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"},
      "sent" : {"type" : "boolean", "index":"not_analyzed"},
      "name" : {"type" : "string",  "index":"analyzed"},
      "quantity" : {"type" : "integer", "index":"not_analyzed"},
      "vat" : {"type" : "double", "index":"no"}
    }
  }
}

Now, the mapping is ready to be added in the index. We'll see how to do it in the Putting a mapping in an index recipe in Chapter 4, Standard Operations.

How it works...

The standard field type must be mapped in the correct ElasticSearch field type adding options about how the field must be indexed.

The next table is a reference of the mapping types:

Depending on the data type, it's possible to give hints to ElasticSearch on how to process the field for better management. The most used options are as follows:

store (defaults to no): This marks that this field has to be stored in a separate index fragment for fast retrieving. Storing a field consumes disk space, but reduces computation if you need to extract it from a document (that is, in scripting and faceting). The possible values for the option are no and yes.

Note

Stored fields are faster than others in faceting.
index (defaults to analyzed): This configures the field to be indexed. Possible values for this parameter are as follows:
- no: This field is not indexed at all. It is useful to put data that is not searchable.
- analyzed: This field is analyzed with the configured analyzer.
- not_analyzed: This field is processed and indexed, but without being changed by an analyzer.
null_value: defines a default value if the field is missing.
boost (defaults to 1.0): This is used to change the importance of a field.
index_analyzer (defaults to null): This defines an analyzer to be used to process this field. If not defined, the analyzer of the parent object is used.
search_analyzer (defaults to null): This defines an analyzer to be used during the search. If not defined, the analyzer of the parent object is used.
include_in_all (defaults to true): This marks the current field to be indexed in the special all field (a field that contains the text of all fields).

There's more...

In this recipe, we have seen the most used options for the core types, but there are many other options that are useful for borderline usage.

An important parameter, available only for string mapping, is the term vector (a vector of terms that compose a string. Refer to the Lucene documentation for further details at http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/Terms.html.):

no: This is a default value and it skips the term vector
yes: This stores the term vector
with_offsets: This stores the term vector with token offset (start and end position in a block of characters)
with_positions: This stores the position of the token in the term vector
with_positions_offsets: This stores all the term vector data

Tip

Term vectors allow fast highlighting, but consume disk space due to the storage of additional text information. It's best practice to be active only in fields that require highlighting, such as title or document content.