Mapping base types
Using explicit mapping allows faster insertion of the data using schema-less approach. Thus to achieve better results and performance in indexing, it's required to manually define mapping.
Fine-tuning mapping brings some advantages such as:
- Reducing the index size on disk
- Indexing only interesting fields (general speed up)
- Precooking data for faster search or real-time analytics (such as facets)
ElasticSearch allows using base fields with a wide range of configurations.
Getting ready
You need a working ElasticSearch cluster and a test
index where to put mappings.
How to do it...
Let's use a semi real-world example of a shop order for our eBay-like shop.
We initially define an order such as:
Our order record must be converted in an ES mapping.
{ "order" : { "properties" : { "id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"}, "date" : {"type" : "date", "store" : "no" , "index":"not_analyzed"}, "customer_id" : {"type" : "string", "store" : "yes" , "index":"not_analyzed"}, "sent" : {"type" : "boolean", "index":"not_analyzed"}, "name" : {"type" : "string", "index":"analyzed"}, "quantity" : {"type" : "integer", "index":"not_analyzed"}, "vat" : {"type" : "double", "index":"no"} } } }
Now, the mapping is ready to be added in the index. We'll see how to do it in the Putting a mapping in an index recipe in Chapter 4, Standard Operations.
How it works...
The standard field type must be mapped in the correct ElasticSearch field type adding options about how the field must be indexed.
The next table is a reference of the mapping types:
Depending on the data type, it's possible to give hints to ElasticSearch on how to process the field for better management. The most used options are as follows:
store
(defaults tono
): This marks that this field has to be stored in a separate index fragment for fast retrieving. Storing a field consumes disk space, but reduces computation if you need to extract it from a document (that is, in scripting and faceting). The possible values for the option areno
andyes
.index
(defaults toanalyzed
): This configures the field to be indexed. Possible values for this parameter are as follows:no
: This field is not indexed at all. It is useful to put data that is not searchable.analyzed
: This field is analyzed with the configured analyzer.not_analyzed
: This field is processed and indexed, but without being changed by an analyzer.
null_value
: defines a default value if the field is missing.boost
(defaults to1.0
): This is used to change the importance of a field.index_analyzer
(defaults tonull
): This defines an analyzer to be used to process this field. If not defined, the analyzer of the parent object is used.search_analyzer
(defaults tonull
): This defines an analyzer to be used during the search. If not defined, the analyzer of the parent object is used.include_in_all
(defaults totrue
): This marks the current field to be indexed in the specialall
field (a field that contains the text of all fields).
There's more...
In this recipe, we have seen the most used options for the core types, but there are many other options that are useful for borderline usage.
An important parameter, available only for string mapping, is the term vector (a vector of terms that compose a string. Refer to the Lucene documentation for further details at http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/Terms.html.):
no
: This is a default value and it skips the term vectoryes
: This stores the term vectorwith_offsets
: This stores the term vector with token offset (start and end position in a block of characters)with_positions
: This stores the position of the token in the term vectorwith_positions_offsets
: This stores all the term vector data
See also
- The Mapping different analyzers recipe shows alternative analyzers to the standard one.