How to Change Elastic Search Index Mapping Without Losing Data

As you might know Changing the mapping to an elastic index is not allowed. Although certain changes are possible such as adding a new fields, other changes to the mapping is not easy. In this article we see how we can change the Elastic Search mapping without losing data. I’m going to bring some examples also that you can execute in other to be able to test the solution proposed here.

Change Mapping Using ReIndexing API and Alias

We can solve this problem by using elastic reindexing API, aliases and put mapping. What we do in these situations is to create an alias for our indexes. Each alias refers to a specific version of our index at a time. Now suppose we want to change something about mapping that cannot be done without deleting the index.

At a high level, what we can do is creating an index with the new mapping using reindex API to move the data to the new index. Then change the alias to refer to our newly created index. There are other problems that we might face while using this approach that I’m going to address in subsequent section. Here is an image describing this method.

Examples and Sample Queries

Create an Index With Specific Mapping

But first let’s look at an example. I create some queries which demonstrate the ideas. Suppose we have an index with this specification.

As you can see the field Reference is of type ”date”, we want to change the type to be “text” instead. But if we just try to change the mapping using the put mapping API we receive the following error.

"mapper [Reference] cannot be changed from type [date] to [text]"

Mapping We want to Switch to

The solution to this problem is creating another mapping and re-indexing the data into it.

Here I’ve changed the “Reference” type to be text and created another field called Coordinates. In our previous index we had Latitude and Longitude at the root level of our index without having a separate key for it. Now I want to also move those into the Coordinates field.

Reindxing the Data to the new Index

After I created the new mapping I can go ahead and use the re-indexing API like so.

When we are re-indexing the Reference is going to be automatically converted to text type. Also there are some script to move the Latitude and Longitude from the root level to the Coordinates object.

Change the Alias to Point to the new Index

Now we can go ahead and change the alias to point to our new index.

In the code for alias we see “is_write_index”, this is used to indicate which index should be used for writing if we had more then one. Because we cannot write to an alias which is associated two indexes. Unless we specify which one is intended for write. We need to specify which one is the write index after creating the second index. Otherwise we’ll get this when we try to index something using the alias:

 “reason” : “no write index is defined for alias [hmdjobs]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index”

The last step in the migration can be stopping updating the old index and dropping it. We simply remove the index in Elasticsearch.

Problems We Solve with This Approach

Adding new Fields: There wouldn’t be any problem with adding new fields because the field that we add in the new index does not exist in the old index. So, when we try to use the re-index it’ll just get ignored.

Removing new Fields: If we delete a field in the new index when try to use the re-index API the deleted field does not get indexed. But we need to set the “dynamic: false” in our mapping otherwise it’s going to create the mapping if it does not exist. Also, the field that we’ve delete is still going to show up in our _source, but it is in fact not indexed.

Changing Field Data Type: If the types are compatible then it is not a problem, we can just change the mapping with the desired type and re-index. But if they are incompatible type then it’s up to us to write a script to move the data to the changed field.

Changing Field Location/Placement: If we need to change where a field is location or move the associated data to another new field, we can create a script as part of our re-indexing as can be seen in our example.

Changing Analyzers: Changing analyzers is also possible. In the example brought above we changed the analyzer both at the level of the index (default analyzer) and at the level of field (changed the analyzer of Title field)

Remaining Concerns

Integrity of Reindexed Data

Elastic Search does not support transactions, so we need some way of validating the data after re-indexing. We can keep previous versions of our data. Otherwise we need to be sure that newly indexed data is valid and re-indexing operation succeeded. To achieve that we might be able to check the status codes for our operations. Another option would be to make the mappings strict. By doing so we receive exception if something was invalid and could not be done.

We need to set the index mapping dynamic to false. That’s because suppose we removed a filed in the new mapping and we don’t want to index it. If it’s dynamic, then the mapping is going to be created in the new index’s mapping.

What Happens to Still On Going Operation in Old Index?

What if documents got indexed in the old index while we try to re-index it into another index? Do we need to lock the indices during re-index? We can have a job that index the remaining documents that got indexed in the old index during that time into the new index.

What Happens in Multi-Tenant Situations?

What if one tenant wants to stay on the old index, while the another wants to migrate to the new index? If one wants to stay on the old index, we now have two indexes that need to be kept in sync with data? Do we need the old index data to be up to date? If it is needed, we might need to setup workers to push the documents to the two indexes at the same time. Also, I think alias need to reference the new index, so if someone still want to use the old index, it should reference the index name directly and not the alias.

Tools we can use

  • We can use templates for the mapping
  • We can use curator to create/update the index/alias automatically

More Readings

Reindex API

Changing Mapping with Zero Downtime

Put mapping API

Curator

Index templates/Index template

Summary

In this post I discussed how we can change the elastic mapping even when we have data on it. We saw the steps necessary to take if we want to preserve the our data while changing the index’s mapping.

Share...
 

Hamid Mosalla

Hi, I'm Hamid Mosalla, I'm a software developer, indie cinema fan and a classical music aficionado. Here I write about my experiences mostly related to web development and .Net.

 

Leave a Reply

Your email address will not be published. Required fields are marked *