How should you handle multi-tenancy for your users and clients in your Elasticsearch / OpenSearch architecture? In this post we look into various methods of handling this common problem and propose several solutions to implementing proper multi-tenant Elasticsearch / OpenSearch.
The problem of how to handle multi-tenancy in Elasticsearch is not new. In fact, there are many who have been plagued with this same issue since the product’s infancy.
Multi-tenancy in this sense refers to having multiple users or clients (e.g. tenants) with disparate sets of data stored within the same Elasticsearch cluster.
The main reason for wanting to keep multiple tenants on one single cluster is to reduce infrastructure complexity, and keep costs lower by sharing resources. Of course, that is not always the best, or even a possible solution - for example when data is sensitive, or isolation is required for other reasons such as compliance. But this is usually the solution users are asking for, and in this post, we will describe a couple of potential methods of handling this problem and the pros/cons of each.
Silos Method - Exclusive indices per tenant
The first method involves each tenant having their data isolated in their own specific index or group of indices. This method is by far the easiest to implement and maintain. You don’t have to worry about things like field mapping collisions (explained more in the pool method). Data purges can be done by simply deleting all indices with “tenant1” in the name. You can set up different ILM policies and index management solutions for each individual tenant. It makes all of these things and more simpler to implement and manage, but there are some drawbacks.
One of the main drawbacks to this method is the index explosion problem. If you have thousands or even millions of tenants/users then you can easily create more indices than your cluster can efficiently support. Each index needs to have metadata stored in the cluster state and takes up a portion of the memory. Too many indices can easily lead to a slow performing cluster.
Another potential problem is having a lot of undersized indices, or more specifically, index shards. We recommend a shard size of 10GB to 50GB. Having a ton of undersized shards means you aren’t using each index to its full potential. You may have tenants with only a few KBs or MBs of data. By isolating these tenants with small amounts of data to their own indices, you run into the index explosion problem mentioned above that much faster.
Pool Method - Shared indices between tenants
With this method, the data each tenant has is shared in an index or group of indices. This doesn’t mean that the data cannot be isolated between tenants. Elasticsearch has a feature called filtered aliases that allows you to use an alias on a subset of the data within indices. You can use a filtered alias to, for example, give you all the data within a group of indices where “tenant ID” equals “CompanyA”. If you use these aliases properly, you can isolate the data within the index so you only receive the data for the tenant you wish to query.
There are a few drawbacks to this method, however. One is the mapping problem. What if “Tenant1” wants to use a field named “field1” and map it as a text field, but “Tenant2” wants to use “field1” mapped as an date type? You could just allow a first write wins strategy and have the loser rename their field, but this isn’t a very elegant solution. A better solution would be to allow both tenants to use “field1” with their desired mapping, but internally you rename the field to “tenant1_field1” or “tenant1.field1” to differentiate between the tenants.
Another drawback is potentially having a large number of fields within an index. The solution mentioned above works fairly well until you find out you have quickly reached the 1000 fields per index default limit. You could always increase the limit to allow for more fields, but don’t forget that default exists for a reason. Increasing the limit too high can lead to performance degradations and memory issues.
Something else to consider, not less important, is performance. If all your customers share the same one index, then the same sharding and replication properties affect everyone, and you can’t adjust it according to specific tenant’s usage. In addition to that, there is often a significant “noisy neighbor” problem, since fields cardinality is growing (thus some search and aggregations operations become slower too), and caches aren’t being used to their full potential since queries are targeting different documents so caches will be invalidated more often.
Hybrid Method - One Pool and then some Silos
For most of our customers, neither Pool or Silos is the right approach for 100% of customers. Usually what we see is 80% of tenants can’t justify the resources to maintain a Silo, and the other 20% are significantly bigger than those 80% and also usually drive the most revenue - so we can’t really put them in the main Pool.
Both the Silos and Pool methods mentioned above have some potential, but they have some limitations that make them a lot less desirable. So we often recommend our customers to opt for a hybrid approach. This way we can cherry pick the best of both worlds and do away with some of the aforementioned drawbacks. Let us consider this approach:
- Tenants use shared indices with filtered aliases to isolate the respective data
- Field names are managed internally by combining similar fields between tenants. For example:
- Tenant1 creates a new custom field named ‘someId’ of type ‘keyword’ while Tenant2 creates a new custom field with the same name of type ‘integer’
- Both fields are indexed, but not using the name given by the tenants. Instead the field ‘someId’ from Tenant1 is indexed into the field named ‘keywordField1’ and the field ‘someId’ from Tenant2 is indexed into the field named ‘integerField1’
- A list is created and maintained (in Elasticsearch or some other source of record) to map these fields ‘Tenant1.someId’ -> ‘keywordField1’, ‘Tenant2.someId’ -> ‘integerField1’
- The next time a document from Tenant1 or Tenant2 is indexed or a query is done on these indices, a lookup is done against this list to find the proper field mapping. The indexing or querying application then adjusts the indexing/querying parameters to use the mapped field names instead.
- Now if Tenant3 wants to store a new document with two fields, one mapped as a keyword type and the other mapped as an integer type, these fields can be inserted into ‘keywordField1’ and ‘integerField1’ and the mapping list can be updated with the new field mappings.
With this hybrid approach, we have been able to overcome the drawbacks of both individual methods. Tenants can now name the fields whatever they want without the potential problem of mapping collisions. Because the indices are shared, we can now handle many more tenants within the cluster before overloading the cluster with too many shards. We can manage the index size problem by combining tenants with smaller data requirements with tenants with larger data requirements. Additionally, in the example above we were able to use four fields across three tenants by only adding two field mappings for the index (keywordField1 & integerField1), cutting our number of fields mapped in half!
Unfortunately, this all comes at the cost of increased complexity and maintenance. We now have to figure out how to decide which tenant to put in which index, how best to manage these field mappings, how to move tenants between tiers, etc. But if things were easy, there wouldn’t be any need for blog posts like these, would there?
In this post we reviewed some potential methods for handling the multi-tenancy problem using one shared cluster, and sharing resources as much as possible. Other possible solutions can of course involve deploying tenants to their own clusters where necessary.
If you come across any of these multi-tenancy issues and would like a helping hand, please feel free to reach out! With our Pulse solution and in-house Elasticsearch and OpenSearch specialists we can review your usage and tailor the ideal solution to your specific use case.
Comments are now closed