Elasticsearch Exporter

This chart creates an Elasticsearch-Exporter deployment on a Kubernetes cluster using the Helm package manager. Prometheus exporter for various metrics about ElasticSearch, written in Go.

Exporters

The purpose of exporters is to take data collected from any Elastic Stack source and route it to the monitoring cluster. It is possible to configure more than one exporter, but the general and default setup is to use a single exporter.

There are two types of exporters in Elasticsearch:

local

The default exporter used by X-Pack monitoring for Elasticsearch. This exporter routes data back into the same cluster. See Local Exporters.
http

The preferred exporter, which you can use to route data into any supported Elasticsearch cluster accessible via HTTP. Production environments should always use a separate monitoring cluster. See HTTP Exporters.
Both exporters serve the same purpose: to set up the monitoring cluster and route monitoring data. However, they perform these tasks in very different ways. Even though things happen differently, both exporters are capable of sending all of the same data.

Exporters are configurable at both the node and cluster level. Cluster-wide settings, which are updated with the _cluster/settings API, take precedence over settings in the elasticsearch.yml file on each node. When you update an exporter, it is completely replaced by the updated version of the exporter.

Features:

  • Node.js based command line tool
  • Export to ElasticSearch or (compressed) flat files
  • Recreates mapping on target
  • Source data can be filtered by the query
  • Specify scope as type, index or whole cluster
  • Sync Index settings along with existing mappings
  • Run in test mode without modifying any data

Usage

A number of combinations of the options can be used, of which some are listed here to give you an idea of what can be done. The complete list of options can be found when running the exporter without any parameters. The script is trying to be smart enough to guess all missing options, so that e.g. if you don't specify a target type, but a target index, the type will be copied over without any changes.

Requirements

To run this script you will need at least node v0.10, as well as the common, colors and through package installed (which will be installed automatically via npm).

Improving Performance

If you're trying to export a large amount of data it can take quite a while to export that data. Here are some tips that might help you speed up the process.

Reduce Network Hops

In most cases the limiting resource when running the exporter has not been CPU or Memory, but Network IO and response time from ElasticSearch. In some cases, it is possible to speed up the process by reducing network hops. The closer you can get to either the source or target database the better. Try running on one of the nodes to reduce latency. If you're running a larger cluster try to run the script on the node where most shards of the data are available. This will further prevent ElasticSearch to make internal hops.

Increase Process Memory

In some cases, the number of requests queued up filling up memory. When running with garbage collection enabled, the client will wait until memory has been freed if it should fill up, but this might also cause the entire process to take longer to finish. Instead, you can try and increase the amount of memory that the node process has available. To set memory to a higher value just pass this option with your desired memory setting to the node executable: --max-old-space-size=600. Note that there is an upper limit to the amount a node process can receive, so at some point, it doesn't make much sense to increase it any further.

Increase Concurrent Request limit

It might be the case that your network connection can handle a lot more than is typical and that the script is spending the most time waiting for additional sockets to be free. To get around this you can increase the maximum number of sockets on the global HTTP agent by using the option flag for it (--maxSockets). Increase this to see if it will improve anything.

Split up into multiple Jobs

It might be possible to run the script multiple times in parallel. Since the exporter is single threaded it will only make use of one core and performance can be gained by querying ElasticSearch multiple times in parallel. To do so simply run the exporter tool against individual types or indexes instead of the entire cluster. If the bulk of your data is contained one type make use of the query parameter to further partition the existing data. Since it is necessary to understand the structure of the existing data it is not planned the exporter will attempt to do any of the optimizations automatically.

Export to file first

Sometimes the whole pipe from source to target cluster is simply slow, unstable and annoying. In such a case try to export to a local file first. This way you have a complete backup with all the data and can transfer this to the target machine. While this might overall take more time, it might increase the speed of the individual steps.

Change the fetching size

It might help if you change the size of each scan request that fetches data. The current default of the option --sourceSize is set to 10. Increasing or decreasing this value might have a great performance impact on the actual export.

Optimizing the ElastichSearch Cluster

This tool will only run as fast as your cluster can keep up. If the nodes are under heavy load, errors can occur and the entire process will take longer. How to optimize your Cluster is a whole other chapter and depends on the version of ElasticSearch that you're running.

Tell us about a new Kubernetes application

Newsletter

Never miss a thing! Sign up for our newsletter to stay updated.

About

Discover and share new Kubernetes applications

Navigation