- 9. Sep
This chart creates an Elasticsearch-Exporter deployment on a Kubernetes cluster using the Helm package manager. Prometheus exporter for various metrics about ElasticSearch, written in Go.
The purpose of exporters is to take data collected from any Elastic Stack source and route it to the monitoring cluster. It is possible to configure more than one exporter, but the general and default setup is to use a single exporter.
There are two types of exporters in Elasticsearch:
The default exporter used by X-Pack monitoring for Elasticsearch. This exporter routes data back into the same cluster. See Local Exporters.
The preferred exporter, which you can use to route data into any supported Elasticsearch cluster accessible via HTTP. Production environments should always use a separate monitoring cluster. See HTTP Exporters.
Both exporters serve the same purpose: to set up the monitoring cluster and route monitoring data. However, they perform these tasks in very different ways. Even though things happen differently, both exporters are capable of sending all of the same data.
Exporters are configurable at both the node and cluster level. Cluster-wide settings, which are updated with the _cluster/settings API, take precedence over settings in the elasticsearch.yml file on each node. When you update an exporter, it is completely replaced by the updated version of the exporter.
- Node.js based command line tool
- Export to ElasticSearch or (compressed) flat files
- Recreates mapping on target
- Source data can be filtered by the query
- Specify scope as type, index or whole cluster
- Sync Index settings along with existing mappings
- Run in test mode without modifying any data
A number of combinations of the options can be used, of which some are listed here to give you an idea of what can be done. The complete list of options can be found when running the exporter without any parameters. The script is trying to be smart enough to guess all missing options, so that e.g. if you don't specify a target type, but a target index, the type will be copied over without any changes.
To run this script you will need at least node v0.10, as well as the common, colors and through package installed (which will be installed automatically via npm).
If you're trying to export a large amount of data it can take quite a while to export that data. Here are some tips that might help you speed up the process.
Reduce Network Hops
In most cases the limiting resource when running the exporter has not been CPU or Memory, but Network IO and response time from ElasticSearch. In some cases, it is possible to speed up the process by reducing network hops. The closer you can get to either the source or target database the better. Try running on one of the nodes to reduce latency. If you're running a larger cluster try to run the script on the node where most shards of the data are available. This will further prevent ElasticSearch to make internal hops.
Increase Process Memory
In some cases, the number of requests queued up filling up memory. When running with garbage collection enabled, the client will wait until memory has been freed if it should fill up, but this might also cause the entire process to take longer to finish. Instead, you can try and increase the amount of memory that the node process has available. To set memory to a higher value just pass this option with your desired memory setting to the node executable: --max-old-space-size=600. Note that there is an upper limit to the amount a node process can receive, so at some point, it doesn't make much sense to increase it any further.
Increase Concurrent Request limit
It might be the case that your network connection can handle a lot more than is typical and that the script is spending the most time waiting for additional sockets to be free. To get around this you can increase the maximum number of sockets on the global HTTP agent by using the option flag for it (--maxSockets). Increase this to see if it will improve anything.
Split up into multiple Jobs
It might be possible to run the script multiple times in parallel. Since the exporter is single threaded it will only make use of one core and performance can be gained by querying ElasticSearch multiple times in parallel. To do so simply run the exporter tool against individual types or indexes instead of the entire cluster. If the bulk of your data is contained one type make use of the query parameter to further partition the existing data. Since it is necessary to understand the structure of the existing data it is not planned the exporter will attempt to do any of the optimizations automatically.
Export to file first
Sometimes the whole pipe from source to target cluster is simply slow, unstable and annoying. In such a case try to export to a local file first. This way you have a complete backup with all the data and can transfer this to the target machine. While this might overall take more time, it might increase the speed of the individual steps.
Change the fetching size
It might help if you change the size of each scan request that fetches data. The current default of the option --sourceSize is set to 10. Increasing or decreasing this value might have a great performance impact on the actual export.
Optimizing the ElastichSearch Cluster
This tool will only run as fast as your cluster can keep up. If the nodes are under heavy load, errors can occur and the entire process will take longer. How to optimize your Cluster is a whole other chapter and depends on the version of ElasticSearch that you're running.