Namespace of subsystem ( /_cat), / is preserved to the last leaf e.g.: elasticsearch /_nodes/info jvm mem heap_max. In addition, Elasticsearch time-based metrics in milliseconds are converted into seconds to comply with Prometheus best practices (metrics ending in “millis” are replaced by “seconds”, “_bytes” and “_seconds” and postfixes are added where appropriate). Metrics collection is decoupled from the serving /metrics endpoint. The exporter uses asynchronous Tokio runtime, Rust Prometheus instrumentation library and the official Elasticsearch client library. The new Elasticsearch exporter is written in the Rust programming language and is open-sourced on GitHub: /vinted/elasticsearch-exporter-rs. Keep track of cluster state metadata node changes, cluster version and shards relocations.Automatically generate new metrics and remove stale ones based on user-configurable lifetime. Handle ephemeral states of nodes, indices and shards.Handle large amounts of metrics without crashing (exporter was able to export 940,272 metrics).The fourth generation of Elasticsearch exporter solves multiple problems. The efforts of rewriting were so significant, we decided to write a completely new exporter instead. The in house fork was branched out from the upstream Elasticsearch exporter repository, it became apparent that the forked exporter was beginning to look like a complete rewrite. The authors did not accept code change requests from the OSS community, no active delivery was done, and new metrics from recent Elasticsearch versions weren’t introduced. Metrics were static, and the naming of metrics was inconsistent. The exporter lacked fine-grained configuration functionality, such as limiting unnecessary metrics and configuring polling time per subsystem. During that time, as our infrastructure grew, the exporter failed to deliver by occasionally running out of memory or timing out on /metrics endpoint requests. We used the video streaming company’s one. Numerous open-source Elasticsearch exporters were tried out. At that time, we ran Elasticsearch version 5.x, just as open-source Elasticsearch exporters emerged. Prometheus was not yet established as a de facto monitoring system. The third generation of metrics ran on the pull-based metrics collection system, Prometheus. Sampling was the main pain point in moving to another storage engine. Graphite would apply sampling on metrics spanning over a longer period. We ran Elasticsearch version 2.x - 5.x at that time, monitoring segments was important as fine-tuning of the segmentation policy improved reading performance. Graphite was used as a persistence layer for storing collected Elasticsearch metrics, which we used to collect metrics of indices, shards and segments. The second generation ran on the Sensu observability pipeline. Nothing sophisticated, we ran Elasticsearch version 1.x - 2.x and monitoring was done on demand with Elasticsearch plugins, such as ElasticHQ. The first generation of metrics were collected by parsing /_cat API’s using Ruby scripts. At Vinted, search engineers work with product, maintain a sound infrastructure, set up a scalable indexing pipeline and upscale search throughput performance this is unattainable without proper metrics. Managing search requires both product and engineering efforts – two complementary parts. This chapter will focus on Elasticsearch metrics, we will share our accumulated experiences from four generations of collecting metrics. Without a doubt, a lot has happened in between. At the time of writing this post, we use Elasticsearch 7.15. An old fact from 2014 December 12th was significant for Vinted: the company switched from Sphinx search engine to Elasticsearch 1.4.1.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |