ElasticSearch

ElasticSearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

ElasticSearch has a nice REST API to retrieve all important settings for a running cluster.

Check cluster health

A good start to check on the cluster health is the "/health" endpoint.

curl localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "elastic-demo-cluster-us-west-2",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 68,
  "number_of_data_nodes" : 65,
  "active_primary_shards" : 16200,
  "active_shards" : 32400,
  "relocating_shards" : 4,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Configure cluster wide settings

If you want to reboot a machine or do some maintenance it makes sense to delay "index.unassigned.node_left.delayed_timeout" to 10min. Afterwards you can change it back to 30sec. For more details see https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html

curl -XPUT -H "Content-Type: application/json" \ 
  localhost:9200/_all/_settings \
  -d '{ "settings": { "index.unassigned.node_left.delayed_timeout": "5m" }}'

Cat Endpoint

curl localhost:9200/_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}

Check pending tasks

curl localhost:9200/_cluster/pending_tasks?pretty

Check max result size settings

curl -s localhost:9200/_settings | jq . | grep max_result_window | sort | uniq -c
     64         "max_result_window": "150000",
   2586         "max_result_window": "300000",

Get list of nodes

curl localhost:9200/_cat/nodes
10.254.101.109 10.254.101.109 79 99 3.13 d - Aragorn
10.254.105.237 10.254.105.237 82 99 9.42 d - Geirrodur
10.254.127.205 10.254.127.205 65 99 3.06 d - Nezarr the Calculator
10.254.122.73  10.254.122.73  56 98 3.57 d - Psi-Lord
10.254.84.58   10.254.84.58   38 99 5.29 d - Patriot II
10.254.126.196 10.254.126.196 45 99 4.90 d - Abominatrix
10.254.95.218  10.254.95.218  54 99 2.69 d - Warstrike
...

Check Cluster settings

curl -s localhost:9200/_cluster/settings | jq .
{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "cluster_concurrent_rebalance": "5",
          "node_concurrent_recoveries": "10",
          "disk": {
            "watermark": {
              "low": "70%",
              "high": "73%"
            }
          }
        }
      }
    },
    "indices": {
      "breaker": {
        "fielddata": {
          "limit": "65%"
        },
        "request": {
          "limit": "35%"
        }
      },
      "recovery": {
        "concurrent_streams": "5",
        "max_bytes_per_sec": "200mb"
      }
    }
  },
  "transient": {
    "cluster": {
      "routing": {
        "allocation": {
          "cluster_concurrent_rebalance": "5",
          "node_concurrent_recoveries": "10",
          "disk": {
            "threshold_enabled": "true",
            "watermark": {
              "low": "78%",
              "high": "85%"
            }
          },
          "exclude": {
            "_ip": ""
          },
          "awareness": {
            "attributes": "az",
            "force": {
              "az": {
                "values": "eu-west-1a,eu-west-1b,eu-west-1c"
              }
            }
          },
          "enable": "all"
        }
      }
    },
    "logger": {
      "_root": "INFO",
      "action": "INFO"
    }
  }
}

Find problematic shards

If nodes crash and leave the cluster the status for the affected shards will change to "NODE_LEFT".

curl -XGET localhost:9200/_cat/shards | grep -v STARTED

# more output fields
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

Another reason for problems can be UNASSIGNED_SHARDS. This can happen if the disk watermark has reached a level where no new shards can be assigned to ElasticSearch nodes. This is a good indicator that the cluster needs to be scaled out.

Retrieve more information about shard allocation issues

curl -XGET localhost:9200/_cluster/allocation/explain?pretty

Decommission a node from ElasticSearch cluster

curl -XPUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient" :{
      "cluster.routing.allocation.exclude._ip" : "10.255.80.90"
   }
}';

Last updated