ElasticSearch is a search engine based on the Lucene library. It provides a distributed, multitenant -capable full-text search engine with an HTTP web interface and schema-free JSON documents.
ElasticSearch has a nice REST API to retrieve all important settings for a running cluster.
Check cluster health
A good start to check on the cluster health is the "/health" endpoint.
Copy curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "elastic-demo-cluster-us-west-2",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 68,
"number_of_data_nodes" : 65,
"active_primary_shards" : 16200,
"active_shards" : 32400,
"relocating_shards" : 4,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
Configure cluster wide settings
If you want to reboot a machine or do some maintenance it makes sense to delay "index.unassigned.node_left.delayed_timeout" to 10min. Afterwards you can change it back to 30sec. For more details see https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html
Copy curl -XPUT -H "Content-Type: application/json" \
localhost:9200/_all/_settings \
-d '{ "settings": { "index.unassigned.node_left.delayed_timeout": "5m" }}'
Cat Endpoint
Copy curl localhost:9200/_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
Check pending tasks
Copy curl localhost:9200/_cluster/pending_tasks?pretty
Check max result size settings
Copy curl -s localhost:9200/_settings | jq . | grep max_result_window | sort | uniq -c
64 "max_result_window": "150000",
2586 "max_result_window": "300000",
Get list of nodes
Copy curl localhost:9200/_cat/nodes
10.254.101.109 10.254.101.109 79 99 3.13 d - Aragorn
10.254.105.237 10.254.105.237 82 99 9.42 d - Geirrodur
10.254.127.205 10.254.127.205 65 99 3.06 d - Nezarr the Calculator
10.254.122.73 10.254.122.73 56 98 3.57 d - Psi-Lord
10.254.84.58 10.254.84.58 38 99 5.29 d - Patriot II
10.254.126.196 10.254.126.196 45 99 4.90 d - Abominatrix
10.254.95.218 10.254.95.218 54 99 2.69 d - Warstrike
...
Check Cluster settings
Copy curl -s localhost:9200/_cluster/settings | jq .
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "5",
"node_concurrent_recoveries": "10",
"disk": {
"watermark": {
"low": "70%",
"high": "73%"
}
}
}
}
},
"indices": {
"breaker": {
"fielddata": {
"limit": "65%"
},
"request": {
"limit": "35%"
}
},
"recovery": {
"concurrent_streams": "5",
"max_bytes_per_sec": "200mb"
}
}
},
"transient": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "5",
"node_concurrent_recoveries": "10",
"disk": {
"threshold_enabled": "true",
"watermark": {
"low": "78%",
"high": "85%"
}
},
"exclude": {
"_ip": ""
},
"awareness": {
"attributes": "az",
"force": {
"az": {
"values": "eu-west-1a,eu-west-1b,eu-west-1c"
}
}
},
"enable": "all"
}
}
},
"logger": {
"_root": "INFO",
"action": "INFO"
}
}
}
Find problematic shards
If nodes crash and leave the cluster the status for the affected shards will change to "NODE_LEFT".
Copy curl -XGET localhost:9200/_cat/shards | grep -v STARTED
# more output fields
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
Another reason for problems can be UNASSIGNED_SHARDS. This can happen if the disk watermark has reached a level where no new shards can be assigned to ElasticSearch nodes. This is a good indicator that the cluster needs to be scaled out.
Retrieve more information about shard allocation issues
Copy curl -XGET localhost:9200/_cluster/allocation/explain?pretty
Decommission a node from ElasticSearch cluster
Copy curl -XPUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
"transient" :{
"cluster.routing.allocation.exclude._ip" : "10.255.80.90"
}
}';