Overview
In one of the vRA upgrade from 7.4 to 7.6, I faced this issue post upgrade. All went well except below error on VAMI page of both vRA appliances (as I had two nodes). If you have more and if you stuck with this error then you will see this error on all the nodes.
================
Elasticsearch validation failed:
status: red
number_of_nodes: 2
unassigned_shards: 4
number_of_pending_tasks: 0
number_of_in_flight_fetch: 0
timed_out: False
active_primary_shards: 113
cluster_name: horizon
relocating_shards: 0
active_shards: 226
initializing_shards: 0
number_of_data_nodes: 2
delayed_unassigned_shards: 0
=================
If you read above error then you will understand that there are 4 unassigned shards which were not automatically assigned to any of the available vra node.
Cause
It happens if and when DB sync between primary and slave vra nodes are not good. When primary node was not having updated data but slave nodes were running with some additional data. Total break between Master and Replica DB replication. In my case also before upgrading there were many issues with DB.
If you recover the cluster state even then these shards might not assign automatically and give above alert. Now you have to assign the unassigned shards manually. Let's see the process.
Resolution
1. Check the state from Master node CLI with below command
#curl http://localhost:9200/_cluster/health?pretty=true
You will have this error in output
{
"cluster_name" : "horizon",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 113,
"active_shards" : 226,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 4,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}
2. Check the cluster information with below command
#curl -s -XGET http://localhost:9200/_cat/nodes
You will have similar output
master.mylab.local 172.25.3.199 8 d * Dreadknight
replica.mylab.local 172.25.3.200 8 d m Masque
3. Search for unassigned shards
#curl -XGET 'http://localhost:9200/_cat/shards' | grep UNAS
You will see similar output as below
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15870 100 15870 0 0 484k 0 --:--:-- --:--:-- --:--:-- 484k
v3_2020-10-02 4 p UNASSIGNED
v3_2020-10-02 4 r UNASSIGNED
v3_2020-10-02 2 p UNASSIGNED
v3_2020-10-02 2 r UNASSIGNED
4. Re-assigned these using the following command, where index =
v3_2020-10-02, and shards to be re-assigned are '2' and '4', while
running on the master node - 'Dreadknight. Change your command according to your environment. for example, value after index will be changed, value after shard, after node will be changed. Other infos will be same.
curl -XPOST 'localhost:9200/_cluster/reroute' -d
'{"commands":[{"allocate":{"index":"v3_2020-10-02","shard":2,"node":"Dreadknight","allow_primary":"true"}}]}'
and
curl -XPOST 'localhost:9200/_cluster/reroute' -d
'{"commands":[{"allocate":{"index":"v3_2020-10-02","shard":4,"node":"Dreadknight","allow_primary":"true"}}]}'
That's it. Now shards have been assigned or allocated automatically manually.
Log out all the nodes VAMI and log in back. You will not see any such error.