,

Kill the ongoing\stuck process in VMware Cloud Director(KB#00099)

Hello guys,

You will find today's post more interesting because resolution of this issue is only documented as vcloud service restart. There are many posts on the web which says that in order to clean the stuck job in VMware Cloud Director GUI, you need to restart vCD services.

Today, I faced the same issue but I decided to find some other solution. For single VM, it is never a good idea to restart entire vcloud director services. isn't it?

Issue : Task is running since long in Cloud Director GUI and not timing out even after more than 6 hours

First, let me show you what exactly a long running job or stuck job means. Below is the task detail. In vCloud Director GUI, you will also see that particular tasks is running since long. Hope it is clear


Observation : Above job was running since last 6-7 hours. Earlier, I noticed that such job used to timed-out in around 4 hours which is default time out value for any vcd tasks but this one was stubborn job. Due to this, I couldn't perform any operation on VM other than power off and power on. From vCenter, there was no such issues so this was clearly a VMware Cloud Director issue

Solution :

1. Login primary cell with root account and then login DB (if embedded. In case of external DB then login directly DB server).

2. Run below command to see the stuck task

select * from organization where name = 'org_name';

It will give you org_id. note that down.

3. Now run below command by using org_id you got in 2nd command.

select * from task where org_id = 'org_id';

It will show you the task name, job id (as in above snippet), which will confirm that this is the right task to cancel. Generally you will see this stuck task only but if you see many then search with job id rather than org_id. If you don't understood this line then, comment box is yours.

4.  Now execute below command to delete this task from DB. Take backup first please!

Delete from task where org_id = 'org_id';

Stuck task has been deleted from DB now. Doing all above operations doesn't require any downtime so go ahead without any fear!

Cheers! 

,

Fix null ip_scope_id of vcd network (KB#00098)

Hi guys,

Since the beginning of HTML5 interface of VMware Cloud Director there are some major issues with portal and the DB. Today I am going to give you fix for one of those issues for which many people over the globe are finding the solution. Few people are not getting solution from even VMware as per my discussion in VMware community. Knowing the fact that I got this fix from VMware but still I found many people struggling due to this. So, I thought that I should create a web record of this issue.

Symptoms\Observation : 

1. When you click on external network page in cloud director /provider page and browse all pages one by one, one of the page will keep on trying to browse but will fail with timeout

2. When you apply filter and search that external network tool, your search will end with timeout with no result

3. When you try to add problematic network into your vapp then it will always fail to add and will give you weired errors.

4. This network can be external network or org network or vapp network

5. Edges where problematic network is attached will intermittently loss the network packet or total loss also can be there.

Basically, you will feel that network is broken and you can't see or edit that problematic or broken network in vCD GUI at all.

Reason :

This is happening because ip_scope_id value in the cloud director DB for that particular network has become null.

Resolution :

1. If you too facing similar symptoms with your network then run below command to confirm that your network is having same issue. This command needs to run in Cloud director DB which can be postgres or SQL. Command is same in both the DB flavor-

select * from logical_network where name = 'problematic_network_name';

Below is the example output


In above snippet, I depicted clearly that where is the issue. Now, if you too see that ip_scope_id is null for your network too then run next command.

2. To solve this, we need to know ip_scope_id value and then insert in above table. to get the value run below command. Only thing to notice here in below command is, you need to change the logical_network_id value. This value you will get from first command. Copy that value and put that in below command and run it. There will be no impact of doing this as this is just get command.

select * from ip_scope where id in (select scope_id from allocated_ip_address where id in (select allocated_ip_address_id from gateway_assigned_ip where gateway_interface_id in (select id from gateway_interface where logical_network_id = 'db62f356-2f24-48b4-a841-87512e720f65')));

You will have below sample output-


ID, in above snippet is your ip_scope_id. Now when you have this let's insert it in logical_network table.

Before doing this, take the backup of your DB. If you have embedded DB then use the command

#/opt/vmware/appliance/bin/create-db-backup

Once backup is done then let's insert it by running next command

3. From 1st and 2nd command you have the logical_network_id and ip_scope_id respectively. Replace both ids in below command and then run it.

insert into logical_network_ip_scope(scope_id,logical_network_id) values('eb578ab9-2f4b-49c6-8fa6-4cd25472a1bf','5149b5ef-fc53-453d-a59d-a6f349624307'); 


 

Warning : If you are not well versed or if you are not understanding what is happening here then take help of VMware to understand or you can comment here I will help you out for sure.

Once above command is done, you will see that the network is now visible in the VCD GUI, you can find the network in the portal. Edge or vapp which are connected to this network are behaving good now.

4. Now, again run the first command and see if you have the ip_scope_id value now. This should be in-place but there are 1% chance that it is still not showing there but issue is resolved. To fix it, run below command-

update logical_network set ip_scope_id = 'd0953b40-3426-44af-a719-349b1245b3de' where id = '7353056c-2ba9-4ee0-9f75-631b3f0be77f';

Now, I think I can hope that you know what to change in above line before running it. If any doubt then I am not far from your guys. Feel free to comment.

Your issue is fixed! Enjoy 😊