Well yes, you are thinking right that I am learning Kubernetes so wanted to share some useful insights and will continue to share stuff on this. Below are some commands for daily operations while working with Kubernetes. I will keep on adding stuff here.
You will see many blogs giving solution for fetching the VM names which are restarted by HA in event of esxi host failures using Get-VIEvent powercli command. But the extracted VM Name too is not in well format to use as it is. You have to use excel and text to column and then extract the VM Name etc. For me, I have vCD also so at the time of ESXi host failures and HA events, I not only need to fetch the VM Name but also Org and OrgvDC info to share it with my customer. It becomes more lengthy for me and I need to make it quick. So it is extended solution for such kind of scenario. Hope you will find it useful.
Let's see how I could do it using powershell.
Script
#Start here
Write-Host "This script will help you out to have VM name restarted by HA due to esxi host failuers" -ForegroundColor Yellow
If you have smtp configured in your environment then simply you can mail it from the same script using Send-MailMessage command but for that you might have to do some tweak in above script.
Hint is, You have to save final report. Change in the last line of above script like
$myView | Out-File C:\Temp\vmsrestartedbyHA.csv
then use below command
Send-MailMessage -From 'gautam.johar@vcnotes.in' -To 'my.reader@home.com', 'myreader2@home.com' -Subject 'HA Event is triggered and VM list is attached' -Body "Please find the attachment" -Attachments C:\Temp\vmsrestartedbyHA.csv -Priority High -DeliveryNotificationOption OnSuccess, OnFailure -SmtpServer 'smtp.vcnotes.in'
Change wherever applicable.
If you are good enough in PowerShell then you can have many ways to enhance the ideas. For me this is basic script which is working fine for me.
Side Note
I created this script to run perfectly in PowerShell ISE so run in that please or if you have any error in running it in simple powershell cli terminal then you might need to fix the visible errors.
In one of the vRA upgrade from 7.4 to 7.6, I faced this issue post upgrade. All went well except below error on VAMI page of both vRA appliances (as I had two nodes). If you have more and if you stuck with this error then you will see this error on all the nodes.
If you read above error then you will understand that there are 4 unassigned shards which were not automatically assigned to any of the available vra node.
Cause
It happens if and when DB sync between primary and slave vra nodes are not good. When primary node was not having updated data but slave nodes were running with some additional data. Total break between Master and Replica DB replication. In my case also before upgrading there were many issues with DB.
If you recover the cluster state even then these shards might not assign automatically and give above alert. Now you have to assign the unassigned shards manually. Let's see the process.
Resolution
1. Check the state from Master node CLI with below command
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 15870 100 15870 0 0 484k 0 --:--:-- --:--:-- --:--:-- 484k v3_2020-10-02 4 p UNASSIGNED v3_2020-10-02 4 r UNASSIGNED v3_2020-10-02 2 p UNASSIGNED v3_2020-10-02 2 r UNASSIGNED
4. Re-assigned these using the following command, where index =
v3_2020-10-02, and shards to be re-assigned are '2' and '4', while
running on the master node - 'Dreadknight. Change your command according to your environment. for example, value after index will be changed, value after shard, after node will be changed. Other infos will be same.
This is dynamic post and I will keep on adding points in here. I generally add small but useful things here which is not worthy to create long post.
How to
Explanation
transfer the tech-support bundle to FTP on Arista Router
copy flash:/EOS-4.18.2F.swi ftp:/user:password@192.168.10.15/EOS-4.18.2F.swi
user = username of ftp server account password = password of ftp server account 192.168.10.15 = IP address of ftp server EOS-4.18.2F.swi = tech-support bundle file name
1. Check Kernal Version in Linux - Rpm -qa | grep -I kernel 2. Change IP on an interface - ifconfig eth1 192.168.2.2 netmask 255.255.255.0 3. To set or change DG of any VM - route add default gw 192.168.2.1 4. File location to change the IP - vi /etc/sysconfig/network-scripts/ifcfg-eth0 5. To Search specific text in linux server - grep -rnw '/path/to/somewhere/' -e 'pattern'
Download the powershell script from Google Drive. Click here
How to delete any iso file in all datastores which is older than 15 days
foreach($ds in Get-datastore){
New-PSDrive -Name GJ -PSProvider VimDatastore -Root '/' -Datastore $ds > $null
Get-Childitem -Path GJ:\ -Recurse -Include *.iso | Remove-Item -Confirm:$true | Where ((Get-date).AddDays(-15))
#This will search each and every folder in your datastore and show you the file to delete it.
Remove-PSDrive -Name GJ -Confirm:$false} Replace false to true in command (Remove-Item -Confirm:$false to Remove-Item -Confirm:$true)if you want to check and delete each file one by one
How to edit Login Banner in Vmware Cloud Director Appliance
1. Create or edit a file in /etc/login.warn and put your message in here. 2. Edit /etc/sshd/sshd_config file and change the line from #Banner none to #Banner /etc/login.warn
How to search largest files in Linux
sudo du -a /dir/ | sort -n -r | head -n 20 This command will search top 20 largest files in any directory
How to search particular word or string in file in Linux
tr ‘[:space]’ ‘[/n*] < /var/log/apache2/access.log | grep -I -c 172.18.101.16 172.18.101.16 is to search in access.log file
This is not a big thing but still I wanted to document it for my own reference. I got a request like which VMs are in which DRS rules so I got below script.
#Start here
$VC = Read-host "Enter the FQDN\IP of vCenter Server"
I am creating a post on the subject because there is no clear cut article on this on web or might be I couldn't find straightforward process to do this. Basically, in vCD GUI there is option to disable or enable the auto-discovery for entire vCD system. On org level you cannot disable or enable auto-discovery but you can override this setting on OrgvDC level but with the help of Admin APIs. Hope you know about APIs but what is Admin APIs. This will automatically be answered in this post. Read this post carefully and I hope you will understand this. To know more about auto-discovery, you can check out this post by Tom Fojta.
How to connect
You cannot even check the auto-discovery status for OrgvDC from GUI. You need to use the API. I have already covered this in my previous posts to connect vCD in API tool. Have a look here
How to check existing setting
Once you are connected then Use below api query to extract your Org detail
1. GET https://vcloud_ip_or_fqdn/api/org
Now, copy entire output and paste into notepad++ or any other text editor you want. Search for Org name where your orgvDC was created. Search in the notepad++ file only. You will get href link from there. Copy that link and paste it in API tool and send GET command. Example is shown below
2. GET https://vcloud_ip_or_fqdn/api/org/a038859f-bf22-4d64-b6dc-e1cb8fdf2fbc"
Now, you will get OrgvDCs list in this org. Copy entire output again and paste it into notepad++ again. Search target OrgvDC name and copy the href for that OrgvDC. Below is the example-
Note that if you run the GET command with adding "admin" then only you will get the auto-discovery option in output. Below is the example command and output with "admin" keyword-
Note that, if any OrgvDC output is not having this line that's mean it is following the vDC global level setting and to override this value by adding this line here. I will explain how.
Flase means VM auto-discovery is disabled and true means it is enabled. I explained you the process to get the value to Vm Auto Discovery status for OrgvDC. Now let's how to change this value.
How to update existing setting
To update this value from false to true or true to false or even enter the whole line here, you need to follow below steps
1. From above steps 3, you got orgvDC href value where you send GET query to get the auto vm discovery states, now you replace GET command with PUT command
2. Now, in the output for OrgvDC which you copied into notepad++, If vmDiscoveryEnabled is false and you want to make it true then change the keyword from false to true and vice-versa.
3. Copy entire output again after changing the value and paste it in the BODY, select RAW and select xml as shown in my previous post.
4. You will not click on send button now, you need to add one more header here along with other placed headers. Header info is given here and practical use below. For this reason only, I had to create an entire post. This is not clearly mentioned on any article on web so now you have one.
In case, you want to use JSON then you can use that too but make sure then JSON must be selected in body where you pasted the data from notepad++.
Once you put the content-type then make sure you have entered the right vDC href and selected operations in PUT and not GET.
Now hit the send button.
You will get message "202 Accepted" if all went good.
This post is to share the process to change the existing available protocols in NSX-v Edge firewall rule (Not DFW). Available protocols are TCP, UDP, ICMP and Any on vCD's Edge Service Gateway page. See below image.
My customer's demand was to set another protocol here which is ESP. I checked on GUI and it was clear that it is not possible from here so I could change it successfully from API queries.
How to connect
Before updating this firewall rule field, we must know that how to connect vCloud Director in any API tool. You can use Postman, Insomnia, ARC (Advance Rest Client) as a tool to connect vCD. You might need to disable SSL check before executing any api call. Below snippet is from Postman API tool.
Once that SSL check is disabled then 1. Set Authorization as Basic Auth. See below image
2. Set header as mentioned below Accept application/*;version=32.0
Version can be according to your vCD version.
3. Now create api query like https://vcloud_ip_or_fqdn/api/sessions and select POST in query type. It will be like POST https://vcloud_ip_or_fqdn/api/sessions This query is to get authorization and access token. Once you entered the URL and selected query type as POST then hit "Send" button to run this query. Post run you will get "200 OK" and authorization and access token headers. See below images
Use above two headers as shown in below images
Now, you are ready to do any operations in vCD using this API tool
How to extract edge firewall rules config
Use below api query to extract your Org detail
1. GET https://vcloud_ip_or_fqdn/api/org
Copy the output and paste in Notepad++. Search for target OrgvDC name where your edge is residing. Then create another query and run it
2. GET https://vcloud_ip_or_fqdn/api/vdc/a038859f-bf22-4d64-b6dc-e1cb8fdf2fbc"
You will see similar output in your Notepad++ data. Just copy vdc href from notepad++ file not from here and paste in Postman and then hit send
Here, you will have another output from OrgvDC. Search here the edge name. You will get line like below. Copy that line similar below and run another query https://iaas-sin.aticloud.aero/network/vdc/a038859f-bf22-4d64-b6dc-e1cb8fdf2fbc/edges Now, create a API call like
3. GET https://vcloud_ip_or_fqdn/network/vdc/a038859f-bf22-4d64-b6dc-e1cb8fdf2fbc/edges
It will give you output like below. Only single line.
It is three step process from version 9.5 to version 10.1.2. I would suggest to complete pre-requisites properly and it will be flawless process. You should check inter-operability first so that your other components can function with vCD versions you will be upgrading to. My experience says that during migration, first deploy the primary node → Transfer the DB → Replace custom certificates with self-signed certificates → Make sure your primary node up → Now add more nodes if you want to deploy multi-cell architecture → Change certificate.ks in standby nodes.
When we talk about Primary and standby nodes then it is only for Postgre DB which is active only on Primary node and will be Standby in standby nodes. VMware-vcd service will always be active-active in all three nodes (If you deploy minimum three nodes in multi-cell architecture). See below image.
Upgrade path will be-
Current 9.5 in-linux → In-place upgrade to 9.7 in-linux → Migrate to 9.7 appliance → Upgrade to version 10.1.2 appliance. You can check vendor doc for this upgrade path. Now when you know the workflow then let's proceed for planning phase.
Planning
First of anything, you should check the interoperability of your product versions. Click here for VMware InterOperability guide
You need to plan your upgrade as per this guide. This phase is most important phase, I must say. If you plan with perfection then very less chances of failure are there. Let's see what all you need to plan- In-place upgrade is quite simple. There is no such complexity. All planning need is for Migration from in-linux to appliance 1. IP Addresses There are two choices we have. You need to decide whether you want to change the existing IPs of existing vCD cells or you want to use new IPs on your new vCD cells. Why? Because you are going to deploy new cells for migration to appliance. I will describe in next steps. I used the existing IPs of existing cells. In case, you are using same IP addresses then
1. at the time of 9.7 appliance deployment you need to change the old cell's IP address to any temp but reachable IP address. This IP address should be reachable to your new cell as well as your existing external DB server. why? Because
1.1. This old cell IP address we will assign to new vCD cell's eth0 NIC
1.2. We still need old vcd cell (anyone) for DB migration that's why it must be reachable to new cell and your external DB server
2. You need to free IPs from all of your old three cells so that we can assign same three IPs to all three new 9.7 appliance node's eth0 nics.
3. You need to change DNS entries for you old cells with new temp IPs and then create DNS (Host and PTR records) for new cells with old IP addresses. Any confusion? comment pls.
4. You need to create different VLAN for IPs of eth1 of all three new vcd cells, if you already don't have it.
2. Network Route It is quite crucial part of this migration. In old vCD cells there used to be three different NICs holding different traffics like HTTPS, VMRC, NFS etc. but in vCD 9.7 appliance, each cell will be having two NICs only holding these services. You need to ensure that both these NICs must be on different VLANs\subnets. Now, you need to ensure that your new vCD cells's eth1 can reach your NFS server and for that if require you need to configure static routes as per your network flow. 3. Single cell or Multi cell Architecture You need to decide and plan your upgrade according to this point. Additional points to be taken care are
- Check out Load-balancer configuration. It might needs to modify post vcd 9.7 deployment.
- Do you have more than one LBs that balanced different traffics. For example, Internet and Intranet - You need to deploy additional nodes at very last stage that is after certificate replacement and making first node fully up
4. NFS
NFS is another critical part of this migration. Some guys might have NFS on Linux Machine, some might have on Windows and some might have it on direct storage box.
You need to make sure that while deploying first Primary node in version 9.7 appliance, NFS mount point must be empty otherwise 9.7 deployment will fail. You need to ensure that you NFS server must be reachable from the eth1 NIC of all vcd nodes in 9.7 appliance deployment.
I guess, I have given enough clues to help you plan this migration. Now, let's see all steps one by one.
In-place upgrade from 9.5 to 9.7
Pre-requisites - Make sure that upgrade.bin file is uploaded in any directory in vCD cell(s) and user has enough permission on it.
- Do a md5sum check. Command is
# md5sum installation-file.bin
Main activity You need to run all below command on all vcd cells one-by-one
Step 1 : Stop vCD services on all cells using ./cell-management-tool
To see the current status ./cell-management-tool -u administrator -p 'password' cell --status To stop coming more tasks on it ./cell-management-tool -u administrator -p 'password' cell --quiesce true To put it in maintenance mode
Step 3 : Take full backup of external MS SQL database
Step 4 : Take ownership on downloaded .bin file. Command is
#chmod u+x installation-file.bin
Step 4 : Install the upgrade file now
#./installation-file.bin
If you have placed .bin file in /tmp folder then change the location in CLI and then run the command It is a simple one. No such complexity.
Best Practices for next migration
1. Segregation of traffic should be as below. I did it like this so sharing it as my personal recommendations. You can chose other way round as well.
vCD 9.7 appliance eth0 eth1
Primary HTTPS+VMRC+API PostgreDB+NFS
Standby HTTPS+VMRC+API PostgreDB+NFS
Standby HTTPS+VMRC+API PostgreDB+NFS
2. If primary is deployed as "Primary-Large" then all standby cells must be deployed as "Standby-Large". If primary is deployed as "Primary-small" then all standby cells must be deployed as "Standby-small".
3. In multicell architecture, standby cells should be deployed at very last stage that is after making first cell up and running fully functional and after replacing the certificates. It will make things simple.
4. Both NICs of vcd 9.7 appliance cells must be on different VLANs
Migrate from ver 9.7 In-linux to 9.7 appliance
Pre-requisites 1. Clean and accessible NFS mount share. Must be accessible from eth1 nic if you are planning to transfer NFS traffic to eth1. I did the same.
2. Accurate DNS entries for new vcd cells with old IPs (If you don't plan to change the existing IPs)
3. If you have customized certificates then have passwords of keystore, https and console proxy.
4. Make sure that network flows are opened between new vcd cell's eth1 nic and old cell. Follow vcd 9.7 Install guide attached in last of this post 5. Configure AMQP with version 3.7
6. Here is the vendor documentation for all pre-requisites
7. Your production downtime will start when you will change the production IP of old vcd cell here. We will assign this IP to new primary cell.
Start the vCloud Director Appliance Deployment : Primary Node
1. Start deploying an ova as usual. A wizard will open → Give vCD cell a valid name → Give it folder location → Compatible datastore → Underlying ESXi host → Select eth1 and eth0 portgroups → Click next to complete customized template wizard →
Under Customized template, fill the following-
NTP Server to use: 8.8.8.8
Initial Root Password: VMware1!
Expire Root Password Upon First Login: uncheck
Enable SSH root login: check
NFS mount for transfer file location: IPaddress:/sharename
'vcloud' DB password for the 'vcloud' user: VMware1!
Admin User Name: administrator
Admin Full Name: vCD Admin
Admin user password: VMware1!
Admin email: vcd97@vcnotes.in
System name: vcd4
Installation ID: 12
eth0 Network Routes: blank
eth1 Network Routes: blank
Default Gateway: 172.17.2.1
Domain Name: vcnotes.in
Domain Name Servers: 8.8.8.8,8.8.4.4
eth0 Network IP Address: 172.17.2.21
eth0 Network Netmask: 255.255.255.224
eth1 Network IP Address: 172.17.2.22
eth1 Network Netmask: 255.255.255.240
You need to modify above detail as per your environment. Few doubts you might have-
System Name - For first primary node, you can put vcd1 Installation ID - In case of Brownfiled setup, note the installation id from running setup and put same here eth0 and eth1 Network - It is to put static routes according to your network design
Once all info is given, review it and click on Finish.
1. You need not to change installation ID on each and every standby cell. Installation ID is the ID which vCD uses to generate unique mac addresses for vCD VMs. I have seen few blogs asking to change it for standby nodes. This is totally incorrect.
2. Domain Name and Domain search path will be same as vcnotes.in. It should not be like vcdcell01.vcnotes.in. When you put the VM name at starting of deployment then DNS automatically generate FQDN.
Post-Checks-
1. Once ova deployment is finished then access SSH. If SSH is not responding the access console and start sshd service. Service sshd start. Check below logs to ensure everything is good.
#cat /opt/vmware/var/log/firstboot
#cat /opt/vmware/var/log/vcd/setupvcd.log
#cat /opt/vmware/var/log/vami/vami-ovf.log
If everything went well during deployment then firstboot logs will show you the success mark otherwise it will refer to check setupvcd.log and then vami-ovf.log
2. Browse VAMI interface https://IP_FQDN_of_primary_cell:5480. It should be like below
3. Browse https://IP_FQDN_of_primary_cell/cloud and https://IP_FQDN_of_primary_cell/provider. All portals will be accessible and without any error
Start the vCloud Director Appliance Deployment : Standby Node
Not Now :)
Once your primary cell is deployed then don't deploy standby node right after. Now, its time to transfer the DB.
Take backup of internal embedded postgres database
#/opt/vmware/appliance/bin/create-db-backup
Configure External Access to the vCloud Director Database
1. Stop vCD services on all cells including one primary and old three cells.
2. SSH to new vcd cell and create a file with name external.txt in /opt/vmware/appliance/etc/pg_hba.d with below command
Note that : IP address 172.25.2.194/32 is IP address of you old external DB server with CIDR value. IP address 172.25.2.209 is the IP address of eth1 NIC of old vcd cell. Refer to Page number 82 and 83 in vcd 9.7 Install guide attached in last of this post.
You can ensure proper update of above created file by checking file pg_hba.conf. Just run #cat pg_hba.conf and it should show the entries you just made in above steps
3. SSH to old vcd cell and run below command. Refer to vCD 9.7 Install guide page number 122.
eth1_IP_new_primary -It is eth1 IP address of new primary vcd appliance cell database_password_new_primary - it is the database password given while deploying the primary vcd node
-dbname - It should be vcloud only if you haven't changed intentionally.
/ - Many get confused with this /, it is just in VMware documentation which means next line. Doesn't matter if you use it or not.
Rest info should be understood and can be used as it is. If all went well then it will transfer your external SQL DB to embedded postgreSQL.
Transfer Certificates from Old Cell and Integrate it to New Primary Cells
1. On the migration source copy all the following files from old vcd cell to new vcd cell. Do not edit any entries in these files in this process. Use WinSCP to move the files between the two devices. Rename the file to cerificates.ks.migrated to avoid any confusion before paste it into new vcd cell.
Here, you have transferred all the certificates from old to new cell and now this is the time to run configure command so that new vCD primary appliance can use these certificates.
6. Below is the command.
Before this, note that /opt/vmware/vcloud-director/certificates.ks (Customer certificate copied from old cell) is not in use because we have renamed it with certificates.ks.migrated. We will do all initial configurations with self-signed certificates and then will use custom certificate in last step.
5. Run the configuration tool to import the new certificate.
/opt/vmware/vcloud-director/bin/configure
If asked “Please enter the path to the Java keystore containing your SSL certificates and private keys:” enter the location you uploaded the file to. If our case: /opt/vmware/vcloud-director/certificates.ks. It will ask about https, console proxy and keystore password. Supply all.
Press Y wherever it prompt and You are Done!!
You need not to start vCD service manually now. It will automatically started.
Check the /cloud, /provider and :5480 portals and make sure it is accessible well from intranet and internet environments.
Some Useful Commands for HA Cluster Operations-
I am making it smallest font size to avoid any confusion in command. These are show commands and you can run to have deep inside of vCD HA cluster status.
sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr -f /opt/vmware/vpostgres/current/etc/repmgr.conf node status
sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr -f /opt/vmware/vpostgres/current/etc/repmgr.conf cluster show
systemctl status appliance-sync.timer #It is to check the time sync between all the nodes and need to run on all nodes seperately
Start the vCloud Director Appliance Deployment : Standby Node1
1. All process to deploy standby node is same except
- You will only seed info which is applicable for standby node at the time of deployment
- You just need to transfer Certificate.ks file to its default location and no other certificate replacement is required on standby node
Start the vCloud Director Appliance Deployment : Standby Node2
Same as above no change.
If everything goes well then in /cloud or /provider interface, you will see all three nodes with green ticket icon.
Now, All nodes are deploy in vCD. In mulit-cell deployment only one or two steps are additional here.
Load-Balancer Configuration : You need to check your existing load balancer configurations and need to modify them if require. If your load-balancer was already configured with in-use IP addresses then you just need to change in-use port from 443 to 8443. For me, LB was configured in NSX for internal traffic and F5 was there to entertain Internet traffic.
Would like to share some issues which I encountered during migration
Known Errors during above deployment and migration
Listing down where I was stuck Issue 1: After deployment of first node, I got below error on VAMI interface The deployment of the primary vCloud Director appliance fails because of insufficient access permissions to the NFS share. The appliance management user interface displays the message: No nodes found in cluster, this likely means PostgreSQL is not running on this node. The /opt/vmware/var/log/vcd/appliance-sync.log file contains an error message: creating appliance-nodes directory in the transfer share /usr/bin/mkdir: cannot create directory ‘/opt/vmware/vcloud-director/data/transfer/appliance-nodes’: Permission denied. Solution : It means that NFS was not clean and PostgreSQL service couldn't be running. If you check above mentioned firstboot and setupvcd.log files then you will have idea. Delete all the content of NFS share and delete the existing node and retry deployment. No other fix. Issue 2: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: signature check failed. These were the log entires in cell.log and portal was not up Solution: Edit the global.properties file in new primary cell and comment out (#) three lines which are associated with ssl connection and run configure command /opt/vmware/vcloud-director/bin/configure --unattended-installation --database-type postgres --database-user vcloud --database-password db_password_new_primary --database-host eth1_ip_new_primary --database-port 5432 --database-name vcloud --database-ssl true --uuid --keystore /opt/vmware/vcloud-director/etc/certificates.ks --keystore-password root_password_new_primary --primary-ip appliance_eth0_ip --console-proxy-ip appliance_eth0_ip --console-proxy-port-https 8443 If it doesn't work then run below command /opt/vmware/vcloud-director/bin/configure --unattended-installation --database-type postgres --database-user vcloud --database-password db_password_new_primary --database-host eth1_ip_new_primary --database-port 5432 --database-name vcloud --database-ssl false --uuid --keystore /opt/vmware/vcloud-director/etc/certificates.ks --keystore-password root_password_new_primary --primary-ip appliance_eth0_ip --console-proxy-ip appliance_eth0_ip --console-proxy-port-https 8443 It will work for sure as worked for me twice Again run below command now /opt/vmware/vcloud-director/bin/configure --unattended-installation --database-type postgres --database-user vcloud --database-password db_password_new_primary --database-host eth1_ip_new_primary --database-port 5432 --database-name vcloud --database-ssl true --uuid --keystore /opt/vmware/vcloud-director/etc/certificates.ks --keystore-password root_password_new_primary --primary-ip appliance_eth0_ip --console-proxy-ip appliance_eth0_ip --console-proxy-port-https 8443 Issue 3: DB transfer was failing, I couldn't capture the error but it was giving old cell's IP address error Solution : When you prepare /opt/vmware/appliance/etc/pg_hba.d/external.txt file, I mentioned to put IP address of external DB so here you need to put IP address of your old cell as mentioned in above steps. In my case, I had to put IP address of eth1 nic of old cell Issue 4: vCD Portal was up from internet and Intranet but VM's console was not accessible from Internet. Solution: You need to make sure that in multi-cell deployment if you are using more than one LB then you will change the new cell's IP address in all LB configuration. In my case, Internet facing LB configuration change was missed so when we corrected it was resolved.
Upgrade from version 9.7 appliance to Cloud Director 10.1.2 appliance
Prerequisites
Take a snapshot of the primary vCloud Director appliance.
Log in to the vCenter Server instance on which resides the primary vCloud Director appliance of your database high availability cluster.
Navigate to the primary vCloud Director appliance, right-click it, and click Power > Shut Down Guest OS.
Right-click the appliance and click Snapshots > Take Snapshot. Enter a name and, optionally, a description for the snapshot, and click OK.
Right-click the vCloud Director appliance and click Power > Power On.
Verify that all nodes in your database high availability configuration are in a good state. See Check the Status of a Database High Availability Cluster.
Procedure
In a Web browser, log in to the appliance management user interface of a vCloud Director appliance instance to identify the primary appliance, https://appliance_ip_address:5480.
Make a note of the primary appliance name. You must upgrade the primary appliance before the standby and application cells. You must use the primary appliance when backing up the database. Note: You must upgrade primary cell first.
vCloud Director is distributed as an executable file with a name of the form VMware_vCloud_Director_v.v.v.v- nnnnnnnn_update. tar.gz, where v. v. v. v represents the product version and nnnnnnnn the build number. For example, VMware_vCloud_Director_10.0.0.4424-14420378_update.tar.gz.
Create the local-update-package directory in which to extract the update package. #mkdir /tmp/local-update-package
Extract the update package in the newly created directory. #cd /tmp #tar -vzxf VMware_vCloud_Director_v.v.v.v-nnnnnnnn_update.tar.gz -C /tmp/local-update-package
Set the local-update-package directory as the update repository. #vamicliupdate --repo file:///tmp/local-update-package
Check for updates to verify that you established correctly the repository. #vamicli update --check You will see similar output, if all went well
Shut down vCloud Director by running the following command #/opt/vmware/vcloud-director/bin/cell-management-tool -u <admin username> cell --shutdown OR #Service vmware-vcd stop You can use either way to stop vcd services
Apply the available upgrade #vamicli update --install latest
Note: Follow all above steps on all cells one by one and restart each cell too after upgrading the application. Now login on Primary Cell only and upgrade the database schema
From the primary appliance, back up the vCloud Director appliance embedded database. #/opt/vmware/appliance/bin/create-db-backup
From any appliance, run the vCloud Director database "upgrade" utility. #/opt/vmware/vcloud-director/bin/upgrade
Reboot each vCloud Director appliance #shutdown -r now
I will now share what is not there on vendor article-
1. Post application upgrade and login in html interface of vCD 10.1.2, you might notice that vCenter is showing disconnected and is not connecting post reconnect and refresh option. In that case, you need to follow below steps-
Login to primary cell with root
Run below command to accept the certificate (Issue is with certificate exchange of new vCD version 10.1.2 and needs to accept)
For more info, refer the URL https://kb.vmware.com/s/article/78885
2. Post upgrading vCD application, postgres service might stop and while upgrading the database schema, you may see error, "unable to establish initial connection with database". To resolve this, either start the service manually or reboot the cell once.
2. Above guide has all detail but just in case, if you need something specific. Here is Certificate replacement guide from VMware 3. Here is the database migration steps from VMware and same is mentioned in guide number 1. 4.Awesome article written by Richard Harris for the same process. Must check. I too learned from his experiences
I am covering this error because I couldn't find any article on it and I had to open a case with VMware to resolve it. Luckily, my issue went into hand of a good guy and we could resolve it after around 5-6 hours call. So, I thought to cover this up as well. It can be beneficial for someone.
In this article, I will not share exact solution rather I will tell you that why is it happening, I mean the root cause and then you have to raise a case with VMware. At least you will now know the root cause. Why not solution? because you will have to do some changes in vCloud Database and it is very critical to touch cloud director DB by your own and if you are not that expert.
Error is below in image and impacted vCD version is 9.7
Reason: In my case, it happened because my customer deleted a user from LDAP server directly without transferring its objected from vCloud Director. Hope you know that when you delete any user in vCD, it ask you to transfer its objects. In this case, it was not happened and all those owned objected were locked for any modification.
VMware identified all the objects which was running with user ID and then replaced the user ID with system's account user id. They did it from vCD database. You need to raise case with them.
In my previous post, I explained the cpu addition automation with PowerShell. Now, I got a request to explain the steps from vROPS as well. As I said that who has vROPS then it is better than doing it from PowerShell. vROPS is undoubtedly Enterprise level solution and I would say PS here is a trick to do it.
Here you go...
Pre-requisite
CPU hot add must be enable for the VM(s) which you want to automate for. How to do it?
User account you are using, must have sufficient access rights
Procedure
1. Login vROPS with admin privilege account. 2. Click on Alerts from top Menu. Which one? See below image-
3. Expand the Alert Settings and Click on "Alert Definition" and then click on Green + icon as shown below.
4. Fill the form ;)
Name : Give any suitable name. I gave "_increase cpu count". Base Object : Select Virtual machine under vcenter adapter here. Type virt and it will auto highlight. Alert Impact : Impact: Efficiency (Because continuous CPU spike will decrease the efficiency) Criticality : Critical (You can select any as per your requirement) Alert type and sub-type : Application:Performance Wait Cycle : 1 Cancel Cycle : 1
5. Add Symptom definitions
1. Click on + icon to create a new one if you don't have already. 2. Now select CPU|usage% or CPU|Workload% metric as per your requirement. I chose CPU|Workload% as it make more sense in vROPS version 10.1.2. 3. Drag it to right or double click on the metric.
it looks like as -
3.1. Give it a name
3.2. click on drop down and select "Immediate" or "Critical" as you wish
3.3. When metric is greater than, I put here 99. It means that when CPU|Workload% will be higher than 99%, it will trigger the action.
3.4. Once done, click on save. You will be back on alert definition page.
3.5. Add recommendation if you want.
3.5.1. Click on + icon
Select Adapter type "vCenter Adapter"
Select Action "Set CPU count for VM"
Once done then click on save. Your new and fresh Alert Definition has been created here.
Let's automate it now-
1. Check your current active policy. How? Check below
2. Now go to "Policy Library" and find active policy name in the list. 3. Select it and click on edit 4. Directly go to "Alert/Symptoms:Definition" and search for your "Alert Definition" which we created in above steps. It will look like below-
5. Click on Automate column and select "Local". You will see green tick icon as shown in above image.
That's it!
Now what you have done is, You have enabled automated action to add CPU count (it will add +1) whenever any VM's CPU workload% will be more than 99%.
To see automated task's action, you can go to:
Administration → History → Recent Tasks. Below is the example of successful automated task
Above is standard way to automate it and it should work in most of the scenarios.
Caution : If you are targeting few VMs then make sure you are not applying automation on "Current Default Policy" otherwise CPU addition will trigger on all your VMs in vCenter server.
Solution : Create a new policy → Create new Group and add target VMs in this group →Apply new Policy on new Group. In this way, only those VMs will be automated which you will add in the group
Known issues in this Automation:
Problem 1 : Input Parameter 'CpuCount' not in range, positive number range;passed value 0
It means while increasing cpu count, vROPS detected that new cpu value passed was 0. It means let's say current value is 2 and we or vrops asked it to be 0. That is why it is failing. Solution: 1. Check the VMware tool version, It should be running and up to date 2. Check the host CPU's core maximum capacity. How many max cores it can assign to any VM 3. Check esxi host's CPU family and other compatibility. Check this guide
Problem2 : Automation is not triggering Solution : It has only one reason that automation is not enabled in Policy.
If you have any other issues than above ones then feel free to comment out. I will try my best to help you out.
I got this question from one of the VMware group so I thought to add it in my blog and then share. Automation of CPU addition is not tough but adding it with condition can be little challenging. You can do it with vROPS very easily but if you don't have vROPS then I have solution for you.
Pre-requisite is to enable hot add for CPU.
Below is the base code-
Connect-VIServer vcenter_ip #to connect vcenter $VM = Get-VM VM_Name #to get the vm name #to convert the value in GHz. Default value comes in MHz $maxcpu = ($vm.Extensiondata.Summary.Runtime.MaxCpuUsage)/1024 #to convert number of cpu into GHz speed. Here you need to replace 2.80 as per physical host's core speed at your end $cpus = ($vm.Extensiondata.summary.config.NumCPU)*2.80 # to have percentage value $cpulimit = $cpus*90/100 #below is the hot add cpu command if ($maxcpu -gt $cpulimit){ get-vm $vm | Set-VM -NumCpu 2 -Confirm:$false #Current value is 1 and it will change it to 2 }
Now the question is how to automate it,
you can create a loop so that it can run infinite. I have created below function
Connect-VIServer vcenter_ip Function AutoAddCPU{ $VM = Get-VM VM_Name #to convert the value in GHz $maxcpu = ($vm.Extensiondata.Summary.Runtime.MaxCpuUsage)/1024 #to convert number of cpu into GHz speed $cpus = ($vm.Extensiondata.summary.config.NumCPU)*2.80 $cpulimit = $cpus*90/100
if ($maxcpu -gt $cpulimit){ get-vm $vm | Set-VM -NumCpu 2 -Confirm:$false } AutoAddCPU } AutoAddCPU #this is not a mistake. Use it as it is.
Now the problem is, how will it monitor that particular VM continuously.
Simple solution, use task scheduler from where you can access vcenter server. Add above script in task scheduler and then run it once, it will run forever.
Another problem is, what if that server or jump server restart then it will break the script. Solution is to create task scheduler to run this script at every system startup.
There must be many questions here. For example, you might need to monitor many VMs at the same time, you want to monitor the cpu usage if goes above than 95% and more....
This is the base solution I have provided and can be customized or enhance as per individual need.
If you have such requirement too then feel free to write me back. Will be glad to assist.
If anyone wants the same thing from vrops then let me know, I will create one post on that as well. Using vrops is more authentic and efficient way to do it.
Hope all are healthy and safe and enjoying the present. If you are healthy and safe and still not enjoying the life then start enjoying it :) This was today's Gyan!
I wish that as VMware has this snapshot feature, life also should have this feature. When you feel that I am most content, successful and happiest then take snapshot and save it somewhere and in adverse days, revert the snapshot. Unfortunately, it doesn't happens. Because whatever happens in life, happens for once and get implemented permanently. No more Gyan! let's start the main discussion :D
You will be surprised to know that this behavior has not documented by VMware either in form of KB article or documentation. There is one good documentation from Tomas Fojta which explain it but I thought to showcase this behavior in bit explained way and with snippets. Before demonstrating it with snippets, I would like to share that
1. vCenter works differently than vCD in terms of snapshots 2. When you take snapshot of any VM in vCloud Director, vCD reserve the size, equal to total size of all disks in its allocation table 3. Actual storage usage by snapshot in datastore will still increase as per snapshot definition in vcenter server. 4. In vcenter, if you take snapshot of 100 GB VM then snapshot size might be in few MBs 5. In vCD, if you take snapshot of 100 GB VM then snapshot size still will be in few MBs but vCD allocation table reserve the size equal to total size of VM. If it sounds weird to you then continue reading it 6. Thin-provisioning or thick-provisioning doesn't change the snapshot behavior in vCD 7. It does mean that if you have any VM of size 1 TB and if you want to take snapshot then make sure you have 1 TB free space in allocation given to your OrgvDC else you will have to increase the allocation quota.
I made a test and below are the details-
vCD version - 9.7 (Versin doesn't matter) VM hdd size - 40 GB VM MEM Size - 2 GB
When I created this VM, used allocation of OrgvDC storage quota changed from 0GB to 42 GB (40 GB disk and 2 GB memory). See below image. Allocated space is 200 GB
Now, I took the snapshot with MEM and below is the modified allocation usage. It means that 40 GB actual hdd size + 2 GB MEM size + 40 GB storage allocated for snapshot by vcd in allocation table.
In vcenter, snapshot size still will be in few MBs.
Why it happens because vcd can't know that how much your snapshot can grow. So, it use the logic that your snapshot can grow up to equal size of your HDD size hence block the same storage in advance.
8. I took snapshot without memory as well but allocation was till 82 GB. I think that "might be" because we still need to reserve the space to potentially be able to suspend the VM.
Conclusion is that, snapshot reserve the space equal to hdd(s) size and for mem it doesn't bother.
9. Last but not the least, you can take one snapshot if any VM which is hosted on vCD. In vCenter you can take many snapshots of a VM on different stages but in vCD, if you take second snapshot then first snapshot will automatically delete.
I hope this information will be helpful for you. Must comment if any doubt\query.