Fix - Invalid Guest on Virtual Center
December 9, 2008 · Print This Article
After encountering an ESX host problem the other night, I ran into an issue today with a VM guest showing up as “invalid” in virtual center. I was able to bring the guest back into VC without taking an outage by doing the following procedures.
First some background.
Due to circumstances still being investigated, the console of an ESX box froze disconnecting it from virtual center. All of the guests (approximately 40) on the host were still available and running, but VMware support confirmed that the state of the server was so degragated that it would require a reboot of the host and thus an outage of all the guests on it to fix. Since the ESX box is in an HA cluster, after some necessary VM guest applications were shut down the ESX box was rebooted and HA promptly brought up the guest VM’s onto other hosts in the cluster. All the guests affected were then checked out and appeared fine.
Thinking I was in the clear, today I noticed one of the affected VM’s icon in Virtual center appeared as blue and was italicized with the words “(invalid)” added after the vm name. Knowing that I had successfully started and checked this particular vm the night before, I was needless to say confused.
First things first, since the VM was a Linux guest I tried to ssh to the guest to see if it was still running. Luckily, I was able to log in to the VM and everything looked normal. Next, I logged onto the ESX host console that this VM had last been registered to and issued a vmware-cmd -l. There was no entry for the invalid VM so to double check I issued a ps -axf | grep -i and found that there was indeed a process running for the vm in question on this particular ESX host.
I decided to try to re-add the VM into VC manually by first removing the invalid guest from inventory in VC and then re-adding it by browsing to the .vmx file. To do this, I clicked on the ESX host in VC and on the summary tab double click on the data store that the .vmx file for this vm lives on. You can then browse to the directory for the vm guest and should be able to right-click the .vmx file and choose the “Add to inventory” option. I say should be able to because in this particular instance that option was grayed out and not selectable.
In an attempt to find out some more information from the ESX host logs, I then logged onto the ESX host the VM was last registered on and navigated to the /var/log/vmware directory. Issuing a grep -i gave a lot of good output. The interesting bit I found were some entries concerning .vmx file syntax errors. They appeared as follows:
hostd-9.log:[2008-12-07 17:28:17.388 'BaseLibs' 20241328 info] Reloading config state: /vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx
hostd-9.log:[2008-12-07 17:28:17.435 'BaseLibs' 20241328 warning] VMHSVMLoadConfig failed: File “/vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx” line 94: Syntax error.
hostd-9.log:[2008-12-07 17:28:17.448 'vm:/vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx' 3076424608 info] Failed to load virtual machine.
hostd-9.log:[2008-12-07 17:28:17.466 'vm:/vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx' 3076424608 info] Failed to load virtual machine. Marking as unavailable: vim.fault.InvalidVmConfig
hostd-9.log:[2008-12-07 17:28:17.467 'vm:/vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx' 3076424608 info] State Transition
(VM_STATE_INITIALIZING -> VM_STATE_INVALID_CONFIG)
hostd-9.log:[2008-12-07 17:28:17.467 'vm:/vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx' 3076424608 info] Marking VirtualMachine invalid
hostd-9.log:[2008-12-07 17:28:17.467 'Vmsvc' 3076424608 info] Loaded virtual machine: /vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx
hostd.log:[2008-12-08 09:18:04.516 'vm:/vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx' 60660656 info] State Transition (VM_STATE_INVALID_CONFIG -> VM_STATE_UNREGISTERING)
hostd.log:[2008-12-08 09:18:04.586 'vm:/vmfs/volumes/48dabc48-573b1344-46f8-001ec939c5cb/vmabc123/vmabc123.vmx' 60660656 info] State Transition (VM_STATE_UNREGISTERING-> VM_STATE_GONE)
These entries are from approximately 17 hours after I successfully restarted the invalid VM after the ESX host outage. Since they specified bad .vmx entries, I navigated to the .vmx file in question and made a backup copy of the file. Then I opened the original .vmx file and noticed the last three lines of the file were:
evcCompatibilityMode = "FALSE"
0001e9ebd3fbff"
evcCompatibilityMode = "FALSE"
The .vmx file is basically the configuration of the VM, and each line should have relevant information. The second to last line consisting of a multiple of digits is not a correct entry and the evccompatibilitymode entry should only appear once. Seems like I found the syntax errors the hostd logs were complaining about. After editing the .vmx file to remove the last two entries. I decided to stop and restart the vmware management agents to see if they could now pick up the orphaned VM guest process.
This was done using the following commands:
#/etc/rc.d/init.d/vmware-vpxa stop
#service mgmt-vmware stop
#service mgmt-vmware start
#/etc/rc.d/init.d/vmware-vpxa start
After restarting the services, I tried manually registering the VM guest to the host using #vmware-cmd -s register . This returned successfully so I checked for the VM’s operation state using vmware-cmd . The command showed that the VM was in a powered on state, which also meant that the VMware services now recognized the vm as a valid guest. I logged back into VC and sure enough the vm guest icon was now showing as powered on and I was able to open a console to the guest.
I’m still not sure who or what created the bad entries in the vmx file to begin with and why they didn’t cause an issue until so long after the guest was rebooted, but at least I was able to fix the issue without an outage.



Comments
Please leave us your comments.