vmware

VMWare EAM Failing, and not Allowing Upgrades

I was attempting to upgrade my homelab which I pushed to VMWare vSphere 8.0 because of… YOLO… and after a recent 8.0.1 update I was no longer able to upgrade individual ESXi hosts. I had already updated vCenter to the latest version, now I wanted to upgrade the hosts. That is my normal course of action, vCenter, then hosts; as recommended. When I went to upgrade the hosts I was told:

"Health check fails to retrieve data about service 'vSphere ESX Agent Manager' on '3 Node And Friends'. Verify that the service 'vSphere ESX Agent Manager' is running and try again."

This had me SSH into the appliance and looking at logs. (To quickly mention EAM = “vSphere ESX Agent Manager“) Here are some of the fun errors I was getting in “/var/vmware/eam/eam.log”:

  • “Re-login to vCenter because method: currentTime of managed object: null::ServiceInstance:ServiceInstance failed due to expired client session: null”
  • “failed to authenticate extension com.vmware.vim.eam to vCenter”

Some older guides mentioned unregistering EAM and then re-registering it. This broke my install even worse, and I ended up reverting to a snapshot. (Always snapshot before upgrades…) When I reverted back to before the vCenter upgrade, I realized that EAM was actually failing before the vCenter upgrade; except now I had EAM back in my extension list both on https://vcenter/mob/?moid=ExtensionManager and in vCenter, which was missing after I followed the guide saying to un-register it.

Now that I had the plugin registered, again, I found this KB, and this persons blog very helpful. I ran the recommended commands:

mkdir /certificate

/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store vpxd-extension --alias vpxd-extension --output /certificate/vpxd-extension.crt

/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store vpxd-extension --alias vpxd-extension --output /certificate/vpxd-extension.key

python /usr/lib/vmware-vpx/scripts/updateExtensionCertInVC.py -e com.vmware.vim.eam -c /certificate/vpxd-extension.crt -k /certificate/vpxd-extension.key -s vcenter.my.domain -u Administrator@vsphere.local

And then EAM suddenly showed happy, and the log started showing useful things:

2023-06-06T16:53:37.573Z |  INFO | vim-monitor | ExtensionSessionRenewer.java | 190 | [Retry:Login:com.vmware.vim.eam:f86509907b4cb7c6] Re-login to vCenter b
ecause method: currentTime of managed object: null::ServiceInstance:ServiceInstance failed due to expired client session: null
2023-06-06T16:53:37.573Z |  INFO | vim-monitor | OpId.java | 37 | [vim:loginExtensionByCertificate:443bbd7c03dce9c6] created from [Retry:Login:com.vmware.vim
.eam:f86509907b4cb7c6]
2023-06-06T16:53:37.947Z |  INFO | vim-async-2 | OpIdLogger.java | 35 | [vim:loginExtensionByCertificate:443bbd7c03dce9c6] Completed.

Thats it! Now I can run updates again! If anyone has the same issue, drop a line in the comments. I hope this isn’t a big new vSphere 8.0 issue. I had upgraded this appliance from 7.0, and perhaps that or a cert issue caused issues.

Below is some of my eam.log to help people:

2023-06-06T02:20:29.728Z | ERROR | vlsi | DispatcherImpl.java | 468 | Internal server error during dispatch
com.vmware.vim.binding.eam.fault.EamServiceNotInitialized: EAM is still loading from database. Please try again later.
        at com.vmware.eam.vmomi.EAMInitRequestFilter.handleBody(EAMInitRequestFilter.java:57) ~[eam-server.jar:?]
        at com.vmware.vim.vmomi.server.impl.DispatcherImpl$SingleRequestDispatcher.handleBody(DispatcherImpl.java:373) [vlsi-server.jar:?]
        at com.vmware.vim.vmomi.server.impl.DispatcherImpl$SingleRequestDispatcher.dispatch(DispatcherImpl.java:290) [vlsi-server.jar:?]
        at com.vmware.vim.vmomi.server.impl.DispatcherImpl.dispatch(DispatcherImpl.java:246) [vlsi-server.jar:?]
        at com.vmware.vim.vmomi.server.http.impl.CorrelationDispatcherTask.run(CorrelationDispatcherTask.java:58) [vlsi-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_362]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_362]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362]
2023-06-06T02:20:31.769Z |  INFO | vim-monitor | ExtensionSessionRenewer.java | 190 | [Retry:Login:com.vmware.vim.eam:9ae94019eb8cb9a2] Re-login to vCenter b
ecause method: currentTime of managed object: null::ServiceInstance:ServiceInstance failed due to expired client session: null
2023-06-06T02:20:31.769Z |  INFO | vim-monitor | OpId.java | 37 | [vim:loginExtensionByCertificate:b63ca4cf0b995a54] created from [Retry:Login:com.vmware.vim
.eam:9ae94019eb8cb9a2]
2023-06-06T02:20:34.775Z |  INFO | vim-async-2 | OpIdLogger.java | 43 | [vim:loginExtensionByCertificate:b63ca4cf0b995a54] Failed.
2023-06-06T02:20:34.775Z |  WARN | vim-async-2 | ExtensionSessionRenewer.java | 227 | [Retry:Login:com.vmware.vim.eam:9ae94019eb8cb9a2] Re-login failed, due
to:
com.vmware.eam.security.NotAuthenticated: Failed to authenticate extension com.vmware.vim.eam to vCenter.
        at com.vmware.eam.vim.security.impl.SessionManager.convertLoginException(SessionManager.java:329) ~[eam-server.jar:?]
        at com.vmware.eam.vim.security.impl.SessionManager.lambda$loginExtension$4(SessionManager.java:154) ~[eam-server.jar:?]
        at com.vmware.eam.async.remote.Completion.onError(Completion.java:86) [eam-server.jar:?]
        at com.vmware.eam.vmomi.async.FutureAdapter.setException(FutureAdapter.java:81) [eam-server.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$ClientFutureAdapter.setException(MethodInvocationHandlerImpl.java:731) [vlsi-c
lient.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$RetryingFuture.fail(MethodInvocationHandlerImpl.java:578) [vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$RetryingFuture$RetryActionImpl.proceed(MethodInvocationHandlerImpl.java:625) [
vlsi-client.jar:?]
        at com.vmware.eam.vim.security.impl.ExtensionSessionRenewer.retry(ExtensionSessionRenewer.java:149) [eam-server.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$RetryingFuture.setException(MethodInvocationHandlerImpl.java:541) [vlsi-client
.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setResponse(ResponseImpl.java:239) [vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.parseResponse(HttpExchangeBase.java:286) [vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchange.invokeWithinScope(HttpExchange.java:54) [vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.TracingScopedRunnable.run(TracingScopedRunnable.java:24) [vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.run(HttpExchangeBase.java:60) [vlsi-client.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_362]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_362]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362]
Caused by: com.vmware.vim.binding.vim.fault.InvalidLogin: Cannot complete login due to an incorrect user name or password.
        at sun.reflect.GeneratedConstructorAccessor58.newInstance(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_362]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_362]
        at java.lang.Class.newInstance(Class.java:442) ~[?:1.8.0_362]
        at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:174) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:25) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:30) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.parse(UnmarshallerImpl.java:167) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.unmarshall(UnmarshallerImpl.java:105) ~[vlsi-core.jar
:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:92) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:86) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.SoapFaultStackContext.setValue(SoapFaultStackContext.java:41) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.processNextElement(ResponseUnmarshaller.java:127) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.unmarshal(ResponseUnmarshaller.java:70) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.unmarshalResponse(ResponseImpl.java:284) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setResponse(ResponseImpl.java:241) ~[vlsi-client.jar:?]
        ... 7 more
2023-06-06T02:20:34.777Z | ERROR | vim-monitor | VcListener.java | 124 | An unexpected error in the changes polling loop
com.vmware.eam.EamRemoteSystemException: Unexpected error communicating with the vCenter server.
        at com.vmware.eam.vim.server.impl.VimRoot.rootOperation(VimRoot.java:106) ~[eam-server.jar:?]
        at com.vmware.eam.vim.server.impl.VimRoot.currentTime(VimRoot.java:78) ~[eam-server.jar:?]
        at com.vmware.eam.vc.VcListener.main(VcListener.java:140) ~[eam-server.jar:?]
        at com.vmware.eam.vc.VcListener.call(VcListener.java:118) [eam-server.jar:?]
        at com.vmware.eam.vc.VcListener.call(VcListener.java:58) [eam-server.jar:?]
        at com.vmware.eam.async.impl.AuditedJob.call(AuditedJob.java:58) [eam-server.jar:?]
        at com.vmware.eam.async.impl.FutureRunnable.run(FutureRunnable.java:55) [eam-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_362]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_362]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362]
Caused by: com.vmware.vim.binding.vim.fault.NotAuthenticated: The session is not authenticated.
        at sun.reflect.GeneratedConstructorAccessor57.newInstance(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_362]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_362]
        at java.lang.Class.newInstance(Class.java:442) ~[?:1.8.0_362]
        at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:174) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:25) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:30) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.parse(UnmarshallerImpl.java:167) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.unmarshall(UnmarshallerImpl.java:105) ~[vlsi-core.jar
:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:92) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:86) ~[vlsi-core.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.SoapFaultStackContext.setValue(SoapFaultStackContext.java:41) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.processNextElement(ResponseUnmarshaller.java:127) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.unmarshal(ResponseUnmarshaller.java:70) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.unmarshalResponse(ResponseImpl.java:284) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setResponse(ResponseImpl.java:241) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.parseResponse(HttpExchangeBase.java:286) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchange.invokeWithinScope(HttpExchange.java:54) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.TracingScopedRunnable.run(TracingScopedRunnable.java:24) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.run(HttpExchangeBase.java:60) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingBase.executeRunnable(HttpProtocolBindingBase.java:229) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingImpl.send(HttpProtocolBindingImpl.java:114) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.sendCall(MethodInvocationHandlerImpl.java:693) ~[vlsi-client.jar:
?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.executeCall(MethodInvocationHandlerImpl.java:674) ~[vlsi-client.j
ar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.completeCall(MethodInvocationHandlerImpl.java:371) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invokeOperation(MethodInvocationHandlerImpl.java:322) ~[vlsi-client.jar:?]
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invoke(MethodInvocationHandlerImpl.java:195) ~[vlsi-client.jar:?]
        at com.sun.proxy.$Proxy51.currentTime(Unknown Source) ~[?:?]
        at com.vmware.eam.vim.server.impl.VimRoot.rootOperation(VimRoot.java:101) ~[eam-server.jar:?]
        ... 9 more
2023-06-06T02:20:34.778Z |  INFO | vim-monitor | VcListener.java | 125 | Full stack trace: com.vmware.eam.EamRemoteSystemException: Unexpected error communic
ating with the vCenter server.
        at com.vmware.eam.vim.server.impl.VimRoot.rootOperation(VimRoot.java:106)
        at com.vmware.eam.vim.server.impl.VimRoot.currentTime(VimRoot.java:78)
        at com.vmware.eam.vc.VcListener.main(VcListener.java:140)
        at com.vmware.eam.vc.VcListener.call(VcListener.java:118)
        at com.vmware.eam.vc.VcListener.call(VcListener.java:58)
        at com.vmware.eam.async.impl.AuditedJob.call(AuditedJob.java:58)
        at com.vmware.eam.async.impl.FutureRunnable.run(FutureRunnable.java:55)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: (vim.fault.NotAuthenticated) {
   faultCause = null,
   faultMessage = null,
   object = ManagedObjectReference: type = ServiceInstance, value = ServiceInstance, serverGuid = f0ee8343-1721-4676-9069-1a837625c60b,
   privilegeId = ,
   missingPrivileges = null
}
        at sun.reflect.GeneratedConstructorAccessor57.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:174)
        at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:25)
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:30)
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.parse(UnmarshallerImpl.java:167)
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.unmarshall(UnmarshallerImpl.java:105)
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:92)
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:86)
        at com.vmware.vim.vmomi.client.common.impl.SoapFaultStackContext.setValue(SoapFaultStackContext.java:41)
        at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.processNextElement(ResponseUnmarshaller.java:127)
        at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.unmarshal(ResponseUnmarshaller.java:70)
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.unmarshalResponse(ResponseImpl.java:284)
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setResponse(ResponseImpl.java:241)
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.parseResponse(HttpExchangeBase.java:286)
        at com.vmware.vim.vmomi.client.http.impl.HttpExchange.invokeWithinScope(HttpExchange.java:54)
        at com.vmware.vim.vmomi.client.http.impl.TracingScopedRunnable.run(TracingScopedRunnable.java:24)
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.run(HttpExchangeBase.java:60)
        at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingBase.executeRunnable(HttpProtocolBindingBase.java:229)
        at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingImpl.send(HttpProtocolBindingImpl.java:114)
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.sendCall(MethodInvocationHandlerImpl.java:693)
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.executeCall(MethodInvocationHandlerImpl.java:674)
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.completeCall(MethodInvocationHandlerImpl.java:371)
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invokeOperation(MethodInvocationHandlerImpl.java:322)
        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invoke(MethodInvocationHandlerImpl.java:195)
        at com.sun.proxy.$Proxy51.currentTime(Unknown Source)
        at com.vmware.eam.vim.server.impl.VimRoot.rootOperation(VimRoot.java:101)
        ... 9 more

2023-06-06T02:20:34.778Z |  INFO | vim-monitor | VcListener.java | 131 | Retrying in 10 sec.

Homelab: Hypervisors – Part 2 – VMware

What I want to say is, after deciding it was time to move to VMware and attempt to use vSAN instead of Storage Spaces Direct (S2D) I wanted to research the hardware I had and see if it would work on ESXi 7.0. But of course I did not thoroughly read all of the changes vSphere 7.0 has brought. The holiday was approaching and I was going to use this time to do my migration. I had read up on vSAN and knew I needed cache drives. I bought a few small (250GB) NVME drives to put into each system. Getting those drives installed took a day because I needed to create a custom 3D printed mount. That would give me a good speed boost for my storage no matter what. Having recently upgraded to 10GB networking, I already had HP and SolarFlare 10gb networking cards. The time came and I copied all of the VMs I had in Microsoft VHDX format to my NAS (which wasn’t getting changed), then unplugged the first Hypervisor, and attempted a ESXi 7.0 Install.

One hardware change I should note, I am using USB 3.0 128GB thumb drives for the ESXi OS. This also allowed me to leave the original Windows drive untouched, allowing for easy rollback if this was a nightmare. I put the ESXi 7.0 disk into the first system AND! Error, no networking card found… I started searching online and quickly found a lot of people pointing to this article. ESXi 7.0 had cut a ton of network driver support, everything from the Realtek motherboard NIC to the 10GB SolarFlare card would not be supported, with no way around it (I tried). It comes down to 6.x had a compatibility layer in it where Linux drivers could be used if there were not native drivers, 7.0 removes this. I then got a ESXi 6.7 installer (VMware doesn’t allow you to just download older versions on a random account, but Dell still hosts their version) and installed that. Everything came online and started working. Now that I knew the one thing blocking me was that, I installed all my systems with 6.7 while I waited for the 3 new Supermicro AOC-STGN-i2S Rev 2.0 Intel 82599 2-Port 10GbE SFP+ cards I ordered. Using the Intel 82599 chipset, they have wide support. 2 Ports is nice; and, the 2.0 revision of the card is compact allowing them to fit into my cases. So far I recommend them, they also are around $50 on eBay, which is not bad.

I played with a few of the systems, but decided to wait till the new network cards were in a few days later to initialize vSAN and copy all of the data back over. I used this guide, from the same author of the other post about ESXi 7.0 changes to configure the disks in the system how I wanted them. At one point I thought I was stuck, but I just had to have VMware rescan the drives. I setup a vSphere appliance on one of the hosts. This gives me all the cluster functionality, and single webpage to manage all the hosts. Here I an also create a “Distributed Switch” which is a virtual switch template which can be applied to each of the hosts. I can set the vlans I have, and how I want them to work in one place, then deploy it to all the systems easily. This works as long as all your hosts have identical network configurations. After watching a YouTube video or two on vSAN setup I went ahead setting that up. The setup was straight forward, the drives reported healthy, and now I was ready to put some data on it.

A small flag about vSAN, it uses a lot of RAM to manage itself and track which system has what. I was seeing about 10-12 gb of ram used on each of my hosts, that has 32gb to begin with. There are guides online for this, and I believe it can be tweaked. It has to do with how large a cache drive you have, and your total storage. Not a big deal, but if you are running a full cluster, something to be aware of.

Migrating the old VMs from their Hyper-V disk images to VMware was not too difficult. I used qemu-img to convert from VHDX to VMDK. The VMDK images that qemu creates are the desktop version of the VMDK format. VMwares desktop products create slightly different disk images than the server versions. I then unloaded these VMDKs onto the vSAN and used the internal vmkfstools on ESXi Shell to convert those images to the server versions. The Windows systems realized the changes, and did a hardware reset, they worked right away. The Linux systems (mostly CentOS 8) would not boot under any of the SCSI controllers VMware had. After reading online, and a bit of guessing, I booted them with the IDE controller which appeared to be the only one dracut had modules for. Once the systems were online I could do updates, and with the new kernel version they had available they made new initrd images. These images being created on the platform with the new virtual hardware, installed the SCSI controller modules and could then be changed from IDE to SCSI mode.

So far other than the hardware changes that needed to happen, moving to VMware has worked out well. I am using a VMware Users Group license, https://www.vmug.com/, which is perfect for homelabs, and doesn’t break the bank. I am starting to experiment with some of the newer or just more advanced VMware features that I have not used before. We spoke of vSAN, I also have setup DRS (Distributed Resource Scheduler, allowing for VMs to move between hosts as resources are needed), and want to setup a key manager server to play with VM encryption and virtual TPMs.

Now that I am off of that… unsupported… Storage Spaces Direct configuration updates are much easier. I can put a host into maintenance mode, which moves any running VMs, then reboot it and once its back online, things re shuffle. This does mean I need enough space on the cluster for 1/3 of it to be off at a time, but that is ok. I am running 32gb of ram, with 2 empty DIMMS in each system, when the time comes I can inexpensively add more RAM.

If you/your work has a NetApp subscription, there is a NetApp Simulator which is a cool OVA you can deploy on VMware to learn NetApp related things. I was using that at work to learn how to do day to day management of NetApps. Another neat VM image that comes in the form of OVA I found recently is Nextcloud’s appliance. They have a single OVA that has a great flow for taking you through configuring their product.

Overall the VMware setup as been as easy as I thought it could be. Coming from a workplace who runs their management systems without a lot of access, it has been nice having vSphere 7.0. It automatically checks in online, and lets me know when there are updates for different parts of the system.

Homelab: Hypervisors – Part 1 – Hyper-V

For the last year I have been running Microsoft Hyper-V on Server 2019. Due to mounting issues I moved over to VMware vSphere, this first post will discuss my Hyper-V setup and feedback about it; then the next post will speak about my migration and new setup. When I started building out my home setup I was studying to take a Windows Server certification for work, with that and about half of the virtual machines I had at home were Windows, Hyper-V was the choice for hypervisor. One feature that stood out to me was Dynamic Memory on Hyper-V because my home setup was not that large; as well as the automatic virtual machine activation (Microsoft Doc). Later, I was attempting to run Storage Spaces Direct (S2D, SSD would be a confusing acronym so Storage Spaces Direct goes by S2D), except my setup was not supported, which made me run a… not recommended configuration… more on that soon; and I kept having issues around the Hyper-V management tools. I decided it was time to migrate from Hyper-V and S2D to VMware vSphere and vSAN.

(Please note for feedback I am discussing Windows Server 2019 here, and VMware vSphere 7.0)

Selecting a Hypervisor

I wanted to briefly go over a bit more of my thought process when selecting a Hypervisor. I already mentioned some of the reasons it made sense, but I also wanted to mention more of my thoughts. I started the search looking for a Type-1 Hypervisor awhile ago, I was going to be running on a Intel NUC with a few Windows and Linux VMs. Being a homelab, I thought I would look at free options.

Having used Proxmox years ago with a ton of issues I wanted to steer clear of it (looking back this probably was not fair, it was several years since I last used it and I believe it has gotten better); I had also used CitrixHypervisor (formerly XenServer) with many issues including a storage array killing itself randomly in one reboot. One of the requirements I gave myself was to have a real management system, I did not want to run KVM on random Linux hosts. That brought me to the 2 big ones, VMware and Microsoft. VMware has a lot of licensing around different features, but was the system I knew better. I could get a VMware Users group membership for homelabs, and that would take care of the licensing. On the other hand, with me studying for Windows Server tests, and the book speaking of different Windows Server and Hyper-V features, I thought I would give it a try. The following are things I liked about it, and then what turned me away from it.

Great Things About Hyper-V

I want to give a fair overview of my year plus running Hyper-V. There are some great features; dynamic memory allows you to run modern OSes with a upper and lower limit on memory, and then most of the time while the VMs are idling your memory footprint is very low. Another great feature is the earlier mentioned automatic activation, as long as your Hyper-V host (Windows Server 2019 Standard or Datacenter not the free Hyper-V Server) is activated, it can pass that activation to your guests and allow you to run Server 2012+. All of the services are running on Windows, out of the box you get all the benefits there; such as creating group policies for your servers and using that to do a lot of your fleet management. I recently have started using Windows Admin Center, which gives you a single view on all your Windows systems and allows you to update them all in one place. Hyper-V works well if you have a single node, and want to do basic things with it; when you move to doing clustering and advanced storage Hyper-V starts to give you a lot of issues.

Hyper-V Manager on Server 2016 (and 2019)

General Hyper-V Issues

To dive more into the woes I was having with Hyper-V, some of it is my own doing, some of it is the tools. Even before I was running S2D, I was running several Hyper-V boxes each with its own storage. I will go into my issues with S2D soon. Hyper-V’s management tools are not good. You have several options on how you will manage the systems, the first and easiest is Hyper-V Manager. This is a simple program that allows you to 1-1 manage a Hyper-V system. I mean 1-1 because if you have VMs that are part of a failover cluster, you can connect to them here to view them but that is it. Hyper-V Manager only allows you to manage VMs that live on one hypervisor with no redundancy; for casual use, it works. I use it for my primary AD host because I don’t want anything fancy going on with that box, when I need to start everything from scratch, I need AD and DNS to come up cleanly.

Maybe you have outgrown the one off server management and want to move your systems into a cluster. Now its time for Failover Cluster Manager. You add all the servers into a Failover Cluster together, and get through the checks you have to pass. Then there is a wizard to migrate your VMs from Hyper-V Manager into Failover Cluster Manager. One requirement to do this is to have storage that every box in the cluster can use, either S2D or iSCSI (you can do things like Fibre Channel, but I was not going to do that). I used the tool and the VM said all its files were moved onto shared iSCSI storage that all the machines could use. Should be good right? Things seem to be working. Then I would move certain VMs to other hosts and it would fail, just some of them. It came down to either a ISO, or one of the HDD hibernation files, or checkpoints (Microsoft version of Snapshots) being on one of the hosts, and the UI NOT mentioning this. Thus, when the VM tried to load on another system, a file it needed was not there and it could not load. Failover Manager is also fairly simplistic and doesn’t not give you a ton of tools. Again, Windows Admin Center adds some nice info on a standard cluster, but it is not fantastic; leaving you to dig through Powershell to try to manage your Failure Cluster.

On occasion the Virtual Machine Manager service that is in charge of managing the VMs, and gives the interface to monitor, modify, and access the VMs would lock up. Hyper-V Manager and Cluster Manager would show no status for the VMs, and I would have to restart the service. These minor issues would stack up over time.

To manage Hyper-V remotely (meaning from any other system) you need to setup Windows Remote Management, winrm. This system by default uses unencrypted HTTP. Encryption can be turned on with a few commands in the command line, but it creates a cert based on your hostname and IP address. If you have more than one IP, OR you are in a Failover Cluster this means you will be spending a lot of time customizing these certificates because it will just get a cert for your host, and when that node because the Failover Cluster manager, it needs that virtual IP and hostname in the cert. I had to create different certs for that virtual interface and put them on the different nodes manually, there are people in the Microsoft support forum talking about this. Here is an example incase it helps anyone of creating a cluster listener after manually creating a cluster cert.

winrm create winrm/config/listener?Address=IP:192.168.3.8+Transport=HTTPS ‘@{Hostname=”home-cluster.home.ntbl.co”;CertificateThumbprint=”BFCDE6C85A0B12426A44BC3F44236313317C63CC”;ListeningOn=”192.168.3.8″}’

There also is a System Center Manager that is another package you can purchase from Microsoft to manage Hyper-V. Having dealt with System Center to manage Windows systems at work, I did not want to touch that at all. Hyper-V has a lot of things going for it, and the underlying code running VMs works well 99% of the time. I wish Microsoft put more time to grow the tools you use for managing it. Parts of the process like setting up networking on different nodes could be much smoother, in comparison with VMware Distributed Switching. I installed one of the systems I had on Windows Server Core (no user GUI) to learn more about that. If your primary interface needs a VLAN for management, this a painful experience. You have to create the Hyper-V virtual switch and attach your management interface to it and assign the VLAN all from within Powershell. If you need to do it, this is a good resource. Thing like this, and the winrm issues, make Hyper-V feel unpolished even after being in the market for years.

S2D Issues

I wanted to put these systems into a failover cluster, allowing them to move VMs between each other as needed, except then I needed shared storage. I attempted to use iSCSI from my FreeNAS box; alas, with 7, old, spinning drives, the speed was not great with more than a few VMs. Then I thought I had some spare SATA SSDs and I could use S2D to do shared storage. For those who have not attempted to setup S2D, your hard drives have to either be NVME, or an internal HDD controller. The system will refuse to work on any configuration that it does not like. With most of my systems being small form factor PCs, and I am just using a few SATA drives I got USB 3.1 4 bay SATA enclosures. Not optimal but decent speed and it allows me to add a good number of drives to each system without a large expense of a full RAID or SAS controller.

S2D refused to work with these drives. I believe it came down to the controller the USB drives was using and it not signaling something the systems wanted. The drives showing up as Removable also made Windows refuse. There are commands you can run like below that will enable more disk types to work, but I could not get my dries to show up.

(Get-Cluster).S2DBusTypes=4294967295
Powershell command to enable all disk types in Storage Spaces Direct from this article

Then I had an idea, an evil and terrible and great idea. I created 3 VMs, one on each Hyper-V box, then gave the 3 disks of each server to the VM in full. Now I had 3 VMs, each with 3 drives (and a separate OS drive) to run S2D. To the VM OS, it looked like they had 3 SCSI HDDs that was happy to use for S2D. I put these three Windows Servers into a failover cluster together, and setup S2D. Overall setup was not too bad. If you have Windows Admin Center configured it is much easier to setup and use Storage Spaces Direct than the GUI in Windows Server. There are a ton of Powershell commands for configuring S2D and you will probably end up using a bunch of them.

This worked! The systems were in a failover cluster, of their own, and my main failover cluster that controlled the VMs could use it as shared storage. If you use Windows Admin Center you can get nice stats from the Storage cluster about the sync status of the disks. Every time one of the storage nodes reboots, the cluster needs to re-sync itself. There are different RAID levels you can set the S2D setup to, I set it to have 2 additional copies of each set of data, this means each node has a full copy of the data; this uses a lot of space but i can have 1 node run everything (which ended up being overkill).

This setup ran for a while decently, other than the small VM overhead, it was fast and worked. The issues arose when the second Tuesday of the month came around and I needed to do patching. The storage network was sitting on top of the Hypervisors, and they didn’t really understand that. I often ran into problems where I would shutdown one of the storage nodes to patch it, and patch the host, then the other 2 nodes would lock up or say all storage was lost. This would occur even when preemptively moving who was the main node, and prepping to restart. With storage dropping out from under all the VMs, they would die and need to manually be rebooted or repaired. This made me start to look for a new setup after a few of these months.

All in all, I ended up running for over a year about 5 Windows VMs and 5 Linux VMs on Hyper-V with good uptime. One benefit of Hyper-V is you get the hardware compatibility of Windows, which is vast. The big downside of Hyper-V is the tools around it. At times they seem unfinished, at other times buggy. My next post will be about the migration, and my experience with vSphere 7.0!

Resources

Good guide for Storage Spaces Direct

http://woshub.com/configure-storage-spaces-direct-s2d-windows-server-2016/

Booting VMware vSphere ESXi 7.0 on Certain Dell Hardware

I recently attempted to boot a Dell Precision M6800 into ESXi 7.0u1 to test some functionality before going to prod. Unfortunately this was met with “Invalid Partition Table”, switching between UEFI and BIOS boot didn’t seem to fix it giving “No boot device available” instead. After searching online I found this, https://communities.vmware.com/t5/ESXi-Discussions/quot-Invalid-Partion-Table-quot-Error-booting-ESXi-7-from-USB/m-p/1823852 which had comments such as “just dont run on a laptop” which was not very helpful.
I spent a chunk of time playing with the partitions and seeing how they were configured. I noticed when I went into the UEFI on the laptop it said it couldn’t find any file systems available, but when I loaded Windows or Linux on the system, the UEFI could see those boot partitions. I tried updating the firmware like Dell recommended, with no change. I then realized the ESXi 7.0 image is FAT16 for the EFI partition, while all other EFI partitions I have seen are FAT32.

I copied the files and folder out of the boot partition, reformatted it with FAT32 instead of FAT16, marked it as EFI type (ESP in Gparted), and moved the files back. The system booted fine the first time, with ESXi running happily. If you need boot ESXi on a Dell M6800, or M4800, or other give that a try. If this worked or didn’t work for you leave a comment below.

Fixing CentOS 6.6 Kickstart Issues

I recently have been working on a system automating CentOS 6 installs for servers. When upgrading to 6.6 my test environment (VMWare Fusion) stopped working. I got a hard kernel panic and halt on loading. Now VMware forums and CentOS site, have posts about work arounds for this. A bunch of them are complex and involve changing modules around, and other files. There is a very easy fix for this, and its detailed below.

NOTE: I am running VMware Fusion, so I will open a package, in Windows and Linux you dont have to do this, just go to the folder.

  1. Stop the VM
  2. Find the VM files
    1. For Fusion there will be a %Your VM%.vmwarevm file, you have to right click that and “Show package contents”
  3. There should be a %Your VM%.vmx file, open that with a text editor
    1. If you are on a Mac, or other machine that likes to do smart quotes, make sure to use a program like vim or Sublime Text that doest add “smart quotes”
  4. A line will read: ethernet0.virtualDev = “e1000e”, change that to ethernet0.virtualDev = “e1000”, just remove the last e. This changes the card from a E1000 in enhanced mode to a normal one. Now CentOS 6.6 will boot.

Here are some place people have discussed issues:

https://communities.vmware.com/message/2443777