VMM Host not responding – WinRM Error and access is denied

If you have a Virtual Host in Virtual Machine Manager that is not responding, and forcing a manual refresh returns an error like this:

Error (2910)
VMM does not have appropriate permissions to access the resource C:\Windows\system32\qmgr.dll on the server.
Access is denied (0x80070005)

It can often be remedied by one of the following: Re-install the VMM agent, restart the virtual machine manager agent and WMI services or restart the virtual host.  It is also worth making sure your hosts are all up to date as well.

Occasionally I see a host where this doesn’t work and no matter what it, remains as “not responding” in VMM.  For me the case appears to be a broken winrm configuration.  You can be fooled into thinking winrm is setup correctly as a “winrm /quickconfig” returns as already setup, and the winrm service is running.

It looks like all the “winrm /quickconfig” command does is check that winrm has been enabled, it wont reset other possibly incorrect configurations or broken settings.

Comparing the winrm configuration and registry of a working identical host to a “not responding” host I have found the following commands will correct the deviated settings and usually results in a host that now responds to VMM.

reg add HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System /v LocalAccountTokenFilterPolicy /t REG_DWORD /d 1 /f

winrm set winrm/config/service/auth @{CredSSP=”True”}
winrm set winrm/config/winrs @{AllowRemoteShellAccess=”True”}
winrm set winrm/config/winrs @{MaxMemoryPerShellMB=”2048″}
winrm set winrm/config/client @{TrustedHosts=”*”}
winrm set winrm/config/client/auth @{CredSSP=”True”}

Be sure to run these on the effected host in an admin command prompt.

Cannot deploy Virtual Machines via App controller or virtual machine manager

I found myself unable to deploy Virtual Machine templates via App controller or virtual machine manager.  The detailed error was available when attempting via VMM and stated:

“The projected CPU utilization exceeds the CPU utilization of 0% specified at the host reserve level”

All the hosts had zero stars and I couldn’t continue.  Normally this is a useful message as it is preventing me from putting too many virtual machines on my hosts and stretching them too far, however in this instance I knew that we should be OK – we need to have 50 or so VMs on each host and they don’t use much CPU.  It was possible to manually make more virtual machine in hyper-v and then manage them post deployment in VMM but that defeats the point of having a scripted template deployment and app controller setup so users can deploy their own Test VMs?

I found that there is a somewhat hidden option in the host reserve settings available only in PowerShell.  You can have a look at what yours are configured to use with the Get-SCHostReserve” PowerShell command.  The setting you need to change to bypass CPU reserves is the “CPUReserveOff” parameter.

get-schostreserve

So as you can see I have my CPU reserve level set to 0% but when VMM evaluates the deployment if it believes there will be less than 0% CPU available it says no.

You can change this with the Set-SCHostReserve command.

Get-SCHostReserve -VMHostGroup “your host group here” | Set-SCHostReserve -CPU -Enabled $false

fix-get-schostreserve

“The projected CPU utilization exceeds the CPU utilization of 0% specified at the host reserve level” but it lets you continue and deploy anyway.

 

WMI reset failed

Recently I have found myself in a position where I have needed to reset WMI to resolve various problems such as SCCM client failing to install/detect and disappearing cluster namespaces.  Generally resetting the WMI is a last resort and should only be tried when you have tried all other options. e.g. restart WMI service, restart server.

The command to reset the WMI repository must be run in an admin / elevated command prompt:

winmgmt /resetrepository

However sometimes this fails with this error:

C:\Users\adminuser>Winmgmt /resetrepository
WMI repository reset failed

Error code:     0x8007041B
Facility:       Win32
Description:    A stop control has been sent to a service that other running services are dependent on.

While it is possible to work around this by stopping the dependent services manually or even editing the registry so that nothing is dependent on this (as I have seen suggested else where) there is a much easier solution.

1. Launch an elevated (admin) powershell window.

2. Enter the following command.

Stop-Service winmgmt -Force; winmgmt /resetrepository

3. Restart the computer you just reset the WMI repository on.

10 tips for a Happy Hyper-V or VMWare Network

 

  1. Make sure your external out of bound access is working.   (if you are lucky enough to have it that is) There is nothing worse than having to trek to a remote site or drag someone out of bed just to go and press F1 on a keyboard.  So test your iLO or IP KVM now and make sure It works over your remote access solution as well.
  2. Keep an eye on those disks. (or better yet have an automated solution to monitor them for you)  Not just on the guest machines but on your cluster shared volumes.  It is a lot better to forecast a growth trend and plan for disk growth than it is to run out unexpected and suffer an outage or worse, corruption.
  3. Test those backups.  Virtualization makes it easier than ever to take portable backups of all your servers but have you ever tested them?  It is much better to find out there is a problem in your business continuity plan in a test when everything is working than after a disaster so go plan a test now.
  4. How many hosts can you lose?  It is tempting to use all the available CPU and RAM on all your hosts but what happens when you have a failure?  Even keeping at least enough spare capacity for the loss of one host can be a risky situation.  This is especially true with Hyper-v; if you want to upgrade to the latest 2012 Hyper visor you will need to start again with a new cluster and move hosts over one at a time, so if you only have N+1 spare hosts during a migration you may well have no spare capacity to cope with loss of a host at all.
  5. So where will we restore all these backups to?  For all but the largest and most cash rich organizations an off-site backup datacentre is likely to be a dream.  Get an agreement in place now for new hardware in the event of a DR situation or have an account ready with Azure/Amazon/Rackspace etc. to host all your guest machines.  Once again, test it as the devil is in the details and have as much as possible pre-configured, there is nothing worse than battling firewall rules when a configuration could have been prepared and tested earlier.
  6. Updates, plan ahead how you are going to deploy updates and when.  Are you going to have them install automatically or will you need to test them in a dev environment now and deploy them to production later?  Either way think about it now and plan accordingly, no one likes downtime and its always a good idea to keep all of your hosts on the same patch level.
  7. Document everything.  Something which during the initial build you know like the back of your hand will be quickly forgotten in a few months when you need to re-visit it for a change.  What happens when you leave for a new job or fall under a bus?  Your current employer will still need to keep things running and its never nice for the newcomer to walk into an undocumented environment where everything has to be worked out from scratch.
  8. Log changes.  If you have an official change control procedure then use it, but even if your organization doesn’t have any official change control, write down any changes you are making, in a helpdesk call, email or anywhere you can refer to if required.  Better yet try to make changes in a pair.  If both you and a colleague agree on a change it is less likely that you have forgotten something crucial and when you leave for your 3 week jungle adventure holiday there is someone else in the team who knows what was done.
  9. Licensing.  Make sure your windows hosts are all activated and any VMware hosts have the required license keys installed.  You don’t want to have your grace period run out and leave you in the lurch.  You have bought licenses haven’t you?
  10. Security, access and auditing.  You should know exactly who has access to what and have auditing enabled for all changes.  Not so that you can apportion blame but so you know who to talk to about a particular change or can easily spot unauthorised or unexpected changes should they occur.  Also “have a go” at your hosts and guests, check what services they have available and if necessary get a professional in to check your security.  It is a lot nicer when a penetration tester finds a hole than a malicious hacker.

KDC Authentication problems with 2003 to 2008 domain functional level

Recently I have had problems connecting to the console on a number of 2008 R2 Hyper-v guest virtual machines.  The error was “An Authentication Error Has Occurred.  The Encryption Type Requested Is not supported by the KDC” while I have also had a single Exchange 2010 server fail with the following event IDs: 2102, 2103, 2114, 9106 all reporting LDAP problems, non-responding domain controllers and global catalogs:

Process MSEXCHANGEADTOPOLOGYSERVICE.EXE (PID=1696). Topology discovery failed, error 0×80040952 (LDAP_LOCAL_ERROR (Client-side internal error or bad LDAP message)). Look up the Lightweight Directory Access Protocol (LDAP) error code specified in the event description. To do this, use Microsoft Knowledge Base article 218185, “Microsoft LDAP Error Codes.” Use the information in that article to learn more about the cause and resolution to this error. Use the Ping or PathPing command-line tools to test network connectivity to local domain controllers.

Process STORE.EXE (PID=4084). All Global Catalog Servers in forest DC=xxx,DC=xx,DC=xx are not responding:

Process STORE.EXE (PID=4084). All Domain Controller Servers in use are not responding:

Attempting to open the Exchange management console on the local server console ended with a  HTTP server error status 500 and “Kerberos” authentication failed.

The Exchange server was able to ping and resolve all DNS names correctly and the problem went away on restarting only to re-occur in 24 hours or so.

The rather simple resolution in the end turned out to be restarting the “KERBEROS DISTRIBUTION KEY (KDC) service” on all Domain controllers.  While Restarting all Domain controllers in their entirety is also a good idea it isn’t always possible (or desirable) on a live production environment.