Add virtual machine host fails with error 20408

I recently had a problem adding a host to a VMM server – all the obvious things had been checked. WinRM was enabled, firewall rules were in place. Service account had admin rights and DNS was correct.

Still, every time I attempted to add the host an error occurred:

“Error (20408) VMM could not get the specified instance Microsoft:{668f165d-4dae-bcb6-5007ff1fc2e8} of class http://schemas.microsoft.com/wbem/wsman/1/wmi/root/standardcimv2/MSFT_NetAdapterRssSettingData on the server server.fqdn. The operation failed with error NO_PARAM”

In this instance, the server was a 2016 one which has been upgraded from 2012 R2. The fix was bizarre. Save all VMs and remove the vswitches so that only normal physical adapters remain and then recreate the vswitches. The config was identical but clearly, something behind the scenes was wrong and recreating the vswitches worked. Retrying the same job on VMM resulted in success and the host was added to VMM.

WSUSpool keeps stopping and console shows reset node

I recently found myself in a situation where WSUS would only work for a few minutes or even seconds at a time. A restart or IISReset could bring it back for a few minutes but it would soon stop again. The Configuration manager console didn’t show any errors but it also could not see any new updates.

The event log contained this message:

The WSUS administration console was unable to connect to the WSUS Server via the remote API.

Eventually the fix was to increase the amount of memory avaliable to the app pool from the default 1843200 KB – you could set this to 0 so there is no limit or to a higher sensible limit. After doing this and running an IISRESET the app pool remained running and I was able to syncronize new updates as well as service updates to clients.

To do this open up IIS and click the plus by your servername, then on “Application pools”. Next right click on WsusPool and then left click on “Advanced Settings”, then scroll down and locate the “Private Memory limit (KB)” near the bottom and edit this value to 0 or something higher.

Meltdown – Patching centos

The patches are out for centos 5/6/7 and you can install them simply by the normal update command.

yum update

and then restarting.

To check the patches are installed run:

rpm -q –changelog kernel | egrep ‘CVE-2017-5715|CVE-2017-5753|CVE-2017-5754’

and make sure you have entries for all three CVE numers.

Broken libc error: Can’t exec “locale”: No such file or directory at /usr/share/perl5/Debconf/Encoding.pm line 16.

When trying to fix a system that someone (something?) had managed to shoe horn a broken/old version of libc-bin onto an Ubuntu 14 server I ran in to this error when trying to run “Apt-get install” or “Apt-get upgrade”

Can’t exec “locale”: No such file or directory at /usr/share/perl5/Debconf/Encoding.pm line 16.
Use of uninitialized value $Debconf::Encoding::charmap in scalar chomp at /usr/share/perl5/Debconf/Encoding.pm line 17.
Preconfiguring packages …
dpkg: warning: ‘ldconfig’ not found in PATH or not executable
dpkg: error: 1 expected program not found in PATH or not executable
Note: root’s PATH should usually contain /usr/local/sbin, /usr/sbin and /sbin
E: Sub-process /usr/bin/dpkg returned an error code (2)
So you cant re-install or repair the libc package because the package manager depends on it working in the first place.

Fortunately we can download and install the package ourselves:

apt-get download libc-bin
dpkg -x libc-bin*.deb unpackdir/
sudo cp unpackdir/sbin/ldconfig /sbin/

This is enough to get apt-get install working again and we can reinstall the package properly, then upgrade.

sudo apt-get install –reinstall libc-bin
sudo apt-get install -f
sudo apt-get upgrade

SCOM 2016 Domain controllers agent status greyed out

I have noticed with operations manager 2016 that by default the agent enters a grey state on all domain controllers. This looks to be caused by a permissions problem with either the local system account (Or your alternative if you have configured one) Fortunately it is a simple fix. Assuming you have winrm setup and working and have administrator access the following should resolve the issue for you:

1. Connect to the server with the grey status:

WINRS -r:MYDCNAME01 cmd.exe

2. Change directory to the Agent location:

cd “C:\Program Files\Microsoft Monitoring Agent\Agent”

3. use the hslockdown program to permit your useraccount.

HSLockdown.exe /A “NT AUTHORITY\SYSTEM”

and/or

HSLockdown.exe /A “mydomain\someaccount”

4. Stop and start the health service:

net stop healthservice
net start healthservice

Your agent status should shortly turn green. Happy monitoring!

Quickly check the dates on a certificate from the command line in Linux

sometimes you need to quickly check an endpoint or site and grab the dates/ when a certificate expires.  Of course you could just log onto the server in question and inspect the certificate or try and coax your browser into letting you view the certificate properties.  That said this is often quicker and could be handy as part of a larger script, check or automation piece.

This quick one liner will show us when the certificate on centos.org is valid for, just change the site name and port to that of the endpoint you want to check.

[[email protected]~]# echo | openssl s_client -connect centos.org:443 2>/dev/null | openssl x509 -noout -dates
notBefore=Jul 29 00:00:00 2014 GMT
notAfter=Aug  2 12:00:00 2017 GMT

ADFS Configuration Wizard Fails with Error “The certificates with the CNG private key are not supported”

When running the ADFS configuration Wizard or renewing a new service communications you will get a “The certificates with the CNG private key are not supported” error unless the certificate was created with a legacy non CNG key. There is a useful blog post here https://blogs.technet.microsoft.com/mspfe/2013/11/29/adfs-configuration-wizard-fails-with-error-the-certificates-with-the-cng-private-key-are-not-supported/ on what to do if you are using a Microsoft certificate authority.

Assuming you are using someone else’s CA the following steps can be used to get a CSR and legacy non CNG private key (Will work post sha1 sunset)

1. Run an MMC and add the local computers certificate store.

2. Expand Personal and Certificates, right click on Certificates > All tasks > Advanced > Create Custom Request.

3. Click on Next then select “Proceed without enrollment policy” and Next again.

4. Change the template to “(no template) Legacy key”

5. Expand the details drop down, click on Properties and make sure to set the correct CN, DNS names, country code etc as required. You must also set the key size to 2048 or higher and you may want to mark the key as exportable if you have other servers that need to share the same private key.

6. Click through and save the CSR and provide this to your CA.

7. When you have the certificate from the CA, import it to the personal store on this computer.

8. Run this command in PowerShell to determine the CertificateHash of the new certificate:

dir cert:\localmachine\my

Review the list of returned certs and note the Thumbprint of the new one.

9. You can now set the service to use this certificate. While the GUI will let you select the service certificate, the http.sys hosted SSL endpoint can only be updated in powershell so you might as well do both like that.

Set-AdfsCertificate -CertificateType Service-Communications -Thumbprint thumbprint

Set-AdfsSslCertificate -Thumbprint thumbprint

If you have any doubt as to what the service is configured to use either before or after the change you can run the equivalent get command. e.g. Get-AdfsCertificate

(If in doubt as to which one is configured, them get-adfssslcertificate can be run as well)

Linux file system is full, but can’t find any large files? – When df and du don’t agree

Often df and du do not agree as df will be reporting on the disk space that is used by reading the filesystem meta data while du and ncdu report the disk space that is used by reading the information from the directory tree. Reading the whole tree is slower but it gives you a better picture of where the data is. I recently came across a situation where snmp was reporting a disk as nearly full and sure enough df- h shows that things are nearly full:

[email protected]:~# df -h
Filesystem Size Used Avail Use% Mounted on

udev 16G 0 16G 0% /dev

tmpfs 3.2G 17M 3.2G 1% /run

/dev/mapper/ubuntu1404lts–vg-root 8.5G 7.5G 587M 93% /

tmpfs 16G 472K 16G 1% /dev/shm

tmpfs 5.0M 0 5.0M 0% /run/lock

tmpfs 16G 0 16G 0% /sys/fs/cgroup

/dev/sda1 236M 87M 137M 39% /boot

While du shows a different picture:

[email protected]:~# du -Lsh /
5.4G /

So df thinks 7.5G is used while du thinks only 5.4G is in use. Where is the missing 2.1G?

Initially I thought this could be due to hidden files or areas the process cannot read but it turned out to be something much simpler. When a file is deleted, but there is still an active process writing to it. The file is hidden from utilities like du as it is a deleted / unlinked file. Unfortunately the space is not actually released until the process stops writing to the file. Running lsof +L1 will show all files that are unlinked open files.

For example:

[email protected]:~# lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
dockerd 902 root 13r REG 252,0 1691426449 0 266537 /var/lib/docker/containers/d3569390cd7fed1eadba67627-json.log (deleted)
dockerd 902 root 14r REG 252,0 1691426449 0 266537 /var/lib/docker/containers/d3569390cd7fed1eadba678627-json.log (deleted)
dockerd 902 root 17w REG 252,0 1691426449 0 266537 /var/lib/docker/containers/d3569390cd7fed1eadba678627-json.log (deleted)
mysqld 924 mysql 4u REG 252,0 0 0 130242 /tmp/ib9FrYkL (deleted)
mysqld 924 mysql 5u REG 252,0 0 0 132358 /tmp/ibsW1bdg (deleted)
mysqld 924 mysql 6u REG 252,0 0 0 132359 /tmp/ibPi2p5K (deleted)
mysqld 924 mysql 7u REG 252,0 0 0 132360 /tmp/ibuTFORK (deleted)
mysqld 924 mysql 11u REG 252,0 0 0 132361 /tmp/ibH3DXVf (deleted)

The solution then becomes obvious, restart the server, process or service that is writing to these files.

Windows 2012 Dedupe – huge chunk store and 0%

One of the best new features in 2012 was the file de-duplication.  That said it does sometimes behave a bit strangely under some workloads.  I recently faced an issue where a a 40TB volume with de-duplication enabled resulted in a huge chunk store that was using more space than the original data!

chunky

At a glance it looks like the best thing to do is turn off dedupe for this volume, but all this seems to do is disable further dedup work, anything that is already deduped will remain so.  I found the best/fasted way to “re-hydrate” your data and get rid of the chunkstore (You could just format the volume if you don’t need the data) is to leave the dedupe enabled, but set an exclusion on the root.

Then run the commands below in power-shell (Assuming drive letter F:):

Start-DedupJob -Volume “F:” -Type unoptimization -Memory 50

Then run:

Start-DedupJob -Volume “F:” -Type GarbageCollection -Memory 50

You can then monitor the size of the chunkstore and/or run this command to see the progress of any “dedupjobs” with this command:

Get-dedupejob

dedupejob

Do bare in mind the increased IO and server load while this runs, it maybe best to start this out of hours.  Please also note that this command will only actually re-hydrate your files if dedupe is still enabled.

VMM Host not responding – WinRM Error and access is denied

If you have a Virtual Host in Virtual Machine Manager that is not responding, and forcing a manual refresh returns an error like this:

Error (2910)
VMM does not have appropriate permissions to access the resource C:\Windows\system32\qmgr.dll on the server.
Access is denied (0x80070005)

It can often be remedied by one of the following: Re-install the VMM agent, restart the virtual machine manager agent and WMI services or restart the virtual host.  It is also worth making sure your hosts are all up to date as well.

Occasionally I see a host where this doesn’t work and no matter what it, remains as “not responding” in VMM.  For me the case appears to be a broken winrm configuration.  You can be fooled into thinking winrm is setup correctly as a “winrm /quickconfig” returns as already setup, and the winrm service is running.

It looks like all the “winrm /quickconfig” command does is check that winrm has been enabled, it wont reset other possibly incorrect configurations or broken settings.

Comparing the winrm configuration and registry of a working identical host to a “not responding” host I have found the following commands will correct the deviated settings and usually results in a host that now responds to VMM.

reg add HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System /v LocalAccountTokenFilterPolicy /t REG_DWORD /d 1 /f

winrm set winrm/config/service/auth @{CredSSP=”True”}
winrm set winrm/config/winrs @{AllowRemoteShellAccess=”True”}
winrm set winrm/config/winrs @{MaxMemoryPerShellMB=”2048″}
winrm set winrm/config/client @{TrustedHosts=”*”}
winrm set winrm/config/client/auth @{CredSSP=”True”}

Be sure to run these on the effected host in an admin command prompt.