Reviewing VMM traces has become so commonplace that we may tend to forget other information related to the failure. More important than the trace itself is the context of the failure. What was being performed at the time and what error message was recorded for the task? These other sources of information are your starting point. They are what make the trace valuable.
Reading a VMM trace is an art and a science. There are basic methods to follow and items to locate in the trace, but the only way to become proficient is practice, practice, practice. Perform actions on your own machines, gathering a trace at the same time, and then review the trace to learn how the actions you initiated are recorded. What follows are general best practices for resolving customer issues by using all data available.
To understand a failure you need information from various sources. You cannot begin blindly reading a trace, for example, and expect to get far. Prepare yourself with these items:
Essentially, Notepad.exe is all that is required to view a trace. Unfortunately, Notepad takes a long time to open the large traces created by VMM. In the examples that follow, TextAnalysisTool.Net will be used instead. A copy of all error codes returned by VMM should also be kept on hand for reference. System Center 2012 – Virtual Machine Manager (VMM) Error Codes
When a job fails, select the ‘Jobs’ tab in the console and review the error recorded. There will be an error code, usually a 3 to 5 digit number. Below this will be more specific information that explains what went wrong in plain English (or localized language). There will also be a return code in hexadecimal format and a recommended action if available.
The error will begin with ‘0x’ followed by eight digits. Often this return code is related to WinRM and if the meaning of the return code is not already provided it can easily be determined. Take the example below:
Error from VMM console
‘Error (2915),’ though specific, is really not specific enough. Perform a search of the VKB or the Internet and you will find numerous reasons for this error. To use this error effectively let’s dig deeper.
First, read the error as it is presented. ‘The WS-Management service cannot process the request. Object not found on the <servername> server.’ This error also provides a Recommended Action. All of this is useful information:
From this simple error we have two action plans:
Further, if all servers are reporting this error, it seems likely that the issue may in fact involve the VMM management server itself. Check the services mentioned on the VMM management server, and verify communication with simple WinRM tests explained later in this document under ‘WinRM Troubleshooting’.
Now let’s imagine two things being different regarding this error. First, let’s imagine that there was no explanation for the return code (not too difficult as it did report ‘Unknown error (0x80338000)), and that there was no recommended action. What now? First, look up the original 3 to 5 digit error code, in this case ‘2915.’ Searching ‘Error Codes_VMM R2’ we find the following:
2915
The WS-Management service cannot process the request. Object not found on the %ServerName; server.
Ensure that the agent is installed and running. If the error persists, reboot %ServerName; and then try the operation again.
This represents the code, the message, and the recommended action. So, in this case the recommended action was already provided in the Admin console message, but this is not always the case. Let’s move on.
Resolve the return code. As these are usually WinRM related, start there.
winrm helpmsg 0x80338000
This returns the following:
The WS-Management service cannot process the request. The service cannot find the resource identified by the resource URI and selectors.
Ok, maybe this is not the most useful error message, but it is a bit different than that provided in the Admin Console and may give you a few key terms that return a better result on the Internet. Also, notice that if you did not precede the eight digit code with a ‘0x’ nothing is returned. This is a ‘feature’ of winrm help.
Keep in mind that if winrm returns nothing for the error message, the error probably is not winrm related. There is one additional trick that can narrow down an error even more. Take the last four digits of an eight digit hex code and run this through ‘net helpmsg’. This is worth testing, but the results are not predictably useful.
A final comment on return codes; you may have noticed that most of the return codes you see begin with ‘0x8’. If the first four digits of the code begin with 8004, 8007, or 80005 (three zeros) you have a WMI related error. Don’t mistake this to mean the error is due to WMI, just that its origin can be determined. Using the table below we see that errors beginning with 80041xxx or 800440xx did in fact originate in WMI, and so WMI should be investigated. Errors beginning with 8007xxx, 80040xxx and 80005xxx originated elsewhere, although they were reported through WMI. Skip WMI and look further for the source of the error in most cases with these. This table is included in the WinRM and WMI appendix module of this training.
Tip:
Occasionally there will be an error code beginning with ‘-2’ that is ten digits long. These can be converted into regular hexadecimal numbers by entering the number, minus sign included, into calc.exe while in decimal mode, then change the format to hex.
-2147024809 for example becomes FFFFFFFF80070057. Just remove the first eight ‘F’s.
Common ranges of WMI errors
Term
Description
0x800410xx - 0x800440
Errors that originate in WMI itself.
A specific WMI operation failed because of:
0x8007xxx
Errors originating in the core operating system. WMI may return this type of error because of an external failure, for example, DCOM security failure.
0x80040xxx
Errors originating in DCOM. For example, the DCOM configuration for operations to a remote computer may be incorrect.
0x80005xxx
Error originating from ADSI (Active Directory Service Interfaces) or LDAP (Lightweight Directory Access Protocol), for example, an Active Directory access failure when using the WMI Active Directory providers.
System Center 2012 - Virtual Machine Manager (VMM) General Troubleshooting Guide
Maheshkumar S Tiwari edited Revision 3. Comment: Added Tag
Richard Mueller edited Revision 1. Comment: Removed (en-US) from title, added tag