How to Interpret Job Failures in VMM

How to Interpret Job Failures in VMM

Reviewing VMM traces has become so commonplace that we may tend to forget other information related to the failure. More important than the trace itself is the context of the failure. What was being performed at the time and what error message was recorded for the task? These other sources of information are your starting point. They are what make the trace valuable.

Reading a VMM trace is an art and a science. There are basic methods to follow and items to locate in the trace, but the only way to become proficient is practice, practice, practice. Perform actions on your own machines, gathering a trace at the same time, and then review the trace to learn how the actions you initiated are recorded. What follows are general best practices for resolving customer issues by using all data available. 

Where to begin

 

To understand a failure you need information from various sources. You cannot begin blindly reading a trace, for example, and expect to get far. Prepare yourself with these items:

  • What job was being run at the time of the trace? A P2V, adding a host?
  • Which trace do you have? Depending on the job you will need traces from more than one system. A trace from the VMM management server will always be needed. If performing a P2V a trace will be required from the Source machine and Destination Host as well. You must also know which trace is from which machine.
  • Did the job fail or was it cancelled? The better answer is ‘failed’. A failed job will produce an error in the Admin Console along with a hex error code. This error code is the first item you will search for in the trace starting from the bottom, so it’s important to have.
  • Verify that the trace being reviewed was run during the time that the error occurred.

Tools to Use

Essentially, Notepad.exe is all that is required to view a trace. Unfortunately, Notepad takes a long time to open the large traces created by VMM. In the examples that follow, TextAnalysisTool.Net will be used instead. A copy of all error codes returned by VMM should also be kept on hand for reference.  

System Center 2012 – Virtual Machine Manager (VMM) Error Codes

Interpreting Job Failures

 

When a job fails, select the ‘Jobs’ tab in the console and review the error recorded. There will be an error code, usually a 3 to 5 digit number. Below this will be more specific information that explains what went wrong in plain English (or localized language). There will also be a return code in hexadecimal format and a recommended action if available.

When a job fails, select the ‘Jobs’ tab in the console and review the error recorded. There will be an error code, usually a 3 to 5 digit number. Below this will be more specific information that explains what went wrong in plain English (or localized language). There will also be a return code in hexadecimal format and a recommended action if available.

The error will begin with ‘0x’ followed by eight digits. Often this return code is related to WinRM and if the meaning of the return code is not already provided it can easily be determined. Take the example below:

Error from VMM console 

‘Error (2915),’ though specific, is really not specific enough. Perform a search of the VKB or the Internet and you will find numerous reasons for this error. To use this error effectively let’s dig deeper.

First, read the error as it is presented. ‘The WS-Management service cannot process the request. Object not found on the <servername> server.’ This error also provides a Recommended Action. All of this is useful information:

  • WS-Management is a reference to WinRM. So the issue is likely caused by a condition that led to a general communication failure, which bubbles up in VMM as a WinRM error. Checking the WinRM service on the indicated server would be a good starting place.
  • Object not found on <servername>. Whatever the name of the server provided, this is where you should focus your efforts.
  • The recommended action suggests that the agent be checked, and recommends rebooting <servername>. The only agent used in VMM is the VMM Agent or P2V Agent. A better action than rebooting the server indicated would be to check and restart the VMM Agent service.

From this simple error we have two action plans:

  1. Verify the WinRM service on the remote server.
  2. Verify the VMM Agent on the remote server.

Further, if all servers are reporting this error, it seems likely that the issue may in fact involve the VMM management server itself. Check the services mentioned on the VMM management server, and verify communication with simple WinRM tests explained later in this document under ‘WinRM Troubleshooting’.

Now let’s imagine two things being different regarding this error. First, let’s imagine that there was no explanation for the return code (not too difficult as it did report ‘Unknown error (0x80338000)), and that there was no recommended action. What now? First, look up the original 3 to 5 digit error code, in this case ‘2915.’ Searching ‘Error Codes_VMM R2’ we find the following:

2915

The WS-Management service cannot process the request. Object not found on the %ServerName; server.

Ensure that the agent is installed and running. If the error persists, reboot %ServerName; and then try the operation again.

This represents the code, the message, and the recommended action. So, in this case the recommended action was already provided in the Admin console message, but this is not always the case. Let’s move on.

Resolve the return code. As these are usually WinRM related, start there.

winrm helpmsg 0x80338000

This returns the following:

winrm helpmsg 0x80338000

The WS-Management service cannot process the request. The service cannot find the resource identified by the resource URI and selectors.

Ok, maybe this is not the most useful error message, but it is a bit different than that provided in the Admin Console and may give you a few key terms that return a better result on the Internet. Also, notice that if you did not precede the eight digit code with a ‘0x’ nothing is returned. This is a ‘feature’ of winrm help.

Keep in mind that if winrm returns nothing for the error message, the error probably is not winrm related.  There is one additional trick that can narrow down an error even more. Take the last four digits of an eight digit hex code and run this through ‘net helpmsg’. This is worth testing, but the results are not predictably useful.

A final comment on return codes; you may have noticed that most of the return codes you see begin with ‘0x8’. If the first four digits of the code begin with 8004, 8007, or 80005 (three zeros) you have a WMI related error. Don’t mistake this to mean the error is due to WMI, just that its origin can be determined. Using the table below we see that errors beginning with 80041xxx or 800440xx did in fact originate in WMI, and so WMI should be investigated. Errors beginning with 8007xxx, 80040xxx and 80005xxx originated elsewhere, although they were reported through WMI. Skip WMI and look further for the source of the error in most cases with these. This table is included in the WinRM and WMI appendix module of this training.

 Tip:

Occasionally there will be an error code beginning with ‘-2’ that is ten digits long. These can be converted into regular hexadecimal numbers by entering the number, minus sign included, into calc.exe while in decimal mode, then change the format to hex.

-2147024809 for example becomes FFFFFFFF80070057. Just remove the first eight ‘F’s.

Common ranges of WMI errors

Term

Description

0x800410xx - 0x800440

Errors that originate in WMI itself.

A specific WMI operation failed because of:

  • An error in the request, for example, a WQL query fails or the account does not have the correct permissions.
  • A WMI infrastructure problem, such as incorrect CIM or DCOM registration.

0x8007xxx

Errors originating in the core operating system. WMI may return this type of error because of an external failure, for example, DCOM security failure.

0x80040xxx

Errors originating in DCOM. For example, the DCOM configuration for operations to a remote computer may be incorrect.

0x80005xxx

Error originating from ADSI (Active Directory Service Interfaces) or LDAP (Lightweight Directory Access Protocol), for example, an Active Directory access failure when using the WMI Active Directory providers.

 Still looking for the content you need?

 

System Center 2012 - Virtual Machine Manager (VMM) General Troubleshooting Guide 

Leave a Comment
  • Please add 6 and 6 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Comments
  • Richard Mueller edited Revision 1. Comment: Removed (en-US) from title, added tag

  • Maheshkumar S Tiwari edited Revision 3. Comment: Added Tag

Page 1 of 1 (2 items)
Wikis - Comment List
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Richard Mueller edited Revision 1. Comment: Removed (en-US) from title, added tag

  • Maheshkumar S Tiwari edited Revision 3. Comment: Added Tag

Page 1 of 1 (2 items)