Troubleshooting UNIX/Linux Agent Discovery in System Center 2012 Operations Manager

Troubleshooting UNIX/Linux Agent Discovery in System Center 2012 Operations Manager

When performed by the Discovery Wizard or PowerShell cmdlets, discovery of UNIX and Linux agents in Operations Manager typically involves three activities:

1) Agent deployment
2) Certificate signing
3) Discovery of the agent

Agent deployment and certificate signing are performed using ssh, but these steps can also be performed manually.  The final agent discovery is performed using a WS-Management query to the deployed agent. The discovery process may fail due to configuration issues, credential or privilege problems, or network and name resolution problems.

This document describes common errors that may be encountered during the discovery process, with potential causes and resolution steps.

Certificate Errors/Certificate Signing Errors

Network/Name Resolution Errors

SSH Connectivity Errors

WSMan Connectivity Errors

Other Errors


 

Certificate Errors/Certificate Signing Errors

 

Signed certificate verification operation was not successful


Error Description

Agent verification failed. Error detail: The server certificate on the destination computer (lx1.contoso.com:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.    
The SSL certificate contains a common name (CN) that does not match the hostname.    
It is possible that:
   1. The destination certificate is signed by another certificate authority not trusted by the management server.
   2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection.  The FQDN used for the connection is: lx1.contoso.com.
   3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

Possible Causes

  • The agent certificate’s CN value does not match the provided or resolved Fully-Qualified Domain name

Resolutions

  • For certificate CN failures, confirm that that agent host’s hostname and domain name match the Fully-Qualified Domain Name resolved through DNS.  More information can be found here.

Error Description

The server certificate on the destination computer (lx1.contoso.com:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.    
The SSL certificate contains a common name (CN) that does not match the hostname.
It is possible that:
   1. The destination certificate is signed by another certificate authority not trusted by the management server. 
   2. The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection.  The FQDN used for the connection is: lx1.contoso.com.
   3. The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.

Possible Causes

  • The certificate has been signed by untrusted authority, when multiple Management Servers are members of the Resource Pool used for discovery, but certificate trust has not been configured between the Management Servers.

Resolutions

  • Confirm that all Management Servers in the Resource Pool used for Discovery trust each other server’s certificate. More information can be found here.


Certificate signing operation was not successful


Possible Causes 

  • The user account specified for discovery has insufficient privileges to perform file operations involved in signing
  • Sudo elevation privileges for the user account specified for discovery was not correctly configured

Resolutions

  • Inspect the StdErr output in the error details to identify the cause of the failure
  • Verify sudo privilege configuration for the account used for certificate signing 

Network/Name Resolution Errors


The target address is not resolvable


Error Description 

Failed to resolve IP address 192.168.25.25 to name


Possible Causes 

  • An IP Address for the host was entered for discovery, but is not resolvable to name in DNS (reverse lookup)

Resolutions

  • Correct name resolution (DNS) configuration for reverse lookup

   

Error Description

Failed to resolve name lxbad.test.com to IP address  


Possible Causes

  • An FQDN for the host was entered for discovery, but is not resolvable to IP Address in DNS (forward lookup)

Resolutions

  • Correct name resolution (DNS) configuration for forward lookup

DNS configuration: forward DNS resolution does not match reverse DNS resolution


Error Description 

The provided hostname host1 resolved to the IP address of 10.137.216.102. The hostname lx1.contoso.com returned by reverse lookup of the IP address 10.137.216.102 did not match the provided hostname. Verify the DNS configuration and try the request again.


Possible Causes 

  • Forward and reverse DNS lookups do not match

Resolutions

  • Correct name resolution (DNS) configuration for forward and/or reverse lookup


The target address is unreachable


Error Description 

The WinRM client cannot complete the operation within the time specified. Check if the machine name is valid and is reachable over the network and firewall exception for Windows Remote Management service is enabled.


Possible Causes 

  • The host is unreachable due to incorrect name resolution, network outage, or host outage
  • A network or host-based firewall is blocking TCP port 1270 connectivity to the target host

Resolutions

  • Verify that Management Server can ping the agent host by Fully-Qualified Domain Name
  • Verify that no network firewalls or host firewall is blocking TCP port 1270

   

SSH Connectivity Errors

SSH connection error 


Error Description 

Failed during SSH discovery. Exit code: -1073479162
Standard Output:
Standard Error:
Exception Message:An exception (-1073479162) caused the SSH command to fail - No connection could be made because the target machine actively refused it.


Possible Causes 

  • The ssh daemon is not running on the target system
  • A network or host-based firewall is preventing ssh connections on TCP port 22

Resolutions

  • Verify that the ssh daemon is running
  • Verity that no network firewalls or host firewall is blocking TCP port 22

   

Error Description

Failed during SSH discovery. Exit code: -1073479118
Standard Output:
Standard Error:
Exception Message:An exception (-1073479118) caused the SSH command to fail - Server sent disconnect message: type 2 (protocol error : Too many authentication failures for root)


Possible Causes 

  • The user account specified for discovery is not permitted to login via ssh.
  • The user account specified for discovery was input with an invalid username or password

Resolutions

  • Verify that the user is permitted to login via ssh
  • Verify the input credentials and that the user is defined on the target host

   

Error Description

Failed during SSH discovery. Exit code: 1
Standard Output: Sudo path: /usr/bin/
Standard Error: sudo: sorry, you must have a tty to run sudo
Exception Message:


Possible Causes 

  • Sudo elevation was selected in the user credential input, but the requiretty option was not disabled for the user in sudoers.

Resolutions

  • Edit the sudoers file on the target host (using the visudo command) and add the following line, replacing “” with the name of the user account specified for discovery:
    Defaults: !requiretty
    More information is available here.                            

   

Error Description

.[?1034hopsuser@lx1:~> su - root -c 'sh /tmp/scx-opsuser/GetOSVersion.sh; EC=$?; rm -rf
/tmp/scx-opsuser; exit $EC'
Password:
exit
su: incorrect password
opsuser@lx1:~> exit
logout

Possible Causes

  • Su elevation was selected in the user credential input, but an invalid root password was provided for su elevation

Resolutions

  • Verify the password input for root in the Elevation configuration dialog

   

Error Description

Failed during SSH discovery. Exit code: -2147221248
Standard Output:
Standard Error: Could not chdir to home directory /home/username: No such file or directory


Possible Causes 

  • The user account specified for discovery does not have a home directory

Resolutions

  • Verify that the user has a home directory at: /home/ and that the user is able to write to this directory

   

Error Description

Failed during SSH discovery. Exit code: -2147221248
Standard Output:
Standard Error: root's password:
Exception Message:Operation timed out
 


Possible Causes 

  • Sudo elevation was selected in the user credential input, but the user account specified for discovery is not correctly configured to use passwordless sudo elevation or the required sudo elevation privileges were not granted for the user account used in discovery.

Resolutions

  • Review sudo elevation configuration documentation and verify user configuration for sudo. Note that passwordless sudo must be configured.

WSMan Connectivity Errors


Invalid credentials    


Error Description

The agent responded to the request but the WSMan connection failed due to :  Access is Denied.

 

Possible Causes

  • The agent is installed, and the agent certificate has been signed, but the user credential provided for agent verification is invalid.
  • The user account specified for discovery was configured to authenticate with an ssh key, but the user credential provided for agent verification is invalid.

Resolutions

  • Verify that the username and password for agent verification were input correctly and that the user is a valid user on the target host.

The target address is unreachable


Error Description
WSMan Only Discovery failed for 192.168.1.30

Possible Causes

  • The Discovery Type option was set to “Only computers with an installed agent and signed certificate,” the target host has the agent installed, but the target host certificate has not been signed.   In order to use the WSMan-only “Only computers with an installed agent and signed certificate” option, the agent must be installed and the certificate manually signed.
  • The Discovery Type option was set to  “Only computers with an installed agent and signed certificate,” but the target host does not have the UNIX/Linux agent currently installed.
  • The Discovery Type option was set to  “Only computers with an installed agent and signed certificate,” but the UNIX/Linux agent is not currently running.
  • The Discovery Type option was set to  “Only computers with an installed agent and signed certificate,” but the target host is unreachable, a network or host-based firewall is preventing connectivity, or the UNIX/Linux agent is currently down.

Resolutions

  • Manually sign the certificate
  • Verify that the UNIX/Linux agent has been installed
  • Change the option to “Discover all computers” to allow the Discovery Wizard to perform the certificate signing
  • Verify that the UNIX/Linux agent is running and that the target host is reachable
  • Verify that no network firewalls or host firewall is preventing access on TCP port 1270 

 


Other Errors


Agent deployment operation was not successful


Error Description

The task cannot be executed against the object(s) because the target of the task does not match any of the classes of the object.


Possible Causes

  • In a System Center 2012 – Operations Manager management group, the UNIX/Linux management packs imported are Operations Manager 2007 R2 versions.

Resolutions

  • Import the System Center 2012 versions of the UNIX/Linux operating system management packs. 

No actions are available

 

Error Description:

The agent is installed and the computer is already being monitored by Operations Manager.

Possible Causes

  • The target host has already been discovered in this Management Group

Resolution

  • No action is required.  Agent upgrade or migration to an alternate resource pool can be performed from the UNIX/Linux Servers view in the Administration pane of the Operations Console

Platform not supported


Error Description

Failed to find a matching supported agent instance in the imported management packs.
Import the Management Pack(s) for this platform in order to discover this computer.

 

Possible Causes

  • The target host is running an unsupported operating system.
  • The correct management pack for the target host’s operating system has not been imported.
  • The correct management pack for the operating system has recently been imported, and has not yet fully loaded.

Resolutions

  • Confirm that the target host is running a supported operating system. 
  • Import the management pack for the target host’s operating system and version
  • If the management pack was just imported, it may still be loading.  Wait a few minutes and rerun discovery.


Pool Not Initialized


Error Description

Unable to enumerate Installable agent types.  The associated resource pool may still be initializing. If you had selected a newly created resource pool, please wait a few minutes before using it.


Possible Causes

  • The Resource Pool used in discovery is not healthy (a majority of member servers are offline).
  • The Resource Pool used in discovery was recently created, but has not fully initialized.

Resolutions

  • If the Resource Pool used in discovery was recently created, retry the discovery after several minutes to allow the pool to initialize.
  • Otherwise, check the Operations Manager Event Log on the servers that are members of the Resource Pool used for discovery for indications of problems.

 < en-US, Linux, Operations Manager 2012, System Center 2012, Unix, OpsMgr 2012, SCOM 2012 >

Leave a Comment
  • Please add 5 and 7 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
Page 1 of 1 (3 items)