SharePoint 2007: Best Practices for Using a Dedicated Front-End Web Server for Crawling

SharePoint 2007: Best Practices for Using a Dedicated Front-End Web Server for Crawling



By default, Microsoft Office SharePoint Server 2007 uses all of the front-end Web servers in a server farm to crawl content in the farm. When a farm is configured in this way, crawler behavior depends on the number of front-end Web servers in the farm. If the farm has only one front-end Web server, the index server sends get requests directly to that server. If the farm has multiple front-end Web servers, the index server sends get requests to the network load balancer, which forwards each request to one of the front-end Web servers. (If a server farm has more than one front-end Web server, the farm must use a network load balancer to distribute user content requests across the front-end Web servers.) Over time, the network load balancer spreads requests across all front-end Web servers. When a front-end Web server receives a content request, it gets the content from content databases that are associated with the SharePoint sites that are being crawled and returns that content to the index server.

Performance issues caused by using all front-end Web servers for crawling

Using all front-end Web servers for crawling in a farm can work well for small to medium-size organizations. Large organizations, however, tend to crawl more content. Such organizations might crawl gigabytes or even terabytes of content. Crawling content in a farm can cause surges in network traffic and can place considerable demands on front-end Web server resources such as the disk, processors, and memory. Crawling a large amount of content can produce more network traffic with the farm’s front-end Web servers than all user requests combined. This traffic can adversely affect the performance of all front-end Web servers in the farm and thereby decrease response times for end-user requests for SharePoint site content.

Recommended solution

We recommend that you use a dedicated front-end Web server for crawling, especially if crawling content is producing more traffic on the front-end Web servers than user requests. You can specify any front-end Web server in your farm for crawling. However, for best performance, we recommend that you configure the index server as the dedicated front-end Web server for crawling if the index server has the capacity for both roles. By using the same computer as both the index server and dedicated front-end Web server, you eliminate the need for the index server to send requests to a different computer when crawling content. This reduces overall network traffic and improves crawl performance.

We also recommend that you do not include the dedicated front-end Web server in the network load balancing rotation for incoming user requests for content. Otherwise, user requests that the network load balancer directs to the dedicated front-end Web server for crawling might be subjected to inconsistent performance.

When not to configure a dedicated front-end Web server for crawling

Do not configure a dedicated front-end Web server for crawling under any of the following conditions:

  • Another application (such as the Excel Calculation service) is running on the index server. Configuring a dedicated front-end Web server for crawling might prevent that application from communicating with other servers in the farm.

    If other applications are running on the index server, move those applications to another application server before configuring a dedicated front-end Web server for crawling.

  • You want to use the index server as the dedicated front-end Web server for crawling and the index server is also configured as a query server.

  • The NetBios name of your query server is also the host name of your SharePoint site.

In either of the preceding two cases, configuring a dedicated front-end Web server for crawling can prevent the index server from propagating the index to another server.

About configuring a dedicated front-end Web server for crawling

There are two ways to configure a dedicated front-end Web server for crawling:

  • Use the Configure Office SharePoint Server Search Service Settings page in Central Administration.

  • Update the Hosts file directly.

Before you configure a dedicated front-end Web server for crawling, we recommend that you read the following section to determine which configuration method to use.

How the Hosts file is affected when you use the user interface to configure a dedicated front-end Web server for crawling

When crawling content, Office SharePoint Server 2007 reads the Hosts file on the index server to determine whether to use all front-end Web servers for crawling (the default), or to use a dedicated front-end Web server for crawling.

When you use the Configure Office SharePoint Server Search Service Settings page in Central Administration to select a dedicated front-end Web server for crawling, the SharePoint timer service writes the following entries to the Hosts file:

  • One entry that specifies the IP address and the computer name of the front-end Web server.

  • One entry for each Web application on the front-end Web server that you configured to use a host header. Each such entry specifies the IP address of the front-end Web server, followed by the host header.

Each entry is on a separate line in the Hosts file, like this:

111.11.111.111 MyMossMachine #Added by Office SharePoint Server Search (7/15/2008 2:56 PM).

111.11.111.111 Marketing #Added by Office SharePoint Server Search (7/15/2008 2:56 PM).

111.11.111.111 Human Resources #Added by Office SharePoint Server Search (7/15/2008 2:57 PM).

Possible problems

In some cases, the timer service writes the incorrect IP address to your Hosts file. (For more information, see the blog post at http://go.microsoft.com/fwlink/?LinkId=135698.) This can cause problems ranging from inability to crawl content to inability to view sites, such as the Search Services Provider (SSP) or Central Administration site. The timer service can add an incorrect IP address to the Hosts file in cases such as the following:

  • The server that you specified as your dedicated front-end Web server for crawling has multiple IP addresses assigned to one or more network cards.

  • Your server farm is using network load balancing.

If either of these conditions is true, we recommend that you add the entries to the Hosts file directly instead of using the user interface to specify a dedicated front-end Web server for crawling.

Important: When you use the Configure Office SharePoint Server Search Service Settings page in Central Administration to specify a dedicated front-end Web server for crawling, you cannot change the Hosts file manually if the timer service adds the wrong IP address. This is because the timer service repeatedly overwrites the entries in the Hosts file every few minutes. If this occurs, use the Configure Office SharePoint Server Search Service Settings page in Central Administration to specify that all front-end Web servers are used for crawling, and then remove the entries in the Hosts file that were made by the timer service.

To configure a dedicated front-end Web server for crawling, perform one of the following procedures:

Further Reading

Also check out the SharePoint 2010 Best Practices page at http://social.technet.microsoft.com/wiki/contents/articles/8666.sharepoint-2010-best-practices-en-us.aspx

Leave a Comment
  • Please add 6 and 1 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Gokan Ozcifci edited Revision 5. Comment: Title change  

  • Fernando Lugão Veltem edited Revision 3. Comment: remove en-us from title and added toc

  • Margriet Bruggeman edited Revision 2. Comment: linked to best practices page

  • Craig Lussier edited Revision 1. Comment: added en-US to tags and title

Page 1 of 1 (4 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Craig Lussier edited Revision 1. Comment: added en-US to tags and title

  • Margriet Bruggeman edited Revision 2. Comment: linked to best practices page

  • Fernando Lugão Veltem edited Revision 3. Comment: remove en-us from title and added toc

  • Gokan Ozcifci edited Revision 5. Comment: Title change  

Page 1 of 1 (4 items)