SharePoint 2013: Crawl Scaling Recommendations

SharePoint 2013: Crawl Scaling Recommendations

In SharePoint 2013 Search, crawling, filtering, and indexing are no longer tied to a single component (i.e. the Crawler in SharePoint 2010). The Crawl Component in 2013 is only responsible for downloading documents ("gathering") and feeding these to the Content Process Component(s).

By offloading the filtering ("content processing") and indexing tasks, the crawler is no longer I/O or CPU intensive, and does not need to be scaled past a couple of components (for fault tolerance and network throughput to content sources). With 2x1Gbit/s connections, content farms are likely to be the bottleneck, rather than the crawler itself. Host distribution rules are also gone (see http://blogs.msdn.com/b/sharepoint_strategery/archive/2013/06/30/why-host-distribution-rules-dont-apply-to-sharepoint-2013.aspx), but due to the new search architecture, not needed either. Likewise, with the architectural changes, Crawl DBs are added just for content volume now, not crawl performance.

This will sound strange to those coming from a SharePoint Search background (e.g. http://blogs.msdn.com/b/russmax/archive/2010/04/16/search-2010-architecture-and-scale-part-1-crawl.aspx), but familiar to those coming from FAST Search for SharePoint 2010. As in FS4SP, crawl performance is scaled up primarily by increasing the number of Content Processing Components (analogous to procservers in FS4SP, with contentdistributor and indexingdispatcher functionally rolled in). The CPCs also scale on their own, based on CPU availability, up to a limit (default is good for up to 12 cores - http://technet.microsoft.com/en-us/library/cc262787.aspx#Search).
For most SharePoint content, a CPC will process 5-10 items per second per core. So for example, on an 8 core server, with an Admin Component, Crawl Component, and a Content Processing Component (with ~6 cores to itself), you might see a crawl rate of ~45 items per second (e.g. 6 cores at an average 7.5 items per second), assuming no content source or index bottleneck.

The 2013 Index Component builds the shadow index, does merging, and propagates the index journal to replicas (other Index Components in the same Partition). So the I/O considerations for this component are more in line with FS4SP's rather than SharePoint 2010's indexing, since there is no longer a property store DB, and all indexes are built & stored locally by the index component. During small crawls/shadow index builds, the Index Component utilizes small writes (~256kB) sustained at a rate of 100 IOPS. For handling queries, the component utilizes small reads (~64kB), with about 30 IOPS per query. To support 10 QPS at low latency for example, the storage subsystem would need to be capable of 300 IOPS for 64kB reads.

If a crawl happens to change more than 10% of current indexed items, a master merge will be triggered, leading to large reads & writes by the indexer (~100MB per operation), which can cause the performance of both small writes (shadow index) and small reads (queries) to drop. For this reason, the documented recommendation of a separate disk (http://technet.microsoft.com/en-us/library/jj219628.aspx) is really more of a requirement for production environment, to ensure that performance is acceptable even during master merge.

So to the summarize the 2013 crawl performance scaling story:

  1. Response time from content sources and network bandwidth to content sources
  2. CPU resources for the Content Processing Component
  3. I/O resources for the Index Component

 The I/O requirements described above lower than in FAST Search for SharePoint 2010. The new minimums are as follows:
  • 256 KB write – 100 IOPS [shadow index]
  • 64 KB read – 300 IOPS [10 queries per second]
  • 100 MB read – 200 MB/s [master merge]
  • 100 MB write – 200 MB/s [master merge]


The TechNet documentation related to this topic in 2010 can be found here: http://technet.microsoft.com/en-us/library/gg604775(v=office.14).aspx

Leave a Comment
  • Please add 5 and 2 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Naomi  N edited Revision 6. Comment: Title case

  • Dan Pandre edited Revision 3. Comment: Fleshed out with other crawl performance data

Page 1 of 1 (2 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Dan Pandre edited Revision 3. Comment: Fleshed out with other crawl performance data

  • Naomi  N edited Revision 6. Comment: Title case

Page 1 of 1 (2 items)