Capacity Planning, Sizing and High Availability for Search in SharePoint 2013: SPC172 (Barry Waldbaum and Olaf Birkeland)

Capacity Planning, Sizing and High Availability for Search in SharePoint 2013: SPC172 (Barry Waldbaum and Olaf Birkeland)

Highlights:

One Search Core:  On-Premises, Office 365 and Exchange 2013
One Installer, One Farm:  multi-tenant as well
Search is a core service:  Building block for ECM, WCM/Internet Business, Productivity Search and Social
Flexible deployment with robust fault tolerance
Major overhaul of the UI
Much easier to configure


Three dimensions in search scaling.  SharePoint 2013 allows independent scaling for:
   Content Volume
   Query Load
   Crawl Load


Query Processing Component(QPC):

CPU Load:  Driving Factors
   QPS
   Query transformations
   Note:  Guideline:  4QPS per CPU core

Network Load:  Driving Factors
   Number of index partitions
   Size of queries and results
   Note:  Example:  20 index partitions @ 20 QPS => 200/100 Mbit/s in/outbound


Index Component:

CPU Load:  Driving Factors
   QPS and Item count
   Note:  Guidelines per index component @ 2GHz CPU
       1M items:   5 QPS per CPU core
       5M items:   2 QPS per CPU core
       10M items: 1 QPS per CPU core


Index Disk IOPS recommendations:

Crawl Load: 
  Typical:  10-60 IOPS @32-512KB writes

Query Load @ 10M items:
  During high crawl rate:  ~30x reads per query
  Without crawl: ~3x reads per query (caching)

Index Merge:
  Concurrent 150MB/s read + 150MB/s write



Crawl Component:


CPU Load:  Driving Factors
  Documents per second
  Link discovery
  Crawl management

Network Load:  Driving Factors
  Downloading items from content sources
  Passing items on to CPC

Disk Load:  Driving Factors
  All documents are temporarily stored in data folder



Content Processing Component(CPC):  


CPU Load:  Driving Factors
  Documents per second
  Document size and complexity
  Feature extraction
  Estimate:  5-10 DPS per CPU core

Network Load:  Driving Factors
  Documents per second
  Document size


Analytics Processing Component(APC):


CPU Load:  Driving Factors
  Number of items
  Site activity

Network Load:  Driving Factors
  Same as for CPU load
  Plus:  Network traffic increases when distributing APC across multiple machines

Disk Load:  Driving Factors
  Local disk used for temporary storage
  Bulk load, primary concern is load isolation



Search Administration Component:


Low CPU and network load
Load increases with more components in the search topology



Components:  Key takeaways:


Split bulk processing from query traffic:  

   Bulk:  Crawl, analytics, content processing
   Query traffic:  index and query processing


Two options for scaling:

   Scale up with more/faster hardware resources
   Scale out with more components across multiple machines


Avoid sharing critical resources: 

   Index is disk intensive and crucial in all load scenarios.
   Consider shared load on network, disk and CPU:
      Within a VM
      Between VM's on same physical host




Small Search Topology Notes:

Windows Server 2012 can host all search components in one VM
The same applies for physical deployment on Windows Server 2008 R2
Windows Server 2008 Hyper-V supports maximum 4 CPU cores per VM



High Availability for Search:

Content Side  High Availability:  Full redundancy in the content feeding chain
Query Side High Availability:      Full redundancy of all query components
Disaster Recovery Options:         Hot, Warm or Cold.  Backup/Restore is now best practices.



Fault-Tolerance:

Indexing Fault-Tolerance:                  Journal Sync
Query Processing Fault-Tolerance:    round robin load balancing:  "lowest load" load balancing, "sticky load" load balancing
Admin Fault-Tolerance:                     Lease(expired?)
Database Fault-Tolerance:                 Database and index files must be in sync.  Supported:  synchronous mirroring. Not Supported:  asynchronous modes and log-shipping.



Backup and Restore:


Index designed for robust backup/restore
Everything but the index is in the database
"Point in time" backup:  No query down time
Backup/restore can make disaster recovery easier


Restore notes:

Restore the whole farm from a backup
   Restore only the SSA:  Entire topology must be restored.  Also, can replace existing topology

Only a single node failure?:
   Add a new node to the farm
   Add the missing SSA components to that node via the topology CMDlets
   Remove the components of the dead node from the topology


NOTE:  Don't forget to recreate your Search Service Application Proxy!!!  Search will not work otherwise.

Disaster Recovery made easy:


Primary and DR should be as similar as possible:  Farm layout, hostnames, database version, directory locations
Test your recovery procedures:  don't want for the failure!




Leave a Comment
  • Please add 6 and 5 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Carsten Siemens edited Revision 3. Comment: Fixed misspellings

  • Richard Mueller edited Revision 2. Comment: Removed extra space in tag "SharePoint  2013", added language tag

Page 1 of 1 (2 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Richard Mueller edited Revision 2. Comment: Removed extra space in tag "SharePoint  2013", added language tag

  • Carsten Siemens edited Revision 3. Comment: Fixed misspellings

Page 1 of 1 (2 items)