Revision #2

You are currently reviewing an older revision of this page.
Go to current version

When crawling content in Office SharePoint Server 2007 search, the crawler determines each individual word in the content it finds. Languages that have words separated by spaces make it relatively easy for the crawler to distinguish each word. In other languages, finding the boundary between words can be more complex.

Office SharePoint Server 2007 provides word breakers and stemmers by default to help crawl and index content in many languages. Word breakers find word boundaries in full-text indexed data, while stemmers conjugate verbs.

If you are crawling any of the languages listed below, Office SharePoint Server 2007 automatically uses the appropriate word breaker and stemmer for that language. An asterisk (*) indicates that the stemming feature is on by default.

  • Arabic
  • Bengali
  • Bulgarian*
  • Catalan
  • Croatian
  • Czech*
  • Danish
  • Dutch
  • English
  • Finnish*
  • French*
  • German*
  • Greek*
  • Gujarati
  • Hebrew
  • Hindi
  • Hungarian*
  • Icelandic*
  • Indonesian
  • Italian
  • Japanese
  • Kannada*
  • Korean
  • Latvian*
  • Lithuanian*
  • Malay
  • Malayalam*
  • Marathi
  • Norwegian_Bokmaal
  • Polish*
  • Portuguese
  • Portuguese_Brazilian
  • Punjabi
  • Romanian*
  • Russian*
  • Serbian_Cyrillic*
  • Serbian_Latin*
  • Slovak*
  • Slovenian*
  • Spanish*
  • Swedish
  • Tamil*
  • Telugu*
  • Thai
  • Turkish*
  • Ukrainian*
  • Urdu*
  • Vietnamese

When the crawler indexes content for a language that is not supported, the neutral breaker is used. If the neutral breaker does not give you the results you expect, you can try third-party solutions that work with Office SharePoint Server 2007.

As a best practice, be sure that you install the appropriate word breaker and stemmer for each of the languages that you need to support. Word breakers and stemmers must be installed on all of the servers that are running the Office SharePoint Server Search service.

For more information about word breakers and stemmers, see Plan for multilingual sites.

Revert to this revision