Word Breakers and Stemmers by Language in Office SharePoint Server 2007

Word Breakers and Stemmers by Language in Office SharePoint Server 2007

When crawling content in Office SharePoint Server 2007 search, the crawler determines each individual word in the content it finds. Languages that have words separated by spaces make it relatively easy for the crawler to distinguish each word. In other languages, finding the boundary between words can be more complex.

Office SharePoint Server 2007 provides word breakers and stemmers by default to help crawl and index content in many languages. Word breakers find word boundaries in full-text indexed data, while stemmers conjugate verbs.

If you are crawling any of the languages listed below, Office SharePoint Server 2007 automatically uses the appropriate word breaker and stemmer for that language. An asterisk (*) indicates that the stemming feature is on by default.

  • Arabic
  • Bengali
  • Bulgarian*
  • Catalan
  • Croatian
  • Czech*
  • Danish
  • Dutch
  • English
  • Finnish*
  • French*
  • German*
  • Greek*
  • Gujarati
  • Hebrew
  • Hindi
  • Hungarian*
  • Icelandic*
  • Indonesian
  • Italian
  • Japanese
  • Kannada*
  • Korean
  • Latvian*
  • Lithuanian*
  • Malay
  • Malayalam*
  • Marathi
  • Norwegian_Bokmaal
  • Polish*
  • Portuguese
  • Portuguese_Brazilian
  • Punjabi
  • Romanian*
  • Russian*
  • Serbian_Cyrillic*
  • Serbian_Latin*
  • Slovak*
  • Slovenian*
  • Spanish*
  • Swedish
  • Tamil*
  • Telugu*
  • Thai
  • Turkish*
  • Ukrainian*
  • Urdu*
  • Vietnamese

When the crawler indexes content for a language that is not supported, the neutral breaker is used. If the neutral breaker does not give you the results you expect, you can try third-party solutions that work with Office SharePoint Server 2007.

As a best practice, be sure that you install the appropriate word breaker and stemmer for each of the languages that you need to support. Word breakers and stemmers must be installed on all of the servers that are running the Office SharePoint Server Search service.

For more information about word breakers and stemmers, see Plan for multilingual sites.

Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Richard Mueller edited Revision 4. Comment: Removed (en-US) from title, added tag

  • Craig Lussier edited Revision 2. Comment: added en-US to tags and title

Page 1 of 1 (2 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Richard Mueller edited Revision 4. Comment: Removed (en-US) from title, added tag

  • nice article

  • nice article

  • Craig Lussier edited Revision 2. Comment: added en-US to tags and title

Page 1 of 1 (4 items)