Wiki: Authoring Articles in Bidirectional Languages

Wiki: Authoring Articles in Bidirectional Languages

 


 

A number of authors have published articles in bidirectional languages such as Persian, Arabic or Hebrew.

As July 2013, there are limitations in the Wiki platform that require workarounds to improve the look of such articles. This article attempts to compile best practices that the community has come up with.


What is Bidirectional Text

English text is always read from left to right (LTR). The general flow Hebrew text is right to left (RTL): text is right justified, bullets appear to the right of headings, etc. But elements such as numbers or non-Hebrew words within it are still read left to right. This means that the direction of reading can change within a block of text. Hence the name "bidirectional text". This Wikipedia article provides more detailed information.

Text is stored by computers as a stream of characters. The characters are stored in logical order and the direction in which the text will actually be rendered is typically determined at display time using an algorithm such as the Unicode Bidirectional Algorithm


What are the challenges in displaying bidirectional text correctly on the TechNet Wiki


Title of article

Symptoms

Incorrect order of words and/or punctuation in the title of the article.
The title appears justified to the left above the article, rather than to the right as the body of the article.

Workaround

The title of the article is entered in a plain text field with no way to over-ride the direction. The only workaround to this currently is to avoid using mixed languages in the title.
There is currently no workaround to the incorrect justification of the title.

Body of article

Symptoms

Incorrect order of words and/or punctuation in the body of an article.

A text editor with full support for bidirectional script will typically allow you to "force" the reading direction for just a paragraph or even a group of words. The current Wiki editor does not have this functionality.

Workaround

The current workaround is to insert information about correct direction using the HTML editor.

Example

In this article, the author is using the following style to ensure the overall flow of the Persian text is right to left, with each paragraph right justified:
<p style="margin: 0cm 0cm 10pt; text-align: justify; unicode-bidi: embed; direction: rtl;" dir="RTL">
But sample code that should display as:

$ListOWKO = Get-ADObject
(Get-ADRootDSE).DefaultNamingContext -Properties otherWellKnownObjects

appeared as:

ListOWKO = Get-ADObject (Get-ADRootDSE).DefaultNamingContext -Properties otherWellKnownObjects$

Workaround
Insert the proper text direction markers (LTR or RTL) in the HTML code manually:

<span style="color: #4f81bd;" dir="LTR"><strong>$ListOWKO = Get-ADObject (Get-ADRootDSE).DefaultNamingContext -Properties otherWellKnownObjects</strong></span><br />

This forces English text within a Persian article to be Left-to-Right, including punctuation, rather than follow the right-to-left direction of the text that surrounded it.

It also works the other way round if you want to force a paragraph to follow Right-to-left rules as a whole, similar to the setting you can use in Word or other editors with full bidi support:

Coding & Direction

Everything written here is based on using certain encoding: UTF-8. The explanation does not apply to other encodings. Different encodings have different behaviors in the context of text direction. For example encoding of Hebrew as ISO-8859-8 (nick name "Visual Hebrew") or encoding as ISO-8859-8-i (nick name "Logical Hebrew") behave in the opposite way! While using Visual Hebrew the word hello in Hebrew will look like ולםש and using the Logical Hebrew the word will look like שלום.  The order of characters is completely opposite (logical Hebrew looks more logical like the way we used to write שלום while visual Hebrew reverses the characters). There are several enodings in each language. For example in Hebrew you can use ISO-8859-8, ISO-8859-8-i, ISO-8859-8-e, Windows-1255, Unicode (like UTF-8)...

WEB developers must remember that they need to inform the browser which encoding should be used to display the page. This is done using META tag like this:
<meta http-equiv="content-type" content="text/html;charset=iso-8859-8-i">

This site and all Microsoft WEB sites that I have checked use UTF-8 encoding.
* If you see a "Gibberish" text that can't be read, probably your browser is trying to display the page using an incorrect encoding. You can change the browser encoding using this guide: https://support.google.com/news/answer/61689?hl=en

References

Farsi/Persian language problem with Wiki Editor - forum discussion started by Patris about formatting issues with mixed Persian/Latin text and formatting of title
Wiki Hebrew Plan - forum discussion started by Ronen about adding support for insertion of direction markers in the UI of the Wiki editor
Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts - Tutorial by Richard Ishida on the W3C site covering basics of using HTML to get correct formatting of bidi text
The bidi algorithm and inline markup - Another article by Richard Ishida on the W3C site



Leave a Comment
  • Please add 7 and 4 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Naomi  N edited Revision 15. Comment: Minor edit

  • Naomi  N edited Revision 13. Comment: Changed name

  • Richard Mueller edited Revision 12. Comment: Minor edits, grammar

  • Richard Mueller edited Revision 11. Comment: Fix TOC

  • pituach edited Revision 10. Comment: Coding & Direction

  • Bruno Lewin - MSFT edited Revision 8. Comment: Clarified and simplified sample

  • Bruno Lewin - MSFT edited Revision 7. Comment: Clarified the issue with titles - also justified to the left rather than right

  • Bruno Lewin - MSFT edited Revision 6. Comment: Added example

  • Bruno Lewin - MSFT edited Revision 5. Comment: Added references

  • Naomi  N edited Revision 3. Comment: Minor grammar corrections, article needs more work

Page 1 of 2 (12 items) 12
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Peter Geelen - MSFT edited Original. Comment: cleaned HTML

  • Peter Geelen - MSFT edited Revision 1. Comment: fixed colors

  • Naomi  N edited Revision 3. Comment: Minor grammar corrections, article needs more work

  • Naomi  N edited Revision 4. Comment: Title case

  • Thanks you for the improvements, Peter and Naomi!

  • Bruno Lewin - MSFT edited Revision 5. Comment: Added references

  • Bruno Lewin - MSFT edited Revision 6. Comment: Added example

  • Bruno Lewin - MSFT edited Revision 7. Comment: Clarified the issue with titles - also justified to the left rather than right

  • Bruno Lewin - MSFT edited Revision 8. Comment: Clarified and simplified sample

  • Hi Bruno

    My name is Ronen, pituach & pitoach It's just a nickname I use for open source communities and forums ("pituach" mean "developer" in Hebrew). there are already some people that start to call me pituach in real life :-)

  • pituach edited Revision 10. Comment: Coding & Direction

  • Richard Mueller edited Revision 11. Comment: Fix TOC

  • Richard Mueller edited Revision 12. Comment: Minor edits, grammar

  • Naomi  N edited Revision 13. Comment: Changed name

  • Naomi  N edited Revision 15. Comment: Minor edit

Page 1 of 1 (15 items)