| ### Javascript porting of Markus Kuhn's wcwidth() implementation |
| |
| The following explanation comes from the original C implementation: |
| |
| This is an implementation of wcwidth() and wcswidth() (defined in |
| IEEE Std 1002.1-2001) for Unicode. |
| |
| http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html |
| http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html |
| |
| In fixed-width output devices, Latin characters all occupy a single |
| "cell" position of equal width, whereas ideographic CJK characters |
| occupy two such cells. Interoperability between terminal-line |
| applications and (teletype-style) character terminals using the |
| UTF-8 encoding requires agreement on which character should advance |
| the cursor by how many cell positions. No established formal |
| standards exist at present on which Unicode character shall occupy |
| how many cell positions on character terminals. These routines are |
| a first attempt of defining such behavior based on simple rules |
| applied to data provided by the Unicode Consortium. |
| |
| For some graphical characters, the Unicode standard explicitly |
| defines a character-cell width via the definition of the East Asian |
| FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes. |
| In all these cases, there is no ambiguity about which width a |
| terminal shall use. For characters in the East Asian Ambiguous (A) |
| class, the width choice depends purely on a preference of backward |
| compatibility with either historic CJK or Western practice. |
| Choosing single-width for these characters is easy to justify as |
| the appropriate long-term solution, as the CJK practice of |
| displaying these characters as double-width comes from historic |
| implementation simplicity (8-bit encoded characters were displayed |
| single-width and 16-bit ones double-width, even for Greek, |
| Cyrillic, etc.) and not any typographic considerations. |
| |
| Much less clear is the choice of width for the Not East Asian |
| (Neutral) class. Existing practice does not dictate a width for any |
| of these characters. It would nevertheless make sense |
| typographically to allocate two character cells to characters such |
| as for instance EM SPACE or VOLUME INTEGRAL, which cannot be |
| represented adequately with a single-width glyph. The following |
| routines at present merely assign a single-cell width to all |
| neutral characters, in the interest of simplicity. This is not |
| entirely satisfactory and should be reconsidered before |
| establishing a formal standard in this area. At the moment, the |
| decision which Not East Asian (Neutral) characters should be |
| represented by double-width glyphs cannot yet be answered by |
| applying a simple rule from the Unicode database content. Setting |
| up a proper standard for the behavior of UTF-8 character terminals |
| will require a careful analysis not only of each Unicode character, |
| but also of each presentation form, something the author of these |
| routines has avoided to do so far. |
| |
| http://www.unicode.org/unicode/reports/tr11/ |
| |
| Markus Kuhn -- 2007-05-26 (Unicode 5.0) |
| |
| Permission to use, copy, modify, and distribute this software |
| for any purpose and without fee is hereby granted. The author |
| disclaims all warranties with regard to this software. |
| |
| Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c |
| |
| |
| |