gxvalid: TrueType GX validator | |
============================== | |
1. What is this | |
--------------- | |
`gxvalid' is a module to validate TrueType GX tables: a collection of | |
additional tables in TrueType font which are used by `QuickDraw GX | |
Text', Apple Advanced Typography (AAT). In addition, gxvalid can | |
validates `kern' tables which have been extended for AAT. Like the | |
otvalid module, gxvalid uses Freetype 2's validator framework | |
(ftvalid). | |
You can link gxvalid with your program; before running your own layout | |
engine, gxvalid validates a font file. As the result, you can remove | |
error-checking code from the layout engine. It is also possible to | |
use gxvalid as a stand-alone font validator; the `ftvalid' test | |
program included in the ft2demo bundle calls gxvalid internally. | |
A stand-alone font validator may be useful for font developers. | |
This documents documents the following issues. | |
- supported TrueType GX tables | |
- fundamental validation limitations | |
- permissive error handling of broken GX tables | |
- `kern' table issue. | |
2. Supported tables | |
------------------- | |
The following GX tables are currently supported. | |
bsln | |
feat | |
just | |
kern(*) | |
lcar | |
mort | |
morx | |
opbd | |
prop | |
trak | |
The following GX tables are currently unsupported. | |
cvar | |
fdsc | |
fmtx | |
fvar | |
gvar | |
Zapf | |
The following GX tables won't be supported. | |
acnt(**) | |
hsty(***) | |
The following undocumented tables in TrueType fonts designed for Apple | |
platform aren't handled either. | |
addg | |
CVTM | |
TPNM | |
umif | |
*) The `kern' validator handles both the classic and the new kern | |
formats; the former is supported on both Microsoft and Apple | |
platforms, while the latter is supported on Apple platforms. | |
**) `acnt' tables are not supported by currently available Apple font | |
tools. | |
***) There is one more Apple extension, `hsty', but it is for | |
Newton-OS, not GX (Newton-OS is a platform by Apple, but it can | |
use sfnt- housed bitmap fonts only). Therefore, it should be | |
excluded from `Apple platform' in the context of TrueType. | |
gxvalid ignores it as Apple font tools do so. | |
We have checked 183 fonts bundled with MacOS 9.1, MacOS 9.2, MacOS | |
10.0, MacOS X 10.1, MSIE for MacOS, and AppleWorks 6.0. In addition, | |
we have checked 67 Dynalab fonts (designed for MacOS) and 189 Ricoh | |
fonts (designed for Windows and MacOS dual platforms). The number of | |
fonts including TrueType GX tables are as follows. | |
bsln: 76 | |
feat: 191 | |
just: 84 | |
kern: 59 | |
lcar: 4 | |
mort: 326 | |
morx: 19 | |
opbd: 4 | |
prop: 114 | |
trak: 16 | |
Dynalab and Ricoh fonts don't have GX tables except of `feat' and | |
`mort'. | |
3. Fundamental validation limitations | |
------------------------------------- | |
TrueType GX provides layout information to libraries for font | |
rasterizers and text layout. gxvalid can check whether the layout | |
data in a font is conformant to the TrueType GX format specified by | |
Apple. But gxvalid cannot check a how QuickDraw GX/AAT renderer uses | |
the stored information. | |
3-1. Validation of State Machine activity | |
----------------------------------------- | |
QuickDraw GX/AAT uses a `State Machine' to provide `stateful' layout | |
features, and TrueType GX stores the state transition diagram of | |
this `State Machine' in a `StateTable' data structure. While the | |
State Machine receives a series of glyph IDs, the State Machine | |
starts with `start of text' state, walks around various states and | |
generates various layout information to the renderer, and finally | |
reaches the `end of text' state. | |
gxvalid can check essential errors like: | |
- possibility of state transitions to undefined states | |
- existence of glyph IDs that the State Machine doesn't know how | |
to handle | |
- the State Machine cannot compute the layout information from | |
given diagram | |
These errors can be checked within finite steps, and without the | |
State Machine itself, because these are `expression' errors of state | |
transition diagram. | |
There is no limitation about how long the State Machine walks | |
around, so validation of the algorithm in the state transition | |
diagram requires infinite steps, even if we had a State Machine in | |
gxvalid. Therefore, the following errors and problems cannot be | |
checked. | |
- existence of states which the State Machine never transits to | |
- the possibility that the State Machine never reaches `end of | |
text' | |
- the possibility of stack underflow/overflow in the State Machine | |
(in ligature and contextual glyph substitutions, the State | |
Machine can store 16 glyphs onto its stack) | |
In addition, gxvalid doesn't check `temporary glyph IDs' used in the | |
chained State Machines (in `mort' and `morx' tables). If a layout | |
feature is implemented by a single State Machine, a glyph ID | |
converted by the State Machine is passed to the glyph renderer, thus | |
it should not point to an undefined glyph ID. But if a layout | |
feature is implemented by chained State Machines, a component State | |
Machine (if it is not the final one) is permitted to generate | |
undefined glyph IDs for temporary use, because it is handled by next | |
component State Machine and not by the glyph renderer. To validate | |
such temporary glyph IDs, gxvalid must stack all undefined glyph IDs | |
which can occur in the output of the previous State Machine and | |
search them in the `ClassTable' structure of the current State | |
Machine. It is too complex to list all possible glyph IDs from the | |
StateTable, especially from a ligature substitution table. | |
3-2. Validation of relationship between multiple layout features | |
---------------------------------------------------------------- | |
gxvalid does not validate the relationship between multiple layout | |
features at all. | |
If multiple layout features are defined in TrueType GX tables, | |
possible interactions, overrides, and conflicts between layout | |
features are implicitly given in the font too. For example, there | |
are several predefined spacing control features: | |
- Text Spacing (Proportional/Monospace/Half-width/Normal) | |
- Number Spacing (Monospaced-numbers/Proportional-numbers) | |
- Kana Spacing (Full-width/Proportional) | |
- Ideographic Spacing (Full-width/Proportional) | |
- CJK Roman Spacing (Half-width/Proportional/Default-roman | |
/Full-width-roman/Proportional) | |
If all layout features are independently managed, we can activate | |
inconsistent typographic rules like `Text Spacing=Monospace' and | |
`Ideographic Spacing=Proportional' at the same time. | |
The combinations of layout features is managed by a 32bit integer | |
(one bit each for selector setting), so we can define relationships | |
between up to 32 features, theoretically. But if one feature | |
setting affects another feature setting, we need typographic | |
priority rules to validate the relationship. Unfortunately, the | |
TrueType GX format specification does not give such information even | |
for predefined features. | |
4. Permissive error handling of broken GX tables | |
------------------------------------------------ | |
When Apple's font rendering system finds an inconsistency, like a | |
specification violation or an unspecified value in a TrueType GX | |
table, it does not always return error. In most cases, the rendering | |
engine silently ignores such wrong values or even whole tables. In | |
fact, MacOS is shipped with fonts including broken GX/AAT tables, but | |
no harmful effects due to `officially broken' fonts are observed by | |
end-users. | |
gxvalid is designed to continue the validation process as long as | |
possible. When gxvalid find wrong values, gxvalid warns it at least, | |
and takes a fallback procedure if possible. The fallback procedure | |
depends on the debug level. | |
We used the following three tools to investigate Apple's error handling. | |
- FontValidator (for MacOS 8.5 - 9.2) resource fork font | |
- ftxvalidator (for MacOS X 10.1 -) dfont or naked-sfnt | |
- ftxdumperfuser (for MacOS X 10.1 -) dfont or naked-sfnt | |
However, all tests were done on a PowerPC based Macintosh; at present, | |
we have not checked those tools on a m68k-based Macintosh. | |
In total, we checked 183 fonts bundled to MacOS 9.1, MacOS 9.2, MacOS | |
10.0, MacOS X 10.1, MSIE for MacOS, and AppleWorks 6.0. These fonts | |
are distributed officially, but many broken GX/AAT tables were found | |
by Apple's font tools. In the following, we list typical violation of | |
the GX specification, in fonts officially distributed with those Apple | |
systems. | |
4-1. broken BinSrchHeader (19/183) | |
---------------------------------- | |
`BinSrchHeader' is a header of a data array for m68k platforms to | |
access memory efficiently. Although there are only two independent | |
parameters for real (`unitSize' and `nUnits'), BinSrchHeader has | |
three additional parameters which can be calculated from `unitSize' | |
and `nUnits', for fast setup. Apple font tools ignore them | |
silently, so gxvalid warns if it finds and inconsistency, and always | |
continues validation. The additional parameters are ignored | |
regardless of the consistency. | |
19 fonts include such inconsistencies; all breaks are in the | |
BinSrchHeader structure of the `kern' table. | |
4-2. too-short LookupTable (5/183) | |
---------------------------------- | |
LookupTable format 0 is a simple array to get a value from a given | |
GID (glyph ID); the index of this array is a GID too. Therefore, | |
the length of the array is expected to be same as the maximum GID | |
value defined in the `maxp' table, but there are some fonts whose | |
LookupTable format 0 is too short to cover all GIDs. FontValidator | |
ignores this error silently, ftxvalidator and ftxdumperfuser both | |
warn and continue. Similar problems are found in format 3 subtables | |
of `kern'. gxvalid warns always and abort if the validation level | |
is set to FT_VALIDATE_PARANOID. | |
5 fonts include too-short kern format 0 subtables. | |
1 font includes too-short kern format 3 subtable. | |
4-3. broken LookupTable format 2 (1/183) | |
---------------------------------------- | |
LookupTable format 2, subformat 4 covers the GID space by a | |
collection of segments which are specified by `firstGlyph' and | |
`lastGlyph'. Some fonts store `firstGlyph' and `lastGlyph' in | |
reverse order, so the segment specification is broken. Apple font | |
tools ignore this error silently; a broken segment is ignored as if | |
it did not exist. gxvalid warns and normalize the segment at | |
FT_VALIDATE_DEFAULT, or ignore the segment at FT_VALIDATE_TIGHT, or | |
abort at FT_VALIDATE_PARANOID. | |
1 font includes broken LookupTable format 2, in the `just' table. | |
*) It seems that all fonts manufactured by ITC for AppleWorks have | |
this error. | |
4-4. bad bracketing in glyph property (14/183) | |
---------------------------------------------- | |
GX/AAT defines a `bracketing' property of the glyphs in the `prop' | |
table, to control layout features of strings enclosed inside and | |
outside of brackets. Some fonts give inappropriate bracket | |
properties to glyphs. Apple font tools warn about this error; | |
gxvalid warns too and aborts at FT_VALIDATE_PARANOID. | |
14 fonts include wrong bracket properties. | |
4-5. invalid feature number (117/183) | |
------------------------------------- | |
The GX/AAT extension can include 255 different layout features, but | |
popular layout features are predefined (see | |
http://developer.apple.com/fonts/Registry/index.html). Some fonts | |
include feature numbers which are incompatible with the predefined | |
feature registry. | |
In our survey, there are 140 fonts including `feat' table. | |
a) 67 fonts use a feature number which should not be used. | |
b) 117 fonts set the wrong feature range (nSetting). This is mostly | |
found in the `mort' and `morx' tables. | |
Apple font tools give no warning, although they cannot recognize | |
what the feature is. At FT_VALIDATE_DEFAULT, gxvalid warns but | |
continues in both cases (a, b). At FT_VALIDATE_TIGHT, gxvalid warns | |
and aborts for (a), but continues for (b). At FT_VALIDATE_PARANOID, | |
gxvalid warns and aborts in both cases (a, b). | |
4-6. invalid prop version (10/183) | |
---------------------------------- | |
As most TrueType GX tables, the `prop' table must start with a 32bit | |
version identifier: 0x00010000, 0x00020000 or 0x00030000. But some | |
fonts store nonsense binary data instead. When Apple font tools | |
find them, they abort the processing immediately, and the data which | |
follows is unhandled. gxvalid does the same. | |
10 fonts include broken `prop' version. | |
All of these fonts are classic TrueType fonts for the Japanese | |
script, manufactured by Apple. | |
4-7. unknown resource name (2/183) | |
------------------------------------ | |
NOTE: THIS IS NOT A TRUETYPE GX ERROR. | |
If a TrueType font is stored in the resource fork or in dfont | |
format, the data must be tagged as `sfnt' in the resource fork index | |
to invoke TrueType font handler for the data. But the TrueType font | |
data in `Keyboard.dfont' is tagged as `kbd', and that in | |
`LastResort.dfont' is tagged as `lst'. Apple font tools can detect | |
that the data is in TrueType format and successfully validate them. | |
Maybe this is possible because they are known to be dfont. The | |
current implementation of the resource fork driver of FreeType | |
cannot do that, thus gxvalid cannot validate them. | |
2 fonts use an unknown tag for the TrueType font resource. | |
5. `kern' table issues | |
---------------------- | |
In common terminology of TrueType, `kern' is classified as a basic and | |
platform-independent table. But there are Apple extensions of `kern', | |
and there is an extension which requires a GX state machine for | |
contextual kerning. Therefore, gxvalid includes a special validator | |
for `kern' tables. Unfortunately, there is no exact algorithm to | |
check Apple's extension, so gxvalid includes a heuristic algorithm to | |
find the proper validation routines for all possible data formats, | |
including the data format for Microsoft. By calling | |
classic_kern_validate() instead of gxv_validate(), you can specify the | |
`kern' format explicitly. However, current FreeType2 uses Microsoft | |
`kern' format only, others are ignored (and should be handled in a | |
library one level higher than FreeType). | |
5-1. History | |
------------ | |
The original 16bit version of `kern' was designed by Apple in the | |
pre-GX era, and it was also approved by Microsoft. Afterwards, | |
Apple designed a new 32bit version of the `kern' table. According | |
to the documentation, the difference between the 16bit and 32bit | |
version is only the size of variables in the `kern' header. In the | |
following, we call the original 16bit version as `classic', and | |
32bit version as `new'. | |
5-2. Versions and dialects which should be differentiated | |
--------------------------------------------------------- | |
The `kern' table consists of a table header and several subtables. | |
The version number which identifies a `classic' or a `new' version | |
is explicitly written in the table header, but there are | |
undocumented differences between Microsoft's and Apple's formats. | |
It is called a `dialect' in the following. There are three cases | |
which should be handled: the new Apple-dialect, the classic | |
Apple-dialect, and the classic Microsoft-dialect. An analysis of | |
the formats and the auto detection algorithm of gxvalid is described | |
in the following. | |
5-2-1. Version detection: classic and new kern | |
---------------------------------------------- | |
According to Apple TrueType specification, there are only two | |
differences between the classic and the new: | |
- The `kern' table header starts with the version number. | |
The classic version starts with 0x0000 (16bit), | |
the new version starts with 0x00010000 (32bit). | |
- In the `kern' table header, the number of subtables follows | |
the version number. | |
In the classic version, it is stored as a 16bit value. | |
In the new version, it is stored as a 32bit value. | |
From Apple font tool's output (DumpKERN is also tested in addition | |
to the three Apple font tools in above), there is another | |
undocumented difference. In the new version, the subtable header | |
includes a 16bit variable named `tupleIndex' which does not exist | |
in the classic version. | |
The new version can store all subtable formats (0, 1, 2, and 3), | |
but the Apple TrueType specification does not mention the subtable | |
formats available in the classic version. | |
5-2-2. Available subtable formats in classic version | |
---------------------------------------------------- | |
Although the Apple TrueType specification recommends to use the | |
classic version in the case if the font is designed for both the | |
Apple and Microsoft platforms, it does not document the available | |
subtable formats in the classic version. | |
According to the Microsoft TrueType specification, the subtable | |
format assured for Windows and OS/2 support is only subtable | |
format 0. The Microsoft TrueType specification also describes | |
subtable format 2, but does not mention which platforms support | |
it. Aubtable formats 1, 3, and higher are documented as reserved | |
for future use. Therefore, the classic version can store subtable | |
formats 0 and 2, at least. `ttfdump.exe', a font tool provided by | |
Microsoft, ignores the subtable format written in the subtable | |
header, and parses the table as if all subtables are in format 0. | |
`kern' subtable format 1 uses a StateTable, so it cannot be | |
utilized without a GX State Machine. Therefore, it is reasonable | |
to assume that format 1 (and 3) were introduced after Apple had | |
introduced GX and moved to the new 32bit version. | |
5-2-3. Apple and Microsoft dialects | |
----------------------------------- | |
The `kern' subtable has a 16bit `coverage' field to describe | |
kerning attributes, but bit interpretations by Apple and Microsoft | |
are different: For example, Apple uses bits 0-7 to identify the | |
subtable, while Microsoft uses bits 8-15. | |
In addition, due to the output of DumpKERN and FontValidator, | |
Apple's bit interpretations of coverage in classic and new version | |
are incompatible also. In summary, there are three dialects: | |
classic Apple dialect, classic Microsoft dialect, and new Apple | |
dialect. The classic Microsoft dialect and the new Apple dialect | |
are documented by each vendors' TrueType font specification, but | |
the documentation for classic Apple dialect is not available. | |
For example, in the new Apple dialect, bit 15 is documented as | |
`set to 1 if the kerning is vertical'. On the other hand, in | |
classic Microsoft dialect, bit 1 is documented as `set to 1 if the | |
kerning is horizontal'. From the outputs of DumpKERN and | |
FontValidator, classic Apple dialect recognizes 15 as `set to 1 | |
when the kerning is horizontal'. From the results of similar | |
experiments, classic Apple dialect seems to be the Endian reverse | |
of the classic Microsoft dialect. | |
As a conclusion it must be noted that no font tool can identify | |
classic Apple dialect or classic Microsoft dialect automatically. | |
5-2-4. gxvalid auto dialect detection algorithm | |
----------------------------------------------- | |
The first 16 bits of the `kern' table are enough to identify the | |
version: | |
- if the first 16 bits are 0x0000, the `kern' table is in | |
classic Apple dialect or classic Microsoft dialect | |
- if the first 16 bits are 0x0001, and next 16 bits are 0x0000, | |
the kern table is in new Apple dialect. | |
If the `kern' table is a classic one, the 16bit `coverage' field | |
is checked next. Firstly, the coverage bits are decoded for the | |
classic Apple dialect using the following bit masks (this is based | |
on DumpKERN output): | |
0x8000: 1=horizontal, 0=vertical | |
0x4000: not used | |
0x2000: 1=cross-stream, 0=normal | |
0x1FF0: reserved | |
0x000F: subtable format | |
If any of reserved bits are set or the subtable bits is | |
interpreted as format 1 or 3, we take it as `impossible in classic | |
Apple dialect' and retry, using the classic Microsoft dialect. | |
The most popular coverage in new Apple-dialect: 0x8000, | |
The most popular coverage in classic Apple-dialect: 0x0000, | |
The most popular coverage in classic Microsoft dialect: 0x0001. | |
5-3. Tested fonts | |
----------------- | |
We checked 59 fonts bundled with MacOS and 38 fonts bundled with | |
Windows, where all font include a `kern' table. | |
- fonts bundled with MacOS | |
* new Apple dialect | |
format 0: 18 | |
format 2: 1 | |
format 3: 1 | |
* classic Apple dialect | |
format 0: 14 | |
* classic Microsoft dialect | |
format 0: 15 | |
- fonts bundled with Windows | |
* classic Microsoft dialect | |
format 0: 38 | |
It looks strange that classic Microsoft-dialect fonts are bundled to | |
MacOS: they come from MSIE for MacOS, except of MarkerFelt.dfont. | |
ACKNOWLEDGEMENT | |
--------------- | |
Some parts of gxvalid are derived from both the `gxlayout' module and | |
the `otvalid' module. Development of gxlayout was supported by the | |
Information-technology Promotion Agency(IPA), Japan. | |
The detailed analysis of undefined glyph ID utilization in `mort' and | |
`morx' tables is provided by George Williams. | |
------------------------------------------------------------------------ | |
Copyright 2004-2015 by | |
suzuki toshiya, Masatake YAMATO, Red hat K.K., | |
David Turner, Robert Wilhelm, and Werner Lemberg. | |
This file is part of the FreeType project, and may only be used, | |
modified, and distributed under the terms of the FreeType project | |
license, LICENSE.TXT. By continuing to use, modify, or distribute this | |
file you indicate that you have read the license and understand and | |
accept it fully. | |
--- end of README --- |