Understanding Font Selection in iTextPDF HTML to PDF Conversion
When converting HTML to PDF using iTextPDF's html2pdf (version 4.0.5), font selection follows a two-step process based on the specified font-family in HTML styles.
Font Selection Process
-
Font Matching by Family Name The converter compares each font in the
font-familylist with available font packages. It assigns scores based on these criteria:if (!"".equals(fontFamily) && (null == fontInfo.getAlias() && null != fontDescriptor.getFamilyNameLowerCase() && fontDescriptor.getFamilyNameLowerCase().equals(fontFamily) || fontDescriptor.getFamilyNameLowerCase().startsWith(fontFamily.trim()) || (null != fontInfo.getAlias() && fontInfo.getAlias().toLowerCase().equals(fontFamily)))) { score += FONT_FAMILY_EQUALS_AWARD; }The highest-scoring available font is selected. For example,
KaiTi_GB2312will match if its lowercase family name exactly matches the specified font. -
Character Compatibility Check Even after selceting a font, individual characters are verified for compatibility:
for (FontInfo f : selector.getFonts()) { int codePoint = isSurrogatePair(text, nextUnignorable) ? TextUtil.convertToUtf32(text, nextUnignorable) : (int) text.charAt(nextUnignorable); if (f.getFontUnicodeRange().contains(codePoint)) { PdfFont currentFont = getPdfFont(f); Glyph glyph = currentFont.getGlyph(codePoint); if (null != glyph && 0 != glyph.getCode()) { font = currentFont; break; } } }If a character (like
with Unicode 160) isn't supported by the primary font, the system falls back to another font (e.g., SimSun for Chinese text).
Script-Based Segmentation
Text is segmented by Unicode script type (e.g., HAN for Chinese, LATIN for English). Each segment uses a font that supports its script:
Character.UnicodeScript unicodeScript = nextSignificantUnicodeScript(nextUnignorable);
for (int i = nextUnignorable; i < text.length(); i++) {
int codePoint = isSurrogatePair(text, i)
? TextUtil.convertToUtf32(text, i)
: (int) text.charAt(i);
Character.UnicodeScript currScript = Character.UnicodeScript.of(codePoint);
if (isSignificantUnicodeScript(currScript) && currScript != unicodeScript) {
break;
}
// Process compatible characters
}
This ensures mixed-script text (e.g., Chinese with Latin acronyms like "USDA") uses appropriate fonts for each segment.