Home > Tech > Content

Understanding Font Selection in iTextPDF HTML to PDF Conversion

Tech Apr 24 10

When converting HTML to PDF using iTextPDF's html2pdf (version 4.0.5), font selection follows a two-step process based on the specified font-family in HTML styles.

Font Selection Process

Font Matching by Family Name The converter compares each font in the font-family list with available font packages. It assigns scores based on these criteria:

if (!"".equals(fontFamily) 
    && (null == fontInfo.getAlias() 
    && null != fontDescriptor.getFamilyNameLowerCase() 
    && fontDescriptor.getFamilyNameLowerCase().equals(fontFamily) 
    || fontDescriptor.getFamilyNameLowerCase().startsWith(fontFamily.trim())
    || (null != fontInfo.getAlias() && fontInfo.getAlias().toLowerCase().equals(fontFamily)))) {
    score += FONT_FAMILY_EQUALS_AWARD;
}

The highest-scoring available font is selected. For example, KaiTi_GB2312 will match if its lowercase family name exactly matches the specified font.

Character Compatibility Check Even after selceting a font, individual characters are verified for compatibility:

for (FontInfo f : selector.getFonts()) {
    int codePoint = isSurrogatePair(text, nextUnignorable)
          ? TextUtil.convertToUtf32(text, nextUnignorable)
          : (int) text.charAt(nextUnignorable);

    if (f.getFontUnicodeRange().contains(codePoint)) {
        PdfFont currentFont = getPdfFont(f);
        Glyph glyph = currentFont.getGlyph(codePoint);
        if (null != glyph && 0 != glyph.getCode()) {
            font = currentFont;
            break;
        }
    }
}

If a character (like   with Unicode 160) isn't supported by the primary font, the system falls back to another font (e.g., SimSun for Chinese text).

Script-Based Segmentation

Text is segmented by Unicode script type (e.g., HAN for Chinese, LATIN for English). Each segment uses a font that supports its script:

Character.UnicodeScript unicodeScript = nextSignificantUnicodeScript(nextUnignorable);
for (int i = nextUnignorable; i < text.length(); i++) {
    int codePoint = isSurrogatePair(text, i) 
          ? TextUtil.convertToUtf32(text, i) 
          : (int) text.charAt(i);
    Character.UnicodeScript currScript = Character.UnicodeScript.of(codePoint);
    if (isSignificantUnicodeScript(currScript) && currScript != unicodeScript) {
        break;
    }
    // Process compatible characters
}

This ensures mixed-script text (e.g., Chinese with Latin acronyms like "USDA") uses appropriate fonts for each segment.

Back to List

Prev: Understanding HttpServlet in Java Web Applications

Next: Wifiphisher —— Aggressive Wi-Fi Phishing Framework for Security Testing

Fading Coder

Understanding Font Selection in iTextPDF HTML to PDF Conversion

Font Selection Process

Script-Based Segmentation

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Understanding Font Selection in iTextPDF HTML to PDF Conversion

Font Selection Process

Script-Based Segmentation

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment