Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Understanding Font Selection in iTextPDF HTML to PDF Conversion

Tech 2

When converting HTML to PDF using iTextPDF's html2pdf (version 4.0.5), font selection follows a two-step process based on the specified font-family in HTML styles.

Font Selection Process

  1. Font Matching by Family Name The converter compares each font in the font-family list with available font packages. It assigns scores based on these criteria:

    if (!"".equals(fontFamily) 
        && (null == fontInfo.getAlias() 
        && null != fontDescriptor.getFamilyNameLowerCase() 
        && fontDescriptor.getFamilyNameLowerCase().equals(fontFamily) 
        || fontDescriptor.getFamilyNameLowerCase().startsWith(fontFamily.trim())
        || (null != fontInfo.getAlias() && fontInfo.getAlias().toLowerCase().equals(fontFamily)))) {
        score += FONT_FAMILY_EQUALS_AWARD;
    }
    

    The highest-scoring available font is selected. For example, KaiTi_GB2312 will match if its lowercase family name exactly matches the specified font.

  2. Character Compatibility Check Even after selceting a font, individual characters are verified for compatibility:

    for (FontInfo f : selector.getFonts()) {
        int codePoint = isSurrogatePair(text, nextUnignorable)
              ? TextUtil.convertToUtf32(text, nextUnignorable)
              : (int) text.charAt(nextUnignorable);
    
        if (f.getFontUnicodeRange().contains(codePoint)) {
            PdfFont currentFont = getPdfFont(f);
            Glyph glyph = currentFont.getGlyph(codePoint);
            if (null != glyph && 0 != glyph.getCode()) {
                font = currentFont;
                break;
            }
        }
    }
    

    If a character (like   with Unicode 160) isn't supported by the primary font, the system falls back to another font (e.g., SimSun for Chinese text).

Script-Based Segmentation

Text is segmented by Unicode script type (e.g., HAN for Chinese, LATIN for English). Each segment uses a font that supports its script:

Character.UnicodeScript unicodeScript = nextSignificantUnicodeScript(nextUnignorable);
for (int i = nextUnignorable; i < text.length(); i++) {
    int codePoint = isSurrogatePair(text, i) 
          ? TextUtil.convertToUtf32(text, i) 
          : (int) text.charAt(i);
    Character.UnicodeScript currScript = Character.UnicodeScript.of(codePoint);
    if (isSignificantUnicodeScript(currScript) && currScript != unicodeScript) {
        break;
    }
    // Process compatible characters
}

This ensures mixed-script text (e.g., Chinese with Latin acronyms like "USDA") uses appropriate fonts for each segment.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.