Example simplified PDF object:
Editing a PDF with multiple CIDFont subsets causes missing characters. Cause: Adding text not present in any existing subset ( +f1 .. +f6 ). Fix: Subset the missing glyphs into a new subset ( +f7 ), or embed full font. cidfont+f1 f2 f3 f4 f5 f6
qpdf --qdf --object-streams=disable document.pdf unpacked.pdf grep -A5 "/CIDFont" unpacked.pdf You will see something like: Example simplified PDF object: Editing a PDF with
12 0 obj << /Type /Font /Subtype /CIDFontType0 /BaseFont /AAAAAA+NotoSansCJK /CIDSystemInfo << /Registry (Adobe) /Ordering (Identity) /Supplement 0 >> /FontDescriptor 13 0 R /DW 1000 /W [ 1 [500] 2 [600] ] >> endobj Pitfall: Text extraction returns garbled CJK text. Cause: Using +f1 ’s CMap incorrectly. Fix: Ensure your extractor uses the CMap referenced in the PDF (usually /CMap /Identity-H ). Fix: Subset the missing glyphs into a new
Would you like a Python script example that iterates through all CIDFont subsets in a PDF and reports their original font names and glyph counts?
If you are working on a specific PDF with f1…f6 and need to reduce or analyze them, tools like cpdf (Coherent PDF), hexaPDF (Ruby), or pymupdf (Python) give programmatic control.