wkhtmltopdf converts HTML content into PDF format, which can be a simple and effective way to generate PDF files.
However sometimes wkhtmltopdf generates a blank last page, depending on the original content.
It may not be obvious which changes are needed in the original content to avoid this problem.
A viable workaround could be processing PDF files to remove blank pages.
I decided to investigate how to do this.
PDF files contains objects, including media, page content, page definitions and a page index.
Here’s an example of the page index:
3 0 obj << /Type /Pages /Kids [ 6 0 R 14 0 R 19 0 R 24 0 R ] /Count 4 /ProcSet [/PDF /Text /ImageB /ImageC] >> endobj
The page index can be recognized by the identifier: /Type /Pages
Objects are numbered. In this case the page index itself had the number 3 0, while the last page had the number 24 0
Searched for the last page (24 0 obj) and found:
24 0 obj << /Type /Page /Parent 3 0 R /Contents 25 0 R /Resources 27 0 R /Annots 28 0 R /MediaBox [0 0 612 792] >> endobj
The page definition contained a reference to the actual content.
Searched for the content (25 0 obj) and found:
25 0 obj << /Length 26 0 R /Filter /FlateDecode >> stream ... (encoded content left out) ... endstream endobj
Searched for the content length (26 0 obj) and found:
26 0 obj 203 endobj
A small content length was a good indication that the last page was blank. 203 bytes was not a lot of content, so I decided to remove the last page from the page index.
(A nicer but much more complicated solution would be to decode the content, analyze it and determine if the page was blank or not)
Modified the page index by removing the original last page and decrementing the count like this:
3 0 obj << /Type /Pages /Kids [ 6 0 R 14 0 R 19 0 R ] /Count 3 /ProcSet [/PDF /Text /ImageB /ImageC] >> endobj
This was enough to remove the blank page from the PDF file.
Determined that the modified PDF looked like expected with:
- Adobe Acrobat Reader DC
- Sumatra PDF
- Evince Document Viewer
(If the PDF had contained a visible total page count, additional modifications may have be needed after removing the blank page)
A program could use this technique to remove a blank last page in a PDF file:
- Search page index (/Type /Pages) for the last page.
- Find the last page.
- Find the content.
- Find the content length.
- If the content length is small, then remove the last page from the page index and decrement the total page count.