Removing blank pages from PDF files

wkhtmltopdf converts HTML content into PDF format, which can be a simple and effective way to generate PDF files.

However sometimes wkhtmltopdf generates a blank last page, depending on the original content.

 

It may not be obvious which changes are needed in the original content to avoid this problem.

A viable workaround could be processing PDF files to remove blank pages.

I decided to investigate how to do this.

 

PDF files contains objects, including media, page content, page definitions and a page index.

Here’s an example of the page index:

3 0 obj
<<
/Type /Pages
/Kids
[
6 0 R
14 0 R
19 0 R
24 0 R
]
/Count 4
/ProcSet [/PDF /Text /ImageB /ImageC]
>>
endobj

 

The page index can be recognized by the identifier: /Type /Pages

Objects are numbered. In this case the page index itself had the number 3 0, while the last page had the number 24 0

 

Searched for the last page (24 0 obj) and found:

24 0 obj
<<
/Type /Page
/Parent 3 0 R
/Contents 25 0 R
/Resources 27 0 R
/Annots 28 0 R
/MediaBox [0 0 612 792]
>>
endobj

 

The page definition contained a reference to the actual content.

Searched for the content (25 0 obj) and found:

25 0 obj
<<
/Length 26 0 R
/Filter /FlateDecode
>>
stream
... (encoded content left out) ...
endstream
endobj

 

Searched for the content length (26 0 obj) and found:

26 0 obj
203
endobj

 

A small content length was a good indication that the last page was blank. 203 bytes was not a lot of content, so I decided to remove the last page from the page index.

(A nicer but much more complicated solution would be to decode the content, analyze it and determine if the page was blank or not)

 

Modified the page index by removing the original last page and decrementing the count like this:

3 0 obj
<<
/Type /Pages
/Kids
[
6 0 R
14 0 R
19 0 R
]
/Count 3
/ProcSet [/PDF /Text /ImageB /ImageC]
>>
endobj

 

This was enough to remove the blank page from the PDF file.

Determined that the modified PDF looked like expected with:

  • Adobe Acrobat Reader DC
  • Sumatra PDF
  • Evince Document Viewer
  • Firefox
  • Chrome

(If the PDF had contained a visible total page count, additional modifications may have be needed after removing the blank page)

Conclusion

A program could use this technique to remove a blank last page in a PDF file:

  1. Search page index (/Type /Pages) for the last page.
  2. Find the last page.
  3. Find the content.
  4. Find the content length.
  5. If the content length is small, then remove the last page from the page index and decrement the total page count.

Error when copying large files to USB memory stick

While copying files to a new USB memory stick I encountered unexpected error messages in Windows.

 

Robocopy failed with:

2017/05/04 21:45:38 ERROR 87 (0x00000057) Copying File f:\Transport\Test\4GB.txt

The parameter is incorrect.

 

Cmd copy failed with:

The parameter is incorrect.

 

PowerShell Copy-Item failed with:

Copy-Item : The parameter is incorrect.
At line:1 char:10
+ Copy-Item <<<<  .\4GB.txt G:\
+ CategoryInfo          : NotSpecified: (:) [Copy-Item], IOException
+ FullyQualifiedErrorId : System.IO.IOException,Microsoft.PowerShell.Commands.CopyItemCommand

 

Windows explorer copy & paste failed with:

The file '4GB.txt' is too large for the destination file system.

 

This prompted me to check the file system, which was FAT32.

FAT32 has a file size limitation of 4 GB, a problem I have encountered before.

Like before I decided to reformat the USB memory stick to NTFS, which solved the problem.