A compilation of documentation   { en , fr }

How to manipulate pages of PDF documents with pdftk

Tag:
Created on:
Author:
Xavier Béguin

A freeware graphical tool and a paid version of the original PDFtk also exists for Windows, but this article only describes its command-line version its editor calls PDFtk server. More precisely, the examples below use the pdftk port into java usually available on GNU/Linux distributions as the package pdftk-java.

Basic syntax

The basic syntax for using the command is the following:

pdftk <input files> <operation> [<operation arguments>] [output <output file>]
  • <input files> is the list of the input files, but can also be - to read the input from the standard input, or PROMPT to have pdftk prompt the user for the name(s) of the input file(s);
  • <operation arguments> is optional, as some operations do not require arguments;
  • output <output file> is optional with some actions, but mandatory with others. <output files> is usually the name the output file. It can also be a printf-style format string to name the files resulting from the split of a document with the burst operation. Or it can be - to print the result to the standard output, or PROMPT to to have pdftk prompt the user for the name of the output file.

Extract pages

The extraction of a page is possible through the cat operation.

To extract the page 3 of the document input.pdf and write it to the document page3.pdf use the following command:

pdftk input.pdf cat 3 output page3.pdf

Several pages can also be extracted at once. They will be concatenated into the output document in the specified order. This example will extract the pages 2, 3, 4 and the page 6 from input.pdf into result.pdf :

pdftk input.pdf cat 2-4 6 output result.pdf

Here are a few other examples of page ranges to illustrate the possibilities:

  • 1-21 : pages 1 to 21 (included);
  • 4-10 5-7 8 : pages 4 to 10, 5 to 7 and page 8;
  • end-2 : pages extracted in reverse order from the last page to the second page;
  • end-2odd : same as above, but only the odd pages are extracted;
  • 1-4 5 5 4 : pages 1 to 4, then repeat page 5 twice, and extract page 4 again;
  • r1: the last page of the input document (same as end);
  • r3-r5: pages 3 to 5 counted from the end of the input document.

Concatenate several full documents

To concatenate all the pages of the two documents beginning.pdf and ending.pdf into the document result.pdf, run:

pdftk beginning.pdf ending.pdf cat output result.pdf

Concatenate specific pages from different documents

You can choose specific page(s) from the input documents and concatenate them. In this case, name the input files with handles (that are freely chosen names) to next specify the pages to extract from each file in the range of pages.

For example, to use pages 1, 2, 3 and 6 from a document named input1.pdf, the page 2 from input2.pdf, all pages starting from page 4 from input3.pdf, and write all these pages to a document named result.pdf, you would run:

pdftk A=input1.pdf B=input2.pdf C=input3.pdf cat A1-3 A6 B2 A4-end output result.pdf

A similar example would be to replace the page 4 of the document input1.pdf by the next-to-last page from input2.pdf (r2 is the second page from the end of the document):

pdftk FIRST=input1.pdf SECOND=input2.pdf cat FIRST1-3 SECONDr2 FIRST5-end output result.pdf

Note that you can also use the keywords even or odd, alone or as a suffix of a page range to respectively select even or odd pages from that range (see the examples of page ranges above).

Split all the pages of a document into different files

Use the action burst to split each page of a document into a specific file that will be named by default pg_0001.pdf, pg_0002.pdf, etc.:

pdftk input.pdf burst

The name of the destination files can be chosen by providing the keyword output and a printf-styled format string to name the files. If you want the page files to be named page_01.pdf, page_02.pdf, etc., you can use:

pdftk input.pdf burst output page_%02d.pdf

The default format used for page filenames is pg_%04d.pdf.

Collate pages from documents

If you scanned the odd pages of a document separately from the even pages (for example using a scan feeder that only scans one side of a sheet), the action shuffle can be used to easily reassemble them into a new document.

As the pdftk(1) manual page puts it, shuffleworks like the cat operation except that it takes one page at a time from each page range to assemble the output PDF”.

To collate all the pages of the input documents, you do not need to specify a range, and you could simply write:

pdftk odd.pdf even.pdf shuffle output output.pdf

But, if odd.pdf was scanned in reverse order, you can simply specify a reverse range like this:

pdftk O=odd.pdf E=even.pdf shuffle Oend-1 E output output.pdf

In this case, shuffle will produce the document output.pdf by writing, in order:

  • the last page of odd.pdf;
  • then the first page of even.pdf;
  • then the next-to-last page of odd.pdf;
  • then the second page of even.pdf;
  • etc.

If you want to test this functionality, remember you can extract the odd pages or the even pages from a document using cat:

pdftk input.pdf cat odd output odd.pdf
pdftk input.pdf cat even output even.pdf

Rotate the pages of a document

To rotate pages of a document, use rotate and specify the list of pages to rotate, like you would give a list of pages to concatenate with cat, but do not specify the pages you do not want to rotate.

Note that the order of the pages in the rotation command does not matter, it will not change the order of the pages in the output.

A rotate command would therefore follow this kind of syntax:

pdftk <input files> rotate [<begin page number>[-<end page number>[<qualifier>]]][<page rotation>] output <output file>

To specify the type of rotation in an absolute way, use the following keywords. The original top of the page will be rotated to point to that many degrees to the right:

  • north: 0°;
  • east: 90°;
  • south: 180°;
  • west: 270°;

The following keywords allow to specify a rotation relative to a page's rotation:

  • left: -90°;
  • right: +90°;
  • down: +180°.

For example, to orientate the top of all the pages of the document input.pdf 90° to the right, use:

pdftk input.pdf rotate 1-endeast output rotated.pdf

To rotate 90° to the left the pages 2 and 4 of the document input.pdf (leaving other pages untouched, and the original order of the pages unchanged), you could use:

pdftk input.pdf rotate 4left 2left output rotated.pdf

Combine rotation and concatenation

The rotate operation can be used in combination with cat simply by using a rotation keyword in the page range to extract.

For example, after using the following command, the resulting document result.pdf would:

  • start with the pages 1 and 2 untouched from input1.pdf;
  • then include all the pages from input2.pdf, all rotated 90° to the right;
  • and finally include the pages 5 and all the following to the end of input1.pdf:
pdftk A=input1.pdf B=input2.pdf cat A1-2 B1-endright A5-end output result.pdf