Difference between revisions of "Pdftk"

From Freephile Wiki
Jump to navigation Jump to search
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
[https://www.pdflabs.com/ PDF Toolkit], or pdftk for short, is a great [http://fsf.org free software] command-line program for manipulating documents in the Portable Document Format (PDF).  To help regular users while also supporting the author and his free software work [https://www.pdflabs.com/ PDF Labs] now also offers (graphical) desktop versions.  '''PDFTk Free''' will merge and split pdfs.  '''PDFTk Pro''' will do other processing and costs a mere $3.99.   
 
[https://www.pdflabs.com/ PDF Toolkit], or pdftk for short, is a great [http://fsf.org free software] command-line program for manipulating documents in the Portable Document Format (PDF).  To help regular users while also supporting the author and his free software work [https://www.pdflabs.com/ PDF Labs] now also offers (graphical) desktop versions.  '''PDFTk Free''' will merge and split pdfs.  '''PDFTk Pro''' will do other processing and costs a mere $3.99.   
  
== Manual ==
+
==Manual==
 
https://www.pdflabs.com/docs/pdftk-man-page/
 
https://www.pdflabs.com/docs/pdftk-man-page/
  
== Examples ==
+
==Examples==
 
http://www.pdflabs.com/docs/pdftk-cli-examples/
 
http://www.pdflabs.com/docs/pdftk-cli-examples/
 +
 +
'''Add a signature page'''<syntaxhighlight lang="shell">
 +
# first, remove page 5 (the final page of the source document having an unfilled signature line)
 +
pdftk ".\Massachusetts Contractor Agreement.pdf" cat 1-4 output ".\Massachusetts Contractor Agreement.pp1-4.pdf"
 +
# now join pp 1-4 with the new p5 (from a scan - see note)
 +
pdftk ".\Massachusetts Contractor Agreement.pp1-4.pdf" ".\Massachusetts Contractor signature page.pdf" output ".\Massachusetts Contractor Agreement.signed.pdf"
 +
</syntaxhighlight>''Note:'' your scan may not be "letter" size. If not, then you will get a weird result when <code>pdftk</code> happily merges two different sized source documents. Simply open the scanned PDF signature page document in LibreOffice '''Draw'''. Set the Page -> Page Properties [Page][Paper Format] to "Letter" so that the document has the correct size. Next, select the object (just click anywhere on the drawing), right-click and choose "Position and Size" (or press F4) to resize the scanned object to fit the page dimensions. In the '''size''' properties, make sure the "keep ratios" checkbox is selected, and change one dimension (e.g. height = 11in) to fit the page dimensions. Press 'tab' to apply the change and check your result. The scan should now visible fit exactly on the 'page' in LibreOffice Draw.  Then "Export directly to PDF" by clicking the [[File:PDF file icon.svg|alt=pdf icon|17x17px]] '''pdf''' icon in the LibreOffice toolbar . You do not have to save the document in ODG format, just export it "live" (you can even overwrite the original PDF file). Now you can use that result to combine with the other properly sized source document.
 +
  
 
'''Discard the cover page of a pdf'''
 
'''Discard the cover page of a pdf'''
Line 12: Line 20:
 
</source>
 
</source>
  
=== Collating two-sided documents ===
+
===Collating two-sided documents===
 
2-sided document?  No problem.  Scan the original face side up first (odd pages); then flip it over and scan the second (even pages).  Astute people will recognized that the second document is in reverse order compared to the first document.  pdfTK can not only Merge the two documents, but '''ALSO''' can reverse the second document during collation so that the pages are in order.
 
2-sided document?  No problem.  Scan the original face side up first (odd pages); then flip it over and scan the second (even pages).  Astute people will recognized that the second document is in reverse order compared to the first document.  pdfTK can not only Merge the two documents, but '''ALSO''' can reverse the second document during collation so that the pages are in order.
 
<source lang="bash">
 
<source lang="bash">
Line 19: Line 27:
 
In our example, We specify documents handles using 'A' and 'B' to make it easier to refer to them. The operator "shuffle" acts like "cat" but means to collate the documents like shuffling a deck of cards.  Using the 'A' and 'B' handles, we can also specify a range, and by reversing the range that 'B' should be read from the "end" to "page 1" using the handle "Bend-1".
 
In our example, We specify documents handles using 'A' and 'B' to make it easier to refer to them. The operator "shuffle" acts like "cat" but means to collate the documents like shuffling a deck of cards.  Using the 'A' and 'B' handles, we can also specify a range, and by reversing the range that 'B' should be read from the "end" to "page 1" using the handle "Bend-1".
  
=== Discard blank pages ===
+
===Discard blank pages===
 
If you have a scan that added blank pages (every even page), and you want to get rid of those, you would ask pdftk to 'cat' pages 1-end (but only the odd ones) and 'output' that to the file of your choice.
 
If you have a scan that added blank pages (every even page), and you want to get rid of those, you would ask pdftk to 'cat' pages 1-end (but only the odd ones) and 'output' that to the file of your choice.
 
<source lang="bash">
 
<source lang="bash">
Line 25: Line 33:
 
</source>
 
</source>
  
=== Cleaning up Bank Statements ===
+
===Cleaning up Bank Statements===
 
This is a long one, because Jack Henry sucks.
 
This is a long one, because Jack Henry sucks.
 
<source lang="bash">
 
<source lang="bash">
Line 31: Line 39:
 
# then rename the whole batch to something more intelligent
 
# then rename the whole batch to something more intelligent
 
rename 's/E-Statement_/2014_ifs_/' E-Statement*
 
rename 's/E-Statement_/2014_ifs_/' E-Statement*
# turn on extended globbing
 
shopt -s extglob
 
# now you should be able to see the files you want
 
 
ll 2014_ifs_??.pdf
 
ll 2014_ifs_??.pdf
 
# use pdftk to combine them
 
# use pdftk to combine them
Line 41: Line 46:
 
# and delete the monthly statement clutter
 
# and delete the monthly statement clutter
 
rm 2014_ifs_??.pdf
 
rm 2014_ifs_??.pdf
 +
</source>
 +
You don't need to use extended globbing <ref>http://mywiki.wooledge.org/glob </ref>, but if you did, you'd be able to use patterns like
 +
<source lang="bash">
 +
# turn on extended globbing
 +
shopt -s extglob
 +
# everything but the condensed pdf
 +
ll 2014_ifs_!(condensed).pdf
 
# unset  
 
# unset  
 
shopt -u extglob
 
shopt -u extglob
 +
</source>
  
 +
===Order Numerically===
 +
This is really a feature of the <code>ls</code> command.  If you have a series of files you wish to load in numeric order, but the numbering system is 'natural' instead of computer-friendly, then use the -v option to <code>ls</code>.  I.e. you have 1.pdf, 2.pdf... 11.pdf, 12.pdf. 
 +
<source lang="bash">
 +
pdftk $(ls -v *.pdf) cat output my.combined.pdf
 
</source>
 
</source>
 +
 +
{{References}}
  
 
[[Category:Tools]]
 
[[Category:Tools]]

Latest revision as of 11:24, 9 February 2024

PDF Toolkit, or pdftk for short, is a great free software command-line program for manipulating documents in the Portable Document Format (PDF). To help regular users while also supporting the author and his free software work PDF Labs now also offers (graphical) desktop versions. PDFTk Free will merge and split pdfs. PDFTk Pro will do other processing and costs a mere $3.99.

Manual[edit | edit source]

https://www.pdflabs.com/docs/pdftk-man-page/

Examples[edit | edit source]

http://www.pdflabs.com/docs/pdftk-cli-examples/

Add a signature page

# first, remove page 5 (the final page of the source document having an unfilled signature line)
pdftk ".\Massachusetts Contractor Agreement.pdf" cat 1-4 output ".\Massachusetts Contractor Agreement.pp1-4.pdf"
# now join pp 1-4 with the new p5 (from a scan - see note)
pdftk ".\Massachusetts Contractor Agreement.pp1-4.pdf" ".\Massachusetts Contractor signature page.pdf" output ".\Massachusetts Contractor Agreement.signed.pdf"

Note: your scan may not be "letter" size. If not, then you will get a weird result when pdftk happily merges two different sized source documents. Simply open the scanned PDF signature page document in LibreOffice Draw. Set the Page -> Page Properties [Page][Paper Format] to "Letter" so that the document has the correct size. Next, select the object (just click anywhere on the drawing), right-click and choose "Position and Size" (or press F4) to resize the scanned object to fit the page dimensions. In the size properties, make sure the "keep ratios" checkbox is selected, and change one dimension (e.g. height = 11in) to fit the page dimensions. Press 'tab' to apply the change and check your result. The scan should now visible fit exactly on the 'page' in LibreOffice Draw. Then "Export directly to PDF" by clicking the pdf icon pdf icon in the LibreOffice toolbar . You do not have to save the document in ODG format, just export it "live" (you can even overwrite the original PDF file). Now you can use that result to combine with the other properly sized source document.


Discard the cover page of a pdf

pdftk wCover.pdf cat 2-end output NoCover.pdf

Collating two-sided documents[edit | edit source]

2-sided document? No problem. Scan the original face side up first (odd pages); then flip it over and scan the second (even pages). Astute people will recognized that the second document is in reverse order compared to the first document. pdfTK can not only Merge the two documents, but ALSO can reverse the second document during collation so that the pages are in order.

pdftk A=my.even.pdf B=my.odd.pdf shuffle A Bend-1 output my.full.pdf

In our example, We specify documents handles using 'A' and 'B' to make it easier to refer to them. The operator "shuffle" acts like "cat" but means to collate the documents like shuffling a deck of cards. Using the 'A' and 'B' handles, we can also specify a range, and by reversing the range that 'B' should be read from the "end" to "page 1" using the handle "Bend-1".

Discard blank pages[edit | edit source]

If you have a scan that added blank pages (every even page), and you want to get rid of those, you would ask pdftk to 'cat' pages 1-end (but only the odd ones) and 'output' that to the file of your choice.

pdftk ~/Desktop/DOC033115.pdf cat 1-endodd output ~/Desktop/ProofOfLearning.pdf

Cleaning up Bank Statements[edit | edit source]

This is a long one, because Jack Henry sucks.

# first download the statements, manually inserting a month digit for each one
# then rename the whole batch to something more intelligent
rename 's/E-Statement_/2014_ifs_/' E-Statement*
ll 2014_ifs_??.pdf
# use pdftk to combine them
pdftk 2014_ifs_??.pdf cat output 2014_ifs_summary.pdf
# and then keep only the pages you want
pdftk 2014_ifs_summary.pdf cat 1 3 5 7 9 14 15 17 18 20 22 24 26 28 output 2014_ifs_condensed.pdf
# and delete the monthly statement clutter
rm 2014_ifs_??.pdf

You don't need to use extended globbing [1], but if you did, you'd be able to use patterns like

# turn on extended globbing
shopt -s extglob
# everything but the condensed pdf
ll 2014_ifs_!(condensed).pdf
# unset 
shopt -u extglob

Order Numerically[edit | edit source]

This is really a feature of the ls command. If you have a series of files you wish to load in numeric order, but the numbering system is 'natural' instead of computer-friendly, then use the -v option to ls. I.e. you have 1.pdf, 2.pdf... 11.pdf, 12.pdf.

pdftk $(ls -v *.pdf) cat output my.combined.pdf

References[edit source]