Add bookmarks to a PDF file in batch
Under Linux, the tool JPdfBookmarks can be use to add a collection of bookmarks to a PDF book in batch by loading information from a text file, which stores the table of contents and page numbers. To prepare this text file manually, I’ve summarized the following steps according to my practice for your reference.
-
Take snapshots of the content pages and save them into images by using
ImageMagick.convert -density 300 book.pdf[${page_no}] content_page.pngHere
${page_no}should be replaced with the numeric page number of the content page. -
Extract text from the image of the content page using the OCR tool Tesseract.
tesseract content_page.png stdout -l eng >> bookmark_text.txtThe option
stdouttellsTesseractto output the extracted text to standard output and-l engspecifies the OCR language as English. To check the list of languages supported byTesseract, executetesseract --list-langs.$ tesseract --list-langs List of available languages (8): chi_sim chi_sim_vert chi_tra chi_tra_vert deu eng fra osd - Open the generated text file
bookmark_text.txtand perform a manual cleaning and reorganization if needed. It is suggested to do this work in Emacs, where the commandregexp-buildercan help us construct and verify regular expressions that are used for efficient text matching and replacement. - Open the target PDF file in
JPdfBookmarksand load the text filebookmark_text.txtby clicking the menu itemLoadin theToolsmenu. - Check the validity of the PDF page targets associated with those bookmark items and finally save the PDF file.