stonemilker

joined 2 years ago
[–] [email protected] 1 points 2 years ago

For my personal use, the only drawback of mat2 is that for PDFs it turns pages into PNGs (https://0xacab.org/jvoisin/mat2/-/blob/master/libmat2/pdf.py), so you lose the OCR layer/searchable text from the original file. Since ExifTool changes to PDF metadata are reversible if you don't linearize them (https://exiftool.org/TagNames/PDF.html), now I just use this script to safely clean and keep the output file searchable: https://gist.github.com/sneakers-the-rat/172e8679b824a3871decd262ed3f59c6.

I guess you could compare the output files from mat2 and ExifTool using the fc or diff commands, to find out what's the difference

view more: ‹ prev next ›