The problem:This can happen once in a while, probably due to the mistaken notion on the part of some clients that OCR software will make a mess of their translation work. They try to provide files which are outside the working limits of OCR software (e.g. oversized). What they don’t realize is how good and necessary OCR can be in the hands of the professional. (Do they imagine that we translators are actually going to sit there and retype a large and complex document in order to translate it – while observing strict deadlines?)
This happened recently in our work, so this solution is tested and proven. This process is dependent on Adobe InDesign. I’ve tried to duplicate the same solution using Scribus or InkScape (both open-source), which work similarly to InDesign, but these programs -at the time of this writing; this is expected to change- require each page to be imported separately one by one, meaning that they ‘break the deal’ time-wise.
The solution:Open a new document and set the number of pages to be the same as the pdf to be resized and size to be something normal and manageable like A4. (Bleed and slug areas are for printing and are unnecessary for this).
After clicking ‘Ok’, we have our, in this case 14-page, blank document.
Now we need to import the pdf into it: File → Place → browse and select the pdf file, uncheck ‘replace selected item’ and check ‘show import options’, then click ‘Open’. In the Options box, select ‘All’ (pages), Crop to →’ Media’ and ‘Ok”.
Place the cursor to the upper left hand corner of the first page and click. Page 1 will be inserted. Go to the second page and click again. This will insert the second page of the pdf. We continue in this same way up to the end of the document, pasting in the pages consecutively.
I’ve zoomed out a little to be able to see the border of the entire image which I’ve selected:
Now we need to start resizing. From the contextual menu (right-click) select ‘Transform’ → ‘Scale’. In the dialog box, select 50%, which should get the image boundaries within toggle range.
Drag and drop the image frame so that the top left corner is on the top left corner of the document frame. Then drag the bottom right corner of the image to match the document frame.
From the same menu, ‘Fitting’ → ‘Fit Content to Frame’. This resizing - refitting procedure needs to be repeated for all pages.
When this is done, we can export the newly resized, OCR-compatible pdf file:
File → Export →Export. (We can turn off printer’s marks, slugs and bleeds).
The resulting pdf file is now A4 size and can be read by OCR software :)