Results 1 to 2 of 2

Thread: Convert scanned book to legible text

  1. #1
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,478
    Thanks
    1,541
    Thanked
    1,029 times in 872 posts

    Convert scanned book to legible text

    Now and again I'll come across some interesting old books/publications on the likes of archive.org which have been scanned to PDF. The text is OCR'd to some extent, as I can select/copy sections of text no problem. However, the text displayed in the PDF itself is pretty rubbish quality which makes it unpleasant to read.

    Does anyone have any suggestions for how to deal with this, e.g. a reader that can replace the scanned text with a proper font, or a tool to convert and do the same? As above, it seems the hard work of actually OCR-ing has already been done by the archivers.

    It had crossed my mind to just copy/paste all of the text, but the books often have in-line images so it's not ideal as they would be missing with a straightforward text rip.

    And yeah I have pursued just buying a modern reprint (actually my first thought before checking archives) but some are long since out of print so the digital copy is about the only way to read it, unless I could find a copy in a library somewhere.

    Thanks for any ideas!!

  2. #2
    Goron goron Kumagoro's Avatar
    Join Date
    Mar 2004
    Posts
    3,150
    Thanks
    38
    Thanked
    170 times in 139 posts

    Re: Convert scanned book to legible text

    This is what I found to be the best way to scan books.

    Scan in on the highest resolution you can. Some scanners will have a document mode which removes the background noise from the other side.

    Use scan tailor to align and make it look like a book. scantailor.org

    I think I may have ended up using one called scan tailor advanced actually which was a fork with more features. There may be other forks now it was more than 6 years ago that I did it last.

    Now the bit you are actually taking about.

    Use acrobat Pro as it has a function to create a font from the images you input. It combines each instance of a letter and averages them out to make the font. The result is pretty good in my opinion as you will get the same layout as the scans. When I'm home I will see if I can find what I was able to produce.
    Last edited by Kumagoro; 06-11-2022 at 04:36 PM.

  3. Received thanks from:

    watercooled (06-11-2022)

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •