Tesseract-OCR

**TooNice** · 04-09-2015, 09:08 AM

I am wondering if anyone is familiar with the Tesseract-OCR. I'd like to train it, but find the instructions here, a bit hard to digest.

I am not trying to do anything complicated. I only need to recognise single line of numbers. As it is, it can sometime manage it perfectly already, yet sometime it fails completely.

Here is an example:

Running with the digits parameter, Tesseract can cope with it perfectly.

Yet over here:

It completely fails. It isn't a case that it gets it wrong, it is unable to output anything at all.

In making the box file, language is irrelevant (I am dealing with digits only) and I don't know what font it is. Another difficulty is that if I work with the black borders, it is common for the numbers to touch each other, making it hard to establish the exact box. Can I just draw an approximation? Really, I think that only needs the white areas.

**TheAnimus** · 04-09-2015, 10:10 AM

http://www.whatfontis.com/ suggested its "Futura Bold"

You might be able to make it much simpler by just masking only the white? Solves the touching of the borders!

**TooNice** · 05-09-2015, 05:09 AM

Well that is odd. Upon more testing, it turns out that if I crop the box so that there is less of the orange background, it works perfectly (so far). Notice that the first image has more space especially to the right. This was done on purpose to accommodate more digits if necessary, but it throws off the OCR engine it seems.

Thread: Tesseract-OCR

LinkBack

Thread Tools

Tesseract-OCR

Re: Tesseract-OCR

Re: Tesseract-OCR

Thread Information

Users Browsing this Thread

Posting Permissions