I am wondering if anyone is familiar with the Tesseract-OCR. I'd like to train it, but find the instructions here, a bit hard to digest.
I am not trying to do anything complicated. I only need to recognise single line of numbers. As it is, it can sometime manage it perfectly already, yet sometime it fails completely.
Here is an example:
Running with the digits parameter, Tesseract can cope with it perfectly.
Yet over here:
It completely fails. It isn't a case that it gets it wrong, it is unable to output anything at all.
In making the box file, language is irrelevant (I am dealing with digits only) and I don't know what font it is. Another difficulty is that if I work with the black borders, it is common for the numbers to touch each other, making it hard to establish the exact box. Can I just draw an approximation? Really, I think that only needs the white areas.