Tl;dr: the website Free Online OCR does a fantastic job extracting text from screenshots.

I really really dislike the service called Issuu. Many newspapers use it to publish online. I find it hard to read: zooming and scrolling are generally obnoxious. What’s more, some publications make it impossible to copy text.

That problem struck today when I wanted to quote a couple of paragraphs in an item from the Ōtaki Mail for our local Waikawa Beach Ratepayers Association blog where a member had referred to a particular article. I ended up making a screenshot of the approx 200 words and apologising for not offering a text version. Totally inaccessible!

Then I went searching and turned up the website Free Online OCR which claims to get text out of images. What the heck, it was worth a try! Last time I tried OCR was more than a decade ago and the results were execrable.

Well, I was flabbergasted: the website rendered the text perfectly! It lost paragraph breaks and somehow an em dash went missing, but the text itself was perfect.

I noticed another article in the same paper was relevant to a discussion I’d been having with a neighbour the other day, so I tried the approach again. Make a screenshot, OCR the screenshot. This other 700+ word article was also rendered perfectly, a missed macron notwithstanding.

I’m so impressed!

Ōtaki Mail article in Issuu format.
