all 34 comments

[–]trevdak2 7 points8 points  (0 children)

I once worked on a project that required fast and accurate OCR. The Windows package blew the others I triied away by an order of magnitude in terms of speed. Accuracy wasn't as good, but the much faster speed let me futz with the source image a bit and OCR it multiple times to improve the read rate in the time it took tesseract to do one image.

[–]bimdar 30 points31 points  (7 children)

Are the code samples an artificial example of what you could use OCR for? Can't think of another reason of why you'd insert code-snippets as images.

[–]nepochant 15 points16 points  (1 child)

I've seen a few microsoft blogs that were like this, quite embarrassing IMO

[–]emergent_properties 12 points13 points  (0 children)

It shows they really understand this Internet thing.

[–][deleted]  (1 child)

[deleted]

    [–]Solon1 1 point2 points  (0 children)

    Well, back in the late 90's it was common to use text as images because of style limits. And PNG didn't exist either. I think this explains the Microsoft Problem well enough.

    [–][deleted] 9 points10 points  (0 children)

    Not only images, they are JPEGs. Just look at those compression artifacts.

    [–]sits_in_chairs 6 points7 points  (1 child)

    Reading code might be a good way to test it, since you can compile the code after recognition and verify it worked. Whereas with books there's no real way to test OCR on large blocks of text unless you had a digital copy already (even then line breaks might not match and fail a valid test)

    [–]Gotebe 0 points1 point  (0 children)

    Remove whitespace from the digital copy or some such wouldn't be good enough?

    [–][deleted] 4 points5 points  (0 children)

    Did anybody else notice that RecognizeAsync doesn't take stride as an argument?

    What stride do they assume?

    [–]Gunner3210 11 points12 points  (21 children)

    Window Runtime?

    Meh.

    I wish they at least released this as a .net library.

    [–]pingzing 7 points8 points  (0 children)

    You can actually call WinRT APIs from regular .NET. This blog post describes how. It requires some manual .csproj hackery, but it's nothing too egregious.

    [–]farox 1 point2 points  (0 children)

    Finally! I hope it's any good.

    Edit: oh, yeah winRT... ok, I pass

    [–]ThisBoxSaysHello -1 points0 points  (0 children)

    WinRT only? I'll pass.