Urdu OCR!
I’ve been secretly working on a way to perform efficient and accurate Urdu OCR!
I started working with binary images using C++ but soon moved to Matlab. After in-depth analysis, I realized a character-level OCR would be harder and would be less accurate. The main idea behind designing such a system was to textize all the Urdu images that are floating around on the web and maybe scanned book content in the future. The system I’m developing requires training before it can perform the OCR. I’m sure several products already exist which can do similar task once trained for specialized glyphs but its so much fun to do something from the scratch!
Here is some sample text in image form I grabbed off a website:

The number mapping on each frame correponds to the ID of a successful match in the library:

I have a lot of ideas to automate the library expansion process but time is an enemy on this one. There is a huge amount of detail which I’m not posting on the blog at this time.
Munir said,
August 13, 2006 @ 11:06 pm
I might decide to release a Urdu OCR software product.
ahm ahem : )
(y)
Whiz Kid said,
August 13, 2006 @ 11:23 pm
Whaa?
Munir said,
August 14, 2006 @ 12:55 am
Worst use of edit feature!
B said,
August 17, 2006 @ 4:42 am
You’re now officially part of the girly girl network.