Document Image Analysis

11 sept. 2012

Dr. Umapada Pal works at the Indian Statistical Institute. The founder of the institute is known for the "Mahalanobis distance"

He was working on how to make mobile devices able to recognize scripts within images in order to be able indirectly to determine what language if had been written in. India has about 22 different official languages, often with different scriptures, and in each region one official, English and one local, thus the need for recognizing scripture.

Methods included a "water level" from top, bottom, left and right -method and how the language had more information above or bellow the "wrinting line". His program marked text with different color. This was done on printed letters. Written characters would be much harder. He also talked about reading text from video, but had a frame by frame perspective, without any tracking as thus not that interesting.

He talked a little about using handwriting for determining personality (graphology). This of course is highly controversial and no proof has been presented. It's still seems to be a hot topic in India. Hiring people based on their handwriting... His methods (curves and lines) are still interesting though.

He also showed us some work on reading graphical document logos, with the issue of reading curved lines like where the line goes and separating the characters.