Tesseract revisited

September 9, 2008 at 7:43 pm (Uncategorized) (, , , )

I worked on adding Indic script support to the Tesseract OCR engine in my summer vacations. Mr. Sankarshan from redhat was my mentor. I have not touched the project since then. Accuracy was at 86% when i left it. The released files are at http://code.google.com/p/tesseractindic/ and hacker documentation is at http://debayanin.googlepages.com/hackingtesseract .

It was a pleasant surprise to see Tom of Tesseract/Ocropus to have taken note of my work, and add indic script support to Ocropus. He mentions my name and techniques used in two of his project pages (http://sites.google.com/site/ocropus/languages/devanagari-hindi-sanskrit   and   http://sites.google.com/site/ocropus/morphological-operations). He uses morphological operations (which is better than my technique obviously) to segment characters instead. Have not tested his work yet, but will do.

In the mean time i am trying out Fedora 9. Shifted to Fedora from openSUSE. Have used openSUSE for the past one year. I need to create a new Fedora repository for the college. Also, i found kernel hacking somewhat diffcult on openSUSE. They suck at maintaining kernel-debuginfo packages.

And guess what, i sat for a Microsoft interview on campus!! Cleared the written round with 11 others. Was a subjective paper for the first time in their history. Coding questions. Then cleared tech interview round with 7 others. Ultimately the HR round was prety interesting. Looking at Linux/OS written all over my CV, he gave me some tough looks. He abruptly said “We do not work on OS at MSIT”. I said, does not matter. And when asked “Why do you want to join Microsoft?”, I have no idea what i said, because i never imagined anyone would ask me that question. Was not selected in the end. Every year they take 7-8 people from here. This year they came after a lot of nagging from our college’s side, and they shortlisted 1 guy and 2 girls (They always take 50% girls, some kind of company policy) for a final telephonic round which has not yet taken place.

Companies to come in the future are IBM, hp, Oracle and some more. Lets see where I go.


  1. akahs sinha said,

    why u left suse????????????? 😦

