Comments
Summary
This paper discusses the problem of separating text from images. It uses a three sided approach to deal with the problem. First each stroke is analyzed in isolation then some temporal information is used to capture correlation between successive class labels, then it also considers the gap between each stroke.
In the first phase stroke features a extracted through which a probalistic classifier based on feed forward neural network also known as multi-layer perceptron is trained. A total of nine features are used for the training.
In the second phase it uses the HMM to predict the label of the stroke as text or shape. The main idea behind this is the people tend to draw graphical or texual strokes in succession. If the last stroke is a text then there is a good chance that the next stroke is also a text.
In the third phase it uses the saptial information between the strokes to label them as text or shape. The main idea is that the gaps between the two consecutive strokes has different characterstics in text and shapes.
The accuracy for text and graphics achieved by this model was 93.9 and 6.1 % respectively on testing data.
Discussion
Some good techniques and intial work on text-shape separation. In third phase bishop only used the gaps between strokes for spatial information. I think here he could have used some more features as the alignment of strokes or point density after including the adjacent strokes etc. One can think of other spatial characterstics which are different for text and shapes than only the gaps between strokes.
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment