Wednesday, October 1, 2008

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Information Sketches

Comments

Daniel's blog

Summary

This thesis paper discusses the recognition algorithm based around the idea of identifying shapes according to the visual parts that they are mode of. The classification of a shape at the level of pixels is difficult because of the wide range of signal and conceptual variation. Here the author tries to represent the shape as a collection of "parts" where each part represents the appearance of a portion of the shape.

The author here introduces the concept of a 'Bulls Eye' in which the input stroke is analyzed inside a circular region divided into wedges like a dartboard at the point of interest. The algorithm basically monitors the number of ink points inside the wedges. The rings are not evenly spaced; the two outer rings are further apart than the inner rings. The rings are spaced so that the inner bins are smaller to represent more detailed information and the outer bins contain more general information. The radius of the 'Bulls Eye' is chosen so as to span the majority of the input shape.

The author here also uses the direction information of the stroke to label each point with the direction, this will help in recognition of the sketches which are rotationally in variant. The stroke is preprocessed by first scaling the stroke so that the maximum distance between the two points in the shape is 75 pixels. The points are then resampled so that each point is equally spaced.

For the comparison of bullseyes of two stroke the algorithm mus first train itself to form a standard vocabulary of part, called the codebook. It does this from a set of training examples and parts are clusetered into a group of parts that represent patches with similar appearances. The algorithm takes the input stroke and determines the bulleyes for the input stroke. It then compares it with entries in the codebook using a distance function. The input part that result in the smallest distance value are more likely to belong to the class represented by the codebook. A match vector is then constructed which represent how each part of the input stroke match with the standard part in the code book.

Input stroke classification is done using a one-vs-one strategy. For one input shape, the result from each classifies is counted as a vote for the class it predicted. The final decision is made based on which class received the most votes. The author uses the technique of shape localization, in which the shape found in the context of a complete sketch. The basic strategy is to run the isolated shape classifier on a large number of regions in the sketch and then combine the information to form a final set of predictions. The steps involved in the process are selecting candidate regions, classifying the candidate regions and combining the classified regions to form the final predictions.

The system is divided into two parts; isolated shape recognizer and the full sketch processor. The local classifier reached the recognition rate of 89.5% for the analog circuit diagrams while image based classified on Zernike moments only achieved 76.5% accuracy. For power point style shapes it author's algorithm reached a recognition rate of 94.4% while Zernike's recognition rate was 96.7%.

Discussion

The main highlight of this paper are bulls eye features which are innovative and takes inspiration from image based recognizer. It address the problem of sketch recognition in great detail from isolated shape classification to shape localization in the context of the full sketch.

It beats the image based recognizers for the shapes which had high variations in sketch and was also equally comparable to image recognizers for more invariant power point shapes.

No comments: