Monday, October 27, 2008

Grouping Text Lines in Freeform Handwritten Notes

Comments

Summary


This paper addresses the problem of grouping handwritten text lines in freeform digital ink notes.

It uses a cost function to determine the grouping of the strokes. There are three likelihood terms and 2 prior terms.

Likelihood of line means the fitted line's direction and the max interstroke distances in it's x and y planes. Configuration consistency means that two strokes are neighbours if the distance between the two is below a threshold and there is not stroke in between them. Model complexity of a partitioning is the number of lines.

Optmization is done using a gradient-descent method. An initial solution is obtained from 'Temporal Grouping' which is based on the fact that most text lines are composed of temporaly adjacent strokes. Then two alternative hypothesis are built whose function is to merge two adjacent strokes and correction high-configuration-energy errors. Then global cost is computed which is essentially the cost change at each hypothesis.

The author uses a recall metric to evaluate the algorithm which is defined by number of correct grouping upon number of labeled groupings. For perfect labeled diagrams the recall metric was 0.93 and for crude diagrams recall metric was 0.87.

Discussion

This paper discusses some nice features for grouping text which can also be used as context features for distinguishing shape vs text.

Perceptually-Supported Image Editing of Text and Graphics

Comments

Summary

This paper describes the ScanScribe system which implements a novel image editing program that emphasize easy selection and manipulation of image materials in the foreground such as sketch on paper.

ScanScribe provides four major advantages. 1) Intuitive model to maintaining image objects and groups 2) It separates fore-ground from light background 3) new interface techniques manipulation of image objects without resorting to pallets 4) provides platfrom for exploiting image recognition and analysis methods.

The user can select each segment in fore-ground image objects and do basic image functions such as move, copy, delete. Group and ungroup them using the flat grouping model, which allows each fragment to be a part of more than one group.

I also separates the background from fore-ground by turning the backgroup pixels to transparent so they doesnot interfere in foregrounnd manipulation of objects. The user is also allowed to input texts from keyboard on the canvas.

Automatic structure recognition is automatic grouping of objects. It uses Gestalt laws of visual perception which includes proximity, smooth continuation, feature similarity, and figural closure. The objects labeled as curvilinear are strokes and objects labeled as blob are text.

Disucssion

A nice tool for image manipulation. Automatic structure recognition was the most interesting part in the paper. Perhaps some more explanation and stats on automatic recognition would have clarified a lot.

Author also did not explained the fragmentation algorithm used by ScanScribe.

Sunday, October 26, 2008

Distributing Text from Graphics in On-line Handwritten Ink

Comments

Summary

This paper discusses the problem of separating text from images. It uses a three sided approach to deal with the problem. First each stroke is analyzed in isolation then some temporal information is used to capture correlation between successive class labels, then it also considers the gap between each stroke.

In the first phase stroke features a extracted through which a probalistic classifier based on feed forward neural network also known as multi-layer perceptron is trained. A total of nine features are used for the training.

In the second phase it uses the HMM to predict the label of the stroke as text or shape. The main idea behind this is the people tend to draw graphical or texual strokes in succession. If the last stroke is a text then there is a good chance that the next stroke is also a text.

In the third phase it uses the saptial information between the strokes to label them as text or shape. The main idea is that the gaps between the two consecutive strokes has different characterstics in text and shapes.

The accuracy for text and graphics achieved by this model was 93.9 and 6.1 % respectively on testing data.

Discussion

Some good techniques and intial work on text-shape separation. In third phase bishop only used the gaps between strokes for spatial information. I think here he could have used some more features as the alignment of strokes or point density after including the adjacent strokes etc. One can think of other spatial characterstics which are different for text and shapes than only the gaps between strokes.

Tuesday, October 21, 2008

Template-based Online Character Recognition

Comments

Summary

This paper discusses the template-based matching technique for the recognition of online characters. The digitized points are taken from a tablet pc and then the stroke points are preprocessed to remove noise by applying smoothing to the stoke points.

The OCR system aligns the input stroke and calculate distance between the two strokes to get a measure of distance. Similar strokes result in lesser distance. The distance calculation is done by a string matching technique.

For classification the author uses two techniques of nearest neighbor and decision trees. The author is able to achieve 86.9 % accuracy in classification accuracy for a 36-class set of alpha-numeric characters.

Discussion

One of the few paper we read on OCR techniques. I like the authors idea of using two different techniques for classification of characters.

Sketch Recognition for Computer Aided Design

Comments

Summary

This paper describes a program which employs a model for the user as a means of inferring user intent as a function of sequence, speed and pressure. The author proposes that this modeling technique can be applied to interactive graphics systems, leading to better interface for the solution of design problems.

The paper mainly talks about the techinques which can be applied for the beautification of the sketch. The system mentioned in this paper requires no user intervention which the author admits can be useful and frustrating at the system as it does not gives the user the amount of control he/she may desire.

Discussion

This paper talks about similar stuff we read in his paper before. Uses speed, pressure and sequence to identify users' intent.

Sunday, October 19, 2008

Ink Features for Diagram Recognition.

Comments

Summary


This paper discusses technique to separate text from images in a free-hand drawn sketch system. Technique used in this paper is to use a classification tree after identifying a set of features which can help in distinguishing between text and shape. The classification tree mentioned in this paper is a binary tree. Each feature is node in the tree and this feature decides to split the input into a text, shape or sub tree.

The most important features are put at the highest node in the classification tree. To determine if the feature is more important or not, a simple test on all features separately in conducted. The feature which most accurately classifies a stroke into shape or text is considered as an important feature and consequently takes a higher position on the classification tree. Some of the proposed features are bounding box, total angle, distance from last stroke, distance to next stroke, speed to next stroke, amount of ink inside, perimeter to area, and time till next stroke.

The system is compared with Microsoft Ink SDK and InkKit. The overall misclassification on shape and text are fairly low as compared to the other techniques for the algorithm presented by the author. The misclassification for shape is 42.1% and for text is 21.4% on testing dataset.

Discussion

This paper addresses an important problem in sketch recognition of separating text vs shape. The reason which I feel that most sketch recognition systems are not widely used because they cannot appropriately distinguish between text and shape.

MathBrush: A Case Study for Pen-based Interactive Mathematics

Comments

Summary

This paper discusses the rationale of pen-math system called the MathBrush. The goal of the MathBrush is to support mathematical tasks which are currently done on CAS. This system uses sketch recognition technologies to allow users to input mathematical expressions using a free form sketching system. The system then recognizes these input sketches and convets them to MathML.

It uses the CAS to provide support for the entry, interaction and display of results from expressions that may be large and complex. To limit the number of commands it also supports the context sensitive menus. Working in different domains in mathematics the menu will only contain the commands used in the particular domain.

It also editing of the subexpressions. The user can just circle a particular sub expression and the expression will be extracted in a floating window for editing.

Discussion

This is a user study and doesnot give explanations on technical details on sketch recognition. It definitely uses domain specific recognition techniques since the sytem is confined to recognition of mathematical symbols.

Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System.

Comments

Summary

An interesting paper which has nothing to do with sketch recognition technology. This paper reports finding from a qualitative study examining the multiplayer gaming practices of the Nintendo DS owners. The primary purpose of this study was to answer the fundamental questions regarding multiplayer gaming with the DS like who people play with, under what circumstances and for what reasons.

The study was done on nine participants who considered themselves experienced gamers and had participated in multiplayer games. The study was also based on three gaming events by local students gaming clubs.

Renegade Gaming: Players created multiplayer gaming sub contexts within larger host contexts (contexts that do not consider gaming a legitimate activity). DS features allow the users to maneuver around obstacles and reduces physical preconditions for multiplayer gaming.
Gaming goals as identified during the study were.
  1. To pass time
  2. To learn or keep one’s mind sharp
  3. To be social
  4. To engage in competitive play
The DS gaming experience when compared to the console gaming experience was found to be less social since the audience cannot watch the game. The players are focused on their screens. DS also allows the players to site farther away from each other so lesser communication is done between players.

The social aspect in the gaming experience of the DS can be improved through the use of external display which can give the “birds-eye-view” of the game in progress.

Discussion

A simple study which gives out a fairly interesting result that console gaming are more social than handheld gaming .

Still I don’t understand the purpose of this study. Is it leading towards the exploration of a new genre of games or consoles?

Recognizing Free-form Hand-sketched Constraint Network Diagrams by Combining Geometry and Context

Comments

Summary


In this paper the author discusses the modeling problem of constraint satisfaction problems and uses LADDER framework to solve the modeling problem. The author develops a hand recognition technology that can recognize hand drawn representations of problems, and automatically generate constraint satisfaction models of them.

The system uses a geometric-based recognizer to allow freeform drawing. Users require no training than would be necessary to sketch on a piece of paper. The system recognizes free-form constraint network diagrams that consist of labeled nodes, labeled links and domain information about the allowable range of the variables.

Nodes are recognized as ellipses that contain variable names inside them. Undirected links are recognized as a line between two notes. Directed links are recognized as arrow that extends between two nodes. A set of rules of geometric properties are defined for variable recognition inside LADDER.

The author uses a perception based threshold to counter noise in the input. Also the author uses a length-based threshold to some what relieve the problems where perception based threshold run into problems.

The system also supports editing in the form of deletion of shapes, movement of shapes, rubber-band of links and scaling of nodes.

Discussion

Yet another implementation using the LADDER framework. This simply demonstrates the capabilities of the framework and possibilities of its use.

The paper does not demonstrate any new or innovative features.

Sketch-based educational games: “Drawing” Kids Away from Traditional Interfaces

Comments

Summary

This paper discusses the use of sketch-based systems for child education and skill development. There are four basic styles in child education and development auditory, visual, tactile, and kinesthetic. Most technology work with auditory and visual styles in education whereas sketch based system address the need of children who learn better from kinesthetic and tactile methods.

Six games are discussed in this paper which helps in the development of different skills.
APPLES game allows user to draw planets and then draw arrows for planetary motion and gravity with a spiral gesture. It familiarizes children with basic concept of physics at an early age. It uses the LADDER framework.

Simon Says Sketch game allows player to draw six sketches and the player must then remember the order in which two shapes changed their color and then mark those shapes with a stroke. The number of shapes that change color increases at each round.
Go (Sketch-a) Fish game the player draws a set predefined shapes and then surround it with rectangular box to make it a memory card. It then hides the card and allows a friend to memory. It also uses the LADDER framework.

Sketch-based Geography Tool is simple geography tool that allows maps to be presented to children for labeling through sketch.
Learn Your Shapes! Game is a tool that allows children to learn their basic shapes through sketching. It utilizes the sketch recognition that can automatically recognize if the child has drawn the right sketch. It also allows the children to draw the shapes in any manner without any constraints.

Sentence Diagramming is a tool for automatically providing feedback about hand drawn sentence drawing. It has also been implemented on the LADDER framework.

Discussion

It’s a fun paper talks about a totally different domain in which sketch recognition can be applied. Too bad the author couldn’t find children to test his games ;). Btw what are IRB constraints?

Monday, October 6, 2008

LADDER : A sketching language for user interface developers

Comments

Yuxiang's blog

Summary

This paper discusses the LADDER system which is a domain-description language for shapes. LADDER has some limitations which are that it only describe shapes with a fixed graphical grammar, the shapes must be composed of the primitive constraints, it can only describe domains that have few curves and shapes which have a lot of regularity and not too much details.

The main features in shape definitions are components, constraints, aliases, editing behavior and display properties. The shapes can also be defined in Hierarchical shape definitions. It also allows the definition of abstract shapes and shape groups.

The language also has some primitive shapes such as Point, Path, Line, Bezier curve, Curve, Arc, Ellipse and spiral. Predefined shapes derived from primitive include Rectangle, Diamond etc. The language also has an abstract 'Shape' from which all the shapes are derived to have some common properties.

The language also has some predefined constraints such as perpendicular, parallel, collinear, same side etc. There are also some predefined editing behaviors eg dragInside. The user can also define more editing behaviors, action and triggers.

Recognition of primitive shapes is done in a bottom-up approach and domain shapes is done by a Jess-rule based system. Editing recognition is done by first identifying if the editing gesture is defined for the shape, and if the mouse is over the shape and the particular editing gesture is made the drawing gesture is short circuited and the editing gesture takes over.

LADDER has been used to for a variety of domains including UML class diagrams, mechanical engineering diagrams, finite state machines, flowcharts and a simplified version of the course of action diagrams.

Discussion

LADDER is an innovative system which depends on a powerful low level recognizer. The ability to define shapes with a language is really the highlight of the system. This also allows the definition of domains comprised of a particular set of shapes. I would really like to see the LADDER system to have the ability to define more complex shapes which a richer grammar.

Ambiguous intentions: a Paper-like Interface for creative design

Comments

Daniel's bog

Summary

In this paper the author discusses a pen-based interface that acquires information about ambiguity and precision from freehand input, represents it internally, and echoes it to users visually and through constraint based edit behavior.

In this paper the author speaks about sketch interface Electronic Cocktail Napkin and its support for abstraction, ambiguity and imprecision. Abstraction is letting symbol takes the place of a more detailed configuration of parts, enabling a designer to work with components without specifying their internal structure. Ambiguity is to postpone the commitment yet retain a marker for a later decision. Imprecision permits postponing decisions about exact dimensions and positions.

Napkin allows users to draw as they would do on a paper e.g. harder pressures cause the brush instrument to display thicker line etc. Napkin also tries to recognize glyphs the user draws, and it may echo this recognition by displaying the name of the glyph. Napkin also allows configuration definition and recognition e.g. it allows the user to draw a configuration of dining table and chairs and mark this configuration with a letter ‘D’ inside the box. The configuration of dining room is abstracted with ‘D’ inside the box. Napkin’s configuration works as daemons and would try to recognize the configuration if there is a pause of 5 seconds in user drawing. Napkin also represents imprecision with internal constraint representation. Napkin identifies spatial relation among drawing elements and asserts them as constraints on the drawing. It also allows the user to modify, add, and delete the constraints identified by Napkin.

For the implementation of Napkin there is a low level recognizer which helps in the recognition of glyphs. It does it by maintaining a 3x3 grid and keeping Hash map for the pen-path. It also assigns each input glyph with confidence values. The user is also allowed to enter new glyphs on the fly by drawing them and assigning it a name. Recognition of configuration is done through recognizer functions which look over all pairs of appropriately typed elements in the diagram of these patterns. New configurations can also be added by drawing a configuration and napkin automatically extract constraints from the diagram (the user can also edit these configurations).

The system was tested out by undergraduate students, architects and design students. Most designers have understood and appreciated the need for end-user training of symbols and contextual definition of configurations.

Discussion

Napkin is an exciting system which does a lot high-level contextual recognition and definition by I think it's low level recognizer is not very strong. I seems to work with simple and small gestures, but the idea of resolving ambiguity through contextual information is really interesting.

Saturday, October 4, 2008

What no Rubine Features? Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Comments

Daniel's blog

Summary

In this paper the author tries to merge the two techniques of Geometric-based sketch recognition and Gesture-based sketch recognition. Both the techniques have their advantages and disadvantages and the author's focus is to get the best out of both the techniques.

For this the author modified the Rubine algorithm to use a quadratic classifier instead of a linear classifier. He integrated all the 31 features of the PaleoSketch and also 13 features from the Rubine algorithm. This called the full feature set. With the full feature the accuracy rate was not comparable to the best accuracy available from the PaleoSketch.

The quadratic classifier can be optimized by removing the features which had the negative effect on recognition. For this author tried to pick out a subset of features which could give the best result. The technique used was the sequential forward selection (SFS). In which the author tried all the possible combination starting with a feature set of 1 feature. The author achieved better accuracies with the optimal feature set.

Interesting to note was that a 93% accuracy was achieved with only top six features. Only one of the Rubine feature was which is total rotation was included in the top six features.

Discussion

The author uses a different approach by merging the feature sets of the two different recognizers to produce a feature set which is most significant in the recognition of the sketch. The accuracy achieved is nearly equal to the hand tuned algorithm PaleoSketch. As it uses a classifier more prmitives can be easily added to the system and easily trained for the recognition of the new primitive.

The only problem that i see is that its application is only tested on a very limited set of symbols and there is high probability that its result wont be as good for complex shapes.

Backpropagation Applied to Handwritten Zip Code

Comments

Andrew's blog

Summary

This paper discusses the use of neural networks in sketch recognition. In this paper the author tackles the problem of recognizing zip codes written on letters which will help in the sorting of the letters. The zip codes are written with pens on the letter paper. The system first scans the digits and then does some pre-processing on the image to retrieve the zip code in digital format.

The digits are then separated from each other and then each digits is fed into the the neural networks to be recognized. The system uses a three tier neural network layers name H1, H2 and H3. In the first layer the image is divided into pieces of 5x5 pixels and the features are extracted from the sub-parts and fed to next layer. H1 consists of 12x64 hidden units, H2 consists of 12x64 hidden units and H3 consists of 30 hidden units. The output can be mapped to 10 units which are the digits in the number series.

According to the author the system misclassified patterns was 0.14% on training set and 5% on testing set.

Discussion

This paper discusses an alternate research area of sketch recognition. Neural networks can also be effectively used in the process of classifying the input sketch to an output sketch. I think the paper lacked detailed information on the what features the author used to train the neural networks.

Also his technique was tested on a very small domain in which there is not high variation in the input data.

Wednesday, October 1, 2008

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Information Sketches

Comments

Daniel's blog

Summary

This thesis paper discusses the recognition algorithm based around the idea of identifying shapes according to the visual parts that they are mode of. The classification of a shape at the level of pixels is difficult because of the wide range of signal and conceptual variation. Here the author tries to represent the shape as a collection of "parts" where each part represents the appearance of a portion of the shape.

The author here introduces the concept of a 'Bulls Eye' in which the input stroke is analyzed inside a circular region divided into wedges like a dartboard at the point of interest. The algorithm basically monitors the number of ink points inside the wedges. The rings are not evenly spaced; the two outer rings are further apart than the inner rings. The rings are spaced so that the inner bins are smaller to represent more detailed information and the outer bins contain more general information. The radius of the 'Bulls Eye' is chosen so as to span the majority of the input shape.

The author here also uses the direction information of the stroke to label each point with the direction, this will help in recognition of the sketches which are rotationally in variant. The stroke is preprocessed by first scaling the stroke so that the maximum distance between the two points in the shape is 75 pixels. The points are then resampled so that each point is equally spaced.

For the comparison of bullseyes of two stroke the algorithm mus first train itself to form a standard vocabulary of part, called the codebook. It does this from a set of training examples and parts are clusetered into a group of parts that represent patches with similar appearances. The algorithm takes the input stroke and determines the bulleyes for the input stroke. It then compares it with entries in the codebook using a distance function. The input part that result in the smallest distance value are more likely to belong to the class represented by the codebook. A match vector is then constructed which represent how each part of the input stroke match with the standard part in the code book.

Input stroke classification is done using a one-vs-one strategy. For one input shape, the result from each classifies is counted as a vote for the class it predicted. The final decision is made based on which class received the most votes. The author uses the technique of shape localization, in which the shape found in the context of a complete sketch. The basic strategy is to run the isolated shape classifier on a large number of regions in the sketch and then combine the information to form a final set of predictions. The steps involved in the process are selecting candidate regions, classifying the candidate regions and combining the classified regions to form the final predictions.

The system is divided into two parts; isolated shape recognizer and the full sketch processor. The local classifier reached the recognition rate of 89.5% for the analog circuit diagrams while image based classified on Zernike moments only achieved 76.5% accuracy. For power point style shapes it author's algorithm reached a recognition rate of 94.4% while Zernike's recognition rate was 96.7%.

Discussion

The main highlight of this paper are bulls eye features which are innovative and takes inspiration from image based recognizer. It address the problem of sketch recognition in great detail from isolated shape classification to shape localization in the context of the full sketch.

It beats the image based recognizers for the shapes which had high variations in sketch and was also equally comparable to image recognizers for more invariant power point shapes.

Constellation Models for Sketch Recognition

Comments

Summary

In this paper the author develops a constellation model for the recognition of strokes in sketches of particular classes of objects. In this model the author captures the structure of the object based on shape, size and pairwise features. The recognition algorithm determines a maximum-likelihood labeling for an unlabelled sketch by searching through the space of possible label assignments using a multi-pass branch and bound algorithm.

The constellation model consists of features of individual as well as features of pairs of parts. Individual features capture shape and global position of parts and pairwise features capture relative positions of parts. Some features are labeled as mandatory and some are labeled as optional. Individual features are calculated for all features whereas pairwise features are only calculated for the pair which has at least one mandatory feature.

An object class model is represented by a probability distribution over features in object constellation models. This function is learned from a set of example labeled sketches. To support recognition with a minimum number of examples it uses a diagonal covariance matrix.
Labeling Likelihood is the quality of particular matching between labels and strokes is scored using a cost function. Maximum likelihood search procedure finds the most plausible labeling for all strokes that appear in the image.

The algorithm was tested on 5 classes of objects with 7-15 labels and drawings that had 3-200 strokes. Most of the time spent in for the running of the algorithm was in the initialization function which calculated all the features of the stroke. Images with higher number strokes consumed most time so the author developed a technique to reject spurious strokes using a threshold. The recognition went wrong for several cases such as inability to find suitable mandatory features, mislabeling of mandatory strokes and mislabeling of optional strokes.


Discussion

In this paper the author discusses the technique of sketch recognition for complex sketches based on individual features and spatial features. Its application is very limited on objects with fewer features. This algorithm can recognize a stroke with the database of strokes but It cannot help in beautification of strokes.