Sketch It!: 2008

Tuesday, December 2, 2008

Effect of Fidelity in Diagram Presentation

Comments

Daniel's Blog

Summary

Ahh! the last paper of our sketch class. Has nothing to do with sketch recognition but only such paper which compares the feedback and usability of sketch systems with paper & pencil inputs. A good motivation from the evaluation study for researchers in the field of sketch recognition comes from this paper.

It basically presents a sketch diagram of a web-page layout from low-fidelity to high-fidelity after small portions of beautification in the diagram. The low-fidelity is the paper drawn sketch and high-fidelity is the final beautified version of the sketch.

In the user study the author selects a small group of students each is shown all the sketches from low-fidelity to high-fidelity. The results show that the changes asked by the users to be done with the sketch decreases as they move from low-fidelity to high-fidelity. An overall people preferred working with the high-fidelity sketch than paper and pencil sketch. Another interesting result is that people preferred pencil and paper sketch over the low-fidelity tablet version.

Discussion

A nice user study for sketch systems but some there are some specific cases the author has looked upon. The sketch is very domain specific to web-page layouts and user opinions might differ for other domains. The user study is also conducted on a very small group of people. The beautification stages can be modified by modifying one aspect of the sketch while keeping other constants, instead of the author's approach which slowly beautifies each aspect at each stage.

SHADY: A Shape Description Debugger for Use in Sketch Recognition

Comments

Manoj's Blog

Summary

This paper simply builds upon the effective use of the LADDER system. In this paper the author describes the addition of a debugging capability of LADDER domain description using the idea of near-miss from the previous paper. The system is called SHADY

Many times it happens when defining the domain constraints of a shape and the shape is not properly recognized by the LADDER system. The developer is not provided of any feedback of what went wrong with the domain descriptions that LADDER was not able to recognize the shape.

In this modification the developer can define constraints and the draw a shape on the LADDER system. If the shape is not recognized the systems exactly shows which constraint is not met by the input sketch and give the editor to the user to fix the error. There can be number of constraints that are not met by the input sketch. If SHADY shows all the results this could become overwhelming for the developer. Instead SHADY tries to choose a smaller subset from the result which can closely match to the user intentions.

SHADY can also generate the constraints from the drawn shape and also generate the shapes which satisfy the constraints provide by the user.

Discussion

A fairly nice improvement which makes it easy for the user to define the constraints in the domain through proactive feedback from shady. This also helps the developer understand the underlying logic of LADDER as he engages into the debugging of constraints.

Multimodal Collaborative Handwriting training for Visually-Impaired People

Comments

Manoj's blog

Summary

In this paper the author presents a Multimodal Haptic feedback system for visually impaired people to help them learn writing and drawing shapes. The system is called "McSig". In this paper the author discusses the general un-awarness of people that blind people find it really hard to draw and write because of lack of feedback.

In this paper the author creates a system which enables teachers to make the students learn how to draw and write through a sketch-based system. In this setting the teacher draws a shape on a tablet pc and then the student can feel explore and move around the shape on a device called PHANTOM which echoes the shape the teacher had drawn. PHANTOM is an omni force-feedback device.

An evaluation study was conducted on two types of visually impaired subjects. Once which are completely blind and the other who are partially visually impaired. There was a significant improvement in the accuracy of the shapes drawn by both type of subjects. The partially impaired subjects were able to learn the system very quickly and after a very short time they were able to draw shapes accurately. The completely blind subjects took more time in learning the system but once they got used to the devices they also showed improvement in their sketching.

Discussion

A different domain of the usability of sketch systems through haptic feeback. There is lot of discussion on the effective feedback of sketch systems and this paper simply opens a new dimension of providing richer feedback.

Tuesday, November 25, 2008

Sketch Recongition User Interfaces: Guidelines for Design and Development

Comments

Akshay's Blog

Summary

This system in general is user study of a sketch system which mainly focuses on the interfaces of such systems. The system used in this study is a free-hand drawing sketch system which can recognize the shapes drawn by the user and immediately they are translated into power point objects, when the user switches the screens.

After the evaluation of the system the author describes a set of rules which according to the author should be considered when designing a sketch recognition system. Points mentioned in short are.

display recognition results only when the author is done sketching.
provide obvious indications to distinguish free sketching from recognition.
restrict recognition to single domain until automatic domain detection becomes feasible.
incorporate pen-based editing.
sketching and edition should use distinct pen motion.
SkRUIs require large buttons.
the pen must always respond in real-time.

The author at the end also makes the assumption that iterative design techniques paper-prototyping, heuristic evaluation, low-fidelity prototypes or wizard of oz techniques are not possible when designing such applications.

Discussion

This paper is thinking on different line of making the sketch system more usable. It define a nice framework for the usability of these sketch systems but some more work is definitely required refining the framework.

Fluid Sketches: Continous Recognition and Morphing of Simple Hand-Drawn Shapes

Commnets

Akshay's Blog

Summary

This paper mainly talks about the beautification technique which is different from the techniques which have been used by the other sketch recognition systems. In this systems as the user draws a sketch on the system the points of the systems are moved from the original location to the location of the recognized shapes.

Currently the system only supports only two types of shape which are mainly circle and rectangle. The important here in this paper is that the user is getting an immediate feedback by the sketch of the shape it's being recognized into. This interpretation of the system is not final and can change during the course of drawing.

A user study was conducted with eleven subjects and qualitative data was gathered for the system. In general the system was appreciated by the users and subjects could easily pick the concepts of the fluid sketches.

Discussion

This system is very basic, which recognizes only two shapes. It can act as a proof of a concept but it's possibility of becoming a more general system for sketch recognition is still a question, since the system will have to work around many hurdles for full fledged sketch system which can recognize many shapes.

Wednesday, November 12, 2008

What Are Intelligence? And Why?

Comments

Daniel's blog

Summary

The author takes an interesting route to define the evolution of intelligence of human beings as different from animals. Based on the evolutionary evidences the rise of the human intelligence can be defined by various theories.

the primal tool maker: says that human intelligence was evolved from making effective tools. the killer frisbee theory suggests that human made frisbee to kill animals. There are various other theories. the killer climate, the primal frugivor, the primal psychologist, the primal linguist.

Then the author discusses the intelligence in other living beings. The mathematical ability of the horse clever hans which was for quite some time misunderstood as intelligence was later found out to be the ability of horse to pick up answers from the audience. The birds and bees does show some sort of intelligence, like the bees telling the direction of the flower field and parrot recognizing human speech and ability to communicate. Monkeys have also shown some of intelligence.

The evolution of the human intelligence has a more bilogical perspective attached to it and its spans over a million years.

Discussion

I loved this paper!. Human brain and intelligence has always intrigued me. The ability of the humans to perform various abstract calculations in no time is remarkable.

I agree with the idea of the author here that AI as a discipline should have more common in biology than mathematics and physics.

Monday, November 10, 2008

Magic Paper: Sketch-Understanding Research

Comments

Manoj's Blog

Summary

In this paper the author describes the basic history of sketch recognition. It's correlation and deviance with the problem of speech recognition. The author here his work in sketch recognition as towards the creation of the 'magic paper' which is as intuitive to use as natural paper and yet it's able to understand what the user has drawn to give him a feel of 'intelligent paper'.

The author thinks that the problem is important even in front of some very good modeling tools because according to a study in cognitive science whose result is that people tend to be more creative and innovative when they are working with more natural utensils than a CAD system.

Author here discusses the difficulties in sketch recognition which are unique to the field or which are common among other fields such as speech recognition.

The author then tries to define a framework in understanding sketch which a popular LADDER system uses quite effectively.

Discussion

A light-weight discussion in the field of sketch recognition. It more sets a tone for research in the field of sketch recognition than any technicalities or innovations.

Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples

Comments

Akshay's Blog

Summary

This paper talks about an improvement in the LADDER system which is not related to the recognition of sketch but an improvement in the usability of the LADDER system. LADDER required users to provide shape descriptions for the sketch recognition. With a large vocabulary of constraints and shapes it becomes a difficult task for the user to write those description accurately.

When defining constraints either manually be developer or generated by the system. They can become over or under constrained resulting in false positives or false negatives. Here the author uses a near-miss strategy to help correct over and under constrained descriptions. For over constrained descriptions it removes a constraint and generates a shape which takes advantage of the removed constraint. The user than provides a feedback whether this shape is acceptable or not. Similarly for under-constrained descriptions it adds a constraint and generated a shape demonstrates the effect of the modification.

Discussion

A very useful feature added to the LADDER framework makes it more easier for the user to write the descriptions by visually seeing the possibilities and effect of constraints rather than to extensively think for each description and possibilities.

I think the near-miss strategy works by adding/removing one constraint. What happens if the description is under/over constrained by more than on constraint? Is the process iterative?

Monday, October 27, 2008

Grouping Text Lines in Freeform Handwritten Notes

Comments

Summary

This paper addresses the problem of grouping handwritten text lines in freeform digital ink notes.

It uses a cost function to determine the grouping of the strokes. There are three likelihood terms and 2 prior terms.

Likelihood of line means the fitted line's direction and the max interstroke distances in it's x and y planes. Configuration consistency means that two strokes are neighbours if the distance between the two is below a threshold and there is not stroke in between them. Model complexity of a partitioning is the number of lines.

Optmization is done using a gradient-descent method. An initial solution is obtained from 'Temporal Grouping' which is based on the fact that most text lines are composed of temporaly adjacent strokes. Then two alternative hypothesis are built whose function is to merge two adjacent strokes and correction high-configuration-energy errors. Then global cost is computed which is essentially the cost change at each hypothesis.

The author uses a recall metric to evaluate the algorithm which is defined by number of correct grouping upon number of labeled groupings. For perfect labeled diagrams the recall metric was 0.93 and for crude diagrams recall metric was 0.87.

Discussion

This paper discusses some nice features for grouping text which can also be used as context features for distinguishing shape vs text.

Perceptually-Supported Image Editing of Text and Graphics

Comments

Summary

This paper describes the ScanScribe system which implements a novel image editing program that emphasize easy selection and manipulation of image materials in the foreground such as sketch on paper.

ScanScribe provides four major advantages. 1) Intuitive model to maintaining image objects and groups 2) It separates fore-ground from light background 3) new interface techniques manipulation of image objects without resorting to pallets 4) provides platfrom for exploiting image recognition and analysis methods.

The user can select each segment in fore-ground image objects and do basic image functions such as move, copy, delete. Group and ungroup them using the flat grouping model, which allows each fragment to be a part of more than one group.

I also separates the background from fore-ground by turning the backgroup pixels to transparent so they doesnot interfere in foregrounnd manipulation of objects. The user is also allowed to input texts from keyboard on the canvas.

Automatic structure recognition is automatic grouping of objects. It uses Gestalt laws of visual perception which includes proximity, smooth continuation, feature similarity, and figural closure. The objects labeled as curvilinear are strokes and objects labeled as blob are text.

Disucssion

A nice tool for image manipulation. Automatic structure recognition was the most interesting part in the paper. Perhaps some more explanation and stats on automatic recognition would have clarified a lot.

Author also did not explained the fragmentation algorithm used by ScanScribe.

Sunday, October 26, 2008

Distributing Text from Graphics in On-line Handwritten Ink

Comments

Summary

This paper discusses the problem of separating text from images. It uses a three sided approach to deal with the problem. First each stroke is analyzed in isolation then some temporal information is used to capture correlation between successive class labels, then it also considers the gap between each stroke.

In the first phase stroke features a extracted through which a probalistic classifier based on feed forward neural network also known as multi-layer perceptron is trained. A total of nine features are used for the training.

In the second phase it uses the HMM to predict the label of the stroke as text or shape. The main idea behind this is the people tend to draw graphical or texual strokes in succession. If the last stroke is a text then there is a good chance that the next stroke is also a text.

In the third phase it uses the saptial information between the strokes to label them as text or shape. The main idea is that the gaps between the two consecutive strokes has different characterstics in text and shapes.

The accuracy for text and graphics achieved by this model was 93.9 and 6.1 % respectively on testing data.

Discussion

Some good techniques and intial work on text-shape separation. In third phase bishop only used the gaps between strokes for spatial information. I think here he could have used some more features as the alignment of strokes or point density after including the adjacent strokes etc. One can think of other spatial characterstics which are different for text and shapes than only the gaps between strokes.

Tuesday, October 21, 2008

Template-based Online Character Recognition

Comments

Summary

This paper discusses the template-based matching technique for the recognition of online characters. The digitized points are taken from a tablet pc and then the stroke points are preprocessed to remove noise by applying smoothing to the stoke points.

The OCR system aligns the input stroke and calculate distance between the two strokes to get a measure of distance. Similar strokes result in lesser distance. The distance calculation is done by a string matching technique.

For classification the author uses two techniques of nearest neighbor and decision trees. The author is able to achieve 86.9 % accuracy in classification accuracy for a 36-class set of alpha-numeric characters.

Discussion

One of the few paper we read on OCR techniques. I like the authors idea of using two different techniques for classification of characters.

Sketch Recognition for Computer Aided Design

Comments

Summary

This paper describes a program which employs a model for the user as a means of inferring user intent as a function of sequence, speed and pressure. The author proposes that this modeling technique can be applied to interactive graphics systems, leading to better interface for the solution of design problems.

The paper mainly talks about the techinques which can be applied for the beautification of the sketch. The system mentioned in this paper requires no user intervention which the author admits can be useful and frustrating at the system as it does not gives the user the amount of control he/she may desire.

Discussion

This paper talks about similar stuff we read in his paper before. Uses speed, pressure and sequence to identify users' intent.

Sunday, October 19, 2008

Ink Features for Diagram Recognition.

Comments

Summary

This paper discusses technique to separate text from images in a free-hand drawn sketch system. Technique used in this paper is to use a classification tree after identifying a set of features which can help in distinguishing between text and shape. The classification tree mentioned in this paper is a binary tree. Each feature is node in the tree and this feature decides to split the input into a text, shape or sub tree.

The most important features are put at the highest node in the classification tree. To determine if the feature is more important or not, a simple test on all features separately in conducted. The feature which most accurately classifies a stroke into shape or text is considered as an important feature and consequently takes a higher position on the classification tree. Some of the proposed features are bounding box, total angle, distance from last stroke, distance to next stroke, speed to next stroke, amount of ink inside, perimeter to area, and time till next stroke.

The system is compared with Microsoft Ink SDK and InkKit. The overall misclassification on shape and text are fairly low as compared to the other techniques for the algorithm presented by the author. The misclassification for shape is 42.1% and for text is 21.4% on testing dataset.

Discussion

This paper addresses an important problem in sketch recognition of separating text vs shape. The reason which I feel that most sketch recognition systems are not widely used because they cannot appropriately distinguish between text and shape.

MathBrush: A Case Study for Pen-based Interactive Mathematics

Comments

Summary

This paper discusses the rationale of pen-math system called the MathBrush. The goal of the MathBrush is to support mathematical tasks which are currently done on CAS. This system uses sketch recognition technologies to allow users to input mathematical expressions using a free form sketching system. The system then recognizes these input sketches and convets them to MathML.

It uses the CAS to provide support for the entry, interaction and display of results from expressions that may be large and complex. To limit the number of commands it also supports the context sensitive menus. Working in different domains in mathematics the menu will only contain the commands used in the particular domain.

It also editing of the subexpressions. The user can just circle a particular sub expression and the expression will be extracted in a floating window for editing.

Discussion

This is a user study and doesnot give explanations on technical details on sketch recognition. It definitely uses domain specific recognition techniques since the sytem is confined to recognition of mathematical symbols.

Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System.

Comments

Summary

An interesting paper which has nothing to do with sketch recognition technology. This paper reports finding from a qualitative study examining the multiplayer gaming practices of the Nintendo DS owners. The primary purpose of this study was to answer the fundamental questions regarding multiplayer gaming with the DS like who people play with, under what circumstances and for what reasons.

The study was done on nine participants who considered themselves experienced gamers and had participated in multiplayer games. The study was also based on three gaming events by local students gaming clubs.

Renegade Gaming: Players created multiplayer gaming sub contexts within larger host contexts (contexts that do not consider gaming a legitimate activity). DS features allow the users to maneuver around obstacles and reduces physical preconditions for multiplayer gaming.
Gaming goals as identified during the study were.

To pass time
To learn or keep one’s mind sharp
To be social
To engage in competitive play

The DS gaming experience when compared to the console gaming experience was found to be less social since the audience cannot watch the game. The players are focused on their screens. DS also allows the players to site farther away from each other so lesser communication is done between players.

The social aspect in the gaming experience of the DS can be improved through the use of external display which can give the “birds-eye-view” of the game in progress.

Discussion

A simple study which gives out a fairly interesting result that console gaming are more social than handheld gaming .

Still I don’t understand the purpose of this study. Is it leading towards the exploration of a new genre of games or consoles?

Recognizing Free-form Hand-sketched Constraint Network Diagrams by Combining Geometry and Context

Comments

Summary

In this paper the author discusses the modeling problem of constraint satisfaction problems and uses LADDER framework to solve the modeling problem. The author develops a hand recognition technology that can recognize hand drawn representations of problems, and automatically generate constraint satisfaction models of them.

The system uses a geometric-based recognizer to allow freeform drawing. Users require no training than would be necessary to sketch on a piece of paper. The system recognizes free-form constraint network diagrams that consist of labeled nodes, labeled links and domain information about the allowable range of the variables.

Nodes are recognized as ellipses that contain variable names inside them. Undirected links are recognized as a line between two notes. Directed links are recognized as arrow that extends between two nodes. A set of rules of geometric properties are defined for variable recognition inside LADDER.

The author uses a perception based threshold to counter noise in the input. Also the author uses a length-based threshold to some what relieve the problems where perception based threshold run into problems.

The system also supports editing in the form of deletion of shapes, movement of shapes, rubber-band of links and scaling of nodes.

Discussion

Yet another implementation using the LADDER framework. This simply demonstrates the capabilities of the framework and possibilities of its use.

The paper does not demonstrate any new or innovative features.

Sketch-based educational games: “Drawing” Kids Away from Traditional Interfaces

Comments

Summary

This paper discusses the use of sketch-based systems for child education and skill development. There are four basic styles in child education and development auditory, visual, tactile, and kinesthetic. Most technology work with auditory and visual styles in education whereas sketch based system address the need of children who learn better from kinesthetic and tactile methods.

Six games are discussed in this paper which helps in the development of different skills.
APPLES game allows user to draw planets and then draw arrows for planetary motion and gravity with a spiral gesture. It familiarizes children with basic concept of physics at an early age. It uses the LADDER framework.

Simon Says Sketch game allows player to draw six sketches and the player must then remember the order in which two shapes changed their color and then mark those shapes with a stroke. The number of shapes that change color increases at each round.
Go (Sketch-a) Fish game the player draws a set predefined shapes and then surround it with rectangular box to make it a memory card. It then hides the card and allows a friend to memory. It also uses the LADDER framework.

Sketch-based Geography Tool is simple geography tool that allows maps to be presented to children for labeling through sketch.
Learn Your Shapes! Game is a tool that allows children to learn their basic shapes through sketching. It utilizes the sketch recognition that can automatically recognize if the child has drawn the right sketch. It also allows the children to draw the shapes in any manner without any constraints.

Sentence Diagramming is a tool for automatically providing feedback about hand drawn sentence drawing. It has also been implemented on the LADDER framework.

Discussion

It’s a fun paper talks about a totally different domain in which sketch recognition can be applied. Too bad the author couldn’t find children to test his games ;). Btw what are IRB constraints?

Monday, October 6, 2008

LADDER : A sketching language for user interface developers

Comments

Yuxiang's blog

Summary

This paper discusses the LADDER system which is a domain-description language for shapes. LADDER has some limitations which are that it only describe shapes with a fixed graphical grammar, the shapes must be composed of the primitive constraints, it can only describe domains that have few curves and shapes which have a lot of regularity and not too much details.

The main features in shape definitions are components, constraints, aliases, editing behavior and display properties. The shapes can also be defined in Hierarchical shape definitions. It also allows the definition of abstract shapes and shape groups.

The language also has some primitive shapes such as Point, Path, Line, Bezier curve, Curve, Arc, Ellipse and spiral. Predefined shapes derived from primitive include Rectangle, Diamond etc. The language also has an abstract 'Shape' from which all the shapes are derived to have some common properties.

The language also has some predefined constraints such as perpendicular, parallel, collinear, same side etc. There are also some predefined editing behaviors eg dragInside. The user can also define more editing behaviors, action and triggers.

Recognition of primitive shapes is done in a bottom-up approach and domain shapes is done by a Jess-rule based system. Editing recognition is done by first identifying if the editing gesture is defined for the shape, and if the mouse is over the shape and the particular editing gesture is made the drawing gesture is short circuited and the editing gesture takes over.

LADDER has been used to for a variety of domains including UML class diagrams, mechanical engineering diagrams, finite state machines, flowcharts and a simplified version of the course of action diagrams.

Discussion

LADDER is an innovative system which depends on a powerful low level recognizer. The ability to define shapes with a language is really the highlight of the system. This also allows the definition of domains comprised of a particular set of shapes. I would really like to see the LADDER system to have the ability to define more complex shapes which a richer grammar.

Ambiguous intentions: a Paper-like Interface for creative design

Comments

Daniel's bog

Summary

In this paper the author discusses a pen-based interface that acquires information about ambiguity and precision from freehand input, represents it internally, and echoes it to users visually and through constraint based edit behavior.

In this paper the author speaks about sketch interface Electronic Cocktail Napkin and its support for abstraction, ambiguity and imprecision. Abstraction is letting symbol takes the place of a more detailed configuration of parts, enabling a designer to work with components without specifying their internal structure. Ambiguity is to postpone the commitment yet retain a marker for a later decision. Imprecision permits postponing decisions about exact dimensions and positions.

Napkin allows users to draw as they would do on a paper e.g. harder pressures cause the brush instrument to display thicker line etc. Napkin also tries to recognize glyphs the user draws, and it may echo this recognition by displaying the name of the glyph. Napkin also allows configuration definition and recognition e.g. it allows the user to draw a configuration of dining table and chairs and mark this configuration with a letter ‘D’ inside the box. The configuration of dining room is abstracted with ‘D’ inside the box. Napkin’s configuration works as daemons and would try to recognize the configuration if there is a pause of 5 seconds in user drawing. Napkin also represents imprecision with internal constraint representation. Napkin identifies spatial relation among drawing elements and asserts them as constraints on the drawing. It also allows the user to modify, add, and delete the constraints identified by Napkin.

For the implementation of Napkin there is a low level recognizer which helps in the recognition of glyphs. It does it by maintaining a 3x3 grid and keeping Hash map for the pen-path. It also assigns each input glyph with confidence values. The user is also allowed to enter new glyphs on the fly by drawing them and assigning it a name. Recognition of configuration is done through recognizer functions which look over all pairs of appropriately typed elements in the diagram of these patterns. New configurations can also be added by drawing a configuration and napkin automatically extract constraints from the diagram (the user can also edit these configurations).

The system was tested out by undergraduate students, architects and design students. Most designers have understood and appreciated the need for end-user training of symbols and contextual definition of configurations.

Discussion

Napkin is an exciting system which does a lot high-level contextual recognition and definition by I think it's low level recognizer is not very strong. I seems to work with simple and small gestures, but the idea of resolving ambiguity through contextual information is really interesting.

Saturday, October 4, 2008

What no Rubine Features? Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Comments

Daniel's blog

Summary

In this paper the author tries to merge the two techniques of Geometric-based sketch recognition and Gesture-based sketch recognition. Both the techniques have their advantages and disadvantages and the author's focus is to get the best out of both the techniques.

For this the author modified the Rubine algorithm to use a quadratic classifier instead of a linear classifier. He integrated all the 31 features of the PaleoSketch and also 13 features from the Rubine algorithm. This called the full feature set. With the full feature the accuracy rate was not comparable to the best accuracy available from the PaleoSketch.

The quadratic classifier can be optimized by removing the features which had the negative effect on recognition. For this author tried to pick out a subset of features which could give the best result. The technique used was the sequential forward selection (SFS). In which the author tried all the possible combination starting with a feature set of 1 feature. The author achieved better accuracies with the optimal feature set.

Interesting to note was that a 93% accuracy was achieved with only top six features. Only one of the Rubine feature was which is total rotation was included in the top six features.

Discussion

The author uses a different approach by merging the feature sets of the two different recognizers to produce a feature set which is most significant in the recognition of the sketch. The accuracy achieved is nearly equal to the hand tuned algorithm PaleoSketch. As it uses a classifier more prmitives can be easily added to the system and easily trained for the recognition of the new primitive.

The only problem that i see is that its application is only tested on a very limited set of symbols and there is high probability that its result wont be as good for complex shapes.

Backpropagation Applied to Handwritten Zip Code

Comments

Andrew's blog

Summary

This paper discusses the use of neural networks in sketch recognition. In this paper the author tackles the problem of recognizing zip codes written on letters which will help in the sorting of the letters. The zip codes are written with pens on the letter paper. The system first scans the digits and then does some pre-processing on the image to retrieve the zip code in digital format.

The digits are then separated from each other and then each digits is fed into the the neural networks to be recognized. The system uses a three tier neural network layers name H1, H2 and H3. In the first layer the image is divided into pieces of 5x5 pixels and the features are extracted from the sub-parts and fed to next layer. H1 consists of 12x64 hidden units, H2 consists of 12x64 hidden units and H3 consists of 30 hidden units. The output can be mapped to 10 units which are the digits in the number series.

According to the author the system misclassified patterns was 0.14% on training set and 5% on testing set.

Discussion

This paper discusses an alternate research area of sketch recognition. Neural networks can also be effectively used in the process of classifying the input sketch to an output sketch. I think the paper lacked detailed information on the what features the author used to train the neural networks.

Also his technique was tested on a very small domain in which there is not high variation in the input data.

Wednesday, October 1, 2008

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Information Sketches

Comments

Daniel's blog

Summary

This thesis paper discusses the recognition algorithm based around the idea of identifying shapes according to the visual parts that they are mode of. The classification of a shape at the level of pixels is difficult because of the wide range of signal and conceptual variation. Here the author tries to represent the shape as a collection of "parts" where each part represents the appearance of a portion of the shape.

The author here introduces the concept of a 'Bulls Eye' in which the input stroke is analyzed inside a circular region divided into wedges like a dartboard at the point of interest. The algorithm basically monitors the number of ink points inside the wedges. The rings are not evenly spaced; the two outer rings are further apart than the inner rings. The rings are spaced so that the inner bins are smaller to represent more detailed information and the outer bins contain more general information. The radius of the 'Bulls Eye' is chosen so as to span the majority of the input shape.

The author here also uses the direction information of the stroke to label each point with the direction, this will help in recognition of the sketches which are rotationally in variant. The stroke is preprocessed by first scaling the stroke so that the maximum distance between the two points in the shape is 75 pixels. The points are then resampled so that each point is equally spaced.

For the comparison of bullseyes of two stroke the algorithm mus first train itself to form a standard vocabulary of part, called the codebook. It does this from a set of training examples and parts are clusetered into a group of parts that represent patches with similar appearances. The algorithm takes the input stroke and determines the bulleyes for the input stroke. It then compares it with entries in the codebook using a distance function. The input part that result in the smallest distance value are more likely to belong to the class represented by the codebook. A match vector is then constructed which represent how each part of the input stroke match with the standard part in the code book.

Input stroke classification is done using a one-vs-one strategy. For one input shape, the result from each classifies is counted as a vote for the class it predicted. The final decision is made based on which class received the most votes. The author uses the technique of shape localization, in which the shape found in the context of a complete sketch. The basic strategy is to run the isolated shape classifier on a large number of regions in the sketch and then combine the information to form a final set of predictions. The steps involved in the process are selecting candidate regions, classifying the candidate regions and combining the classified regions to form the final predictions.

The system is divided into two parts; isolated shape recognizer and the full sketch processor. The local classifier reached the recognition rate of 89.5% for the analog circuit diagrams while image based classified on Zernike moments only achieved 76.5% accuracy. For power point style shapes it author's algorithm reached a recognition rate of 94.4% while Zernike's recognition rate was 96.7%.

Discussion

The main highlight of this paper are bulls eye features which are innovative and takes inspiration from image based recognizer. It address the problem of sketch recognition in great detail from isolated shape classification to shape localization in the context of the full sketch.

It beats the image based recognizers for the shapes which had high variations in sketch and was also equally comparable to image recognizers for more invariant power point shapes.

Constellation Models for Sketch Recognition

Comments

Summary

In this paper the author develops a constellation model for the recognition of strokes in sketches of particular classes of objects. In this model the author captures the structure of the object based on shape, size and pairwise features. The recognition algorithm determines a maximum-likelihood labeling for an unlabelled sketch by searching through the space of possible label assignments using a multi-pass branch and bound algorithm.

The constellation model consists of features of individual as well as features of pairs of parts. Individual features capture shape and global position of parts and pairwise features capture relative positions of parts. Some features are labeled as mandatory and some are labeled as optional. Individual features are calculated for all features whereas pairwise features are only calculated for the pair which has at least one mandatory feature.

An object class model is represented by a probability distribution over features in object constellation models. This function is learned from a set of example labeled sketches. To support recognition with a minimum number of examples it uses a diagonal covariance matrix.
Labeling Likelihood is the quality of particular matching between labels and strokes is scored using a cost function. Maximum likelihood search procedure finds the most plausible labeling for all strokes that appear in the image.

The algorithm was tested on 5 classes of objects with 7-15 labels and drawings that had 3-200 strokes. Most of the time spent in for the running of the algorithm was in the initialization function which calculated all the features of the stroke. Images with higher number strokes consumed most time so the author developed a technique to reject spurious strokes using a threshold. The recognition went wrong for several cases such as inability to find suitable mandatory features, mislabeling of mandatory strokes and mislabeling of optional strokes.

Discussion

In this paper the author discusses the technique of sketch recognition for complex sketches based on individual features and spatial features. Its application is very limited on objects with fewer features. This algorithm can recognize a stroke with the database of strokes but It cannot help in beautification of strokes.

Tuesday, September 30, 2008

Yu: A Domain-Independent System for Sketch Recognition

Comments

Summary

In this paper the author talks about a domain independent system for sketch recognition. In this he talks about stroke approximation through direction graphs, curvature graphs and feature area of the strokes.

Here uses a different approach for vertex detection by first trying to classify a stroke into a primitive. If the stroke cannot be classified into a primitive he breaks the stroke from the point of highest curvature and recursively tries to classify the sub strokes.

For line segment approximation he tries to fit the gradient graph of the stroke to a horizontal and the line to a straight line from the endpoints. For circles the direction graphs should be constant and increasing so he tries to fit the gradient graph with a straight line.

For self intersecting strokes such as helix its not a good methodology to break the stroke from the point of highest curvature so here the author uses a different methodology. He breaks the stroke from the point of highest curvature as well as from the point of intersection and then tries to classify the sub strokes. The result from both sub strokes is obtained and analyzed which one to chose. Here the author follows the strategy of 'simpler is better'. That stroke classified as a circle is preferred over the sub stroke classified as a set of lines etc.

Author then uses some post processing to clean up the stroke for beautification and basic object recognition into square, circle and rectangles etc.

The author claims to have achieved an accuracy rate of 98% for polylines and 94% for arcs.

Discussion

The paper presents good ideas for domain independent object recognition. The author explains the process of basic object recognition quite vaguely about how he uses his algorithm for basic object recognition.

Although it shows some new techniques for stroke approximation but I would still prefer PaleoSketch over it because it explains in detail about each process and does some very similar work.

GLADDER: Combining Gesture and Geometric Sketch Recognition

Comments

Summary

This paper proposes a recognition system which tries to utilize the advantage of two types of sketch recognition systems 1) Gesture based system 2) Geometric based system.

Gesture based depends upon how the user is supposed to draw the sketches and has a good accuracy rate and Geometric based systems allows the user to draw more naturally but its difficult to describe the shapes using their geometric sub-parts.

GLADDER tries to merge both the recognition system to produce a higher accuracy rate. In its implementation it modifies the rubine algorithm to use a quadratic classifier instead of a geometric classifier for gesture recognition. It uses the LADDER system for geometric recognition. A recognition assistant must decide which recognition system to use. It uses the Mahanabolis distance to reject or accept the input for a recognition algorithm. For Rubine the mahanabolis rejection is done on 24 and for LADDER primitives is done on the valur of 100. A mid value of 30 is set to decide which system to use.

If the value is below 30 rubine algorithm is used for recognition and if the value is above 30 LADDER system is used for recognition.

With all the inputs the Modified Rubine has accuracy of 61.9% and LADDER has 75.2% and after merging the GLADDER has the highest accuracy of 79.9%.

Discussion

This system shows how two systems can be merged to produce a system which is better than the two by utilizing the best in both.

Kim: A curvature estimation for pen input segmentation in sketch-based modeling

Comments

Summary

In this paper the author discusses the techniques for the segmentation of the input through curvature estimation. The features discussed in this paper are direction at a point, support for curvature estimation at Point 'j' and local convexity at Point'j' with respect to P'i'.

Direction at a point for A,B and C is the change in angle formed by the line segments AB & BC. Curvature estimation at point j is the angle from the horizontal of the Line segment AB.
A polygon is locally convexity at Point j with respect to Point i, if the curvature estimation at point j and point i has the same sign.

For segmentation the author uses the local maximum of positive curvature and local minimum of negative curvature at the identified points. These points are then taken as the segmenting points.

The algorithm proposed by Kim produced an accuracy rate of 95% for power point basic shapes and some basic shapes used by other researchers for curvature finding.

Discussion

This paper gives out new feature for curvature estimation for input strokes. Similar features are used by other algorithms for curvature estimation this paper just gives a different approach to the same problem.

MergeCF : Eliminating false positives during corner finding by Merging Similar Segments

Comments

Summary

This paper actually discusses corner finding algorithm based on curvature and speed differences within a stroke. Once the corner are found from the algorithm tries to merge the smaller stroke segments with longer segments and if the fit for the segment is below a threshold that corner is removed between the two segments.

The segments after merging produces a low primitive error is used for merging and no merging is done if the error when merging the two segments is much higher that the sum of the original errors of the segments.

MergeCF has a high accuracy rate when compare to Sezgin and Kim.

Discussion

This algorithm is extension to corner finding algorithms from Sezgin and ShortStraw for arcs which rely on curvature and speed differences and then uses the top down approach for eliminating the false positives.

Thursday, September 18, 2008

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Comments

Summary

In this paper the author discusses the techniques which aid in the sketch recognition and beautification of the sketch without hampering the user ability to draw freely and naturally. It adds no constraint on the user drawing which could help in the recognition process. In this paper the author tries to recognize some primitive set of strokes.

Line
Polyline
Circle
Ellipse
Arc
Curve
Spiral
Helix

The system has a structure which first takes the stroke into pre-recognition routine. In the pre-recognition routine a series of graphs and values are computed. Graphs calculated are speed graph, direction graph and curvature graphs. Then the corners are calculated for the stroke. In addition to these graphs some other features are also calculated. Normalized distance between direction extremes (NDDE) and direction change ratio (DCR) is calculated. Polylines will have lower NDDE values and higher DCR values and vice versa for curves.
Then a series of tests are performed for each shape and the author in detail explains the conditions which need to be satisfied for the recognition in his paper. One thing to note here is that the author successfully recognizes shapes which are not recognized by most recognizers such as Sezgin’s recognizer. These shapes include Arc, Spiral and Helix.
If all the test of the shapes fails then the input shape is termed as complex fit. The author here defines a novel hierarchy which helps in distinguishing the shape in complex interpretation or polyline interpretation. Each primitive shape has defined weight which is calculated based on the number of corners of the primitive shapes. The cumulative weights are calculated for both types of interpretations. The interpretations with the lowest weight are taken as the interpretation of the stroke (complex wins tie).

Results:
The author analyzed a dataset of 900 shapes with three version of his own recognizers and the Sezgin’s recognizer. The Paleo (proposed recognizer), Paeo-F (Paleo without NDDE DCR features), Paleo-R (Paleo without ranking algorithm) and SSD (Sezgin’s algorithm) were used. The results with Paleo were very good and achieved an accuracy of 99.89% for correct interpretation and 98.56% for top interpretation.

Discussion

The techniques discussed in this paper are a very in-depth analysis of the shapes, which accounts for the brilliant accuracy achieved by this recognizer. It does a great job by extending the work of the Sezgin and very effectively utilizes his techniques to introduce his own novel features.
I also particularly liked the ability of this low-level recognizer to be integrated into the high-level recognition system, LADDER.

Sketch Based Interfaces: Early Processing for Sketch Understanding

Comments

Daniel's blog

Summary
This paper describes about the algorithm which accroding to the author is a directed study for creating the pen input devices to be more usable in terms of the end user’s ability to interact with the system like he/she would do on a paper. This paper tries to define methods which can allow the user interaction more intuitive combined with the power of computing.
The author’s approach for the early processing of sketch is based on three phases approximation, beautification, and basic recognition.

Stroke Approximation:
It is to approximate the stoke with a more compact and abstract description, while both minimizing error and avoiding overfitting. The first step in stroke approximation is Vertex Detection. Vertex detection explains the methods used to find corners in a stroke. First a direction graph of the stroke is generated. From the direction curvature graph can be determined. The peaks in the curvature graph can pointed out as the vertexes or the corners of the stroke. The curvature graph has limitation is that it cannot properly identify the strokes which has a small curvature value so that it falls below the mean. To identify such corners the authors also presents the idea of speed graph. The speed graph algorithm works on the assumption that the user tends to slow down when it is drawing a corner. Using the speed graph alone has it’s own limitation. Poly lines formed from a short and long vertexes can be problematic. In such cases two corners can be regarded as on corner.
The author next presents the idea of Hybrid fit which uses both the above mentioned techniques to identify the best set of vertices.
The next problem in stroke approximation is the related to handling curves. The above mentioned techinques are good for polygons. The author approximate the curve regions with Bezier curves which are defined by two endpoints and two control points.

Beautification:
It refers to adjusting the sketch output to make it look as it was intended by the user. Here the author adjusts the slopes of line segments in order to ensure that were apparently meant to have the same slope end up being parallel.
Basic Object Recognition:
In this final step the author tries to recognize the basic objects that were built from line segments and curve segments. These simple geometric objects include ovals, circles, rectangles and squares.

Evaluation:
Overall the users of the system were very happy when they drew sketches on it. They could interact with the system very naturally. The recognition of the system for vertices and approximation of shapes with lines and curves was correct 96% of time.

Discussion
The presents some very good ideas in corner detection particularly the hybrid approach. But what I think that the paper lacks is the proper description of the ideas beautification and basic object recognition. After reading the paper it seems quite difficult to implement these with only the explanation in this paper.

Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or Its Caricature

Comments

Summary

This paper proposes an algorithm for the poly line simplification. This can be useful in situations where your graphical application must draw many poly lines and time becomes an issue, like cartographic applications. Although this is a line simplification algorithm but it can be usefull in finding the corner in a particular sketch.

The algorithm works by first finding the two end points of sketch, which is the starting and and ending point of the sketch. Then it tries to find the point which is most distant in terms of the euclidean distance from the two points. If the euclidean distance is above the threshold value the algorithm assumes that this is not a line and there are possible corners on the line. The farthest point away becomes the corner and also the ending point for the starting points and starting point for the ending point.

The process is repeated recursively until no points can be found above the threshold. The algorithm works well and finds corners with good accuracy on polygons. While working with curves the algorithm sometimes calculate two points for a single point.

Discussion

The algorithm is the well defined and seems to work well for polygons. It's a very basic algorithm and several improvements can be made on it to produce better results with curves and arcs.

Monday, September 15, 2008

ShortStraw: A Simple and Effective Corner Finder for Polylines

Comments

Daniel's blog

Summary

In this paper the author discusses a simple corner finding algorithm which he compares to more complex corner finding algorithms given by Sezgin and Kim Kim. The basic theme defined for this algorithm is that it can be implemented easily and still be very accurate when compared to it's counterparts. When implementing this algorithm the developer only has to have a very basic knowledge of higher mathematical function defined in calculus.

The algorithm works in three stages

Resampling of input points of a stroke.
Calculate 'straws' which is the distance between the endpoints of window.
Points with minimum 'straw' distances as the corners of the stroke.

In resampling the point from the original stroke a resampled in such a way that the resulting stroke has equidistant points. Its based on the Wobbrock resampling algorithm.

In corner finding the author takes two approaches. First is the bottom-up approach in which the corner are indentified by a defined algoritym. Secondly the author takes a top-down approach in which the corner which are not identified by the first algorithm are identified and the corners which are misrecognized earlier are removed.
The bottom-up approach of corner finding algorithm works by first calculating straw values at each point. A straw value is the euclidean distance between the two points pi-w and pi+w where w is the window size. The algorithm works on the fact that these straw values will become samller when points will get closer to a corner. The corners are identified by first taking the median of all the straw values. Then multiplying the median with a constant value to calculate the threshold t. The local minima below the threshold t is taken as corner.
The top-down approach of the corner finding algorithm works to find the missed corners and remove false positives. This is done by calculating the ratio between the euclidean distance to the path distance between the two points taken as corners. If the ratio is below a defined threshold value then the two points are considered as lines and there are no corners between the two points. If the value is larger than the threshold then there could be more corners between the tow points. The threshold value is relaxed and the minimum straw value which is in the middle half of the stroke is taken as a corner. A collinear check is then run on the subsets of triplet, consecutive corners and the middle point is removed if the three points are collinear.

It is interesting to note that the ShortStraw with it's simplicity is able to produce better results than Sezgin and Kim Kim with this logic. It also doesnot uses any temporal information which as Sezgin and Kim Kim so it can also be used for sketches taken from offline sources.

Discussion
I like the idea of the ShortStraw which uses a very intuitive algorithm for corner finding. The good thing about ShortStraw is that uses no temporal information which more close to how the human perceives corners from sketches.

The algorithm uses a lot of threshold values and constants which tells that ShortStraw can only be fine tuned for a perticular set of strokes and not a blanket solution for corner finding. I think this is where the contextual information about the stroke can make a differenece in the accuracy and usability of this algorithm for a wide range of problems.

Prototype Pruning by Feature Extraction for Handwritten Mathematical Symbol Recognition

Comments

Yuxiang's blog

Summary

In this paper the author discusses the problem of recognizing math symbols. The problem is tough because there are around 1000 to 2000 mathematics symbols today. Mathematics writing is a blend of drawing and writing. In this paper the author defines some features related to mathematics symbols, gives algorithms to extract those and use these features to recognize the symbols.
In preprocessing the collected data the author describes some techniques he used. Techniques used are chopping head & tail, re-sampling, smoothing and size normalization. Author also identifies some features which he categorized in these broad categories.

Geometric features
Ink related features
Directional features
Global features

The recognition method he used was elastic matching, which is to determine the minimum distance between the unknown symbols and a set of models.

Discussion

This paper discusses some new set of features which might feel more relevant to the mathematics symbols but are also good for other set of sketches

Wednesday, September 10, 2008

Graphical Input through Machine Recognition of Sketches

Comments

Manoj's blog

Summary

In this paper the author first tries to answer the question “Could a machine makes useful interpretations of a sketch without employing knowledge of the subject domain?” He tries to answer this question by means of a system called “HUNCH”. HUNCH is a set of FORTRAN programs and has several components. One part of the program is called STRAIT which found corners in a sketch as function of speed. Curves are considered as a special case of corners. When the curvature of the corner is too gradual or the curve is drawn too carefully the output of the straightening program would go through the curve-fitting program. CURVIT would make one or more passes over the raw data at places pointed out by STRAIT. It was seen that when the output of the STRAIT and CURVIT was shown to the participants; the interpretations made by these programs were not always what the participants expected.

Latching is the idea of joining two or more lines if the sketcher was not able to join them. This also suffered from problems if the domain knowledge was not available. Like in 3D shapes and pictures with varying scales.

Overtracing is idea of replacing several close line with one line by inferring what the user had intended to draw. This also suffered from problems same as of latching for 3D pictures.

The question posed earlier by the author “That is there a syntax of sketching independent of the semantics?” - Is still unresolved. It’s evident from the above scenarios that sketch recognition involves drawing of a sketch and also the context explaining the domain the sketch belongs.

Towards an interactive system: The author here explains an interactive system of sketching where the user can draw in a unobtrusive manner. The system maintains the user inputs and its interpretation in the form of a database. The HUNCH components than can use the database. The HUNCH has three kinds of components. 1) Inference programs: which are improved versions STRAIN, LATCH, OVERTRACE and GUESS. 2) Display programs: which allow displaying any levels of database. 3) Manipulation programs: which allow the user to modify the database directly. In order for the system to be interactive STRAIN works in real time and finds lines and curves on the fly.

In conclusion the author says that the sketch recognition problem has come in a full circle from a insistence on machine recognition with no demands on the user, through knowledge-based systems, and back to more modest interactive approach.

Discussion

The author brings up the idea and complexities involved in sketch recognition and beautification techniques. The author is very right to say that sketch recognition is not possible without the knowledge of context in which the sketch is drawn.

I think when in the end the author goes back to the interactive approach is basically his disappointment of not able to achieve the desired results or some solution other than knowledge based system. I think sketch recognition with the knowledge based systems is a perfect model to deal with this problem.

User Sketches: A Quick, Inexpensive, and Effective way to Elicit More Reflective User Feedback

Comments

Daniel's blog

Summary

In this paper the author gives a new idea about the prototype design in comparison to other more commonly used methods for usability testing. The author here focuses on the making the right design instead of making the design right. In usability testing (UT) the participants usually generate more reactive comments and criticisms that reflective design suggestions or redesign proposals. The author here introduces a new technique called the user sketching.
The reflective methodology of the user sketching gives the idea of users sketching the design of the system after they are given ideas on the possible designs of the system. Here the author conducts the experiment by taking four groups of 12 people each group is shown with different prototypes of a house climate control system. There were three types of prototypes. 1) Circular prototype. 2) Tabular prototype. 3) Linear prototype. The last group is shown all three prototypes. When verbally asked for user feedback on the design the participants gave more comments than suggestions for all prototypes. The participants were then asked to draw the design of in their view would be the ideal interface of the system. A ‘Quick and Dirty’ analysis showed that the user designs were stereotyped to the designs they were showed earlier, but the important thing was that some users even came up with the new ideas which were not part of the prototypes they were shown earlier.

The author here classifies his subjects into three categories. 1) The Quiet Sketcher: The participant who highly rated the prototype and when asked about the change suggestions he said ‘No’ immediately. When asked to sketch to draw, he drew a design which included totally new features. 2) The Passive Sketcher: She also highly rated the prototype but when asked for changes couldn’t figure out what she would change. When asked to draw she discovered a totally new solution for representing intervals in the system. 3) The Overly Excited Sketcher: She was really excited to contribute to the study but had confused and mixed suggestions for the system. When asked to draw she drew a totally different interface which even changed the shape of the device.

Here the author illustrated the benefits of engaging users in a sketching activity as an extension to the conventional usability testing. The act of sketching proved to facilitate reflection and discovery better than the other methods used.

Discussion

The idea of the author here is good and does work out well because the participants are able convey to the designer the actual interface they are looking for. I think this method can work out well for the devices and systems which are existent today. The design features the participants gave in these experiments were not novel; instead they were taken from some other devices and artifacts around them.

Wednesday, September 3, 2008

Gestures without Libraries, Tookits or Training

Comments

Manoj's blog

Summary

This paper discusses another gesture recognition algorithm which according to the author is simple to implement. The author focused on these 3 points when designing this algorithm which he name as $1. 1) To present easy to implement algorithm 2) To compare $1 with other advanced algorithms to show that $1 is as good as them for certain shapes. 3) To give insight which user interface gesture are best in terms of human and recognizer performance.

A Simple Four-Step Algorithm:

Step1: Resample the point path Here the author explains that two same sketches drawn at different will have different number of input points on it. Sketch drawn slower will have more input points. The that goal of this step is to resample the gestures such that the path defined by their original points M is define by N equidistantly space points. The value of N=64 was found to be very reasonable for implementation.
Step2: Rotate once based on indication angle The goal of this step is to rotate both the sketches so they are best aligned for future steps. Here the author gives the concept of indicative angle which is the angle formed between the centroid of the gesture and gesture's firs point. Both gestures are rotated such that the indicative angle of both the gestures are 0.
Step3: Scale and translate here the gesture is first scaled to a reference square. This scaling is non-uniform. Then the gesture is then translated to a reference point. For simplicity the author keeps the reference point as origin.
Step5: Find the optimal angle for best score Here the author explains the computation of a value which can then be used for the recognition of the gesture. The candidate gesture is compared to each template gesture to find the average distance between the corresponding points. The lowest value of the path distance will lead to the recognition of the gesture.

Limitations

$1 cannot distinguish between gestures whose identities depend on specific orientations, aspect ratio, or locations.

$1 does not use time, so gestures cannot be differentiated on the basis of speed.

Evaluation results
$1 performed very well for user interface gestures with 99% accuracy overall. With one template is showed 97% accuracy. With 3 loaded templates it showed 99.5% accuracy. Rubine on the other hand was performed at 95% accuracy using 9 training examples of 16 gestures.

Discussion

I liked the way the authors projected their algorithm as '$1' which immediately tells its a fast easy to implement solution for gesture recognition. The algorithm is very interesting and the author very rightly states it's limitations.

I don't see any real world application of this algorithm. It's a good read for people like me who are looking towards getting started in this field and implement something.

MARQS

Comments

Yuxiang's blog

Summary

This paper discusses an algorithm that can identify multiple stroke sketches using a set of global features that are both domain and style independent. To give a real world example the author have created an application called MARQS. In this application the user can store photo and music albums, which can be retrieved by matching multi-stroke sketches which the user designated during album creation.
Recognition algorithm uses two different classifiers depending upon the the number of training examples available. Initially the user is asked to give only one example sketch and whenever the user performs a search the sketch he gave he gave for searching is also added to examples. The algorithm uses global features for the sketches since it puts no constraint on the users on how they would draw the sketches. Currently it uses four global features to describe a sketch. 1) Bounding box aspect ratio: the total width of the sketch divided by the total height of the sketch. 2) Pixel density: the ratio of the filled (black) pixels to total pixels within the sketch’s bounding box. 3) Average curvature: the sum of the curvature values of all the points in all strokes divided by the total sketch length (sum of the stroke lengths of all strokes in the sketch). 4) The number of perceived corners across all strokes of the sketch.
MARQS is real world application which utilizes the the recognition algorithm mentioned above. It’s a media storage and retrieval query sketch system. It allows users to create, edit, open, add and delete albums and pictures. It also allows user to search for an album through a sketch and gives the top 4 sketches that matched.
To gather the preliminary data 1350 different sketch queries were performed (15 sketches, 9 queries each, 10 tests). The system used the single classifier 27% of the time and linear classifier 73% of the time. 70% of the time the system produced top result and 98% in the top 4. 2% of the time the results were not in top 4.

Discussion
Here I liked the idea of the system becoming more accurate with every search performed. The idea of adding the query sketch to example space is good. But I am not sure that its a good enough algorithm for other real world application like drawing a circuit diagram.
In MARQS the application shows top 4 results and then the person chooses one from it which tells the system to associate that query sketch to a particular example class. By using MARQS the person is training the system without actually knowing that he is training the system.

I also think 70% accuracy for top result will not be very effective in real world applications. Nonetheless this system opens a new dimension of multi-stroke recognition

Monday, September 1, 2008

Visual Similarity of Pen Gestures

Comments

Daniel's blog

Summary

In this paper the author discusses the issues of designing good gestures so that they are easy to remember for the user of the system. The author is trying to develop a tool which will enable the UI designers to improve their gesture set so they are easy to remember and use.

The author investigates the gesture similarity and develops a computable, quantitative model of gesture similarity which will help in creating a gesture designer tool. The author conducts two experiments on the human beings to develop a computable, quantitative model of gesture similarity.

Perceptual similarity is the concept of how human beings perceive two shapes to be similar to each other. Psychologists have conducted investigations on the shapes which are simpler than gestures. Investigations conducted by Attneave found that the similarity of shape correlated with log of the area and tilt for parallelograms.

Multi-dimensional scaling is the technique for reducing the number of dimensions of the data set so that patterns can be more easily seen by viewing a plot of data, in two or three dimensions. Author here uses the MDS version called INDSCAL, that takes as input a proximity matrix of each participant and takes individual differences into account.

In experiment one the author makes a set of 14 gestures which are a widely dissimilar to each other. Each of the twenty participants are shown a set of 3 gestures on the screen and are asked to select the gesture which is mostly dissimilar to the the other two gestures. All possible combination are shown to the participants i.e. a total of 364 screens are shown. After the experiment the author had to analyze two important points from the data collected. 1) to determine what geometric properties of gesture influenced their perceived similarity. 2) to produce a model of gesture similarity, that given two gesture the system could predict the similarity that humans would perceive. The firs point was addressed through MDS plotting. The Euclidean inter-gesture distances corresponded to inter-gesture dissimilarities. The second point was addressed by running the regression analysis to determine which geometric features correlated with reported similarity. Some features were taken from Rubine’s algorithm and some inspired from MDS analysis. The author was able to derive a model which correlated 0.74 with the reported similarities.

In experiment 2 the the author wanted to explore how systematically varying different features would affect the perceived similarity. For this the author made 3 gesture sets of 9 gestures each. First set was to explore total absolute angle and aspect. Second was to explore length and area. Third was to explore rotation-related features. The author then took 2 from each set of gestures and made a fourth set of gestures. Again twenty people were shown the a set of 3 gestures and a total of 538 gesture sets were shown same as in experiment one. The trial was also examined using the techniques used in experiment one. Author was able to determine that length and area are not very significant contributors to similarity judgment. Another finding was that the perceived similarity among gesture is not proportional to the angle rotation of the gesture, instead gestures with horizontal and vertical lines are perceived more similar than those gestures whose components are diagonal.

Author concludes that human perception of similarity is very complicated and there are several cues involved in human perception which determines the similarity and dissimilarity of the gestures. However the authors model correlates 0.74 with the perceived similarity in experiment one which is a fairly good model.

Discussion

The author have conducted an extensive investigation to determine a model for perceived similarity of gestures. Even if we can determine that two gesture are similar to each we still cannot make a gesture set that is easier to remember. Remembrance of a gesture not only depends upon a gesture being dissimilar to another gesture so the user does not overlap the gestures in memory but also on the shapes and actions mapped to a gesture, the complexity of the gesture itself and also similar gestures for similar meanings will also contribute in the remembrance of the gestures.

Sunday, August 31, 2008

Specifying Gestures by Example

Comments

Andrew's blog

Summary

'Gesture' the term mentioned by the author is basically a stroke made by a stylus or pen on a computer screen.

In this paper the author starts with mentioning some of the earlier gesture-based recognition system developed by various authors and teams. The problem with all these systems is their gesture recognizer is 'hand coded'. This code is complicated and not easy to maintain. There is another recognition technology known as 'GRANDMA' which shows how the need of 'hand coding' can be eliminated by automatically creating gesture recognizers from example gestures.

Author discusses the design of GDP which is a gesture-based drawing program built using 'GRANDMA'. 'GRANDMA' utilizes a MVC like system where an input event handler may be associated with a view class. It allows the user to define a set of gestures and their examples. Examples here are important because they define the variance in the gesture. The user can then also define the 'Semantics' of the gesture using the interface of GRANDMA. On the semantics window there are three main functionality. 1) 'recog' the user can define an expression once a gesture is recognized. 2) 'manip' defines the expression evaluated at subsequent mouse points. 3) 'done' when mouse button is released.

Gesture recognition is done in two steps 1) statistical features are extracted from the gesture 2) extracted features are then classified into one of the class of gestures earlier defined. There are thirteen features which are chosen as the important and could be able to distinguish between to different gestures. These features are sine and cosine of the initial angle, the length and angle of the bounding box diagonal, distance between first and last point, sine and cosine of then angle between the first and last point, total length of gesture, total angle traversed, the sum of absolute value of angle at each point, the sum of squared values of those angles, the maximum speed of the gesture, and the duration of gesture. One set back of these feature extraction is that there are some cases where these features may be same for totally different gestures.

Gesture classification is simple. Every feature has an assigned weight and the sum is evaluated of the weight multiplied by the feature value. The resultant value can be used for determining in which class the input gesture falls into.

Author also discusses the training concept. Earlier the user was actually required to input a set of example gesture for every gesture class. For each of these examples the features are extracted. Then simply an average of these feature values a taken as the base values. Author then discusses the methods by which the system can reject ambiguous gestures and outliers.

The algorithm explained by the author is fairly simple in understanding and it produces good results when classifiers are trained using this algorithm. As the number of classes are increased the accuracy rate slightly falls and the number of examples per class needs to be increased to achieve the better results.

Discussion

The author presents an algorithm which addresses the issue with previous gesture recognition tools. Earlier tools were 'hand coded' so they complicated and hard to maintain. GRANDMA on the other hand can actually be trained to recognize any gesture. The algorithm it uses is conceptually similar to fingerprint recognition algorithms where features are first extracted from the fingerprint and then these features are classified to match the the fingerprint from the database.

There are two problems with this system.
1) Feature values can be similar for two different gestures.
2) Accuracy in matching falls when the number of classes are increased. (Author gives a graph of accuracy against number of examples for up to 30 classes. But I had be more interested in the graph of accuracy against number of classes for up to say 100 classes).

Thursday, August 28, 2008

Introduction to Sketch Recognition

Comments

Akshay's blog

Summary

This paper discusses the history in the field of sketch recognition. One of the early breakthroughs in this field was the birth of pen-based computer interaction in 1963 by Ivan Sutherland at the MIT. Sketchpad was a very basic device which allowed user to draw diagrams on a computer screen with the help of a pen. This device was based on vector graphics which was later substituted with raster graphics because of many digital benefits.

With the advent of tablet PCs the pen-based interaction with the computer was taken to next level. There are various manufacturers of the tablet PCs. Tablet systems are of many types. They are in a slate form where only interaction is through the screen of the slate. The other type is convertible type which is basically a notebook computer whose screen can twist and be placed over the keyboard to actually look like a slate. Then there are some USB connected pen tablets which can be plugged into the system as a pen-based input device.

The input technology in the table PCs are of two types. A passive digitizer which uses only touch and active digitizers which detects electromagnetic signals from a special. Active digitizers are better in terms of precision. They also gives the functionality of 'hover' just like in mouse where the pointer can hover over the objects without actually without actually triggering a click event.

Some software features are also unique to tablet PCs. For example Microsoft support for Tablet PCs allows user to write with a pen on a computer screen and the software can recognize the user input which is translated to text. It also gives an on screen keyboard for the users which are familiar with QWERTY keyboard. Linux can also be installed to tablet PCs and there are some open source softwares available such as 'Jarnal' to take notes and sketch. There is another specialized software such as 'camatasia' which can record user activity on the screen.

Tablet PCs main application is in the field of presentation of and teaching. Tablet PCs can provide a more intuitive interface when preparing PowerPoint slides for the lectures. They can also be used for delivering lecture material 'on the fly' or interaction with static materials prepared in advance. Then editing on the tablet PC is fairly simple and easy as compared to paper.

There can also be some disadvantages of Tablet based presentations. The presenter may not be accustomed to the tools available on the tablets such as highlighters or he/she may not be comfortable with a tablet which can reduce the effectiveness of presentation.

The application of a tablet PC joined with the specialized software of sketch recognition can be tremendous. There are already some fields where these techniques are being used such as music composition where a user can draw the nodes and the system can play the music for the composer. In the field of chemistry where 'ChemPad' can be used to draw molecular structures. This is also being used in drawing mechanical diagrams, finite state machines, UML diagrams, and military course of action symbols.

For sketch recognition the author has provided a FLUID framework which can be used to build a sketch recognition system for any field. In this framework the instructor can draw a random diagram and then write a 'Domain specific information' in the LADDER (Domain description language). The GUILD is fed with these two information which can then generate a sketch recognition system which will have the ability to recognize data in a particular domain.

In the end the paper discusses two case studies. One is of high school teacher who uses table PCs and a lot of sketching softwares to present his lectures. The other teacher uses a tablet PC for presentation but utilizes a polling devices for students to interact. Both the teachers preferred tablet PCs because their students where more attentive in class and excited about the course.

Discussion

Ah!.. That was a difficult read, I think some pages were missing there.

This paper basically gives a background history of sketch recognition and then summarizes some latest milestones achieved.

The good thing about the FLUID framework is its 'generic nature'. The ability to describe a domain for any sketch recognition system gives it a lot of power. But will its generic nature become it's shortcoming when someday specialized frameworks will come into picture?.

Wednesday, August 27, 2008

Sketchpad

Comments

Akshay's blog

Summary

Sketchpad basically is one major breakthrough in the field of sketch recognition. Ivan E. Sutherland consultant Lincoln lab of the MIT is the creator of sketchpad. A sketchpad in simple terms is a device which enable human beings to draw digital diagrams using a light pen. The resulting diagram can be easily manipulated on the computer screen.

Considering sketchpad was built in 1963 with the limited resources of the time a simple task involved a lot of complexities. Sketchpad on its conception only supported a set of predefined shapes for example lines, circles and points to be made on the screen. But it was designed in such a way which could cater more shapes such as ellipse etc. Sketch pad also provided other functionality of joining points, adding constraints, copying, merging, deletion, rotation and magnification of the diagrams. Its usefulness at that time was expected to be in the field of the topological input devices and highly repetitive diagrams.

Sketchpad also have the capability of displaying 'text' and 'numbers'. In sketchpad the letters and the numbers are more or less the combination of curves and lines. Sketchpad utilizes some recursive functions to be able to manipulate the diagrams. 1) Expansion of instances: which means its possible to have sub pictures within sub pictures 2) Recursive deletion: which is deleting an object also means the deletion of all depended objects. 3) Recursive merging: which means merging of two independent objects will result in all the dependent objects to be now dependent upon the result of the merger. Sketchpad also use 'recursive display' to display diagrams on the screen. For every instance to be drawn it breaks the picture into the smaller part which was earlier drawn and so on until the it could not broken down into any smaller instance.

Sketchpad can also define 'attachers' since small diagrams can be used to make a larger diagram and these smaller parts need to be connected with other parts so the user must define the attachers in smaller parts so these pieces can be joined. Sketchpad apart from the light pen also uses a set of buttons which actually tells the sketchpad when a copy operation is to be performed or when a delete operation is to be performed and so forth. What makes sketchpad different from the paper and pencil concept is that in sketchpad the user can define the design constraints. For example user can define a constraint that two particular lines will be parallel. If the user had not drawn the two lines parallel to each other the computer will adjust the diagram so the two lines become parallel. There are a set of constraints which are compiled in the form of a manual for the user. This makes it easier for the user to draw as he/she wishes to draw by utilizing these constraints.

Discussion

Although it has been four decades since the sketchpad was developed but still it proposes some very futuristic concepts. Sketchpad was developed as the first pen based input device on computers but its use and progress was marred by the concurrent invention of a computer mouse. Since the computer mouse was a cheaper built and pen-based device was very expensive sketchpad didn't get much attention of the researchers.

The fault that I see in this system but that can be justified is that the system rely a lot on user input. To posses the system was actually very expensive at that time and it seems that it was equally difficult to use the it also.

If I had invented this sketchpad I would have definitely gone onto the next level to make this device more user friendly and accessible to common people for use in activities such as teaching, planning and simulation.