Comments
Andrew's blog
Summary
'Gesture' the term mentioned by the author is basically a stroke made by a stylus or pen on a computer screen.
In this paper the author starts with mentioning some of the earlier gesture-based recognition system developed by various authors and teams. The problem with all these systems is their gesture recognizer is 'hand coded'. This code is complicated and not easy to maintain. There is another recognition technology known as 'GRANDMA' which shows how the need of 'hand coding' can be eliminated by automatically creating gesture recognizers from example gestures.
Author discusses the design of GDP which is a gesture-based drawing program built using 'GRANDMA'. 'GRANDMA' utilizes a MVC like system where an input event handler may be associated with a view class. It allows the user to define a set of gestures and their examples. Examples here are important because they define the variance in the gesture. The user can then also define the 'Semantics' of the gesture using the interface of GRANDMA. On the semantics window there are three main functionality. 1) 'recog' the user can define an expression once a gesture is recognized. 2) 'manip' defines the expression evaluated at subsequent mouse points. 3) 'done' when mouse button is released.
Gesture recognition is done in two steps 1) statistical features are extracted from the gesture 2) extracted features are then classified into one of the class of gestures earlier defined. There are thirteen features which are chosen as the important and could be able to distinguish between to different gestures. These features are sine and cosine of the initial angle, the length and angle of the bounding box diagonal, distance between first and last point, sine and cosine of then angle between the first and last point, total length of gesture, total angle traversed, the sum of absolute value of angle at each point, the sum of squared values of those angles, the maximum speed of the gesture, and the duration of gesture. One set back of these feature extraction is that there are some cases where these features may be same for totally different gestures.
Gesture classification is simple. Every feature has an assigned weight and the sum is evaluated of the weight multiplied by the feature value. The resultant value can be used for determining in which class the input gesture falls into.
Author also discusses the training concept. Earlier the user was actually required to input a set of example gesture for every gesture class. For each of these examples the features are extracted. Then simply an average of these feature values a taken as the base values. Author then discusses the methods by which the system can reject ambiguous gestures and outliers.
The algorithm explained by the author is fairly simple in understanding and it produces good results when classifiers are trained using this algorithm. As the number of classes are increased the accuracy rate slightly falls and the number of examples per class needs to be increased to achieve the better results.
Discussion
The author presents an algorithm which addresses the issue with previous gesture recognition tools. Earlier tools were 'hand coded' so they complicated and hard to maintain. GRANDMA on the other hand can actually be trained to recognize any gesture. The algorithm it uses is conceptually similar to fingerprint recognition algorithms where features are first extracted from the fingerprint and then these features are classified to match the the fingerprint from the database.
There are two problems with this system.
1) Feature values can be similar for two different gestures.
2) Accuracy in matching falls when the number of classes are increased. (Author gives a graph of accuracy against number of examples for up to 30 classes. But I had be more interested in the graph of accuracy against number of classes for up to say 100 classes).
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment