Automatic Schaeffer’s Gestures Recognition

System

Francisco Gomez-Donoso Miguel Cazorla
Alberto Garcia-Garcia
Jose Garcia-Rodriguez

Computer Science Research Institute. University of Alicante
P.O. Box 99. E-03080. Alicante. Spain

April 10, 2016

Abstract

Schaeffer’s sign language consists of a reduced set of gestures designed
to help children with autism or cognitive learning disabilities to develop
adequate communication skills. Our automatic recognition system for
Schaeffer’s gesture language uses the information provided by an RGB-
D camera to capture body motion and recognize gestures using Dynamic
Time Warping combined with k-Nearest Neighbors methods. The learning
process is reinforced by the interaction with the proposed system that
accelerates learning itself thus helping both children and educators. To
demonstrate the validity of the system, a set of qualitative experiments
with children were carried out. As a result, a system which is able to
recognize a subset of 11 gestures of Schaeffer’s sign language online was
achieved.

keywords: Schaeffer’s gestures, 3D gesture recognition, Human-Machine
Interaction, RGB-D sensors

1 Introduction

Disabled people are a group that requires special attention from governments
and people with cognitive disabilities (learning difficulties, cerebral palsy, etc.)
are a special group within that one. Caregivers and educators need a way to
communicate with them. In the case of children with autism, aided commu-
nication systems from low (pictures) to high tech (speech devices), have been
used. However, gestural communication or sign languages are still recommended
as well to provide autistic children with augmentative or alternative means of
communication [29].

Several studies have been presented in fields like psychology, neuroscience or
linguistics, demonstrating that the combination of speech with gestures forms
an integrated and useful system during language production and comprehension
[27]. In particular, several works have studied how that combination can improve
children comprehension in many teaching contexts [28]. For that purpose, a

1

Usuario
Texto escrito a máquina
This is a previous version of the article published in Expert Systems. 2016, 33(5): 480-488. doi:10.1111/exsy.12160

Usuario
Nota adhesiva
Completed definida por Usuario

Usuario
Nota adhesiva
Accepted definida por Usuario

Usuario
Nota adhesiva
None definida por Usuario

http://dx.doi.org/10.1111/exsy.12160


special gesture set was developed by Schaeffer et al. [18]. To the best of our
knowledge, there exists no system that is able to recognize these gestures.

Gesture languages based on hand poses (static gestures) or movement pat-
terns (dynamic gestures) have been used for implementing command and control
interfaces [1–4]. Gestures, which involve spontaneous hand and arm movements
that complement speech, have been proven to be a very effective tool for multi-
modal user interfaces [5–9]. For instance, object manipulation interfaces [10–12]
use the hand for navigation, selection and manipulation tasks in virtual envi-
ronments.

Several applications, such as heavy machinery control or manipulators, han-
dling computer-generated avatars or musical interaction [13], use the hand as
an efficient control device and with a high number of Degree of Freedom (DoF).
In addition, some applications such as surgical immersive virtual reality simula-
tions [14] and training systems (VGX, nd), include the manipulation of complex
objects in their own definition.

Almost all Human-Computer Interaction (HCI) systems based on gestures
use hand movements as their main input. Currently, the most effective hand
motion capture tools are electromechanical or magnetic detection devices (data
gloves) [15, 16]. These devices are placed on the hand to measure the loca-
tion and angles of the finger joints. They offer the most comprehensive set of
real-time measurements, they are application-independent and allow full func-
tionality of the hand in HCI systems. However, they have several disadvantages
in terms of use and cost. In addition, they also hinder the movement of the hand
and require complex calibration and installation procedures to obtain accurate
measurements.

Computer vision represents a promising alternative to data gloves because
of its potential to provide a more natural interaction without intrusive devices.
However, there are still several challenges to overcome for it to become more
widely used, namely precision, processing speed, and generality. Among the
different parts of the body, the hand is the most effective tool for general purpose
interaction due to its communicative and manipulative functionalities. Some
trends in interaction tend to adopt both modalities, thus allowing an intuitive
and natural interaction.

This work focuses on part of HCI, which is the branch of computer science
that studies and develops new paths in the communication process between hu-
mans and machines. HCI is a multidisciplinary field that includes topics such as
artificial intelligence, design, social sciences, and natural language understand-
ing.

In particular, we present a gesture recognition system that is specially de-
signed for Schaeffer’s gestures. The system, apart from gesture recognition, is
able to show the user examples of the correct way to make the gestures, com-
bined with speech reproduction of the described concept. Once the user imitates
the gesture, the system provides the most similar one in the dictionary.

This document is structured as follows. In Section 2 we briefly introduce
Schaeffer’s gestures, what they are used for, and how they can help cognitively
disabled people. Section 3 describes the general system architecture and explains
the modules of the system: the input data, preprocessing and classification.
Next, in Section 4 we present some experiments that were carried out using
the gesture recognition system with different parameters over a set of recorded
gestures in order to test its performance and its impact on children learning.

2


Finally, we draw some conclusions and outline future works in Section 5.

2 Schaeffer’s gestures

Research on hand gesture falls within the behavioural analysis of interactions.
Behavioural studies often involve observational methods, i.e., the adoption and/or
adaptation of reliable coding systems shared by the scientific community [30].
There are many research studies that propose different taxonomies of gesture
languages [30–34].

However, we are interested in a special group of patients: children with
autism or cognitive learning disability. This kind of patients need a special way
to learn to talk. In the year 1980, Schaeffer, Musil and Kollinzas published a
book entitled “Total Communication: A signed speech program for non-verbal
children” [18] in which they lay the foundations for interaction among people
who are not able to speak. That work describes a complete sign language so
that these people can interact with others more effectively.

The speech signed program is an example of a system of signs, as classified
by Kiernan [19], in which the therapist introduces the user to the speech signed
language. It follows the structure of oral language with some spoken words
which are accompanied with signs. The real strength of this system is that
its use is based on the child’s overall development framework. The study of
common development enables us to understand the communicative disorders
that certain diseases cause. We can use this speech signed program without
special authorization or training and it can be modified to meet the personal
needs of people who might use it.

Its learning and use does not obstruct or hinder, or therefore slow the onset of
language, quite the opposite, it promotes and motivates language development.
Both this Alternative Communication System (ACS) and other alternative sys-
tems can be more than augmentative speech enhancers since they unlock this
unique way of communication and allow others to be developed. The theoretical
basis for ACS appeared in the USA in the year 1980, and a revised edition was
published in 1994. The proposed system can recognize a subset of Schaeffer’s
gestures (see Table 1). We aim to recognize the complete Schaeffer’s language
in the future.

The system has been designed as a tool which educators can use with autistic
children that do not speak at all. The curiosity and interest of children in
computers can accelerate the learning process that begins with the children just
repeating the gesture, but after some time it includes the pronunciation of the
words. The system can help to promote self-correction. There are three main
components in a sign: the final movement, the position of the hand (relative to
the body), and the shape of the hand.

Firstly, it is necessary to use tactile help to show the children how to develop
the gesture. Next, imitation aspects are very important and one of the steps
where the system could be helpful. Finally, verbal support include full words,
parts of words or even sentences. That part could also be improved with the
system.

3


Water Help Sandwich Sleep

Shower Clean Mom Dad

Want Sick Dirty

Table 1: Schaeffer’s gestures.

3 Gestures recognition

In this section we explain the full system by giving an overview and describing
the main device: the Microsoft Kinect v21. Next, we outline the gesture acqui-
sition and characterization process and describe the classification methodology.
Finally, we provide description of the model database.

3.1 System Overview

Figure 1 shows a diagram of the system. When a person makes a gesture in
front of the camera, the motion is captured by the Kinect v2. This information,
summarized and packaged, is what the system understands as a gesture. The

1https://dev.windows.com/en-us/kinect

4


gesture object is sent to the Gesture Class Pre Selection (GCPS) module, which
quickly executes with a naively selected subset of possible classes for the gesture,
discarding others to improve performance. Then, both the subset of possible
candidates classes and the gesture itself are sent to the classifier. The classifier
compares the unknown gesture with every gesture present in the model using
Dynamic Time Warping (DTW) [21], and it uses the Nearest Neighbor (NN)
algorithm [22] to select the one with the shortest distance. The class is returned
as the tag for the unknown gesture. This result is then sent to the user. The
whole process is executed online.

There is also an offline stage that handles the training of the model. Editing
and condensing processes are applied in order to obtain a fitted and error free
model.

Figure 1: System architecture

3.2 Acquiring gestures: Microsoft Kinect v2

Microsoft Kinect v2 is an RGB-D camera capable of capturing the color and
depth of a scene separately. The color stream is obtained with a common high
definition RGB digital camera and for the depth stream it sends light beams
that reflect on the surfaces and return to the sensor. By measuring the time
difference between the emission and reception it calculates the depth of the
element reflecting the beam. This process is known as Time of Flight (ToF).

This device is capable of capturing color images with a resolution of 1920 ×
1080 pixels, while the depth images are captured with a resolution of 512 × 424.
The device is also able to capture audio and its direction of origin, segment
elements such as bodies and other objects, and most importantly for the task
at hand: by using its API, it is also able to detect the joints of the skeleton of a
person, which makes us able to detect gestures. Kinect v2 is able to capture up
to 25 joints, although our recognition system only uses 11 joints for the upper
part of the body of the person, the others are ignored.

5


3.3 Gesture Characterization

Once Kinect has captured these 11 joints of interest, two tasks are carried
out before proceeding to the next module of the system: firstly, the points
are grouped by joint type in order to facilitate the DTW comparison process
in the classifier module, and then it runs a downsampling process in order to
speed up the system. Kinect captures information at 30 fps, which means 330
points per second as long as a gesture lasts. Working with this amount of data
implies a high computational cost, so the system reduces the information. The
downsampling method used is k-means clustering with 20 centroids.

In addition, it is necessary to obtain independence of the angle at which the
subject is located when performing the gesture and its position in the scene.
For this purpose, a change on the reference system of the captured points is
performed. Firstly, the system obtains two vectors, one from the neck joint to
the right shoulder one, and another from the neck joint to the head one. These
represent the X and Y axes of the new reference system. The Z axis is obtained
by performing the cross product of these two vectors. Then, the transformation
matrix is calculated from the rotation and translation between the new reference
system and the camera one. At last, the transformation matrix is multiplied
by all the points that compose the gesture. Some sample gestures and their
associated point clouds are shown in Figure 2.

Figure 2: Clean and Sleep gestures with their associated joint point clouds.

3.4 Gesture Class Pre Selection

The classification process compares the unknown gesture with all the gestures
that compose the model, making it possible to accelerate the process by com-
paring it only with a subset of gestures.

This GCPS module implements a series of naive but fast to evaluate rules
which examine the gesture features, such as hand position or point cloud cen-
troids, and it is able to discard some classes. For example, if the gesture for
Want is performed with the hand below the shoulders continuously, all gestures
performed above them are automatically discarded. In this way, some classes
are taken as impossible and pruned, so when the comparison with the model
occurs, it contains a reduced set of examples, which leads to an improved run-
time. The rules are evaluated in order and if the gesture does not meet any of
the rules, the classifier runs with the full model. See Table 2 for an example of
how the rules are checked.

The rules implemented by the GCPS are:

6


• Rule 1: If the performer keeps the hand all the time over his head and
there is a z-index variation below a threshold of 0.1 meters, the gesture
will be classified as “Shower”.

• Rule 2: If the performer moves the hand over and between his shoulders
all the time, and there is a z-index variation below a threshold of 0.05
meters, the gesture will be classified as “Dirty”.

• Rule 3: If the performer has his hands all the time closer than 0.22 meters
one from the other, the selected classes will be “Help” or “Sleep”.

• Rule 4: If the performer keeps his hand under and between his shoulders,
and far from his head 0.2 meters, the selected classes will be “Help”,
“Want” or “Sandwich”.

• Rule 5: If the performer moves his hand over his neck all the time, the
selected classes will be “Sick”, “Shower” or “Water”.

• Rule 6: If the centroid of the points that describe the hand motion when
performing a gesture is located to the right of the performer, the selected
classes will be “Water”, “Dad” or “Clean”.

• Rule 7: If the average distance between the hand and the head is below
a threshold of 0.28 meters, the selected classes will be “Mom”, “Dad”,
“Sick” or “Water”.

“Help” gesture

Rule 1: Is the performer keeping the hand all the time
over his head and there is a z-index variation below a
threshold of 0.1 meters? No. Evaluate the next rule.

Rule 2: Is the performer moving the hand over and be-
tween his shoulders all the time, and is there a z-index
variation below a threshold of 0.05 meters? No. Evalu-
ate the next rule.

Rule 3: Is the performer having his hands all the time
closer than 0.22 meters one from the other? Yes. Run
the classifier with examples of the “Help” and “Sleep”
classes.

Table 2: The GCPS module evaluates the rules from first to last (the order is
important). If at some point a rule is evaluated to true, the classifier will be
executed with the examples bonded to that rule.

3.5 Classifiers

In the classification process, the distance provided by DTW is calculated be-
tween the unknown gesture and every gesture that composes the model. The

7


distance between two gestures is calculated by adding the partial distances aris-
ing from comparing each point collection of each joint. That is why they are
packed in such a way in the Gesture module. The NN algorithm is then executed
and its class is returned as the label for the unknown gesture.

An early abandon technique is also applied as follows: after each comparison
between the unknown gesture and a gesture of the model, a check is performed
to determine if the distance returned is the minimum distance so far, in which
case it is stored. This distance is sent to the following comparison as a threshold
value and if at some point of the comparison between two gestures the partial
distance obtained is greater than that threshold, the algorithm finishes. In this
way, we improve the runtime of this module [24].

3.6 Model

The model is composed of all the gestures that the system has learned. These
gestures have been obtained from multiple persons who were recorded as they
performed every gesture several times. Then, the gestures were labeled and
stored. Within the model there are gestures that were not properly performed,
mislabeled, or provide redundant or useless information. To eliminate these
problems, two processes are performed.

Firstly, the editing algorithm [25] is applied so those mislabeled gestures are
discarded, and then the condensing or CNN algorithm [26] is executed. The
latter extracts only those examples that actually provide new information to
form the final model. This whole training process is performed offline. The
model used by the final recognition system is composed of 253 different gestures
spread over 11 classes .

4 Experiments

Two different sets of experiments were carried out. On the one hand, we tested
the system with a number of samples to prove its accuracy recognizing the
proposed gestures. On the other hand, a small group of children with autism
used the system under the supervision of educators to study the impact of its use
on the sign language and speech learning process. A screenshot of the developed
application is shown in Figure 3.

4.1 Gesture Recognition Accuracy

The experimentation consists of using the system to classify a collection of 264
gestures captured with Kinect v2. In this collection, there exist 24 samples of
each gesture type performed by five different persons. The classifier was set up
with various parameters in order to determine the configuration that provides
the best performance:

• Downsampling with 20 centroids, with GCPS activated and using k-Nearest
Neighbors (k-NN) (k = 3) for the classifier.

• Downsampling with 20 centroids, with GCPS deactivated and using NN
for the classifier.

8


Figure 3: Screenshot of the application which shows a person choosing a gesture
and performing it. The recognized gesture is shown on the right side.

• Downsampling with 20 centroids, with GCPS activated and using NN for
the classifier.

• Downsampling with 10 centroids, with GCPS activated and using NN for
the classifier.

Wa He Sa Sl Sh Si Cl Mo Da Wt Di
0

0.2

0.4

0.6

0.8

1

Gesture Type

S
u

cc
es

s
R

a
te

3NN, GCPS, 20 centroids
NN, no GCPS, 20 centroids

NN, GCPS, 20 centroids
NN, GCPS, 10 centroids

Figure 4: Success rate with the different parameter configurations.

Figure 4 shows the results for the different parameter configurations. The
system setup that provides the best success rate is the one downsampled with
20 centroids, with the GCPS module deactivated and using NN for the classi-
fier method. However, this configuration requires too much runtime, making
it impractical for real-time uses, which is what this project aims to address.
Figure 5 shows that the fastest method for gesture classification tasks is the 10
centroids, with the GCPS module activated and using NN, but its success rate
is below the threshold of acceptance. The second fastest system configuration,

9


the one downsampled with 20 centroids with the GCPS module activated and
using NN, provides a very high success rate, making it the best option with a
reasonable ratio of success rate to elapsed time. Figure 5 shows the average
elapsed run time of a five cross validation round for these four configurations (a
round is composed of 55 unknown gestures classifications).

3N
N

G
C
P
S

20
ce

nt
ro

id
s

N
N

no
G
C
P
S

20
ce

nt
ro

id
s

N
N

G
C
P
S

20
ce

nt
ro

id
s

N
N

G
C
P
S

10
ce

nt
ro

id
s

0

2,000

4,000

6,000

E
la

p
se

d
T

im
e

(m
s)

Figure 5: Execution time for a five cross validation round.

Table 3: Timing for different configurations. All values are in ms.

5CV
round

3NN
GCPS

20 centroids

NN
no GCPS
20 centr.

NN
GCPS

20 centroids

NN
GCPS

10 centroids
Total
time

GCPS
time

Class.
time

Total
time

Total
time

GCPS
time

Class.
time

Total
time

GCPS
time

Class.
time

1 5642 97 5545 1519 720 99 621 212 62 150

2 5396 58 5338 1463 563 59 504 179 29 150

3 6113 66 6047 1465 644 68 576 201 34 167

4 5708 60 5648 1399 632 63 569 190 31 159

5 5158 54 5104 1129 473 51 422 138 24 114

As seen in Table 3, running the system with the GCPS module activated
gives almost no impact in the overall runtime yet it improves the classification
stage runtime. As exposed in Section 3.4 the GCPS module preselects some
classes, decreasing the model size thus producing less comparisons in the clas-
sification stage.

Although the GCPS module introduces an error, it improves the execution
time in every case while providing a high success rate, so its use is justified. We
can see how the configurations with 20 centroids improve the execution time
as well, while the 10 centroids one provides the best runtime but fails in terms

10


success rate. Regarding the classification algorithm, the k-NN (k = 3) provides
a high success rate but with a prohibitive processing time. Instead of that, the
best option is to use the nearest neighbor algorithm, which not only provides a
high success rate but it is also faster.

So, in the light of the experiments, the best setup for the gesture classification
task is provided by the NN, GCPS activated and 20 centroids system.

The evaluation of the evolution of the gesture learning process should be
based on two main groups of features: sign component (shape, position respect
body and final movement), and conditions (in presence of the object or in ab-
sence of the object). Finally, it is also important to consider if it was necessary
full help, partial help or absence of help by the educator.

4.2 Learning Improvement

To evaluate the impact of the system in the children learning process there
are three main stages: gesture imitation, gesture and speech imitation, and
autonomous speech.

The study with a large group of children in parallel with and without the use
of the system was unable because of the absence of a large number of children
in the same learning stage. The system is used mainly with children between 2
and 5 years and only a few children (1 or 2) begin at the same time.

However, the qualitative evaluation of the educators indicates that, in all
cases, the learning process is accelerated. Once they pass the phase in which a
tactile help is necessary, the use of the system stimulates the visual imitation
and fosters self-correction thus reducing the average time to achieve autonomous
spoken communication.

5 Conclusions

This approach provides an innovative, customizable and reliable system for
Schaeffer’s gesture language detection and learning using Kinect v2. The main
use case of the system is providing a tool to help children with autism to com-
municate with gestures and speech. The system has been tested with excellent
results with a group of children between two and five years old. It is also oriented
to human-machine interaction for everyone, including people with cognitive dis-
abilities, who can use this system to communicate with another person or a
robot companion.

The system can only recognize a subset of 11 different gesture classes, but
we aim, as a future work, to recognize the whole of Schaeffer’s sign language
and implement a system to detect when a gesture starts and ends in order
to create a continuous real-time classification system. The main drawback of
the proposed system is the GCPS module, as it is necessary to achieve real-
time response, but its use decreases the accuracy of the system. In order to
improve the performance of the system and make it more suitable for a real-
time application with more gestures in the model, we also aim as a future work
to parallelize the NN module. In addition, the whole system could be deployed
in a heterogeneous computing platform for mobile robots such as the NVIDIA
Jetson TK1, which features a multi-core CPU together with a many-core GPU
for parallelization with low power consumption.

11


The proposed system has been tested with children, obtaining a positive
qualitative evaluation from the educators. We could not get a quantitative
evaluation due to the small population we can reach in a short period of time.
We plan as a future work evaluate the system with more children in order to
get a representative population.

Acknowledgments.

This work has been supported by the Spanish Government DPI2013-40534-R
Grant, supported with Feder funds. We would like to thank APSA, an asso-
ciation that helps people with mental handicaps in Alicante (Spain) for their
collaboration in the experiments with autistic children.

References

[1] F. Quek, ”Unencumbered gestural interaction”. IEEE Multi-Media 3, 1996.

[2] M. Turk, ”Handbook of Virtual Environments: Design, Implementation,
and Applications”. Hillsdale: K.M. Stanney, 2002.

[3] S. Lenman, ”Using marking menus to develop command sets for computer
vision based hand gesture interfaces”. NY: ACM Press, 2002.

[4] M. Nielsen, ”A procedurefor developing intuitive and ergonomic gesture in-
terfaces for HCI”. 5th International Gesture Workshop, 2003.

[5] A. Wexelblat, ”An approach to natural gesture in virtual environments”.
1995.

[6] F. Quek, ”Multimodal human discourse: gesture and speech”. 2002.

[7] R. Bolt, ”Put-that-there: voice and gesture at the graphics interface”. NY:
ACM Press, 1980.

[8] D.B. Koons, ”Iconic: speech and depictive gestures at the human-machine
interface”.NY: ACM Press, 1994.

[9] M. Billinghurst,” Put that where? Voice and gesture at the graphics inter-
face”. 1998.

[10] D. Bowman, ”Principles for the design of performance-oriented interaction
techniques”. Hillsdale: Lawrence Erlbaum Associates, 2002.

[11] J. Gabbard, ”A taxonomy of usability characteristics in virtual environ-
ments”. Department of Computer Science, University of Western Australia,
1997.

[12] V. Buchman, ”ingARtips: gesture based direct manipulation in augmented
reality”. NY: ACM Press, 2004.

[13] D. Sturman, ”Whole hand input”. MIT, 1992.

12


[14] A. Liu, ”A survey of surgical simulation: applications, technology, and
education”. 2003.

[15] D. Sturman, ”A survey of glove-based input”. 1994.

[16] E. Foxlin, ”Motion tracking requirements and technologies”. Hillsdale:
Lawrence Erlbaum Associates, 2002.

[17] I. Oikonomidis, N. Kyriazis and A.A. Argyros, ”Efficient model-based 3D
tracking of hand articulations using Kinect”, in Proceedings of the 22nd
British Machine Vision Conference, BMVC 2011, University of Dundee, UK,
Aug. 29-Sep. 1, 2011.

[18] B. Schaeffer, A. Musil and G.Kollinzas, ”Total Communication: A Signed
Speech Program for Nonverbal Children”. Research Press, 1980.

[19] C. Kiernan, ”Alternatives to speech: A review of research and manual and
other forms of communication with the mentally handicapped and other
noncommunication populations”. British Journal of Mental Subnormality,
1977.

[20] A. Rebollo et al., ”Diccionario de signos para alumnos con necesidades
educativas especiales en el área de comunicación/lenguaje : programa de
comunicación total habla signada de B. Schaeffer”. Conserjeŕıa de Educación
y Universidades de la región de Murcia, 2011.

[21] H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for
spoken word recognition, Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol 26, 1978.

[22] S. Arya, et al, ”An optimal algorithm for approximate nearest neighbor
searching in fixed dimensions”. University of Maryland, 1994.

[23] S. P. Lloyd, ”Least square quantization un PCM”. Bell Telephone Labora-
tories Paper, 1982.

[24] J. Li and Y. Wang, ”EA DTW: Early abandon to accelerate exactly warp-
ing matching of time series”. College of Computer Science and Technology,
Huazhong University of Science and Technology, 2007.

[25] D. S. Wilson, ”Asymptotic properties of nearest neighbor rules using edited
data”. IEEE Transactions on Systems, Man, and Cybernetics, vol. smc-2,
no 3, 1972.

[26] P.E. Hart, The condensed nearest neighbor rule. IEEE Transactions on
Information Theory, IT-14(3):515516, 1968.

[27] S.D. Kelly, S.M. Manning and S. Rodak, Gesture Gives a Hand to Language
and Learning: Perspectives from Cognitive Neuroscience, Developmental
Psychology and Education,Language and Linguistics Compass 2 (2008).

[28] S. Wagner-Cook and S. Goldin-Meadow, The Role of Gesture in Learning:
Do Children Use Their Hands to Change Their Minds?, Journal of Cognition
and Development, 7(2), 211232, 2006.

13


[29] J.B. Schwartz, C. Nye, Improving Communication for Children with
Autism: Does Sign Language Work?, EBP Briefs, 1(2), 1-17, 2006.

[30] R. Bakeman, J.M. Gottman, Observing interaction. An introduction to
sequential analysis, II edn. Cambridge University Press, New York 1997.

[31] J.B. Bavelas, N. Chovil,D.A. Lawrie, A. Wade, Interactive gestures. Dis-
course Processes 15, 469489, 1992.

[32] P. Ekman, W.V Friesen, The repertoire of nonverbal behavior. Semiotica
1, 4998, 1969.

[33] A. Kendon, Gesture and speech: How they interact. In: Wiemann, J.M.,
Harrison, R.P. (eds.) Nonverbal Interaction, pp. 1345. Sage Publications,
Beverly Hills, 1983.

[34] D. McNeill, Hand and Mind. The University of Chicago Press, Chicago,
1992.

14