SCG Gesture Coding Manual

This SCG gesture coding manual describes our methods for gesture labeling, labeling them for their movement, so that we can explore the relationship between the kinematics of gesture with: prosody, intonational breaks, morphosyntax, discourse coherence, and pragmatics.

Expand and collapse the toggles below to show and hide more information, labeling notes, and extended support for labeling each features.

Methodology

ELAN Annotation Tool

We use ELAN, a video annotation tool created by Max Planck Institute, to label the co-speech gestures in our video samples. To download and learn more about how to use ELAN, check out http://tla.mpi.nl/tools/tla-tools/elan/. Some tips for using ELAN are provided below.

Process

Our coding process alleviates the decision-making burden on the labeler by focusing on a single cognitive task at a time. This means that a labeler may have multiple passes even when labeling a particular dimension, such as the gesture stroke. For each pass, the labeler does only one thing. The broad task of identifying the stroke is not the same as the detailed task of defining the stroke boundaries. Some labelers may prefer to organize their work flow in other ways, and that is okay. Acknowledging and separating out the tasks by what kind of mental work the labeler is doing can have implications for future automatic labeling of gestures.

Mute the Audio

Most of the features described in this labelling manual are kinematic, and are labeled while the video is muted. This prevents the speech signal from influencing the labeler's perception of what is going on with the hands. So unless otherwise mentioned, these features are labeled with the video sound off.

Label by Video Frame

Most videos are 30 frames per second. Be sure to restrict your annotation boundaries to video frames. You man see a still frame in ELAN in the milliseconds between video frames, but that is merely computed. There is an ELAN preferences setting that allows you to restrict your labeling to video frames.

Initial Setup

Starting from an unlabeled video sample, we create a tier to chunk areas of non-gesturing. We then go back to the beginning of the sample and create a new tier for labeling more defined chunks of gesturing. These chunks are later refined into Perceived Gesture Groupings (PGGs). Next, focusing on these chunks, we roughly label for where the strokes are. Refining the boundaries of the stroke happens when labeling for phases.

Terms
- Labeler: Someone who annotates or labels the gesture features
- Labels and annotations: Individual annotated tokens
- Annotation value: The text or label of the annotation
- Annotation range: The start to end of an annotation

Tips for labeling
- Use a question mark at the end if you are unsure
- Use forward slash “/” when unable to decide between two annotation values, with the more likely one first

Tips for Workflow
- Take some time to explore the ELAN preferences panel. Some options like “center on selected annotation” can be turned off so that the annotation you just selected doesn’t suddenly jump to the center of the screen. <IMAGE OF ELAN PREFERENCES PANEL>
- Use the horizontal zoom slider to zoom in to see in more detail, or zoom out to see the bigger picture. <IMAGE>
- Play a selection multiple times with context. Select a half second or more before and after the area of interest and use the Play Selection feature to play it multiple times to help you decide.
- Copy the strokes or SDGs tier for labeling features like trajectory shape or hand shape, so there's less work to recreate and line up the annotation boundaries.

Perceived Gesture Groupings (PGGs)

Gestural strokes can appear grouped together. They may look like a set of repeated hits, beats, or something else. Novice labelers can identify perceived gesture groupings (PGGs) even when the boundaries between strokes may not be so clear.

When labeling PGGs, include the gesture strokes that appear to group together. As you do this, you will develop mental hypotheses about why and how you are grouping gestures. Save your ideas and notes for later discussion. The PGG annotation range does not have to be perfect, so long as all the gesture strokes that should be in a group are included in the range. The start and end boundaries will be later refined to include all the individual phases around the stroke.

If you feel that you need a hierarchy of PGGs, feel free to do so. We use PGG1 for the smallest groups and PGG2 for larger, higher-level groups (where a PGG1 is grouped with other PGG1s or with a single gesture). In our experience, a third level grouping, PGG3, may be found for some speakers and for longer videos (In one 30-minute sample, we found only 13 PGG3s).

Annotation value

Leave the annotation value blank. When you are done labeling the entire sample, use ELAN’s “Label and Number Annotations” tool to automatically label and number the PGG annotations. We often use this as an ID number to refer to the PGG when referring to the PGG. It is more convenient than referring to the time stamp.

Reliability

Before looking at PGG reliability, SDG labels must be finalized.

Because perception of gesture grouping levels can vary slightly between labelers, we have two labelers annotate the same sample, with a consensus labeling round for the final annotations. These differences are usually about the level of detail, for example, PGG1 for one labeler may be a PGG2 for another labeler. Disagreements are only counted when the larger PGG does not contain all the strokes of the smaller PGG that it overlaps with. <INSERT EXAMPLE OF DISAGREEMENT, EXAMPLE OF AGREEMENT>

Gesture Strokes and other Phases

Following Kendon's widely accepted view, we label Gesture strokes and their related phases. Strokes are described as the peak of effort in a G-Phrase in Kendon 1980: “A phrase of gesticulation, or G-Phrase is distinguished for every phase in the excursionary movement in which the limb, or part of it, shows a distinct peaking of effort, ‘effort’ here used in the technical sense of Rudolf Laban (Dell 1970). Such an effort peak, or less technically, such a moment of accented movement, is termed the stroke of the G-Phrase.” The stroke phase is necessary for identification of the movement as a gesticulation unit. There can also be supporting phases around the stroke, including preparation, hold, and recovery from the peak. These phases are not always used.

The phases we label are:

preparation phase: movement to the onset of a stroke

pre-stroke hold: static hold before a stroke

stroke: movement with the peak of effort

post-stroke hold: static hold after a stroke

recovery: movement to rest position

relaxed: static rest position

Recovery is when the hand is in motion, moving to a rest position. Relaxed is when the hand is not moving, in a rest position. Relaxed cannot occur without recovery. This means that between two strokes, if there is a non-moving phase, the non-moving phase is labeled as post-stroke hold.

Annotation range

It can be difficult to label where gesture strokes start and end. The gesture may be too slow, too fast, or too blurry. For the case of the video being too blurry – our video framerate is 30 frames per second, and though that is much too low for automatic motion capture, it did allow us to use the frames where the hands clear up in the frames as boundary markers for the annotation label. "Clearing up" means that the hand has slowed down

Progression of blurry to clear video frames

(Image here)

Annotation value

Each gesture phase has its own tier. The annotation value is created with “Label and Number Annotations” tool under the Tier menu option. After all phases are labeled and reviewed, they may be combined into a single tier.

Motion Tracking

If you are able to use motion tracking, use the calculated velocity and acceleration to help determine the start and end of the gesture strokes. As for what to track, you can choose the point of the index finger or the center of the palm.

Stroke-Defined Groupings (SDGs)

Stroke-defined groupings are similar to Kendon's G-Phrase, but includes the relaxed phase. It allows for focus on the kinematics of movement, separate from the semantics and pragmatics of gesturing. Including the relaxed phase in SDGs allows us to test various hypotheses related to intonational breaks in speech and discourse structure.

Annotation range

SDGs include all the phases around a stroke. So the start and end boundary depends on what phases are associated with a stroke.

Annotation value

The annotation value is created with "Label and Number Annotations" tool under the Tier menu option.

Handedness

Handedness refers to which hand or hands are actively gesturing during the stroke. Both hands may be moving in synchrony or asynchrony. One hand may be moving while the other hand is inactive or aimless.

Annotation values
- BHES: Both hands equal synchronous
- BHEA: Both hands equal asynchronous
- RHD: Right hand dominant (left hand may be aimlessly moving)
- RH: Right hand (left hand is not moving)
- LHD: Left hand dominant (right hand may be aimlessly moving)
- LH: Left hand (right hand is not moving)

Annotation range

Handedness is labeled for what is happening during the stroke but the annotation range is the same as the SDG. Make a copy of the SDG tier and use ELAN's "Remove Annotations or Values" tool under the Tier menu option to remove only the values. Use this as a blank template for labeling for handedness.

Palm Orientation

Palm orientation does not always have to be labeled. It is important to know it exists when labeling for hand shape. Having palm orientation as a separate kinematic feature allows hand shape labeling to have a smaller list of options to choose from.

Palm orientation is labeled for palm orientation during the stroke. If the hand rotates during the stroke, palm orientation is labeled for the direction the palm is facing near the end of the stroke. It captures a moment in time, unlike trajectory shape. The annotation value for palm orientation can be a combination of the options below, in order to capture the 3-dimensional cartesian direction the palm is facing. For example, when the palms are facing the head, it would be SU for Self and Up. If the palms face each other, it would be I for Inside.

Annotation value
- U: Up
- D: Down
- L: Left
- R: Right
- I: Inside
- O: Outside
- S: Self

Annotation range

Unless you are interested in change in palm orientation during a stroke or SDG, the annotation range can be the same as the SDG. Make a copy of the SDG tier and use ELAN's "Remove Annotations or Values" tool under the Tier menu option to remove only the values. Use this as a blank template for labeling for palm orientation.

Hand Shape

Our hand shapes are labeled independently of palm orientation. This helps keep the number of different hand shapes to a minimum.

You can add more to the library if they are distinct hand shapes. For uncertainties between two hand shapes, we used a 5-point slider. We use both hand shape abbreviations with a number from 1-5 in between. For example: C4O. The first letter indicates that it is a cupped shape that the labeler initially saw, the number indicates how close this shape is to the second handshape, Open. C4O means it more like Open than Cup. It is equivalent to O2C. So, the order does not matter.

Here's our list of hand shapes to use. Check the library of hand shapes at the end of this page. It contains images of the different hand shapes as a reference.

Annotation range

Hand shape usually stays the same during a stroke or SDG, so the annotation range can be the same as the SDG. Make a copy of the SDG tier and use ELAN's "Remove Annotations or Values" tool under the Tier menu option to remove only the values. Use this as a blank template for labeling for hand shape. Depending on your research question, you may want to do this differently.

Annotation value

The value is usually a single capital letter abbreviated from the word that would best describe the hand shape. For example, we use “F” for “fist,” “O” for “Open,” and “R” for “Relaxed.”
- D: Deictic, pointing
- G: Gun
- F: Fist
- R: Relaxed
- O: Open, fingers spread outwards
- C: Cup or claw
- K: Knife, fingers straight together and flat
- A: Angled, fingers flat together, bent at an angle (about 90º) above palm
- H: Hole, making an empty cylinder shape
- Q: Okay, emblematic "okay" sign
- T: Two or three fingers (not thumb) extended, emblematic peace sign
- L: Loose fist, relaxed shape with fingers curled in

Two-handed hand shapes

Two-handed hand gestures often use the same hand shape. We label them the same way as one-handed hand gestures. If they are different, label for the dominant hand.

When the two hands have different hand shapes, label for the dominant gesturing hand. If the non-dominant hand is not passive, label for the hand shape in a tier called non-dominant hand shape. Since there is a handedness tier that defines which hand is the dominant gesturing hand, you can compute which hand has which hand shape. This decision is focused on minimizing number of tiers needed for labeling, and making it easier for labelers to label for hand shape. If you prefer to have a separate hand shape tiers for left and right hands, you may do so.

Individual speaker differences

Hand shapes may vary for different speakers. One speaker’s default “Cup” hand shape may be more rounded than another speaker’s “Cup” hand shape. If you are doing a inter-speaker comparison of handshapes, we recommend going through the video and take screenshots of the different handshapes used.

Location

Location refers to where the hands are in respect to the body.

Slight movements that are visible might not be captured as having a location change because the trajectory motion does not travel far enough to qualify as a substantial change.

Annotation value

Use the mapping grid to for the values to use. Location is annotated as (right_hand_y),(right_hand_x);(left_hand_y),(left_hand_x). The y-values can have half values. For example, “3.5” would refer to the middle of the torso. Annotating the right hand first makes the labeling process faster.

(Insert image)

Annotation range

Every gesture stroke has a start location and an end location even if the locations are the same. For large gestures that cover more space, also label the extremes in between the start and end locations. Location can be labeled in a single tier, copying the stroke tier, and contain both the start and end locations as the annotation value. Or location can be labeled in two tiers, one for start location, one for end location, and, if necessary, one for location between if the trajectory path follows an extreme curvature.

Automatic tracking

Location labeling can use automatic tracking data for the hands and body. Make sure to use hand location relative to the body, rather than absolute location.

Analysis

Several analyses can be done for location: distance traveled during a stroke, comparing strokes that have similar start or end locations, and more.

Trajectory Shape

Trajectory shape labeling describes the shape of the gesture stroke's trajectory.

Observing the active hand movement, imagine the stroke tracing out a straight path, curved path, or, if looking at multiple gesture strokes, a looping path.

To label for trajectory shape, play the video segment of the stroke and then the entire SDG or a longer segment that includes the stroke to verify the trajectory shape. Play the segment over and over a few times. Identify the trajectory shape, making sure it refers to the stroke, not the preparation phase or the recovery phase.

Hand shape usually stays the same during a stroke or SDG, so the annotation range can be the same as the SDG.

Annotation value

The annotation value is either S, S-H, S-D, C, or L for each type of trajectory shape.
- S: Straight
- S-H: Straight-horizontal
- S-D: Straigt-diagonal
- C: Curved
- L: Looping

Annotation range

Make a copy of the SDG tier and then use ELAN's "Remove Annotations or Values" tool under the Tier menu option to remove only the values. Use this as a blank template for labeling for trajectory shape.

Straight trajectory shape variations

The straight trajectory shape is usually a vertical down or up motion, and can occasionally be diagonal–in most examples we’ve seen, the motion is moving upwards and out– or horizontal, usually outwards. Keep in mind these variations are usually infrequent perhaps because they take more energy to carry out, and may occur more often for a particular speaker.

Curved versus straight trajectory shape variations

When deciding between labels for curved or the variations of straight-horizontal and straight-diagonal, be sure to review the video snippet containing the entire SDG with all its phases and to evaluate the influences of the preparation and recovery phases on your perception of the stroke trajectory phase. Take them into consideration as contextual clues but remember that the trajectory shape label refers to the stroke phase, not to any of the other phases.

Looping trajectory shape

The looping trajectory shape must apply to multiple consecutive strokes that do not have a visual or temporal pause between them. A single loop without preceding or succeeding loops would be labeled as a curved trajectory shape. Regarding successive looping strokes, they are cut into individual “loops” at the end of the loop, or the lowest point in the loop, and each are labeled as a stroke.

Curved versus looping trajectory shape

The main difference between curved and looping is that there is a pause, hold phase, or break in movement path, between successive curved strokes, whereas these do not occur between successive looping strokes. Another helpful cue is the preparation phase which may occur before each curved strokes, but only before a set of successive looping strokes. Likewise with the recovery phase, which may occur after each curved stroke, but only at the end of a set of successive looping strokes.

Speech Communication Group