Speech Communication Group

Gesture Research » Training

This training is designed for new undergraduate researchers to familiarize themselves with gesture labeling, best practices for data management


  • Complete the CITI certification
  • Read select research articles
  • Introduction to Gesture Labeling
  • Access to the Corpora
  • Data Management Practices
  • Transcription and ToBI labeling
  • Self Training Procedure

At the beginning, meetings will inform you of the other tiers you need to be aware of before labeling each tier. Meetings will also review any difficulties you've come across and answer your questions. We also have write-ups of difficult situations and questions (with feedback) other labelers have when they started labeling those tiers. After confirming your understanding and expertise in labeling a specific tier, you may label other samples for that tier.

CITI certification

  • If you have not already done so, please complete the CITI course for Human Subject Research: https://about.citiprogram.org/en/series/human-subjects-research-hsr/ (Or: https://www.citiprogram.org/index.cfm?pageID=154)
  • Click on the button that says "Buy Now". As an MIT student, you will not need to pay for it. It will take you to a page where you can choose your institution, and you'll be asked to log in with your MIT Certificate.
  • Email the certificate to your supervisors. We need your certificate for our records.

Read Research Articles

  1. Kendon 1980 (introduces gesture groupings) (1.5MB) https://www.dropbox.com/s/2kykl4afjdrn637/kendon_1980_gesticulation_speech.pdf?dl=0
  2. McNeill 1985 So you think gestures are non-verbal (1.7MB) https://www.dropbox.com/s/r7exqp23275q6v4/McNeill_1985_so-you-think-gestures-are-nonverbal.pdf?dl=0

Introduction to Gesture Labeling

Pre-meeting preparation

  1. Download and install ELAN (85MB for MacOS): https://archive.mpi.nl/tla/elan/download
  2. Download the video (Insert video link)
    • Watch and listen. Think about:
      • What do you notice about the speaker's gestures?
      • How do the gestures relate to the speech, e.g. in timing and in meaning?
      • What body parts are gesturing during speech?
  3. Set up requirements (ELAN Preferences)
    • Understanding video frames
  4. How to create a new ELAN file
  5. How to create a new tier
  6. How to create an annotation
    • Some tiers (PGGs and SDGs) do not need labels, labels can be auto-populated.

See the Data for how to name tiers and how to save and manage your work

Dropbox Folder

In the folder you'll find the two main folders, Corpora, and Readings for UROPs. The Corpora folder contains our entire Corpus. For starters, until you have gained expertise,

  • Do not delete anything.
  • Any new files you add that you work on should have your initials appended to the filename.

Data Management Practices

All completed labels are saved as individual Praat textgrid files. This means they need to be reviewed and vetted by an experienced labeler. The work done before the review also gets saved into an archive folder.

The Praat textgrid file format is easy to import into and export from ELAN and open in Praat in case we need to do that. It also allows for a better way to manage inventory of completed labels. ELAN files can have multiple tiers, so we think of it more of a workspace. We use spreadsheets for analysis, and can either export the tier from ELAN to a spreadsheet format, or use an in-house script to convert the Praat textgrid file to csv.

Some tiers are labeled together such as PGG1, PGG2, and PGG3. In addition to individual files, It may be convenient to have a textgrid file that includes all three of them, to make it easier to import for analysis.

  • Note on importing textgrids into ELAN

    When you import Praat textgrids into ELAN, make sure to click on the checkbox for "Skip empty intervals / annotations" so you don't get the in-between blank intervals.

  • Exporting data for spreadsheets
    1. Select a tier to export.
    2. Go to File → Export As… → Tab-delimited Text…
    3. In the dialog box, under Select tiers, select only the SDG tier.
    4. Under Output options, only select "Separate column for each tier"
    5. Under Include time column for, check only the boxes for "Begin Time" and "End Time"
    6. Under Include time format, check only the "ss.msec" box.
    7. Save the file in the "data" folder for that speaker.
    8. If you see any pop-up dialog boxes about encoding, go with the default setting of UTF-8.
    9. This file can be opened in any spreadsheet program. If it does not work, open it with a plain text editor program and copy the contents. Then paste it into a spreadsheet.
  • Our data layout

    In our main spreadsheet, each case is an SDG.

    The attributes include:

    • Start time of SDG
    • End time of SDG
    • Start time of stroke
    • End time of stroke
    • PGG1 ID that contains this SDG
    • PGG2 ID that contains this SDG
    • Whether it has a preparation phase
    • Whether it has a pre-stroke hold
    • Handedness label
    • Trajectory shape label
    • Hand shape label
    • … and many more

Typical Order of Gesture Tier Labeling

This is the typical order of gesture tier labeling at Speech Communication Group. As you start labeling specific tiers, you will need to be familiar with other more detailed tiers to make informed decisions.

  1. Introduction to PGGs and strokes. Introduction to phases (you will need to know about phases before you label PGGs and strokes.)
    • Learn to label for PGGs and strokes. Imprecise labeling.
  2. Learn about what normal phases and strokes look like, and learn about what odd phases look like. (for example upwards strokes and downward prep.) Learn about trajectory shape to help think about this.
    • Label for phases, be sure to be aligned with video frames. Refine stroke labels as you do so. Precise labeling.
  3. Review strokes and phases labels. Consensus labeling if using a new sample.
  4. Create SDG tier based on stroke and phase labels.
  5. Learn handedness labeling. Copy SDG tier, clear out tier annotation labels, use blank annotations to label for handedness of strokes.
  6. Learn and review trajectory shape labeling. Label copied SDG tier for trajectory shape labeling.
  7. Learn about hand shapes. Read and review notes on hand shape characteristics. Learn about hand shape difference labeling. Label hand shape differences.

Transcription and ToBI (RPT) Labeling

This can be simultaneous to gesture labeling.

  1. Learn to use Praat.
  2. Transcribe the sound file for words. This may mean starting from scratch or correcting an AI-transcribed file.
  3. Make a copy of the words tier to create the syllables tier.
  4. RPT (Rapid Prosodic T?) labeling for syllables that have pitch accent.
  5. Intonational Phrases
  • Note on creating the syllable tier

    In Praat, make sure you have the cursor between where you want to break the word apart, then click in the place on then waveform where you want to split the word.

  • AI automatic transcription

    There are many inexpensive options for automatic transcription. To get the best results, be sure to find one that gives you the start and end times of each word, rather than chunks of words like closed captioning or subtitles for videos.

Self-Training Procedure

Overview: Use a pre-labeled sample. Observe how it is labeled. Hide the labels. Label it on your own. Unhide the pre-labeled section. Compare your labels. Discuss.

  1. If you are training using *.textgrid files, start by creating a new ELAN file. (ELAN: File> New> Add Media File, Select the video file from the media folder). If you are training with an ELAN file *.eaf, skip steps 2 and 3.
  2. Locate the *.textgrid file you want to use for training.
  3. Import the *.textgrid files into the ELAN file (ELAN: File> Import > Praat TextGrid File…) Be sure to check the box for "Skip empty intervals / annotations"
  4. If the tier or dimension needs to be annotated without sound, make sure your computer is muted or use the ELAN controls to mute the video.
  5. Observe the first 30 to 60 seconds of the annotations on the pre-labeled tier and familiarize yourself with them. Where does each annotation start and end? Why do you think that is? Does the label for the annotation make sense to you? Why is it labeled this way?
  6. Hide the tier. (Right click on the tier name, "Hide…")
    • Note

      If you are learning to label a tier that has pre-determined annotation locations, such as handedness or hand shape, you can just copy the original (usually the SDG) tier, and then empty the annotations so that you have a tier with blank annotations. (ELAN: Tier> Remove Annotations or Values > make sure to deselect any select tiers and select ONLY the copied tier. Make sure you select "Annotation Values" and not "Annotations" and choose "All annotations"). This will clear out the annotation values.

  7. Create a new tier (Tier → New Tier…) for your labels.
  8. Label the same segment you have observed.
  9. Once you are done, unhide the pre-labeled handedness tier (Right click on any tier name, go to "Visible tiers…" , select the pre-labeled tier to unhide it)
  10. Compare with the original labels. How did you do? Where do you agree or disagree? Why do you think the differences arise? If possible, discuss with someone with more expertise in labeling this dimension.

After you have discussed your labels with more experienced labelers and have gained confidence in labeling this specific tier, you can start labeling a sample that does not already have that tier labeled.

Notes for Labeling Specific Tiers

Occluded Regions

The occlusion tier is for marking areas of occlusion so that we can ignore them when labeling for gestures. For example, the video may have an image or text box that partially or completely covers the speaker. Sometimes when hands are partially occluded we can still see or surmise what is going on.

  • Cases that are not marked as occlusion
    • In places where you can only one hand actively gesturing, do not label it as occlusion.
    • You can mark it as occluded if the hands are out of the frame for a long time. If they are just going out of the frame for a short time, you don't need to mark it as occluded.
    • When the hands are behind the speaker's back, you also don't need to mark that as occluded.

Labeling Phases

Refer to the Coding Manual on labeling phases. Depending on the current hypotheses we are exploring, phases may be labeled in a single tier or on separate tiers.

Create the SDG Labels

Stroke-defined groupings (SDGs) contain a stroke and its accompanying phases (preparation, pre-stroke hold, stroke, post-stroke hold, recovery, relaxed) Note that the relaxed is just the pause between gestures and is grouped within SDGs for data analysis.

  1. Create a new tier and label it SDG.
  2. Double-click on the tier name in ELAN to make it the "active tier."
  3. Select and copy (Command+C) the first phase of a set of phases around a single stroke. It would be either a preparation phase or stroke.
  4. Paste (Command+V) the annotation, and it will show up in the active tier.
  5. Select and paste the last phase into the active tier.
  6. Right-click on the pasted last phase to bring up the option to merge the annotation with the previous annotation.
  7. Do this for the entire sample.
  8. Once you're done, use ELAN's automatic numbering feature to give each one of them an ID we can refer to. Go to Tier > Label and number annotations… and select options so you get an output label that looks like "SDG-123." Check to make sure you have the correct tier selected before you make the changes.
  • Tip

    To make the merging step faster, you can set up a custom keyboard shortcut in ELAN to do this.

How to Copy the SDG tier

Some tiers like handedness and trajectory shape are features of the strokes and labeled based on the SDG tier labels. Instead of creating new annotations, just copy the SDG tier and remove the annotation values. This gives you blank annotations to label for those features.

  1. Copy the SDG tier (ELAN: Tier> Copy Tier) Select the SDG tier. Don't need to check any checkboxes. Just click on the next button until you get to the finish button.)
  2. Rename your copied tier (Right click on the tier name, "Change attributes of…") and change the tier name.
  3. Clear the annotations for your copied tier (ELAN: Tier> Remove Annotations or Values > deselect any select tiers and select ONLY the copied tier. Make sure you select "Annotation Values" and not "Annotations" and choose "All annotations"

Handedness Labeling

Note that the left hand and right hand refers to the speaker's perspective

  1. Copy the SDG tier, and remove the annotation values.
  2. Label for handedness. Refer to the Coding Manual for how to label handedness.
  3. Discuss any difficulties in the next meeting.
  • Tip

    Put sticky notes on the top corners of your computer screen to remind you which side is the speaker's left and which side is right.

Trajectory Shape Labeling

  1. Find a short 30-second sample that already has trajectory shape labeled.
  2. If you can only find the trajectory shape labels as a Praat textgrid file, import the textgrid into ELAN.
  3. Hide the pre-labeled trajectory shape tier.
  4. Create your own tier and label for trajectory shape.
  5. Un-hide the pre-labeled tier and compare.
  6. How did you do? What do you notice?
  7. Take note of any disagreements, difficulties, and thoughts. They will be discussed in the next meeting.
  • Note

    Trajectory shape is labeled using the duration of the SDG, but refers to the trajectory shape of the stroke. This can be difficult since the preparation phase lends more curvature to the stroke, and does influence the trajectory shape label. Ideally we want to focus on the trajectory shape of the stroke. (After our meeting, we will ask you to label trajectory shape for Louvre sample)