Construction

Collection Protocol and Procedure

Subjects:

44 subjects includes 22 females and 20 males with age ranging form 19 to 30

Subjects who major in Drama at the National Taiwan University of Arts (NTUA)

Dyadic groups: 7 female-female pair, 10 female-male pair, and 5 male-male pair

Environment:

All scenes performed at student performance room.

The hypothesized stage was in a real-life at home.

6 affective atmosphere: angry, sadness, happiness, frustration, neutral and surprise.

Procedure:

Before real recording, the director will check acting and give advices to actors.

The length of each session is about 3 minutes.

Actors are asked to performance as spontaneous interaction.

Multimodal Recording Setup

Annotation

We have 44 peer-reports, 1 director-report, 1 self-report, and 4 observer-reports (a total of 49 unique raters) for every actor in each session. This large pool of raters from different background of view points is another unique novelty of this database.

Perspectives: Peer, Director, Self, Naïve-Observer
Continuous-in-time annotation
1. Annotators rate the activation and valence (one at a time) of each actor (sampling rate is 1 second/frame) for every session.
2. The value of annotations ranged from -1 to

Use the Feeltrace software.

Discrete session-level annotation
1. Dimensional (Valence-Activation ranging from 1 to 5)
2. Categorical (six categories: angry, happy, sad, neutral, frustration, surprise)

Participants

The participants were recruited from the Department of Drama at the National Taiwan University of Arts (NTUA).
There were a total of 44 subjects (22 females, 20 males) with age ranged from 19 to 30.
The 44 subjects were paired into dyadic groups (seven female-female, ten female-male, and five male-male pairs).

Dual-channel Audio

Each actor in the dyadic scene wore a Bluetooth wireless closed-up microphone (CAROL BTM- 210C).
Wireless microphones to avoid interfering with actors natural behaviors.
The two wireless audio signal streams were transmitted to a multi-channel digital recorder (ROLAND BR-800) for dual-channel time synchronization.
The setting was set to sample at 44.1 kHz with 24-bit AD conversion.
Manually segmented all audio files (two in every session with each lasted ap- proximately 3 minutes long) into spoken utterances(in a total of 6701 utterances).
Marked each utterances as speech, laugh, sigh, sobbing, or audience background noise in order to enable further studies in understanding the role of non-verbal vocalizations in affective interactions.

HD　Video

Each interaction session was recorded using a high definition camcorder (SONY HDR-P J790V).
The specification of the raw video data was 1920 x 1080, 60P, and 28Mbps.
The video camera was placed in front of the acting area at a fixed position that enabled capturing of both actors’ movements on scene.
The actors were free to move as naturally as possible and to stay within the field view of the HD camera.
These videos were also used for annotation of emotion

Electrocardiogram (ECG)

In every session, each actor wore a multichannel physiological integrated circuit from Texas Instrument (ADS1292R) as the analog front-end circuit(low-power integrated circuit with a 24-bit analog-to-digital converter and the sampling rate 250 Hz).

Manual Transcript

Manually completed the transcripts for all of the sessions.

NNIME

The NTHU-NTUA Chinese Interactive Multimodal Emotion Corpus

Construction