
This week, we started creating our emotion-labeled sentence dataset with over 500 entries categorized into our seven emotion classes, which will be used to generate our custom testing dataset. We also developed our demonstration webapp, which now implements live emotional prediction and transcript recording. This provides a tangible way to showcase our technology to stakeholders and users.
Our audio team implemented weighted loss for both Wav2Vec and Whisper fine-tuning to address class imbalance issues. Meanwhile, the visual team expanded our dataset resources by obtaining two new datasets: Emo135 and ExpW-Cleaned. They also successfully tested EmotionCLIP on the Affectnet-YOLO dataset and validated video functionality with the MOSEI dataset. Our late fusion system was tested on CMU-MOSEI using all our best-performing models.
For the coming week, we’ll focus on collecting recordings of sentences from our dataset to finalize our multimodal testing data. The transcript team will complete BART fine-tuning and compare its performance against our other models. Our fusion efforts will concentrate on implementing weighted loss functions, while the audio team will develop LSTM capabilities for COVAREP features and survey state-of-the-art emotional classification models. The visual team will evaluate current model performance on our new datasets and prepare the ExpW dataset for training. We’ll also enhance our demo by adding individual modality prediction information and incorporating pitch and volume markers.