Week 11 – Prototype Refinement and Dataset Expansion

Team Noesys discussing potential improvements to the audio model

Our team has been busy this week preparing for our Project Implementation Design (PID) presentation. The text modality team created a comprehensive CSV with over 1,000 sentences labeled across seven emotion classes, expanding our training and testing capabilities. We fine-tuned our BART model to work more effectively across different datasets. We’ve continued iterating on our demo webapp, improving live emotional prediction and transcript recording functionalities.

The audio team implemented weighted loss and balanced sampling techniques for COVAREP LSTM and wav2vec2 models, training on the MOSEI dataset. Our visual team obtained ResEmoteNet, a new open-source model with pre-trained weights, and developed a combined dataset from our previous resources.

Next week, we’ll be preparing for our PID presentation. We’ll finalize our prototype, prepare presentation slides and the demo video, and focus on further model improvements. The text team plans to expand their sentence collection and combine it with MOSEI for training. Our fusion team will implement weighted loss in the intermediate fusion model, while the audio team will explore LSTM implementations from the CMU-MOSEI paper. The visual team will fine-tune and test new and existing models on our combined dataset. We’ll also enhance our demo’s data flow and potentially add summary generation functionality.

Leave a Reply

Your email address will not be published. Required fields are marked *