Week 6 – Late Fusion Testing

Example data from CMU-MOSEI

This week our team integrated text and bounding box components into our late fusion model. We verified that our fusion accuracy successfully exceeds that of individual modalities, validating our multi-modal approach. Our audio team completed 1D CNN model evaluations and began testing wav2vec2 for improved performance. Meanwhile, the video team conducted extensive cross-dataset evaluations, testing CLIP trained on AffectNet-YOLO against CMU-MOSEI data, and running Dinov2 evaluations trained on CMU-MOSEI against AffectNet-YOLO.

Next week, we’ll focus on addressing class imbalances in our datasets through weighted loss functions and will begin reporting results using macro f1 scores for more accurate performance metrics. We’re working to incorporate training datasets that better represent our intended use case. The audio team will continue finetuning and evaluating wav2vec2 while performing hyperparameter tuning on models trained on MELD. Our video team plans to cross-compare CLIP and Dinov2 performance and identify another compatible commercial dataset for additional cross-evaluation. All teams will standardize their evaluation approach by training and testing models on identical datasets to ensure valid comparisons.

Leave a Reply

Your email address will not be published. Required fields are marked *