Blog Posts

Week 5 – Cross-Dataset Testing

Team Noesys obtaining valuable feedback from the team coach and liaison

Our audio team successfully migrated the model from Tensorflow to Pytorch and completed grid search optimization for the 1D CNN hyperparameters. The video team focused on CLIP development, working with the text encoder and improving zero-shot accuracy while also running grid searches for optimal CLIP hyperparameters. We achieved an important milestone by implementing late fusion with our current CLIP and 1D MFCC CNN models, evaluating their combined performance on CMU-MOSEI test data.

For the coming week, we’ll focus on comprehensive performance documentation, including detailed data samples from our training and testing sets. The audio team will showcase audio samples from each dataset, while the video team explores bounding box implementation, black and white image processing, and cross-dataset training. We’re expanding our focus on generalizability by conducting cross-dataset training and testing, incorporating multimodal datasets, and utilizing training data that better represents our use case. Finally, we’ll be integrating text and AU analysis components into our fusion model.

Week 4 – Preparing for Fusion

HiPerGator, UF’s supercomputer

This week, we successfully fine-tuned our CLIP model with an 80-20 train-test split, achieving 84% accuracy. Our audio team made progress by implementing grid search capabilities for the audio modality. We delivered our DFX presentation in class and expanded our dataset by annotating additional data for CLIP. We also integrated our best-performing models into the fusion system and successfully produced output.

Next week, we’ll evaluate late fusion performance and conduct grid searches for audio and CLIP models using Hipergator. The audio and CLIP teams will incorporate preprocessing steps into the intermediate fusion model, followed by training and performance reporting. We’re also preparing comprehensive documentation slides detailing each model’s specifications, including input/output shapes, performance metrics, next steps, and dataset information.

Week 3 – QRB1

Team Noesys preparing for QRB1

This week, we presented our progress at QRB1 and received valuable feedback from various coaches. We shared detailed model performance metrics with our liaison, and we’ve achieved our target accuracy goals for audio, transcript, and action unit recognition as specified in our Technical Performance Measures. We have also implemented a joint-representation multimodal transformer architecture for the fusion model.

For the upcoming week, we’ll focus on enhancing our CLIP model through image-text pair training and hyperparameter tuning. We’re moving forward with end-to-end testing of our fusion model using our highest-performing components. Additional tasks include training CNN and ConvLSTM models for audio emotion recognition, testing FG-Net accuracy for emotion-specific markers, and acquiring supplementary training data. Our project continues to progress on schedule as we move toward our integration goals.

Week 2 – Model Testing and Integration

EmotionCLIP architecture

Our team made significant progress in model development this week. We tested the accuracies of Recurrent Neural Networks and Support Vector Classification for the audio modality and set up EmotionCLIP for accuracy evaluation. We initiated the FG-Net system for Facial Action Units training and testing. We also established our intermediate layer fusion code, which we tested using off-the-shelf ResNet and MobileNet models.

Next week, we’ll complete the initial testing of individual components like CLIP, EmotionCLIP, spectrogram CNN, and spectrogram ConvLSTM to ensure smooth integration into our fusion model. We’ll finalize our QRB1 presentation, share testing results with our coach and liaison, and integrate selected models into our fusion implementation. Our project continues to progress systematically toward our goals.

Week 1 – Spring Semester Kickoff

First Zoom meeting of the spring semester!

Our team hit the ground running in our first week of the spring semester. We established new meeting times to accommodate everyone’s schedules and developed our critical path for the semester ahead. After creating our January work breakdown schedule, we received valuable feedback from both our coach and liaison. We’ve organized into specialized teams to focus on fine-tuning individual modalities, setting the stage for our upcoming development sprint.

Looking ahead to next week, we’ll begin development of our fusion model while identifying and fine-tuning pretrained models for each modality. The project remains on schedule.

Week 15 – SLDR

Team Noesys after SLDR

This week, we successfully delivered our final SLDR presentation in front of team INSIGHT, faculty coaches, and mentors. We gathered valuable feedback from our presentation that we hope to implement in our product. Looking ahead to winter break, we hope to make progress on our subsystems so that we can seamlessly combine them beginning in the spring semester. We look forward to continuing the development process and hope to deliver a high-quality product by the end of the spring.

Week 13 – SLDR Progress and Model Refinements

System mockup presented at SLDR Peer Review on 11/19

We focused on advancing multiple aspects of our project while preparing for our System Level Design Review (SLDR) this week. We presented our initial SLDR presentation to our peers and gathered valuable feedback. Based on our evolving understanding of system interactions, we updated our Project Architecture diagram to better reflect our current design. We also completed and submitted our SLDR report draft for review. Meanwhile, we continued implementing improvements to our models based on the suggestions received during our PID presentation.

For the coming week, we’ll be enhancing our SLDR report and revising our presentation based on peer feedback. We also plan to compute and evaluate additional metrics to determine our system’s performance. Our technical work continues with ongoing model improvements and implementation of PID feedback.

Week 12 – PID Feedback and SLDR Preparation

Team Noesys at Prototype Inspection Day last Tuesday

The team reached a significant milestone with our Prototype Inspection Day presentation, where we demonstrated our current prototype to three pairs of judges. We received many helpful comments and suggestions that we plan to implement in our product and put into practice for SLDR. Following the presentation, we carefully analyzed the feedback and discussed potential implementations with our liaison. Our technical progress continued as we merged our CLIP and heart rate models into a single script and finalized our audio analysis prototypes.

Moving forward, we’ll focus on incorporating the feedback received during PID while continuing to develop our SLDR draft. We’re preparing for our upcoming SLDR peer review session and working on implementing the specific model improvements suggested by the judges. Our project remains on track and progressing as planned.

Week 11 – PID Preparation

This week, we finished the development of our unimodal model prototypes and began testing to ensure everything functions properly for our upcoming demo. We enhanced our CLIP model’s accuracy through fine-tuning with additional labels. Our work on audio analysis continued with further development of audio marker detection and spectrogram analysis code. We also implemented emotion detection capabilities for transcripts using LLMs.

Looking ahead to next week, our focus shifts to preparing for our PID presentation and demo. We’ll be coordinating the setup, making final adjustments to our unimodal models, and starting work on our SLDR draft.

Week 10 – Model Development Progress

This week, we collected datasets for individual modalities to create our unimodal model architectures. Our team gained access to our shared Github repository, where we’ve begun uploading our code and data processing methods to integrate our various components. We advanced our audio analysis capabilities, developing code for audio marker detection and spectrogram analysis. Additionally, we implemented a system using Large Language Models to determine emotions from transcripts.

We have focused our development efforts on creating prototypes that we aim to present for Prototype Inspection Day on November 12. For example, we set up facial action unit detection using the OpenFace library:

Facial AU detection with OpenFace

We also completed our prototype of real-time emotion detection using CLIP:

Real-time emotion detection with CLIP

For the upcoming week, we’ll be finalizing our prototype plans for Prototype Inspection Day while continuing the development of our unimodal models. We plan to set up API key access for the team and enhance our transcript analysis through improved prompt engineering.