
From Strategy to Practice: Key Technical Guidance from This Week’s Meeting
This week, our team held a productive meeting with our sponsor liaison engineers, which not only reinforced our project objectives but also offered essential technical advice on particular techniques for data processing and model development.
Fine-Tuning Our Technical Strategy & New Resources
We reaffirmed our two core missions: Fiducial Landmark Detection and Arrhythmia Classification. More importantly, we received several key implementation details during the meeting:
Access to Internal Data: An exciting update is that Aventusoft will provide us with an internal dataset of “normal” ECGs from company employees. This will allow us to get familiar with their device’s signal characteristics early on, preparing us for future model transfer.
Fiducial Landmark Detection as a Regression Problem : Dr. Hoyos clarified that the task of detecting PQRST points should fundamentally be treated as a regression problem, with the goal of predicting the precise sample location of the waveform peaks, rather than as a classification problem. Aventusoft’s standard approach is to first detect the R-peak, use it to center and extract a single heartbeat, and then train a multi-output regressor to predict the relative locations of the other points.
Strategy for Using Multi-Lead Data: For small datasets like LUDB (with only 200 recordings), we should not use all 12 leads to augment the data. Because the morphology of some leads (e.g., an inverted R-peak) differs too much from our target device’s signal, it could confuse the model. The guidance is to select the 3-4 leads that are most morphologically similar to Lead I and Lead II.
The Necessity of Data Augmentation: For small datasets, data augmentation is crucial for success. We were encouraged to research methods like adding noise or applying time-stretching/shrinking to the signal, while ensuring that the annotation positions are transformed accordingly.
Next Steps: A Data Deep Dive with New Guidance
With this clearer technical focus, our data exploration next week will be more precise. As each team member examines their assigned dataset, besides the usual exploratory data analysis tasks, we will concentrate on identifying shared disease labels across the various datasets, such as LBBB, to assess the possibility of merging these databases to create a larger training set in the future. Simultaneously, we will emphasize conducting a comprehensive literature review before starting model development from scratch, looking for and referencing existing studies that have already worked with these datasets so we can leverage established methods. When analyzing 12-lead ECG data, we will limit our focus to Lead I, Lead II, or other leads with similar morphology, to better replicate the real-world conditions in which our final product—a wearable device—will function.
This week’s meeting was a key step in moving us from high-level goals to concrete implementation details. We are excited to move forward with this valuable guidance and begin building the foundational components of our analysis pipeline.
See you next week!