Blog Posts

Week 7

Getting up for the Sponsor Review: Insights from Data and Peer Feedback

This week was a critical period of refinement and preparation for our team. We focused on two parallel efforts: conducting our first deep dive into the public ECG datasets and, just as importantly, processing the valuable feedback from our in-class PDR peer review session. This combination of hands-on data exploration and constructive criticism has significantly sharpened our focus as we prepare for our formal presentation to our sponsor, Aventusoft.

Key Accomplishments This Week

Our progress this week was driven by both technical analysis and strategic refinement. The main highlights include:

Coordinated Exploratory Data Analysis (EDA): Our team completed a coordinated first-pass analysis of five key public datasets: PTB-XL, MIT-BIH, the CPSC 2020 Challenge dataset, LUDB, and the Chapman-Shaoxing dataset. This effort allowed us to visualize the signals, confirm technical specifications, and identify available labels for our target conditions.

Uncovering a Universal Challenge: Class Imbalance: A crucial finding from our EDA is the presence of severe class imbalance across all datasets. This insight is vital, as it confirms that addressing this imbalance through techniques like class weighting or resampling will be a core part of our modeling strategy.

Incorporating Peer Review Feedback: After presenting our draft PDR in a peer review session this week, we have been actively incorporating the constructive feedback received. Our action plan focuses on improving our presentation pacing and refining the clarity of our technical explanations to ensure our message is clear and impactful.

Streamlining Collaboration: To enhance our workflow, we have successfully established a team GitHub repository. This provides a central platform for code sharing and version control as we move into the implementation phase.

Next Steps: The Formal Sponsor PDR

All of our efforts this week are geared towards one major goal: the formal Preliminary Design Review (PDR) with our sponsor, Aventusoft, scheduled for this coming Tuesday, October 14th. Our immediate tasks are to apply the peer feedback to finalize both the presentation slides and the formal PDR report. Immediately following a successful PDR, we will begin the hands-on implementation of our data preprocessing pipeline and our baseline CNN model.

This week was about turning plans into action and feedback into improvement. The combination of insights from real-world data and our peers has left us better prepared and more confident for our upcoming sponsor review. We look forward to presenting our finalized plan and initial findings.

See you next week!

Week 6

From Strategy to Practice: Key Technical Guidance from This Week’s Meeting

This week, our team held a productive meeting with our sponsor liaison engineers, which not only reinforced our project objectives but also offered essential technical advice on particular techniques for data processing and model development.

Fine-Tuning Our Technical Strategy & New Resources

We reaffirmed our two core missions: Fiducial Landmark Detection and Arrhythmia Classification. More importantly, we received several key implementation details during the meeting:

Access to Internal Data: An exciting update is that Aventusoft will provide us with an internal dataset of “normal” ECGs from company employees. This will allow us to get familiar with their device’s signal characteristics early on, preparing us for future model transfer.

Fiducial Landmark Detection as a Regression Problem : Dr. Hoyos clarified that the task of detecting PQRST points should fundamentally be treated as a regression problem, with the goal of predicting the precise sample location of the waveform peaks, rather than as a classification problem. Aventusoft’s standard approach is to first detect the R-peak, use it to center and extract a single heartbeat, and then train a multi-output regressor to predict the relative locations of the other points.

Strategy for Using Multi-Lead Data: For small datasets like LUDB (with only 200 recordings), we should not use all 12 leads to augment the data. Because the morphology of some leads (e.g., an inverted R-peak) differs too much from our target device’s signal, it could confuse the model. The guidance is to select the 3-4 leads that are most morphologically similar to Lead I and Lead II.

The Necessity of Data Augmentation: For small datasets, data augmentation is crucial for success. We were encouraged to research methods like adding noise or applying time-stretching/shrinking to the signal, while ensuring that the annotation positions are transformed accordingly.

Next Steps: A Data Deep Dive with New Guidance

With this clearer technical focus, our data exploration next week will be more precise. As each team member examines their assigned dataset, besides the usual exploratory data analysis tasks, we will concentrate on identifying shared disease labels across the various datasets, such as LBBB, to assess the possibility of merging these databases to create a larger training set in the future. Simultaneously, we will emphasize conducting a comprehensive literature review before starting model development from scratch, looking for and referencing existing studies that have already worked with these datasets so we can leverage established methods. When analyzing 12-lead ECG data, we will limit our focus to Lead I, Lead II, or other leads with similar morphology, to better replicate the real-world conditions in which our final product—a wearable device—will function.

This week’s meeting was a key step in moving us from high-level goals to concrete implementation details. We are excited to move forward with this valuable guidance and begin building the foundational components of our analysis pipeline.

See you next week!

Week 5

From Strategy to Signals: Kicking Off ECG Analysis

This week, our team, BEATNET, made significant progress in defining the technical roadmap for our project. A productive meeting with our liaisons at Aventusoft provided crucial clarity and set a clear direction for the weeks ahead.

Defining Our Mission: Project Goals and Data Strategy

The primary outcome of our meeting was the confirmation of our two main project goals: arrhythmia classification (specifically targeting conditions like AFib, Flutter, and PVCs) and the detection of ECG landmarks (fiducial points). We learned that while Aventusoft has implemented Q and R point detection, the P, S, and T points remain open tasks for us to tackle.

Since most of Aventusoft’s data is currently unlabeled, we will begin by using well-known public datasets for our initial model development, including PTB-XL and MIT-BIH. This approach will allow us to build and validate our models before applying them to Aventusoft’s data in later stages.

Architecting Our Approach: Preprocessing and Deep Learning

A key technical decision from our meeting was to focus on a deep learning approach where the raw ECG signal is fed directly into the neural network. The network itself will act as a feature extractor, which avoids the need for manual feature engineering and allows the model to learn the most predictive patterns from the data.

To prepare the data for our models, we received clear guidance on the preprocessing pipeline. The core steps will include:

  • Applying a Butterworth bandpass filter to clean the signal
  • Resampling all data to a standard 500 Hz frequency
  • Segmenting the recordings into 5 or 10-second windows for analysis
  • Applying z-score normalization to standardize the signal amplitude

Our dataset exploration revealed that PTB-XL contains 21,837 clinical 12-lead ECGs from 18,885 patients with 10-second recordings and comprehensive multi-label annotations across 71 diagnostic classes. Meanwhile, MIT-BIH provides longer recordings but focuses primarily on arrhythmia detection with beat-level annotations. This diversity will strengthen our model’s generalization capabilities.

Next Steps: Diving into the Data

With a clear plan in place, our immediate focus shifts to hands-on data exploration. For the upcoming week, each team member will download and analyze at least one public dataset, with the goal of exploring six datasets in total. Our primary objectives are to understand the available labels, learn how to load and visualize the data, and assess data quality and class distribution. In parallel, we will begin implementing the preprocessing pipeline and start replicating baseline CNN-based models for arrhythmia detection.

Recent research shows that single-lead ECG analysis using deep learning can achieve impressive results, with models like VGG16 reaching F1-scores of 98.11% on certain leads, while lightweight architectures like MobileNetV2 achieve 97.24% accuracy with faster inference times suitable for real-time monitoring. This validates our approach of exploring individual lead performance before moving to Aventusoft’s proprietary data.

This week marked a critical transition from high-level planning to detailed technical execution. We are excited to get our hands on the data and begin building the foundation for BEATNET.

See you next week!

Week 4

This week, our BEATNET team dedicated efforts to building a solid foundation for our ECG deep learning project. We dived into the required documentation, mapping out the project’s scope, stakeholders, and benefits, while drafting weekly status reports and project diagrams to guide our process. A key focus was clarifying our data situation through a detailed session with our company liaison. We learned that Aventusoft’s proprietary ECG data is largely unlabeled, with annotations and clinical info available for only a fraction of patients—and with columns varying across different datasets. Most of the annotated data supports seismocardiogram (SCG) research rather than the ECG deep learning tasks we’re pursuing.

Given these findings, our immediate priorities are twofold: designing robust ECG landmark detection algorithms resilient to noise and diverse conditions, and developing models to identify arrhythmias and conduction abnormalities. To accomplish this, we’ll use well-annotated public datasets like PTB-XL and MIT-BIH as our starting point. Next week, we’ll shift into hands-on exploration of these public datasets, refining our preprocessing pipelines, and aligning our workflow with industry best practices. With the planning phase wrapping up, we’re eager to start coding and build out the core baseline models for BEATNET.

Week 3 Meeting Our Liaisons

This week, our team officially kicked off Project BEATNET with a meeting attended by Aventusoft’s liaisons, Dr. Keider Hoyos and Dr. Diego Pava, along with our faculty advisor, Prof. Kejun Huang. The meeting established the foundation for our collaboration by clearly defining the project’s objectives and expectations. Aventusoft highlighted three key focus areas for our work: landmark detection, disease state classification, and system deployment. Our task is to develop AI models that can accurately identify PQRS-T waveforms in both normal and abnormal ECG signals, as well as classify various conditions, including arrhythmias, conduction abnormalities, pacemaker types, and signs of electrolyte imbalance. A distinctive feature of the project is to create both comprehensive models for cloud-based inference and streamlined models optimized for mobile device performance.

At the beginning of the meeting, the liaisons highlighted the importance of data. Ultimately, we will receive single-lead, 30-second ECG recordings in periodic batches, provided in MATLAB or NumPy formats, along with limited demographic details. Meanwhile, our team is expected to actively search for and use publicly available datasets, such as those from PhysioNet and MIT-BIH, to start initial model development while Aventusoft finalizes data sharing agreements. We are fully responsible for cleaning and preprocessing all the data.

From a technical standpoint, Aventusoft instructed us to use Python as our main programming language and PyTorch for all deep learning tasks, explicitly advising against TensorFlow due to previous compatibility issues. Our approach will begin with an extensive review of the latest research on single-lead, short-duration ECG analysis, adapting and fine-tuning existing models to fit the specific requirements of our project. We are also encouraged to apply semi-supervised learning methods to utilize unlabeled data for improved results.

A particularly important topic discussed was documentation. Aventusoft requires FDA-style documentation from all team members, including specifications of requirements, descriptions of system and algorithm designs, risk assessments, and verification and validation procedures. They emphasized that the quality of documentation is just as critical as the quality of the code for the project’s success and regulatory compliance.

Looking ahead, the immediate priorities for the upcoming week are to identify and evaluate suitable public ECG datasets, start an in-depth literature review of methodologies relevant to our project, brainstorm and document preprocessing strategies, and thoughtfully assign sub-tasks among team members. With clear direction from our liaisons and defined next steps, our team is excited to move from the planning phase into concrete research and development. We look forward to sharing our progress in the coming weeks!

Week 2

Hello everyone, welcome to our team’s blog! We are excited to collaborate with Aventusoft on the ECG Deep Learning project, which sits at the crossroads of artificial intelligence and healthcare technology. Our main goal is to create deep learning algorithms that can precisely analyze electrocardiogram (ECG) data, focusing on detecting key landmarks, classifying diseases, and deploying solutions on both cloud and mobile platforms. This groundbreaking work aims to enhance digital health tools and aid in the early detection and diagnosis of heart conditions like heart failure.

This week, we focused on understanding the project’s scope by carefully reviewing Aventusoft’s Statement of Work (SOW). The SOW highlights our objectives: building highly accurate models to identify important landmarks across different ECG patterns and developing strong disease classification algorithms. These models will be deployed as efficient cloud services and lightweight mobile apps, all while complying with FDA design control documentation standards to prepare for future regulatory approval.

To get ready technically, we held internal discussions with Professor Kejun Huang. Looking ahead, our next step is to have our first meeting with Aventusoft’s liaison engineers. This meeting will be essential to clarify details about dataset access, structure, and project priorities. We also plan to finalize our choice of software tools and create a detailed checklist of dataset requirements. Meanwhile, we will begin outlining our data preprocessing strategy and continue refining the framework for FDA-required documentation. Additionally, we are working on scheduling regular weekly meetings with our industry mentors to maintain smooth communication throughout the project.

At this point, our team is on track and motivated, with clear next steps outlined. Once we receive the ECG dataset and detailed project guidance from Aventusoft, we will proceed with data processing and developing baseline models. We look forward to sharing updates as we advance in this important project!