SCIENTIFIC COMPUTING AND IMAGING INSTITUTE
at the University of Utah

An internationally recognized leader in visualization, scientific computing, and image analysis

SCI Publications

2024


Q.C. Nguyen, T. Tasdizen, M. Alirezaei, H. Mane, X. Yue, J.S. Merchant, W. Yu, L. Drew, D. Li, T.T. Nguyen. “Neighborhood built environment, obesity, and diabetes: A Utah siblings study,” In SSM - Population Health, Vol. 26, 2024.

ABSTRACT

Background

This study utilizes innovative computer vision methods alongside Google Street View images to characterize neighborhood built environments across Utah.

Methods

Convolutional Neural Networks were used to create indicators of street greenness, crosswalks, and building type on 1.4 million Google Street View images. The demographic and medical profiles of Utah residents came from the Utah Population Database (UPDB). We implemented hierarchical linear models with individuals nested within zip codes to estimate associations between neighborhood built environment features and individual-level obesity and diabetes, controlling for individual- and zip code-level characteristics (n = 1,899,175 adults living in Utah in 2015). Sibling random effects models were implemented to account for shared family attributes among siblings (n = 972,150) and twins (n = 14,122).

Results

Consistent with prior neighborhood research, the variance partition coefficients (VPC) of our unadjusted models nesting individuals within zip codes were relatively small (0.5%–5.3%), except for HbA1c (VPC = 23%), suggesting a small percentage of the outcome variance is at the zip code-level. However, proportional change in variance (PCV) attributable to zip codes after the inclusion of neighborhood built environment variables and covariates ranged between 11% and 67%, suggesting that these characteristics account for a substantial portion of the zip code-level effects. Non-single-family homes (indicator of mixed land use), sidewalks (indicator of walkability), and green streets (indicator of neighborhood aesthetics) were associated with reduced diabetes and obesity. Zip codes in the third tertile for non-single-family homes were associated with a 15% reduction (PR: 0.85; 95% CI: 0.79, 0.91) in obesity and a 20% reduction (PR: 0.80; 95% CI: 0.70, 0.91) in diabetes. This tertile was also associated with a BMI reduction of −0.68 kg/m2 (95% CI: −0.95, −0.40)

Conclusion

We observe associations between neighborhood characteristics and chronic diseases, accounting for biological, social, and cultural factors shared among siblings in this large population-based study.



Q.C. Nguyen, M. Alirezaei, X. Yue, H. Mane, D. Li, L. Zhao, T.T. Nguyen, R. Patel, W. Yu, M. Hu, D. Quistberg, T. Tasdizen. “Leveraging computer vision for predicting collision risks: a cross-sectional analysis of 2019–2021 fatal collisions in the USA,” In Injury Prevention, BMJ, 2024.

ABSTRACT

Objective The USA has higher rates of fatal motor vehicle collisions than most high-income countries. Previous studies examining the role of the built environment were generally limited to small geographic areas or single cities. This study aims to quantify associations between built environment characteristics and traffic collisions in the USA.

Methods Built environment characteristics were derived from Google Street View images and summarised at the census tract level. Fatal traffic collisions were obtained from the 2019–2021 Fatality Analysis Reporting System. Fatal and non-fatal traffic collisions in Washington DC were obtained from the District Department of Transportation. Adjusted Poisson regression models examined whether built environment characteristics are related to motor vehicle collisions in the USA, controlling for census tract sociodemographic characteristics.

Results Census tracts in the highest tertile of sidewalks, single-lane roads, streetlights and street greenness had 70%, 50%, 30% and 26% fewer fatal vehicle collisions compared with those in the lowest tertile. Street greenness and single-lane roads were associated with 37% and 38% fewer pedestrian-involved and cyclist-involved fatal collisions. Analyses with fatal and non-fatal collisions in Washington DC found streetlights and stop signs were associated with fewer pedestrians and cyclists-involved vehicle collisions while road construction had an adverse association.

Conclusion This study demonstrates the utility of using data algorithms that can automatically analyse street segments to create indicators of the built environment to enhance understanding of large-scale patterns and inform interventions to decrease road traffic injuries and fatalities.



R. Nihalaani, T. Kataria, J. Adams, S.Y. Elhabian. “Estimation and Analysis of Slice Propagation Uncertainty in 3D Anatomy Segmentation,” Subtitled “arXiv preprint arXiv:2403.12290,” 2024.

ABSTRACT

Supervised methods for 3D anatomy segmentation demonstrate superior performance but are often limited by the availability of annotated data. This limitation has led to a growing interest in self-supervised approaches in tandem with the abundance of available unannotated data. Slice propagation has emerged as an self-supervised approach that leverages slice registration as a self-supervised task to achieve full anatomy segmentation with minimal supervision. This approach significantly reduces the need for domain expertise, time, and the cost associated with building fully annotated datasets required for training segmentation networks. However, this shift toward reduced supervision via deterministic networks raises concerns about the trustworthiness and reliability of predictions, especially when compared with more accurate supervised approaches. To address this concern, we propose the integration of calibrated uncertainty quantification (UQ) into slice propagation methods, providing insights into the model’s predictive reliability and confidence levels. Incorporating uncertainty measures enhances user confidence in self-supervised approaches, thereby improving their practical applicability. We conducted experiments on three datasets for 3D abdominal segmentation using five UQ methods. The results illustrate that incorporating UQ improves not only model trustworthiness, but also segmentation accuracy. Furthermore, our analysis reveals various failure modes of slice propagation methods that might not be immediately apparent to end-users. This study opens up new research avenues to improve the accuracy and trustworthiness of slice propagation methods.



T.A.J. Ouermi, J. Li, T. Athawale, C.R. Johnson. “Estimation and Visualization of Isosurface Uncertainty from Linear and High-Order Interpolation Methods,” In IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks, IEEE, pp. 51--61. 2024.
DOI: 10.1109/UncertaintyVisualization63963.2024.00012

ABSTRACT

Isosurface visualization is fundamental for exploring and analyzing 3D volumetric data. Marching cubes (MC) algorithms with linear interpolation are commonly used for isosurface extraction and visualization. Although linear interpolation is easy to implement, it has limitations when the underlying data is complex and high-order, which is the case for most real-world data. Linear interpolation can output vertices at the wrong location. Its inability to deal with sharp features and features smaller than grid cells can lead to an incorrect isosurface with holes and broken pieces. Despite these limitations, isosurface visualizations typically do not include insight into the spatial location and the magnitude of these errors. We utilize high-order interpolation methods with MC algorithms and interactive visualization to highlight these uncertainties. Our visualization tool helps identify the regions of high interpolation errors. It also allows users to query local areas for details and compare the differences between isosurfaces from different interpolation methods. In addition, we employ high-order methods to identify and reconstruct possible features that linear methods cannot detect. We showcase how our visualization tool helps explore and understand the extracted isosurface errors through synthetic and real-world data.



T.A.J. Ouermi, J. Li, Z. Morrow, B. Waanders, C.R. Johnson. “Glyph-Based Uncertainty Visualization and Analysis of Time-Varying Vector Fields,” In IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks, IEEE, pp. 73--77. 2024.
DOI: 10.1109/UncertaintyVisualization63963.2024.00014

ABSTRACT

Uncertainty is inherent to most data, including vector field data, yet it is often omitted in visualizations and representations. Effective uncertainty visualization can enhance the understanding and interpretability of vector field data. For instance, in the context of severe weather events such as hurricanes and wildfires, effective uncertainty visualization can provide crucial insights about fire spread or hurricane behavior and aid in resource management and risk mitigation. Glyphs are commonly used for representing vector uncertainty but are often limited to 2D. In this work, we present a glyph-based technique for accurately representing 3D vector uncertainty and a comprehensive framework for visualization, exploration, and analysis using our new glyphs. We employ hurricane and wildfire examples to demonstrate the efficacy of our glyph design and visualization tool in conveying vector field uncertainty.



A. Panta, X. Huang, N. McCurdy, D. Ellsworth, A. Gooch, . “Web-based Visualization and Analytics of Petascale data: Equity as a Tide that Lifts All Boats,” In Proceedings of the IEEE Visualization conference, IEEE, 2024.

ABSTRACT

Scientists generate petabytes of data daily to help uncover environmental trends or behaviors that are hard to predict. For example, understanding climate simulations based on the long-term average of temperature, precipitation, and other environmental variables is essential to predicting and establishing root causes of future undesirable scenarios and assessing possible mitigation strategies. While supercomputer centers provide a powerful infrastructure for generating petabytes of simulation output, accessing and analyzing these datasets interactively remains challenging on multiple fronts. This paper presents an approach to managing, visualizing, and analyzing petabytes of data within a browser on equipment ranging from the top NASA supercomputer to commodity hardware like a laptop. Our novel data fabric abstraction layer allows user-friendly querying of scientific information while hiding the complexities of dealing with file systems or cloud services. We also optimize network utilization while streaming from petascale repositories through state-of-the-art progressive compression algorithms. Based on this abstraction, we provide customizable dashboards that can be accessed from any device with any internet connection, enabling interactive visual analysis of vast amounts of data to a wide range of users - from top scientists with access to leadership-class computing environments to undergraduate students of disadvantaged backgrounds from minority-serving institutions. We focus on NASA’s use of petascale climate datasets as an example of particular societal impact and, therefore, a case where achieving equity in science participation is critical. We validate our approach by improving the ability of climate scientists to visually explore their data via two fully interactive dashboards. We further validate our approach by deploying the dashboards and simplified training materials in the classroom at a minority-serving institution. These dashboards, released in simplified form to the general public, contribute significantly to a broader push to democratize the access and use of climate data.



M. Parashar. “Enabling Responsible Artificial Intelligence Research and Development Through the Democratization of Advanced Cyberinfrastructure,” In Harvard Data Science Review, Special Issue 4: Democratizing Data, 2024.

ABSTRACT

Artificial intelligence (AI) is driving discovery, innovation, and economic growth, and has the potential to transform science and society. However, realizing the positive, transformative potential of AI requires that AI research and development (R&D) progress responsibly; that is, in a way that protects privacy, civil rights, and civil liberties, and promotes principles of fairness, accountability, transparency, and equity. This article explores the importance of democratizing AI R&D for achieving the goal of responsible AI and its potential impacts.



M. Parashar. “Everywhere & Nowhere: Envisioning a Computing Continuum for Science,” Subtitled “arXiv:2406.04480v1,” 2024.

ABSTRACT

Emerging data-driven scientific workflows are seeking to leverage distributed data sources to understand end-to-end phenomena, drive experimentation, and facilitate important decision-making. Despite the exponential growth of available digital data sources at the edge, and the ubiquity of non trivial computational power for processing this data, realizing such science workflows remains challenging. This paper explores a computing continuum that is everywhere and nowhere – one spanning resources at the edges, in the core and in between, and providing abstractions that can be harnessed to support science. It also introduces recent research in programming abstractions that can express what data should be processed and when and where it should be processed, and autonomic middleware services that automate the discovery of resources and the orchestration of computations across these resources.



S. Parsa, B. Wang. “Harmonic Chain Barcode and Stability,” Subtitled “arXiv:2409.06093,” 2024.

ABSTRACT

The persistence barcode is a topological descriptor of data that plays a fundamental role in topological data analysis. Given a filtration of the space of data, a persistence barcode tracks the evolution of its homological features. In this paper, we introduce a new type of barcode, referred to as the canonical barcode of harmonic chains, or harmonic chain barcode for short, which tracks the evolution of harmonic chains. As our main result, we show that the harmonic chain barcode is stable and it captures both geometric and topological information of data. Moreover, given a filtration of a simplicial complex of size n with m time steps, we can compute its harmonic chain barcode in O(m2nω + mn3) time, where nω is the matrix multiplication time. Consequently, a harmonic chain barcode can be utilized in applications in which a persistence barcode is applicable, such as feature vectorization and machine learning. Our work provides strong evidence in a growing list of literature that geometric (not just topological) information can be recovered from a persistence filtration.



M. Penwarden, H. Owhadi, R.M. Kirby. “Kolmogorov n-Widths for Multitask Physics-Informed Machine Learning (PIML) Methods: Towards Robust Metrics,” Subtitled “arXiv preprint arXiv:2402.11126,” 2024.

ABSTRACT

Physics-informed machine learning (PIML) as a means of solving partial differential equations (PDE) has garnered much attention in the Computational Science and Engineering (CS&E) world. This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning. PIML is characterized by the incorporation of physical laws into the training process of machine learning models in lieu of large data when solving PDE problems. Despite the overall success of this collection of methods, it remains incredibly difficult to analyze, benchmark, and generally compare one approach to another. Using Kolmogorov n-widths as a measure of effectiveness of approximating functions, we judiciously apply this metric in the comparison of various multitask PIML architectures. We compute lower accuracy bounds and analyze the model's learned basis functions on various PDE problems. This is the first objective metric for comparing multitask PIML architectures and helps remove uncertainty in model validation from selective sampling and overfitting. We also identify avenues of improvement for model architectures, such as the choice of activation function, which can drastically affect model generalization to "worst-case" scenarios, which is not observed when reporting task-specific errors. We also incorporate this metric into the optimization process through regularization, which improves the models' generalizability over the multitask PDE problem.



D. Alex Quistberg, S.J. Mooney, T. Tasdizen, P. Arbelaez, Q.C. Nguyen. “Deep Learning-Methods to Amplify Epidemiological Data Collection and Analyses,” In American Journal of Epidemiology, Oxford University Press, 2024.

ABSTRACT

Deep learning is a subfield of artificial intelligence and machine learning based mostly on neural networks and often combined with attention algorithms that has been used to detect and identify objects in text, audio, images, and video. Serghiou and Rough (Am J Epidemiol. 0000;000(00):0000-0000) present a primer for epidemiologists on deep learning models. These models provide substantial opportunities for epidemiologists to expand and amplify their research in both data collection and analyses by increasing the geographic reach of studies, including more research subjects, and working with large or high dimensional data. The tools for implementing deep learning methods are not quite yet as straightforward or ubiquitous for epidemiologists as traditional regression methods found in standard statistical software, but there are exciting opportunities for interdisciplinary collaboration with deep learning experts, just as epidemiologists have with statisticians, healthcare providers, urban planners, and other professionals. Despite the novelty of these methods, epidemiological principles of assessing bias, study design, interpretation and others still apply when implementing deep learning methods or assessing the findings of studies that have used them.



S. Saklani, C. Goel, S. Bansal, Z. Wang, S. Dutta, T. Athawale, D. Pugmire, C.R. Johnson. “Uncertainty-Informed Volume Visualization using Implicit Neural Representation,” In IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks, IEEE, pp. 62--72. 2024.
DOI: 10.1109/UncertaintyVisualization63963.2024.00013

ABSTRACT

The increasing adoption of Deep Neural Networks (DNNs) has led to their application in many challenging scientific visualization tasks. While advanced DNNs offer impressive generalization capabilities, understanding factors such as model prediction quality, robustness, and uncertainty is crucial. These insights can enable domain scientists to make informed decisions about their data. However, DNNs inherently lack ability to estimate prediction uncertainty, necessitating new research to construct robust uncertainty-aware visualization techniques tailored for various visualization tasks. In this work, we propose uncertainty-aware implicit neural representations to model scalar field data sets effectively and comprehensively study the efficacy and benefits of estimated uncertainty information for volume visualization tasks. We evaluate the effectiveness of two principled deep uncertainty estimation techniques: (1) Deep Ensemble and (2) Monte Carlo Dropout (MC-Dropout). These techniques enable uncertainty-informed volume visualization in scalar field data sets. Our extensive exploration across multiple data sets demonstrates that uncertainty-aware models produce informative volume visualization results. Moreover, integrating prediction uncertainty enhances the trustworthiness of our DNN model, making it suitable for robustly analyzing and visualizing real-world scientific volumetric data sets.



S.A. Sakin, K.E. Isaacs. “A Literature-based Visualization Task Taxonomy for Gantt Charts,” Subtitled “arXiv:2408.04050,” 2024.

ABSTRACT

Gantt charts are a widely-used idiom for visualizing temporal discrete event sequence data where dependencies exist between events. They are popular in domains such as manufacturing and computing for their intuitive layout of such data. However, these domains frequently generate data at scales which tax both the visual representation and the ability to render it at interactive speeds. To aid visualization developers who use Gantt charts in these situations, we develop a task taxonomy of low level visualization tasks supported by Gantt charts and connect them to the data queries needed to support them. Our taxonomy is derived through a literature survey of visualizations using Gantt charts over the past 30 years.



C. Scully-Allison, I. Lumsden, K. Williams, J. Bartels, M. Taufer, S. Brink, A. Bhatele, O. Pearce, K. Isaacs. “Design Concerns for Integrated Scripting and Interactive Visualization in Notebook Environments,” In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
DOI: 10.1109/TVCG.2024.3354561

ABSTRACT

Interactive visualization can support fluid exploration but is often limited to predetermined tasks. Scripting can support a vast range of queries but may be more cumbersome for free-form exploration. Embedding interactive visualization in scripting environments, such as computational notebooks, provides an opportunity to leverage the strengths of both direct manipulation and scripting. We investigate interactive visualization design methodology, choices, and strategies under this paradigm through a design study of calling context trees used in performance analysis, a field which exemplifies typical exploratory data analysis workflows with Big Data and hard to define problems. We first produce a formal task analysis assigning tasks to graphical or scripting contexts based on their specificity, frequency, and suitability. We then design a notebook-embedded interactive visualization and validate it with intended users. In a follow-up study, we present participants with multiple graphical and scripting interaction modes to elicit feedback about notebook-embedded visualization design, finding consensus in support of the interaction model. We report and reflect on observations regarding the process and design implications for combining visualization and scripting in notebooks.



N. Shingde, T. Blattner, A. Bardakoff, W. Keyrouz, M. Berzins. “An illustration of extending Hedgehog to multi-node GPU architectures using GEMM,” In Springer Nature (to appear), 2024.

ABSTRACT

Asynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures. This paper extends the previous work, demonstrating how Hedgehog, a dataflow graph-based model developed at the National Institute of Standards and Technology, can be used to obtain high performance for numerical linear algebraic operations as a starting point for complex algorithms. While the results were promising, it was unclear how to scale them to larger matrices and compute node counts. The aim here is to show how the new, improved algorithm inspired by DPLASMA performs equally well using Hedgehog. The results are compared against the leading library DPLASMA to illustrate the performance of different asynchronous dataflow models. The work demonstrates that using general-purpose, high-level abstractions, such as Hedgehog’s dataflow graphs, makes it possible to achieve similar performance to the specialized linear algebra codes such as DPLASMA.



A. Singh, S. Adams-Tew, S. Johnson, H. Odeen, J. Shea, A. Johnson, L. Day, A. Pessin, A. Payne, S. Joshi. “Treatment Efficacy Prediction of Focused Ultrasound Therapies Using Multi-parametric Magnetic Resonance Imaging,” In Cancer Prevention, Detection, and Intervention, Springer Nature Switzerland, pp. 190-199. 2024.

ABSTRACT

Magnetic resonance guided focused ultrasound (MRgFUS) is one of the most attractive emerging minimally invasive procedures for breast cancer, which induces localized hyperthermia, resulting in tumor cell death. Accurately assessing the post-ablation viability of all treated tumor tissue and surrounding margins immediately after MRgFUS thermal therapy residual tumor tissue is essential for evaluating treatment efficacy. While both thermal and vascular MRI-derived biomarkers are currently used to assess treatment efficacy, currently, no adequately accurate methods exist for the in vivo determination of tissue viability during treatment. The non-perfused volume (NPV) acquired three or more days following MRgFUS thermal ablation treatment is most correlated with the gold standard of histology. However, its delayed timing impedes real-time guidance for the treating clinician during the procedure. We present a robust deep-learning framework that leverages multiparametric MR imaging acquired during treatment to predict treatment efficacy. The network uses qualtitative T1, T2 weighted images and MR temperature image derived metrics to predict the three day post-ablation NPV. To validate the proposed approach, an ablation study was conducted on a dataset (N=6) of VX2 tumor model rabbits that had undergone MRgFUS ablation. Using a deep learning framework, we evaluated which of the acquired MRI inputs were most predictive of treatment efficacy as compared to the expert radiologist annotated 3 day post-treatment images.



S. Subramaniam, M. Miller, several co-authors, Chris R. Johnson, et al.. “Grand Challenges at the Interface of Engineering and Medicine,” In IEEE Open Journal of Engineering in Medicine and Biology, Vol. 5, IEEE, pp. 1--13. 2024.
DOI: 10.1109/OJEMB.2024.3351717

ABSTRACT

Over the past two decades Biomedical Engineering has emerged as a major discipline that bridges societal needs of human health care with the development of novel technologies. Every medical institution is now equipped at varying degrees of sophistication with the ability to monitor human health in both non-invasive and invasive modes. The multiple scales at which human physiology can be interrogated provide a profound perspective on health and disease. We are at the nexus of creating “avatars” (herein defined as an extension of “digital twins”) of human patho/physiology to serve as paradigms for interrogation and potential intervention. Motivated by the emergence of these new capabilities, the IEEE Engineering in Medicine and Biology Society, the Departments of Biomedical Engineering at Johns Hopkins University and Bioengineering at University of California at San Diego sponsored an interdisciplinary workshop to define the grand challenges that face biomedical engineering and the mechanisms to address these challenges. The Workshop identified five grand challenges with cross-cutting themes and provided a roadmap for new technologies, identified new training needs, and defined the types of interdisciplinary teams needed for addressing these challenges. The themes presented in this paper include: 1) accumedicine through creation of avatars of cells, tissues, organs and whole human; 2) development of smart and responsive devices for human function augmentation; 3) exocortical technologies to understand brain function and treat neuropathologies; 4) the development of approaches to harness the human immune system for health and wellness; and 5) new strategies to engineer genomes and cells.



K.M. Sultan, M.H.H. Hisham, B. Orkild, A. Morris, E. Kholmovski, E. Bieging, E. Kwan, R. Ranjan, E. DiBella, S. Elhabian. “HAMIL-QA: Hierarchical Approach to Multiple Instance Learning for Atrial LGE MRI Quality Assessment,” Subtitled “arXiv:2407.07254v1,” 2024.

ABSTRACT

The accurate evaluation of left atrial fibrosis via high-quality 3D Late Gadolinium Enhancement (LGE) MRI is crucial for atrial fibrillation management but is hindered by factors like patient movement and imaging variability. The pursuit of automated LGE MRI quality assessment is critical for enhancing diagnostic accuracy, standardizing evaluations, and improving patient outcomes. The deep learning models aimed at automating this process face significant challenges due to the scarcity of expert annotations, high computational costs, and the need to capture subtle diagnostic details in highly variable images. This study introduces HAMIL-QA, a multiple instance learning (MIL) framework, designed to overcome these obstacles. HAMIL-QA employs a hierarchical bag and sub-bag structure that allows for targeted analysis within sub-bags and aggregates insights at the volume level. This hierarchical MIL approach reduces reliance on extensive annotations, lessens computational load, and ensures clinically relevant quality predictions by focusing on diagnostically critical image features. Our experiments show that HAMIL-QA surpasses existing MIL methods and traditional supervised approaches in accuracy, AUROC, and F1-Score on an LGE MRI scan dataset, demonstrating its potential as a scalable solution for LGE MRI quality assessment automation.



X. Tang, B. Zhang, B.S. Knudsen, T. Tasdizen. “DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention,” Subtitled “arXiv:2407.13920,” 2024.

ABSTRACT

We here propose a novel hierarchical transformer model that adeptly integrates the feature extraction capabilities of Convolutional Neural Networks (CNNs) with the advanced representational potential of Vision Transformers (ViTs). Addressing the lack of inductive biases and dependence on extensive training datasets in ViTs, our model employs a CNN backbone to generate hierarchical visual representations. These representations are then adapted for transformer input through an innovative patch tokenization. We also introduce a ’scale attention’ mechanism that captures cross-scale dependencies, complementing patch attention to enhance spatial understanding and preserve global perception. Our approach significantly outperforms baseline models on small and medium-sized medical datasets, demonstrating its efficiency and generalizability. The components are designed as plug-and-play for different CNN architectures and can be adapted for multiple applications.



X. Tang, J. Berquist, B.A. Steinberg, T. Tasdizen. “Hierarchical Transformer for Electrocardiogram Diagnosis,” Subtitled “arXiv:2411.00755,” 2024.

ABSTRACT

Transformers, originally prominent in NLP and computer vision, are now being adapted for ECG signal analysis. This paper introduces a novel hierarchical transformer architecture that segments the model into multiple stages by assessing the spatial size of the embeddings, thus eliminating the need for additional downsampling strategies or complex attention designs. A classification token aggregates information across feature scales, facilitating interactions between different stages of the transformer. By utilizing depth-wise convolutions in a six-layer convolutional encoder, our approach preserves the relationships between different ECG leads. Moreover, an attention gate mechanism learns associations among the leads prior to classification. This model adapts flexibly to various embedding networks and input sizes while enhancing the interpretability of transformers in ECG signal analysis.