SCI Publications
2024
J. Adams, K. Iyer, S. Elhabian.
Weakly Supervised Bayesian Shape Modeling from Unsegmented Medical Images, Subtitled arXiv:2405.09697v1, 2024.
Anatomical shape analysis plays a pivotal role in clinical research and hypothesis testing, where the relationship between form and function is paramount. Correspondence-based statistical shape modeling (SSM) facilitates population-level morphometrics but requires a cumbersome, potentially bias-inducing construction pipeline. Recent advancements in deep learning have streamlined this process in inference by providing SSM prediction directly from unsegmented medical images. However, the proposed approaches are fully supervised and require utilizing a traditional SSM construction pipeline to create training data, thus inheriting the associated burdens and limitations. To address these challenges, we introduce a weakly supervised deep learning approach to predict SSM from images using point cloud supervision. Specifically, we propose reducing the supervision associated with the state-of-the-art fully Bayesian variational information bottleneck DeepSSM (BVIB-DeepSSM) model. BVIB-DeepSSM is an effective, principled framework for predicting probabilistic anatomical shapes from images with quantification of both aleatoric and epistemic uncertainties. Whereas the original BVIB-DeepSSM method requires strong supervision in the form of ground truth correspondence points, the proposed approach utilizes weak supervision via point cloud surface representations, which are more readily obtainable. Furthermore, the proposed approach learns correspondence in a completely data-driven manner without prior assumptions about the expected variability in shape cohort. Our experiments demonstrate that this approach yields similar accuracy and uncertainty estimation to the fully supervised scenario while substantially enhancing the feasibility of model training for SSM construction.
J. Adams, S. Elhabian.
Point2SSM++: Self-Supervised Learning of Anatomical Shape Models from Point Clouds, Subtitled arXiv:2405.09707v1, 2024.
Correspondence-based statistical shape modeling (SSM) stands as a powerful technology for morphometric analysis in clinical research. SSM facilitates population-level characterization and quantification of anatomical shapes such as bones and organs, aiding in pathology and disease diagnostics and treatment planning. Despite its potential, SSM remains under-utilized in medical research due to the significant overhead associated with automatic construction methods, which demand complete, aligned shape surface representations. Additionally, optimization-based techniques rely on bias-inducing assumptions or templates and have prolonged inference times as the entire cohort is simultaneously optimized. To overcome these challenges, we introduce Point2SSM++, a principled, self-supervised deep learning approach that directly learns correspondence points from point cloud representations of anatomical shapes. Point2SSM++ is robust to misaligned and inconsistent input, providing SSM that accurately samples individual shape surfaces while effectively capturing population-level statistics. Additionally, we present principled extensions of Point2SSM++ to adapt it for dynamic spatiotemporal and multi-anatomy use cases, demonstrating the broad versatility of the Point2SSM++ framework. Furthermore, we present extensions of Point2SSM++ tailored for dynamic spatiotemporal and multi-anatomy scenarios, showcasing the broad versatility of the framework. Through extensive validation across diverse anatomies, evaluation metrics, and clinically relevant downstream tasks, we demonstrate Point2SSM++’s superiority over existing state-of-the-art deep learning models and traditional approaches. Point2SSM++ substantially enhances the feasibility of SSM generation and significantly broadens its array of potential clinical applications.
S.I. Adams-Tew, H. Odéen, D.L. Parker, C.C. Cheng, B. Madore, A. Payne, S. Joshi.
Physics Informed Neural Networks for Estimation of Tissue Properties from Multi-echo Configuration State MRI, In Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024, Springer Nature Switzerland, pp. 502--511. 2024.
This work investigates the use of configuration state imaging together with deep neural networks to develop quantitative MRI techniques for deployment in an interventional setting. A physics modeling technique for inhomogeneous fields and heterogeneous tissues is presented and used to evaluate the theoretical capability of neural networks to estimate parameter maps from configuration state signal data. All tested normalization strategies achieved similar performance in estimating T2 and T2*. Varying network architecture and data normalization had substantial impacts on estimated flip angle and T1, highlighting their importance in developing neural networks to solve these inverse problems. The developed signal modeling technique provides an environment that will enable the development and evaluation of physics-informed machine learning techniques for MR parameter mapping and facilitate the development of quantitative MRI techniques to inform clinical decisions during MR-guided treatments.
T. M. Athawale, B. Triana, T. Kotha, D. Pugmire, P. Rosen.
A Comparative Study of the Perceptual Sensitivity of Topological Visualizations to Feature Variations, In IEEE Transactions on Visualization and Computer Graphics, Vol. 30, No. 1, pp. 1074-1084. Jan, 2024.
DOI: 10.1109/TVCG.2023.3326592
Color maps are a commonly used visualization technique in which data are mapped to optical properties, e.g., color or opacity. Color maps, however, do not explicitly convey structures (e.g., positions and scale of features) within data. Topology-based visualizations reveal and explicitly communicate structures underlying data. Although we have a good understanding of what types of features are captured by topological visualizations, our understanding of people’s perception of those features is not. This paper evaluates the sensitivity of topology-based isocontour, Reeb graph, and persistence diagram visualizations compared to a reference color map visualization for synthetically generated scalar fields on 2-manifold triangular meshes embedded in 3D. In particular, we built and ran a human-subject study that evaluated the perception of data features characterized by Gaussian signals and measured how effectively each visualization technique portrays variations of data features arising from the position and amplitude variation of a mixture of Gaussians. For positional feature variations, the results showed that only the Reeb graph visualization had high sensitivity. For amplitude feature variations, persistence diagrams and color maps demonstrated the highest sensitivity, whereas isocontours showed only weak sensitivity. These results take an important step toward understanding which topology-based tools are best for various data and task scenarios and their effectiveness in conveying topological variations as compared to conventional color mapping.
T.M. Athawale, Z. Wang, D. Pugmire, K. Moreland, Q. Gong, S. Klasky, C.R. Johnson, P. Rosen.
Uncertainty Visualization of Critical Points of 2D Scalar Fields for Parametric and Nonparametric Probabilistic Models, In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1--11. 2024.
This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), however, creates uncertainty regarding critical point positions. Uncertainty in critical point positions, therefore, cannot be ignored, given their impact on downstream data analysis tasks. In this work, we study uncertainty in critical points as a function of uncertainty in data modeled with probability distributions. Although Monte Carlo (MC) sampling techniques have been used in prior studies to quantify critical point uncertainty, they are often expensive and are infrequently used in production-quality visualization software. We, therefore, propose a new end-to-end framework to address these challenges that comprises a threefold contribution. First, we derive the critical point uncertainty in closed form, which is more accurate and efficient than the conventional MC sampling methods. Specifically, we provide the closed-form and semianalytical (a mix of closed-form and MC methods) solutions for parametric (e.g., uniform, Epanechnikov) and nonparametric models (e.g., histograms) with finite support. Second, we accelerate critical point probability computations using a parallel implementation with the VTK-m library, which is platform portable. Finally, we demonstrate the integration of our implementation with the ParaView software system to demonstrate near-real-time results for real datasets.
B. Aubert, N. Khan, F. Toupin, M. Pacheco, A. Morris.
Deformable Vertebra 3D/2D Registration from Biplanar X-Rays Using Particle-Based Shape Modelling, In Shape in Medical Imaging, Springer Nature Switzerland, pp. 33--47. 2024.
ISSN: 978-3-031-75291-9
Patient-specific 3D vertebra models are essential for accurately assessing the spinal deformities quantitatively in 3D and for surgical planning, including determining the optimal implant size and 3D positioning. Calibrated biplanar X-rays serve as an alternative to CT scans to generate the 3D models in a weight-bearing standing position. This paper presents an intensity-based 3D/2D registration method for vertebra statistical shape model (VSSM), incorporating two key elements: the particle-based shape modeling and an image domain transfer for efficient image matching. In the 3D/3D setting, the VSSMs reach a surface reconstruction error of less than 0.5 mm. For 3D reconstruction from biplanar X-rays, the root mean square point-to-surface are 1.05 mm for L1 to L4 vertebrae and 1.6 mm for the L5 vertebra. The particle-based VSSMs offer a significant balance between the model compactness and the reconstruction error, which is advantageous for deformable 3D/2D registration.
A.Z.B. Aziz, M.S.T. Karanam, T. Kataria, S.Y. Elhabian.
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration, Subtitled arXiv preprint arXiv:2403.11026, 2024.
Transformers have emerged as the state-of-the-art architecture in medical image registration, outperforming convolutional neural networks (CNNs) by addressing their limited receptive fields and overcoming gradient instability in deeper models. Despite their success, transformer-based models require substantial resources for training, including data, memory, and computational power, which may restrict their applicability for end users with limited resources. In particular, existing transformer-based 3D image registration architectures face three critical gaps that challenge their efficiency and effectiveness. Firstly, while mitigating the quadratic complexity of full attention by focusing on local regions, window-based attention mechanisms often fail to adequately integrate local and global information. Secondly, feature similarities across attention heads that were recently found in multi-head attention architectures indicate a significant computational redundancy, suggesting that the capacity of the network could be better utilized to enhance performance. Lastly, the granularity of tokenization, a key factor in registration accuracy, presents a trade-off; smaller tokens improve detail capture at the cost of higher computational complexity, increased memory demands, and a risk of overfitting. Here, we propose EfficientMorph, a transformer-based architecture for unsupervised 3D image registration. It optimizes the balance between local and global attention through a plane-based attention mechanism, reduces computational redundancy via cascaded group attention, and captures fine details without compromising computational efficiency, thanks to a Hi-Res tokenization strategy complemented by merging operations. We compare the effectiveness of EfficientMorph on two public datasets, OASIS and IXI, against other state-of-the-art models. Notably, EfficientMorph sets a new benchmark for performance on the OASIS dataset with ∼16-27× fewer parameters.
Z. Bastiani, R.M. Kirby, J. Hochhalter, S. Zhe.
Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients, Subtitled arXiv:2406.06751, 2024.
This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery. Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness, and potentially meet tail barriers, which can zero out the policy gradient and cause inefficient model updates. To overcome these limitations, we use transformers in conjunction with breadth-first-search to improve the learning performance. We use Bayesian information criterion (BIC) as the reward function to explicitly account for the expression complexity and optimize the trade-off between interpretability and data fitness. We propose a modified risk-seeking policy that not only ensures the unbiasness of the gradient, but also removes the tail barriers, thus ensuring effective updates from top performers. Through a series of benchmarks and systematic experiments, we demonstrate the advantages of our approach.
J.W. Beiriger, W. Tao, Z. Irgebay, J. Smetona, L. Dvoracek, N. Kass, A. Dixon, C. Zhang, M. Mehta, R. Whitaker, J. Goldstein.
A Longitudinal Analysis of Pre-and Post-Operative Dysmorphology in Metopic Craniosynostosis, In The Cleft Palate Craniofacial Journal, Sage, 2024.
DOI: 10.1177/10556656241237605
Objective
Design
Setting
Patients
Main Outcome Measures
Results
Conclusions
C.C. Berggren, D. Jiang, Y.F. Wang, J.A. Bergquist, L. Rupp, Z. Liu, R.S. MacLeod, A. Narayan, L. Timmins.
Influence of Material Parameter Variability on the Predicted Coronary Artery Biomechanical Environment via Uncertainty Quantification, Subtitled arXiv preprint arXiv:2401.15047, 2024.
Central to the clinical adoption of patient-specific modeling strategies is demonstrating that simulation results are reliable and safe. Indeed, simulation frameworks must be robust to uncertainty in model input(s), and levels of confidence should accompany results. In this study, we applied a coupled uncertainty quantification-finite element (FE) framework to understand the impact of uncertainty in vascular material properties on variability in predicted stresses. Univariate probability distributions were fit to material parameters derived from layer-specific mechanical behavior testing of human coronary tissue. Parameters were assumed to be probabilistically independent, allowing for efficient parameter ensemble sampling. In an idealized coronary artery geometry, a forward FE model for each parameter ensemble was created to predict tissue stresses under physiologic loading. An emulator was constructed within the UncertainSCI software using polynomial chaos techniques, and statistics and sensitivities were directly computed. Results demonstrated that material parameter uncertainty propagates to variability in predicted stresses across the vessel wall, with the largest dispersions in stress within the adventitial layer. Variability in stress was most sensitive to uncertainties in the anisotropic component of the strain energy function. Moreover, unary and binary interactions within the adventitial layer were the main contributors to stress variance, and the leading factor in stress variability was uncertainty in the stress-like material parameter that describes the contribution of the embedded fibers to the overall artery stiffness. Results from a patient-specific coronary model confirmed many of these findings. Collectively, these data highlight the impact of material property variation on uncertainty in predicted artery stresses and present a pipeline to explore and characterize forward model uncertainty in computational biomechanics.
J.A. Bergquist, B. Zenger, J. Brundage, R.S. MacLeod, T.J. Bunch, R. Shah, X. Ye, A. Lyons, M. Torre, R. Ranjan, T. Tasdizen, B.A. Steinberg.
Performance of Off-the-Shelf Machine Learning Architectures and Biases in Low Left Ventricular Ejection Fraction Detection, In Heart Rhythm O2, Vol. 5, No. 9, pp. 644 - 654. 2024.
Background
Objective
Methods
Results
Conclusions
K. Borkiewicz, E. Jensen, Y. Miao, S. Levy, J.P. Naiman, J. Carpenter, K.E. Isaacs.
Audience Reach of Scientific Data Visualizations in Planetarium-Screened Films, 2024.
Quantifying the global reach of planetarium dome shows presents significant challenges due to the lack of standardized viewership tracking mechanisms across diverse planetarium venues. We present an analysis of the global impact of dome shows, presenting data regarding four documentary films from a single visualization lab. Specifically, we designed and administered a viewership survey of four long-running shows that contained cinematic scientific visualizations. Reported survey data shows that between 1.2 - 2.6 million people have viewed these four films across the 68 responding planetariums (mean: 1.9 million). When we include estimates and extrapolate for the 315 planetariums that licensed these shows, we arrive at an estimate of 16.5 - 24.1 million people having seen these films (mean: 20.3 million).
O. Cankur, A. Tomar, D. Nichols, C. Scully-Allison, K. Isaacs, A. Bhatele.
Automated Programmatic Performance Analysis of Parallel Programs, Subtitled arXiv:2401.13150v1, 2024.
Developing efficient parallel applications is critical to advancing scientific development but requires significant performance analysis and optimization. Performance analysis tools help developers manage the increasing complexity and scale of performance data, but often rely on the user to manually explore low-level data and are rigid in how the data can be manipulated. We propose a Python-based API, Chopper, which provides high-level and flexible performance analysis for both single and multiple executions of parallel applications. Chopper facilitates performance analysis and reduces developer effort by providing configurable high-level methods for common performance analysis tasks such as calculating load imbalance, hot paths, scalability bottlenecks, correlation between metrics and CCT nodes, and causes of performance variability within a robust and mature Python environment that provides fluid access to lower-level data manipulations. We demonstrate how Chopper allows developers to quickly and succinctly explore performance and identify issues across applications such as AMG, Laghos, LULESH, Quicksilver and Tortuga.
A.M. Chalifoux, L. Gibb, K.N. Wurth, T. Tenner, T. Tasdizen, L. MacDonald.
Morphology of uranium oxides reduced from magnesium and sodium diuranate, In Radiochimica Acta, Vol. 112, No. 2, pp. 73-84. 2024.
Morphological analysis of uranium materials has proven to be a key signature for nuclear forensic purposes. This study examines the morphological changes to magnesium diuranate (MDU) and sodium diuranate (SDU) during reduction in a 10 % hydrogen atmosphere with and without steam present. Impurity concentrations of the materials were also examined pre and post reduction using energy dispersive X-ray spectroscopy combined with scanning electron microscopy (SEM-EDX). The structures of the MDU, SDU, and UO x samples were analyzed using powder X-ray diffraction (p-XRD). Using this method, UO x from MDU was found to be a mixture of UO2, U4O9, and MgU2O6 while UO x from SDU were combinations of UO2, U4O9, U3O8, and UO3. By SEM, the MDU and UO x from MDU had identical morphologies comprised of large agglomerates of rounded particles in an irregular pattern. SEM-EDX revealed pockets of high U and high Mg content distributed throughout the materials. The SDU and UO x from SDU had slightly different morphologies. The SDU consisted of massive agglomerates of platy sheets with rough surfaces. The UO x from SDU was comprised of massive agglomerates of acicular and sub-rounded particles that appeared slightly sintered. Backscatter images of SDU and related UO x materials showed sub-rounded dark spots indicating areas of high Na content, especially in UO x materials created in the presence of steam. SEM-EDX confirmed the presence of high sodium concentration spots in the SDU and UO x from SDU. Elemental compositions were found to not change between pre and post reduction of MDU and SDU indicating that reduction with or without steam does not affect Mg or Na concentrations. The identification of Mg and Na impurities using SEM analysis presents a readily accessible tool in nuclear material analysis with high Mg and Na impurities likely indicating processing via MDU or SDU, respectively. Machine learning using convolutional neural networks (CNNs) found that the MDU and SDU had unique morphologies compared to previous publications and that there are distinguishing features between materials created with and without steam.
N. Cheng, O.A. Malik, Y. Xu, S. Becker, A. Doostan, A. Narayan.
Subsampling of Parametric Models with Bifidelity Boosting, In Journal on Uncertainty Quantificatio., ACM, 2024.
Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise at reducing the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bifidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the nonboosted solution.
Y. Chen, Y. Ji, A. Narayan, Z. Xu.
TGPT-PINN: Nonlinear model reduction with transformed GPT-PINNs, Subtitled arXiv preprint arXiv:2403.03459, 2024.
We introduce the Transformed Generative Pre-Trained Physics-Informed Neural Networks (TGPT-PINN) for accomplishing nonlinear model order reduction (MOR) of transport-dominated partial differential equations in an MOR-integrating PINNs framework. Building on the recent development of the GPT-PINN that is a network-of-networks design achieving snapshot-based model reduction, we design and test a novel paradigm for nonlinear model reduction that can effectively tackle problems with parameter-dependent discontinuities. Through incorporation of a shock-capturing loss function component as well as a parameter-dependent transform layer, the TGPT-PINN overcomes the limitations of linear model reduction in the transport-dominated regime. We demonstrate this new capability for nonlinear model reduction in the PINNs framework by several nontrivial parametric partial differential equations.
M. Cooley, S. Zhe, R.M. Kirby, V. Shankar.
Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation, Subtitled arXiv preprint arXiv:2406.02336, 2024.
We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced accuracy over a variety of problems, we present (1) a family of orthogonality constraints that impose mutual orthogonality between the polynomial and the DNN within a PANN; (2) a simple basis pruning approach to combat the curse of dimensionality introduced by the polynomial component; and (3) an adaptation of a polynomial preconditioning strategy to both DNNs and polynomials. We test the resulting architecture for its polynomial reproduction properties, ability to approximate both smooth functions and functions of limited smoothness, and as a method for the solution of partial differential equations (PDEs). Through these experiments, we demonstrate that PANNs offer superior approximation properties to DNNs for both regression and the numerical solution of PDEs, while also offering enhanced accuracy over both polynomial and DNN-based regression (each) when regressing functions with limited smoothness.
M. Cooley, R.M. Kirby, S. Zhe, V. Shankar.
HyResPINNs: Adaptive Hybrid Residual Networks for Learning Optimal Combinations of Neural and RBF Components for Physics-Informed Modeling, Subtitled arXiv:2410.03573, 2024.
Physics-informed neural networks (PINNs) are an increasingly popular class of techniques for the numerical solution of partial differential equations (PDEs), where neural networks are trained using loss functions regularized by relevant PDE terms to enforce physical constraints. We present a new class of PINNs called HyResPINNs, which augment traditional PINNs with adaptive hybrid residual blocks that combine the outputs of a standard neural network and a radial basis function (RBF) network. A key feature of our method is the inclusion of adaptive combination parameters within each residual block, which dynamically learn to weigh the contributions of the neural network and RBF network outputs. Additionally, adaptive connections between residual blocks allow for flexible information flow throughout the network. We show that HyResPINNs are more robust to training point locations and neural network architectures than traditional PINNs. Moreover, HyResPINNs offer orders of magnitude greater accuracy than competing methods on certain problems, with only modest increases in training costs. We demonstrate the strengths of our approach on challenging PDEs, including the Allen-Cahn equation and the Darcy-Flow equation. Our results suggest that HyResPINNs effectively bridge the gap between traditional numerical methods and modern machine learning-based solvers.
H. Csala, A. Mohan, D. Livescu, A. Arzani.
Physics-constrained coupled neural differential equations for one dimensional blood flow modeling, Subtitled arXiv:2411.05631, 2024.
Computational cardiovascular flow modeling plays a crucial role in understanding blood flow dynamics. While 3D models provide acute details, they are computationally expensive, especially with fluid-structure interaction (FSI) simulations. 1D models offer a computationally efficient alternative, by simplifying the 3D Navier-Stokes equations through axisymmetric flow assumption and cross-sectional averaging. However, traditional 1D models based on finite element methods (FEM) often lack accuracy compared to 3D averaged solutions. This study introduces a novel physics-constrained machine learning technique that enhances the accuracy of 1D cardiovascular flow models while maintaining computational efficiency. Our approach, utilizing a physics-constrained coupled neural differential equation (PCNDE) framework, demonstrates superior performance compared to conventional FEM-based 1D models across a wide range of inlet boundary condition waveforms and stenosis blockage ratios. A key innovation lies in the spatial formulation of the momentum conservation equation, departing from the traditional temporal approach and capitalizing on the inherent temporal periodicity of blood flow. This spatial neural differential equation formulation switches space and time and overcomes issues related to coupling stability and smoothness, while simplifying boundary condition implementation. The model accurately captures flow rate, area, and pressure variations for unseen waveforms and geometries. We evaluate the model’s robustness to input noise and explore the loss landscapes associated with the inclusion of different physics terms. This advanced 1D modeling technique offers promising potential for rapid cardiovascular simulations, achieving computational efficiency and accuracy. By combining the strengths of physics-based and data-driven modeling, this approach enables fast and accurate cardiovascular simulations.
H. Dai, S. Joshi.
Refining Skewed Perceptions in Vision-Language Models through Visual Representations, Subtitled arXiv preprint arXiv:2405.14030, 2024.
Large vision-language models (VLMs), such as CLIP, have become foundational, demonstrating remarkable success across a variety of downstream tasks. Despite their advantages, these models, akin to other foundational systems, inherit biases from the disproportionate distribution of real-world data, leading to misconceptions about the actual environment. Prevalent datasets like ImageNet are often riddled with non-causal, spurious correlations that can diminish VLM performance in scenarios where these contextual elements are absent. This study presents an investigation into how a simple linear probe can effectively distill task-specific core features from CLIP’s embedding for downstream applications. Our analysis reveals that the CLIP text representations are often tainted by spurious correlations, inherited in the biased pre-training dataset. Empirical evidence suggests that relying on visual representations from CLIP, as opposed to text embedding, is more practical to refine the skewed perceptions in VLMs, emphasizing the superior utility of visual representations in overcoming embedded biases
Page 1 of 142