Assistant Professor at Mayo Clinic (Department of AI & Informatics). I work on medical image analysis and machine learning.
My research focuses on computational pathology, with particular emphasis on visual search in histopathology, multimodal agentic and generative AI for pathology image analysis, visual generative models, and reinforcement learning to emulate pathologist diagnostic workflows, alongside methodological advances in supervised, self-supervised, semi/weakly supervised learning, diffusion models, and reinforcement learning.
The study introduces HeteroTissue-Diffuse, a latent diffusion framework for synthesizing histopathology images that maintain tissue heterogeneity and fine morphological detail. Unlike conventional generative approaches that yield homogeneous samples, this method employs a novel conditioning mechanism and scales to both annotated and unannotated datasets, enabling the creation of realistic, diverse, and annotated synthetic tissue slides.
In this work, we introduce a fast patch selection method (FPS) for efficient selection of representative patches while preserving spatial distribution. HistoRotate, is a 360∘ rotation augmentation for training histopathology models, enhancing learning without compromising contextual information. PathDino, is a compact histopathology Transformer with five small vision transformer blocks and ≈9 million parameters.
This paper investigates the efficacy of the foundation models in the domain of histopathology by conducting a detailed comparison between these models, specifically CLIP derivatives (PLIP and BiomedCLIP), and traditional, domain-specific histology models that leverage well-curated datasets. Through a rigorous evaluation process on eight diverse datasets, including four internal from Mayo Clinic and four well-known public datasets (PANDA, BRACS, CAMELYON16, DigestPath). The findings show that domain-specific models, such as DinoSSLPath and KimiaNet, provide better performance across various metrics, underlining the significance of clean large datasets for histopathological analyses.
We propose SDM, a novel method for selecting diverse WSI patches, minimizing patch count while capturing all morphological variations. SDM outperforms the state-of-the-art, achieving high representativeness without needing parameter tuning.
We leveraged the power of 3D graphics and computer vision techniques to tackle a real-world problem, that we propose object-to-spot rotation estimation which is of particular significance for intelligent surveillance systems, bike-sharing systems, and smart cities. We introduced a rotation estimator (OSRE) that estimates a parked bike rotation with respect to its parking area.
We propose a new spatiotemporal attention scheme, termed synchronized spatiotemporal and spatial attention (SSTSA), which derives the spatiotemporal features with temporal and spatial multiheaded self-attention (MSA) modules.
Multimodal learning for video understanding (Text, Audio, RGB, Motion). We present a multimodal learning approach that leverage several modalities and several on-the-shelf models for both audio and language understanding. We proposed Irrelevant Modality Dropout (IMD) that drops the irrelevant audio from further processing while fusing the relevant audio-visual data for better video understanding.
This research study addresses the following question: To what extent can a fast independent adaptive algorithm select the most discriminative and representative frames to downsize huge video datasets while improving action recognition performance?
We propose variational Representation Learning for object Re-Identification. The proposed method has been evaluated on vehicle re-identification and person re-identification and face recognition.
For small objects detection like pedestrians in the outdoor surveillance, we propose a fast, lightweight, and auto-zooming-based framework for small pedestrian detection.