Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation

The research proposes a domain-aware selective adaptation method for medical image segmentation using fewer exemplars.This approach adapts general knowledge from large natural image models to medical domains, overcoming limitations in existing methods and offering an efficient, LMIC-friendly solution for healthcare diagnostics.
The dimensionality reduction visualisation map shows the relative relationship and distribution of different data sets in a two-dimensional feature space. It can be seen the characteristics of the medical data set (red points) compared to the general natural data set (blue/black points). A clear domain gap can be observed from this visualisation.

Chen Xu†, Qiming Huang†, Yuqi Hou, Jiangxing Wu, Fan Zhang, Hyung Jin Chang, Jianbo Jiao

The MIx Group, University of Birmingham, Fudan University

The Springer: Asian Conference on Computer Vision (ACCV) 2024

Paper

arXiv

Code

Group

Main Contributions:

We propose, to our knowledge, the first attempt towards adapting a general prior knowledge to various medical domains with only a few exemplars.

We introduce a new domain-aware selective adaptation approach, which enables simple yet effective fine-tuning of large pre-trained models and boosts their performance in target domains.

We identify the issues with the use of prompts in current prompt-based medical image segmentation models and propose a coarse prompt setting that better aligns with real-world scenarios.

Extensive experiments validate the effectiveness of the proposed method, achieving state-of-the-art performance under the challenging few exemplar setting, surpassing existing works by a large margin.

FEMed Architecture

The architecture of our Image Encoder enhanced with two specialized Adapters: (a) the Multi-Scale Features Adapter that captures features at various granularities through pyramid pooling, and (b) the High-Frequency Adapter that emphasizes salient textural details from frequency domain analysis. (c) These Adapters feed into the Selection Module, which uses a trainable binary decision layer to selectively integrate the most informative feature set at each transformer stage, effectively tailoring the feature landscape for optimal tumour delineation in CT/MRI scans.

Visual Prompt Settings:

The segment anything model (SAM) can adopt many types of visual prompts, \ie, scribbles, clicks, or boxes to segment the arbitrary object within the image. It demonstrates highly generalized segmentation performance using prompts during training and testing. This paper focuses on the form of bounding box prompt. Consequently, mainstream approaches to applying SAM for medical image segmentation follow this setting, utilizing prompts in both training and testing. We highlight that the previous methods' use of prompts in the medical segmentation domain is not suitable. We categorize prompts into two types: \textbf{fine-grained prompts and coarse prompts}. The fine-grained prompts, as shown in Fig. \ref{compa} A and C, are customary user-provided or generated from manually annotated results. They are bespoke for each image and provide strong prior knowledge of the target location. Coarse prompts, as illustrated in Figures B and D, remain consistent across different images and offer almost no prior knowledge. Note that our definitions of fine-grained and coarse prompts differ from those in ^[1].

Setting A:

Trained with file-grained bbox prompts and tested with fine-grained bbox prompts.

Setting B:

Trained with file-grained bbox prompts and tested with coarse bbox prompts.

Setting C:

Trained with coarse bbox prompts and tested with fine-grained bbox prompts.

Setting D:

Trained with coarse bbox prompts and tested with coarse bbox prompts.

Most SAM adapters in medical image segmentation rely on user-provided prompts or assuming prompts generated from segmentation annotations, i.e., the lesion area is already known, and a bounding box prompt for the lesion area is given, expecting the model to accurately segment the lesion within this region (setting A). However, this assumption is not applicable in real diagnostic scenarios. For unseen samples, the lesion area is unknown, making it impossible to provide such fine-grained visual prompts. Therefore, a prompt setting that aligns with real-world applications should be settings B and D, where only a coarse bounding box prompt can be provided during inference, for example, a rectangular box almost the same size as the original image. Since setting B uses different prompts for training and testing, performance is affected. Thus, this paper primarily investigates setting D in Fig. \ref{compa}. It's more challenging compare to the other settings since these is no accurate lesion area information provided.

Yang, Lingfeng, Wang, Yueze, Li, Xiang, Wang, Xinlong, Yang, Jian. "Fine-grained visual prompting." Advances in Neural Information Processing Systems, vol. 36, 2024.

Prompt Strategy

Four settings of using bbox prompts during training and testing stages. The coarse bounding box prompt is designed to be almost the same size as the input image data, with different ratios indicating the proportion of pixels by which the rectangle is shrunk inward relative to the entire image. A pseudo-code for coarse bbox prompt generation is shown in Algorithm 1 in the paper.

Comparison from MedSAM^[2], SAM-MED2D^[3], and our FEMed method (5-shot, 10-shot). The first column is the input image, the second column is the image with coloured ground truth masks, and the third and fourth columns are the image with coloured predicted masks by MedSAM and SAM-MED2D. The right two columns are the image with coloured predicted masks by our FEMed method.

Ma, et al. "Segment Anything in Medical Images with MedSAM." 2024.
Cheng, et al. "SAM-MED2D: Medical Image Segmentation with Segment Anything Model." 2023.