More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data

Question

1. What is the main limitation of transformers in medical imaging?

2. What is the BloodMNIST dataset and how was it used in the study?

3. What was the classification accuracy on the test set?

4. How do Compact Convolutional Transformers (CCT) perform in image classification tasks?

Accepted Answer

The main limitation of transformers in medical imaging is their need for large amounts of training data. Obtaining large, high-quality labeled datasets in medical imaging can be difficult and expensive due to factors such as the need for expert annotations and patient privacy protection. This challenge has led researchers to propose hybrid models that combine the strengths of CNNs and transformers, such as the Compact Convolutional Transformer (CCT). CCTs use convolutional layers to tokenize images and transformer layers, achieving high accuracy on image classification tasks with modestly sized datasets. However, the efficacy of CCTs in medical image classification with limited data has not been thoroughly investigated, making it a critical gap in the field. This study aims to fill this gap by investigating the performance of CCTs on a benchmark dataset of peripheral blood cell images, with each cell type represented by approximately 2,000 low-resolution samples. The results provide insights into the potential of CCTs as a solution to the data scarcity problem in medical imaging.

Accepted Answer

The BloodMNIST dataset is a subset of the MedMNIST benchmark dataset collection, composed of microscopic images of eight types of blood cells stained purple. The original source of the data is a data brief by Acevedo et al. titled 'A dataset of microscopic peripheral blood cell images for development of automatic recognition systems' [24] . The images were captured using the CellaVision DM96 analyzer at the Core Laboratory at the Hospital Clinic of Barcelona in Spain. The dataset contains images of individual normal cells, obtained from individuals who were free of any infection, hematologic disease, or oncologic disease. The cells were stained purple using the May Grunwald-Giemsa stain in the Sysmex SP1000i machine. The labels for each cell image were decided by expert pathologists. In the study, the dataset was split into training and test (90% and 10%, respectively). The images were converted to Numpy arrays and normalized to the range of 0-1. In each batch during training, images were randomly cropped and flipped to combat overfitting. The labels were one-hot encoded using Keras's to_categorical function. The Compact Convolutional Transformer's architecture was used to analyze the dataset, with an all-convolution mini-network to create image patches, tokenization with convolutional layers and max pooling layers, positional embedding, transformer layers with multi-head self attention and feed-forward neural network, stochastic depth regularization, layer normalization, and a dense layer to calculate attention weights and output logits. The model was compiled using the AdamW optimizer with a learning rate of 0.0018 and a weight decay of 0.00012. The batch size was set to 64 and the number of epochs was set to 75. The loss function used was categorical cross-entropy loss with label smoothing. The model's performance was evaluated using top-1 and top-2 accuracy, training and validation loss and accuracy curves, multi-class ROC curves, and a confusion matrix. The model was saved to an HDF5 file and will be made available.

Accepted Answer

The classification accuracy on the test set was 92.49%, with 3,421 unseen sample images. This indicates a high level of accuracy in the model's predictions. The precision, recall, and F1 score were examined for each of the eight cell types, revealing varying levels of performance. Cell type 5 had a precision of 0.91 and a recall of 0.73, while cell type 7 had a precision of 0.99, recall of 0.99, and F1 score of 0.99, indicating strong performance in identifying cells of type 7. The Receiver Operating Characteristic curve and confusion matrix further demonstrated the model's robust classification performance, with the lowest AUC score of 0.9833 for cell type 5 and the highest of 0.9999 for cell type 7. Overall, the model showed promising results in accurately classifying cell types.

Accepted Answer

Compact Convolutional Transformers (CCT) have shown high accuracy in image classification tasks despite limited training data. In a study, CCT achieved a classification accuracy of 92% using a modest dataset of 17,092 peripheral blood cell images, covering eight distinct cell types. The dataset had around two thousand low-resolution samples for each cell type on average. The results demonstrate the robustness of CCTs in handling data scarcity, particularly in biomedical imaging. Additionally, CCTs showed high accuracy even with low-resolution images, suggesting their robustness to variations in image quality. Overall, CCTs have proven effective in contexts where data is limited, opening up possibilities for their application in biomedical imaging and other data-constrained fields. This technology represents a significant step towards democratizing the use of transformers in niche areas.

More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What is the main limitation of transformers in medical imaging?

2. What is the BloodMNIST dataset and how was it used in the study?

3. What was the classification accuracy on the test set?

4. How do Compact Convolutional Transformers (CCT) perform in image classification tasks?

Citations

Reducing the volume of computations when building analogs of neural networks for the first stage of an ensemble classifier with stacking

E-MedViTR: Enhanced Vision Transformers with Registers for Biomedical Image Classification

Related Papers (5)

Boosting facial expression recognition using LDGP ¿ Local Distinctive Gradient Pattern

Pixel selection based on discriminant features with application to face recognition

Optimum selection of features for 2D (color) and 3D (depth) face recognition using modified PCA (2D)

Local Triangular Coded Pattern: A Texture Descriptor for Image Classification

Pixel selection in a face image based on discriminant features for face recognition