Han Zhang, Qian Zhang, Toan Duc Bui, Ismail Ben Ayed, Gang Li, Guodong Zeng, Oualid M. Benkarim, Dinggang Shen, Andrew Doyle, Li Wang, Kim-Han Thung, Guannan Li, Jitae Shin, Josien P. W. Pluim, GuoYan Zheng, Gerard Sanroma, Jing Xia, Dong Nie, Zhengwang Wu, Yongchao Xu, Pim Moeskops, Weili Lin, Jose Dolz, Élodie Puybareau, Jie Chen
21 Jan 2026
TL;DR: The iSeg-2017 challenge evaluates 21 automatic segmentation methods for 6-month-old infant brain images, highlighting top-ranked teams' pipelines and limitations, providing insights for methodological development in infant brain segmentation.
Abstract: Accurate segmentation of infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is an indispensable foundation for early studying of brain growth patterns and morphological changes in neurodevelopmental disorders. Nevertheless, in the isointense phase (approximately 6-9 months of age), due to inherent myelination and maturation process, WM and GM exhibit similar levels of intensity in both T1-weighted (T1w) and T2-weighted (T2w) MR images, making tissue segmentation very challenging. Despite many efforts were devoted to brain segmentation, only few studies have focused on the segmentation of 6-month infant brain images. With the idea of boosting methodological development in the community, iSeg-2017 challenge (http://iseg2017.web.unc.edu) provides a set of 6-month infant subjects with manual labels for training and testing the participating methods. Among the 21 automatic segmentation methods participating in iSeg-2017, we review the 8 top-ranked teams, in terms of Dice ratio, modified Hausdorff distance and average surface distance, and introduce their pipelines, implementations, as well as source codes. We further discuss limitations and possible future directions. We hope the dataset in iSeg-2017 and this review article could provide insights into methodological development for the community.
Şenol Pişkin, J S F Josephin, K B Kose, F Z Gungoren, M H Alzaeim, O F Sahin, I Yilmaz, M A K Umair, M O Alboushi, Ruqaiyah Mirza, F Aktay, B N Yazici, Ahmed Boray, Z Dulli, Ibrahim Faress
TL;DR: This study develops and evaluates a 3D U-Net model for automatic segmentation of pediatric cardiovascular structures, exploring dataset composition, data augmentation, hyperparameter tuning, and architectural enhancements, achieving a mean Dice coefficient of 0.8330 and Jaccard Index of 0.7356.
Abstract: Abstract Background/Introduction Automatic segmentation of cardiovascular structures from medical images is essential for the diagnosis and monitoring of congenital heart disease. Manual segmentation is a labour-intensive and time consuming process. In this case several Deep learning models, including the U-Net architecture, offer a better alternative for automatic and precise segmentation. Achieving high accuracy is crucial and challenging. Moreover it depends heavily on dataset quality, data processing strategies, and model architecture. Purpose This study set out to develop and evaluate a 3D U-Net model tailored for segmenting major cardiovascular structures in pediatric medical images. Our goal was to understand how different strategies—such as dataset composition (local, open-source, and combined), data augmentation, hyperparameter tuning, and architectural enhancements like attention mechanisms—affect segmentation performance. Methods We trained a 3D U-Net using three dataset configurations: a local clinical dataset (48 patients), and a combined dataset (85 patients). Training was conducted with a high-performance computing (HPC) system which used Tesla-V100 GPU. For assessing the impact of various optimization methods, we systematically experimented with and without data augmentation techniques. This included random affine and elastic transformations. Additionally, we performed hyperparameter tuning using Optuna and compared results without using the framework to serve as baselines. Furthermore, we compared the performance of a standard 3D U-Net architecture with that of a modified version incorporating Attention Gates, as well as without them. We assessed model performance using the Dice Similarity Coefficient (DSC) and Jaccard Index. Results We trained the U-Net model on the combined dataset of 85 patients. The predicted mask revealed improved performance differences based on the applied methodology. The U-Net model was configured for 1000 epochs. Various methods were implemented for improving performance of the models including data augmentation, Optuna hyper parameter tuning. The model predicted a mean Dice coefficient of 0.8330 and a Jaccard Index of 0.7356. In contrast, Attention Gates into the model resulted in a lower mean Dice Coefficient of 0.7774. Qualitative results (Figure 1) demonstrate the segmentation capabilities of models trained on the combined dataset when tested on unseen local and open-source patient data. A summary of these key performance metrics is presented in Table 1. Conclusion(s) On a combined, heterogeneous dataset, a standard 3D U-Net architecture without augmentation yielded the most consistent segmentation performance. Pre-processing techniques such as CLAHE remain valuable complementary methods for enhancing data quality.Qualitative segmentation results of theConsolidated Performance Metrics Across
Max Steiger, Mohammad Rezapourian, Christian Hansen
8 Jan 2026
TL;DR: This dataset provides 30,000 frames of multi-view RGB video sequences for training and evaluating markerless needle-tracking systems in CT-guided surgical environments, with synchronized RGB cameras capturing realistic needle-manipulation scenarios under diverse conditions.
Abstract: Needle Tracking Dataset - Clinical & Semi-Clinical Overview This dataset contains multi-view RGB video sequences for training and evaluating markerless needle-tracking systems in CT-guided surgical environments. The data were acquired at the Research Campus STIMULATE in Magdeburg, using synchronized RGB cameras to capture realistic needle-manipulation scenarios under diverse conditions. The dataset comprises two complementary subsets:- Clinical Dataset: ~22,000 frames recorded in a CT laboratory replicating real interventional workflows- Semi-Clinical Dataset: ~8,000 frames acquired under controlled laboratory conditions with systematic parameter variations Combined, the dataset provides approximately 30,000 frames spanning a comprehensive range of needle movements, environmental conditions, and realistic procedural scenarios. Hardware Setup- Cameras: Three hardware-synchronized RGB cameras (1920×1080 px resolution, 60 FPS)- Needles: Five biopsy needles with varying specifications: - 10 cm / 14G - 15 cm / 14G - 15 cm / 16G - 15 cm / 18G - 20 cm / 16G- Phantom: Abdomen phantom with surgical drapes- Environment: Clinical setup mimics realistic CT interventions; semi-clinical setup focuses on controlled parameter variations Ground Truth AcquisitionAccurate 3D needle-tip coordinates were obtained using independent reference systems operating simultaneously with the RGB-based tracking: - Semi-Clinical Dataset: Atracsys optical tracker (Target Registration Error < 0.3 mm)- Clinical Dataset: AprilTag-based image tracking (Target Registration Error < 0.7 mm) All frames were processed using needle-tracking software that integrates three synchronized RGB cameras and a UNetConvNeXt segmentation model to estimate markerless tip and base positions (TRE < 0.7 mm). Data Characteristics Sequence Properties:- Duration: 2 to 30 seconds per sequence- Temporal continuity was ensured across all sequences- Frame rate: 60 FPS- Resolution: 1920×1080 pixels Variation Coverage:The dataset includes systematic variations across multiple dimensions to ensure robust model training: 1. Lighting Conditions: Range from optimal CT-lab lighting to challenging low-light scenarios2. Occlusion Patterns: Partial occlusions from hands, surgical tools, and drapes3. Motion Dynamics: - Stable, controlled needle movements - Rapid intentional displacements - Complex manipulation patterns - Initial stabilization phases4. Environmental Factors: - Different backgrounds (laboratory, CT environment) - With and without phantom presence - Realistic surgical drape configurations5. User Interactions: Multiple operators with different handling techniques and viewing angles6. Camera Perspectives: Multi-view synchronized capture from three fixed positions Data Content Each frame contains:- RGB Images: Three synchronized camera views (1920×1080)- Camera System Tracking: 2D and 3D coordinates from the tracking system- Ground Truth: 2D and 3D coordinates from independent reference systems (Atracsys/AprilTag) Use CasesThis dataset is designed for:- Training and evaluating needle tracking algorithms- Uncertainty quantification in surgical tracking systems- Real-time needle localization in CT-guided interventions- Robustness testing under challenging conditions (occlusion, motion, lighting)- Multi-view triangulation and sensor fusion research- Temporal prediction and motion modeling- Tracking error analysis and calibration validation Data Format The dataset is provided as HDF5 files with JPEG-compressed images for efficient storage and fast loading. File Structure: needle_tracking_dataset_jpeg.h5├── sequence_name_1/│ ├── tracking # Camera system predictions (N_frames × 15)│ ├── ground_truth # Reference system measurements (N_frames × 6)│ ├── camera_0/│ │ └── frames_jpeg # JPEG-compressed frames (variable length)│ ├── camera_1/│ │ └── frames_jpeg│ └── camera_2/│ └── frames_jpeg├── sequence_name_2/│ └── ... Data Columns: Tracking data (camera system):- frame_id: Frame number- tip_2d_cam1_x, tip_2d_cam1_y, tip_2d_cam2_x, tip_2d_cam2_y, tip_2d_cam3_x, tip_2d_cam3_y: 2D tip coordinates in each camera view- base_2d_cam1_x, base_2d_cam1_y, base_2d_cam2_x, base_2d_cam2_y, base_2d_cam3_x, base_2d_cam3_y: 2D base coordinates in each camera view- tip_3d_x, tip_3d_y, tip_3d_z: 3D tip coordinates from tracking system Ground truth data (reference system):- marker_tip_3d_x, marker_tip_3d_y, marker_tip_3d_z: Ground-truth 3D tip coordinates- marker_base_3d_x, marker_base_3d_y, marker_base_3d_z: Ground-truth 3D base coordinates Loading Example: import h5pyfrom PIL import Imageimport numpy as npimport io with h5py.File('needle_tracking_dataset_clinical_jpeg.h5', 'r') as f: seq = f[''] # Load tracking data tracking_2d = seq['tracking_2d'][:] # Shape: (N_frames, 13), dtype: int16 tracking_3d = seq['tracking_3d'][:] # Shape: (N_frames, 3), dtype: float32 # Load ground truth data ground_truth = seq['ground_truth'][:] # Shape: (N_frames, 6), dtype: float32 # Load JPEG frame jpeg_bytes = seq['camera_0/frames_jpeg'][0].tobytes() img = Image.open(io.BytesIO(jpeg_bytes)) frame = np.array(img) Compression:- Images: JPEG quality 98 (~70-75% size reduction from PNG)- Tracking/Ground truth: gzip compression level 5 Quality Assurance- All sequences manually reviewed for quality- Frames with calibration failures excluded- Ground-truth accuracy verified against reference measurements- Temporal consistency validated across sequences CitationIf you use this dataset, please cite: Steiger, M., Rezapourian, M., Rak, M., Hansen, C. (2025).Dynamic Uncertainty Level Assessment Framework for Real-Time Needle Tracking in CT-Guided Surgical Environments.Funding The work was partially funded by the German Federal Ministry of Research, Technology, and Space (within the Research Campus STIMULATE under the grant number ‘13GW0473A’) and by the European Regional Development Fund (under the operation number ‘ZS/2023/12/182010’ as part of the initiative ‘Sachsen-Anhalt WISSENSCHAFT Schwerpunkte’).
Alice Gros, Jules Vanaret, Valentin Dunsing, Agathe Rostan, Philippe Roudot, Pierre-François Lenne, Léo Guignard, Sham Tlili
13 Jan 2026
TL;DR: Researchers developed a pipeline for whole-mount deep imaging and analysis of multi-layered organoids, enabling quantification of cell-scale processes and tissue architecture, and demonstrated its application in gastruloids, revealing insights into tissue-scale organization and development.
Abstract: Whole-mount 3D imaging at the cellular scale is a powerful tool for exploring complex processes during morphogenesis. In organoids, it allows examining tissue architecture, cell types, and morphology simultaneously in 3D models. However, cell packing in multilayered organoid tissues hinders both deep imaging and quantification of cell-scale processes. To address these challenges, we developed an experimental and computational pipeline to extract properties at scales ranging from cell to tissue. The experimental module is based on two-photon imaging of immunostained organoids. The computational module corrects for optical artifacts, performs accurate 3D nuclei segmentation and reliably quantifies gene expression. We provide the computational module as a user-friendly Python package called Tapenade, along with napari plugins which enable joint data processing and exploration across scales. We demonstrate the pipeline by quantifying 3D spatial patterns of gene expression and nuclear morphology in gastruloids, revealing how local cell deformations and gene co-expression relate to tissue-scale organization. This quantitative pipeline improves our understanding of gastruloid development, and lays the groundwork for a wide range of multi-layered organoids and tumoroids systems
TL;DR: Researchers provide sourcecode and dataset for semantic segmentation modeling of few-shot sample batik patterns, enabling reproducibility and advancement of techniques for pattern recognition and classification in limited data scenarios.
Abstract: Sourcecode and Dataset for Research on Semantic Segmentation Modeling of Few-Shot Sample Batik Patterns
Lisa Guzzi, María A. Zuluaga, Riccardo Taiello, Fabien Lareyre, Gilles Di Lorenzo, Sébastien Goffart, Andréa Chierici, Juliette Raffort, Hervé Delingette
TL;DR: This study develops a digital twin-based system for restoring ancient sculptures, integrating laser point clouds, IoT sensors, and GPU rendering for precise, real-time decision support, achieving a stable RMSE of 3.5 mm and expert score of 4.04/5.
Abstract: This study applies digital twin technology to enhance the restoration and monitoring of ancient sculptures, integrating laser point clouds, LoRa-enabled IoT sensors, and GPU rendering for precise, real-time decision support. Using a Ming Dynasty grotto as a case study, the authors propose a dual-loop framework: an acquisition loop with multi-source sensing and normal-constrained semantic segmentation generating a millimeter-level semantic mesh, and a monitoring loop with rule-data hybrid calibration for real-time micro-strain and environmental monitoring. GPU-accelerated pipeline enables interactive visualization, while a state machine ensures twin-physical synchronization. Deployed in 48 hours, the system achieves a stable RMSE of 3.5 mm, 0.3% false negative rate, and an expert score of 4.04/5 (18% higher than controls). This work demonstrates visualized digital twins' value in cultural heritage restoration precision and decision-making, with modular adaptability for cross-type heritage—including agricultural heritage (e.g., ancient terraces, irrigation systems)—preservation.
TL;DR: UltraMamba, a novel multimodal ultrasound image fusion framework, improves breast lesion segmentation accuracy by 2.59% and reduces error by 6.78mm, outperforming existing methods on the BreLS dataset, a comprehensive 2D multimodal ultrasound breast lesion dataset.
Abstract: Multimodal ultrasound imaging, combining B-mode ultrasound, shear wave velocity, and shear wave time, is crucial for diagnosing and treating breast lesions, providing insights into lesion characteristics and tissue properties. However, challenges arise from intermodal feature misalignment and attention shifts due to varied capture methods and an overemphasis on vibrant color data. To tackle these issues, we introduce two innovations: a novel segmentation framework and a comprehensive dataset. The UltraMamba framework utilizes bidirectional alignment between modalities and enhances region-specific information to improve breast lesion segmentation accuracy. Key components include the Cross-Modal Knowledge Interaction module for robust information exchange and the Region-Aware Feature Excitation module to focus on relevant features. We also present the BreLS dataset, the first two-dimensional multimodal ultrasound breast lesion dataset, with paired images from 506 cases, serving as a valuable resource for analysis. UltraMamba shows strong performance on the BreLS dataset, achieving a Dice Similarity Coefficient of 72.16% and an HD95 of 42.02 mm, reflecting improvements of 2.59% in DSC and a 6.78 mm reduction in HD95 compared to the second-best framework, MMCA-NET. These results highlight UltraMamba's potential to enhance segmentation accuracy in clinical settings, facilitating precise treatment planning and, ultimately, leading to improved outcomes. Code: https://github.com/deepang-ai/UltraMamba.
TL;DR: Researchers tested the "Segment Anything Model 2" (SAM 2) to reduce manual workload in CT and MRI annotation, achieving 30-53% workload reduction with comparable segmentation model performance, expediting 3D medical imaging dataset annotation and model development.
Abstract: Volumetric segmentation in CT and MRI is valuable for artificial intelligence workflows in radiology, yet creating the large, precisely annotated datasets required for training segmentation models remains laborious. Here, we tested in simulation whether the foundation model “Segment Anything Model 2” (SAM 2) can reduce expert annotation workload. In our workflow, annotators provide a single box at the object’s center, and SAM 2 automatically segments the object across slices; annotators then review and correct the masks as needed. Workload reduction was defined as the proportion of SAM 2’s predicted segmentation masks that were accepted without modification. Downstream segmentation models were then trained on the SAM 2-assisted masks and compared with reference models trained on ground truth masks. For femoral bone segmentation in MRI and liver tumor segmentation in CT, 36,614 sagittal and 16,311 axial slices were annotated, with 30% and 53% of SAM 2-generated masks accepted as is, respectively, indicating workload reduction. Crucially, segmentation models trained on SAM 2-assisted masks performed comparably to reference models, with a median dice similarity coefficient of 98.5% compared with 98.7% for femoral bone segmentation, and 77.3% compared with 77.0% for liver tumor segmentation. Using SAM 2 could thus expedite 3D medical imaging dataset annotation and segmentation model development for both research and clinical applications.
TL;DR: EHPNet, a novel edge-aware method, is proposed for accurate leaf segmentation in complex field environments, achieving superior performance (98.25-99.25%) on a composite dataset, outperforming state-of-the-art methods in mean IoU, accuracy, precision, recall, and F1 score.
Abstract: Accurate plant leaf image segmentation plays a crucial role in species recognition, phenotypic analysis, and disease detection. However, most segmentation models perform poorly in complex field environments due to challenges such as overlapping leaves and uneven sunlight. This research proposes an Edge-Aware High-Frequency Preservation Network (EHPNet) for leaf segmentation in complex field environments. Specifically, a High-Frequency Edge Fusion Module (HEFM) is introduced into the skip connections to preserve high-frequency edge information during feature extraction and enhance boundary localization. In addition, a Structural Recalibration Attention Module (SRAM) is incorporated into the decoder to refine edge structural features across multiple scales and retain spatial continuity, which leads to more accurate reconstruction of leaf boundaries. Experimental results on a composite dataset constructed from Pl@ntLeaves and ATLDSD show that EHPNet achieves 98.25%, 99.25%, 99.03%, 98.51%, and 98.77% in mean Intersection over Union (mIoU), accuracy, precision, recall, and F1 score, respectively. Compared with state-of-the-art methods, EHPNet achieves superior overall performance, which demonstrates its effectiveness for leaf segmentation in complex field environments.
TL;DR: This study introduces HHF-SAM, a segmentation framework optimized for abdominal CT images, overcoming general segmentation model limitations in medical image processing, and providing a reliable tool for clinical auxiliary diagnosis and lesion delineation.
Abstract: This study presents an SAM-based framework optimized for the unique characteristics of abdominal CT images, effectively overcoming the limitations of general segmentation models in medical image processing. The proposed HHF-SAM provides a reliable tool for clinical auxiliary diagnosis, reducing inter-reader variability and improving efficiency in lesion delineation.
TL;DR: This study develops a contrastive learning framework to enrich ECG signals with cardiac MRI features, predicting structural features and cardiovascular disease outcomes with high accuracy, particularly using 3D CMR data and temporal dynamics.
Abstract: Abstract Background Cardiovascular disease (CVD) remains the leading cause of mortality. Early detection of CVD requires diagnostic tools that are scalable, accessible, and low-cost. While cardiac magnetic resonance imaging (CMR) provides detailed structural and functional cardiac information, its limited availability and high costs restrict widespread use. In contrast, the electrocardiogram (ECG) is widely available but lacks the rich anatomical and mechanical information of the CMR. We hypothesize that ECG-based latent representations can be enriched with CMR features by leveraging contrastive learning (CL). Purpose We aim to develop a CL framework that fuses CMR metrics to the ECG signal representation and use it to predict CMR features and CVD outcomes. Methods We used 63,448 subjects from the UK Biobank with same-day short-axis cine CMR and 12-lead resting ECG recordings. A pretrained segmentation model was used to crop the CMR images around the heart. As proposed by previous work, the model learns cross-modal latent representations by minimizing the distance between data from the same subject, and maximizing the distance between pairs from different participants, described in Figure 1. The model learns cross-modal latent representations by minimizing the distance between data from the same subject, and maximizing the distance between pairs from different participants, described in Figure 1. We further enhance the CMR representation with 3D cardiac volumes from ES and ED timepoints. We evaluated the ECG encoder's ability to predict: (1) CMR metrics, including left ventricular ejection fraction (LVEF), right ventricular ejection fraction (RVEF) and cardiac output; and (2) the most prevalent cardiovascular diseases, including coronary artery disease (CAD), atrial fibrillation (AF), sudden cardiac death (SCD), heart failure (HF), myocardial infarction (MI) and cardiomyopathy (CMP). We used 47,527/10,185/10,185 subjects for the training/validation/held-out test cohorts. We compared five models: (1) ECG only, (2) ECG with CL trained on one mid-ventricular end-diastolic slice from the CMR image, (3) ECG with CL trained on 2D+time data of one middle slice over time, (4) ECG with CL using 3D CMR data with multiple slices over the volume, and (5) ECG with triplet contrastive loss (TCL) with 4D CMR data combining the end-diastolic and end-systolic volumes over time. Results The model with TCL showed the highest performance for the prediction of CMR features, with an average R2 of 0.605 (Figure 2). Adding temporal dynamics and 3D volume improved the model performance compared to using a single image. However, it did not improve the performance of the clinical endpoints. Conclusion We design a CL pre-training strategy that proves effective in enriching ECG representations with 3D volume CMR derived embeddings, enabling a low cost and non-invasive ECG-based risk stratification for cardiac pathology in the general population.
TL;DR: UVInsDet is an open dataset of 591 UV camera images with 1,415 annotated insulator instances, providing polygon-based masks and COCO-compatible annotations for training and testing instance segmentation models, particularly for glass and porcelain insulator detection.
Abstract: UVInsDet is an open dataset of narrow-angle RGB-channel of ultraviolet camera images of high-voltage insulator strings acquired during diagnostic inspections of power substation. The dataset contains 591 images with 1,415 manually annotated instances belonging to two classes (glass and porcelain insulators). Polygon-based instance segmentation masks are provided as primary ground truth in LabelMe format. For convenience and reproducibility, COCO-compatible annotations are included as a derived format, generated from the original LabelMe annotations using the provided conversion script. The dataset is split into training and test subsets by directory structure. Baseline instance segmentation results obtained using Mask R-CNN and Cascade Mask R-CNN are reported in the accompanying manuscript (under review now). The dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
TL;DR: A novel deep learning model is developed for automated segmentation of internal mammary artery, aorta, and perivascular regions from CT angiography, achieving high accuracy (0.7876-0.961) and generalizability for large-scale clinical applications.
Abstract: Abstract Background A prior study demonstrated that the novel radiotranscriptomic signature, C19RS, holds prognostic significance for clinical outcomes (Figure 1a). The automated segmentation of vascular structures, including the internal mammary artery (IMA), the aorta, and their surrounding perivascular areas from contrast-enhanced CT angiography (CCTA), would facilitate the high-throughput extraction of radiomic profiles. Purpose Our goal is to create an innovative deep learning (DL) model for the IMA and aorta that facilitates the automated calculation of C19RS in extensive cohort analyses. Methods The model utilises a distinct architecture that combines a CNN (squeeze-and-excitation block) and a transformer (Swin block) to improve segmentation by alternating between these blocks, which helps in capturing discriminative features (Figure 1b). The model was built using the CCTA (n = 227) dataset from the OxHVF study conducted in the UK, applying standardised preprocessing techniques such as resampling, clipping, and intensity normalisation. An iterative refinement process occurred three times (n = 140), resulting in a robust model (see Figure 1c). An external validation cohort (n = 751) from an international site in the United States was utilised, with all segmentations subjected to manual expert review for quality assessment. Lastly, a publicly available dataset (ASOCA) was also validated externally (n = 318). Results The model achieved a mean Dice similarity score (DSC) of 0.7876±0.0176 for IMA/peri-IMA segmentation and 0.9207±0.0057 for aorta/periaortic region segmentation (See figure). After refinement, it achieved a DSC of 0.947 for IMA/peri-IMA segmentation (Figure 1c,d). In the external cohort, 679 out of 751 cases (90.4%) were considered clinically acceptable for both regions; the remaining cases were excluded because the CCTAs' narrow field of view did not capture the IMA/aorta. In the ASOCA cohort, the model consistently performed at 0.961 ± 0.039. These results underscore the model’s generalizability and scalability for large-scale clinical applications. Conclusion This study presents a powerful and clinically flexible DL model designed for the automatic segmentation of vascular structures, specifically the IMA, aorta, and their surrounding perivascular space. Its use in radiotranscriptomic biomarker analysis presents an exciting opportunity for non-invasive prediction of patient outcomes, making it a significant resource for cardiovascular research and clinical applications.
TL;DR: This paper proposes Parkllite, a satellite-assisted smart parking system using LEO imagery and AI-based urban segmentation to determine real-time parking availability globally, aiming to deliver an infrastructure-free solution for smart cities and future mobility systems.
Abstract: Urban mobility suffers from chronic inefficiencies due to the lack of real-time information on parking availability. Existing systems rely heavily on ground sensors, CCTV networks, or manual user input, all of which fail to scale across large cities or global regions.This paper proposes a satellite-assisted smart parking system that utilizes Low Earth Orbit (LEO) satellite imagery, geospatial AI, and parking slot segmentation models to determine real-time parking availability across cities worldwide. Inspired by an idea conceptualized in 2024, this system aims to deliver a global, infrastructure-free parking detection solution capable of serving smart cities, autonomous vehicles, and future AR-based mobility systems.
Qiu Guan, Zhiqiang Yang, Dezhang Ye, Yang Chen, Xinli Xu, Ying Tang
8 Jan 2026
TL;DR: DB-MSMUNet, a novel encoder-decoder architecture, is proposed for robust pancreatic segmentation in CT scans, achieving high Dice Similarity Coefficients (89.47-89.02%) on three datasets, outperforming existing state-of-the-art methods in accuracy, edge preservation, and robustness.
Abstract: Accurate segmentation of the pancreas and its lesions in CT scans is crucial for the precise diagnosis and treatment of pancreatic cancer. However, it remains a highly challenging task due to several factors such as low tissue contrast with surrounding organs, blurry anatomical boundaries, irregular organ shapes, and the small size of lesions. To tackle these issues, we propose DB-MSMUNet (Dual-Branch Multi-scale Mamba UNet), a novel encoder-decoder architecture designed specifically for robust pancreatic segmentation. The encoder is constructed using a Multi-scale Mamba Module (MSMM), which combines deformable convolutions and multi-scale state space modeling to enhance both global context modeling and local deformation adaptation. The network employs a dual-decoder design: the edge decoder introduces an Edge Enhancement Path (EEP) to explicitly capture boundary cues and refine fuzzy contours, while the area decoder incorporates a Multi-layer Decoder (MLD) to preserve fine-grained details and accurately reconstruct small lesions by leveraging multi-scale deep semantic features. Furthermore, Auxiliary Deep Supervision (ADS) heads are added at multiple scales to both decoders, providing more accurate gradient feedback and further enhancing the discriminative capability of multi-scale features. We conduct extensive experiments on three datasets: the NIH Pancreas dataset, the MSD dataset, and a clinical pancreatic tumor dataset provided by collaborating hospitals. DB-MSMUNet achieves Dice Similarity Coefficients of 89.47%, 87.59%, and 89.02%, respectively, outperforming most existing state-of-the-art methods in terms of segmentation accuracy, edge preservation, and robustness across different datasets. These results demonstrate the effectiveness and generalizability of the proposed method for real-world pancreatic CT segmentation tasks.
TL;DR: This paper proposes VLD-PL, a Vision-Language Driven Prompt Learning framework for weakly supervised semantic segmentation, addressing under-activation and co-occurrence issues through auxiliary class matching and background class filtering, achieving state-of-the-art performance on PASCAL VOC and MS COCO benchmarks.
Abstract: The primary challenges in image-level weakly supervised semantic segmentation (WSSS) lie in addressing the under-activation issue of target pixels and mitigating the co-occurrence phenomenon in class activation maps. In recent years, Vision-Language Models (VLM) have demonstrated exceptional performance across various vision tasks, primarily attributed to their cross-modal semantic alignment capabilities achieved through contrastive learning mechanisms. Leveraging VLM’s capability to capture fine-grained visual-textual correspondences, this paper proposes a novel Vision-Language Driven Prompt Learning (VLD-PL) framework that addresses two fundamental challenges in WSSS by establishing explicit semantic correspondences between textual descriptors and visual components, ultimately enabling efficient semantic segmentation. The VLD-PL framework consists of two core components Auxiliary Class Matching (ACM) and Background Class Filtering (BCF). The ACM module dynamically identifies semantically relevant auxiliary classes through feature alignment between image and textual embeddings, effectively enlarging target activation while mitigating co-occurrence interference by expanding semantic coverage. Simultaneously, the BCF constructs image-specific background prompts and adaptively refines background feature representations, achieving precise suppression of irrelevant background regions. These dual mechanisms synergistically address both target localization accuracy and background noise suppression, achieving state-of-the-art performance on both the PASCAL VOC 2012 and MS COCO 2014 benchmarks.
TL;DR: This paper proposes EDSC-HRAFNet, a novel apple tree branch semantic segmentation model for orchard harvesting robots, achieving state-of-the-art performance with 91.50% precision, 91.71% recall, and 95.58% mean pixel accuracy in complex orchard environments.
Abstract: Accurate semantic understanding of tree branches is critical for orchard harvesting robots in automated fruit harvesting and pruning. Existing methods suffer from low detection accuracy and limited adaptability in complex orchard environments. This paper proposes a novel apple tree branch semantic segmentation model (EDSC-HRAFNet) for orchard harvesting robots under complex orchards conditions. The presented Enhanced Dynamic Snake Convolution module (EDSC_unit) is integrated into High-Resolution Network (HRNet) backbone to extract topological features such as branching points and bifurcations. Then, the HeteroFPN module is designed as the Neck structure, and performs semantic-position information cyclic interaction on the multi-level output features of Backbone in a dual-path collaborative framework, obtaining multi-level features with stronger comprehensive representation capabilities. And the Parallel-M4 Decode module is designed for the network head, performing parallel processing based on the characteristics of features at different levels. This framework could concatenate the features to generate geometrically precise segmentation masks. Finally, we constructed a dataset of in-situ apple trees under diverse real-world conditions to verify the performance and superiority of EDSC-HRAFNet. EDSC-HRAFNet demonstrates state-of-the-art segmentation performance across eight challenging orchard scenarios and exhibits robust generalization. Experiments show that proposed model achieves precision, recall, dice, IoU, MIoU, and MPA of 91.50%, 91.71%, 91.60%, 84.51%, 91.72%, and 95.58%, respectively. These improvements are 6.98%, 10.09%, 9.21%, 13.5%, 7.25%, and 5.24% compared to HRNet. Compared with existing models including Pspnet, Deeplabv3+ series and Unet series, the precision, recall, IoU is improved by 13.92% to 29.61%, 23.59% to 42.38% and 28.79% to 46.67% respectively. EDSC-HRAFNet's superior branch segmentation capability in challenging orchard environments provides a practical foundation for robotic automation in agricultural orchards. • This paper proposes a novel apple tree branch semantic segmentation model (EDSC-HRAFNet) for orchard harvesting robots under complex orchards conditions. • The presented EDSC_unit module is integrated into HRNet backbone repeat multi-scale fusion to better capture slender branch structures and complex tree morphology. • Proposed HeteroFPN provides topologically-aware features for branch segmentation. Cyclic semantic-localization interaction significantly improves bifurcation and thin-end perception. • The designed Parallel-M4 Decoder preserves extreme-scale feature integrity, optimizes mid-high-level geometric consistency via parallel processing, and enhances topological adaptability. • Experiments show that EDSC-HRAFNet achieves Precision, Recall, Dice, IoU, MIoU, and MPA of 91.50%, 91.71%, 91.60%, 84.51%, 91.72%, and 95.58%, respectively.
TL;DR: Researchers developed an AI-driven model using RGB and thermal imaging to evaluate dairy cow cleanliness, achieving improved accuracy with thermal input and outperforming traditional methods, enabling objective and interpretable monitoring in livestock barn environments.
Abstract: • RGB-Thermal imaging was used to evaluate dairy cow cleanliness automatically. • A two-stage machine learning pipeline (segmentation + score prediction) was developed. • Thermal input boosted segmentation accuracy over RGB alone (mIoU + 48 %) • CNN outperformed MLR (on dirt pixel counts) for predicting human cleanliness scores. • The approach supports interpretable, objective monitoring in livestock barn environment. Accurate assessment of dairy cow cleanliness is essential for ensuring animal welfare, maintaining udder health, and optimising milk production. Traditional visual inspections are subjective and often fail to distinguish dirt from natural coat patterns, especially in spotted breeds. This research investigates the applicability of a two-stage approach for automated cleanliness evaluation, consisting of (i) semantic segmentation of dirt areas on cow coats and (ii) regression from the resulting masks to numerical cleanliness scores. The first stage was implemented using the U-Net and DeepLabV3 architectures, which were trained on either RGB-only or RGB-Thermal (RGB-T) images. Incorporating thermal information significantly improved segmentation accuracy: U-Net achieved a mean Intersection over Union (mIoU) of 0.5244 on RGB-T images, compared to 0.3537 on RGB images, while DeepLabV3 on RGB-T images reached an mIoU of 0.5049. The second stage compared two regression strategies: multiple linear regression (MLR) on the number of pixels classified as dirt, and convolutional neural networks (CNNs) trained directly on the masks. CNN-based regression consistently outperformed MLR, with the best performance obtained by combining RGB-T segmentation and CNN regression (DeepLabV3 + CNN: MAPE 23.05 %; U-Net + CNN: MAPE 25.24 %). These findings support the feasibility of a two-stage RGB-T-based approach for objective cleanliness evaluation, highlighting the benefits of thermal information for segmentation and CNNs for score prediction.
TL;DR: A deep learning-based semantic segmentation method is applied to robotic combine harvesters for safe and efficient operation, detecting various objects in paddy fields, including rice areas, humans, and field features, to enable quick and accurate identification of the surrounding environment.
Abstract: In recent years, robotic combine harvesters have been developed globally to alleviate the shortage of agricultural labor. However, they are only allowed to work automatically under the supervision of a human operator, as they are unequipped with any object detection sensors for safety. To ensure a safe and efficient operation of a robotic combine harvester, a deep learning-based semantic segmentation method was applied for pixel-wise detection of various objects in paddy fields. The target objects to be detected by semantic segmentation included harvested rice areas, unharvested rice areas, lodging rice areas, humans, the header of the combine or field ridges. By using this technique, significant objects in the paddy field can be detected simultaneously, helping robotic combine harvester to identify their surrounding environment quickly and accurately.
Evangelos Spatharis, Christos Papaioannidis, Vasileios Mygdalis, I. Pitas
9 Jan 2026
TL;DR: This paper presents Unrealfire, a free, open-access pipeline for creating synthetic annotated wildfire image datasets using Unreal Engine, enabling diverse, high-quality training data for Deep Neural Network (DNN) training in Natural Disaster Management (NDM) scenarios.
Abstract: High-quality training data are essential for Deep Neural Network (DNN) training. In Natural Disaster Management (NDM) scenarios, annotated training data are needed to train DNN models, e.g., for wildfire detection/segmentation. However, image annotation in such scenarios is prone to annotation errors, mostly due to the unpredictable visual structure of the fire/smoke. To this end, photorealistic simulators hold substantial promise, since they allow the creation of synthetic wildfire images. Yet, existing assets depicting fires in simulator engines are typically inserted as particle objects. As a result, existing assets do not feature a set 3D mesh causing them to have no 2D projection, i.e., it is not trivial how to generate fire segmentation annotation maps. This paper presents a free, open-access1 pipeline for creating diverse synthetic annotated wildfire image datasets. More specifically, we developed a novel particle segmentation camera for the AirSim plugin, which enables the generation of segmentation maps of objects made of particles. We also integrate Procedural Content Generation tools (PCG) to gather unlimited amounts of diverse, high-quality annotated training data. To evaluate our framework, we generated a sample fire dataset called AUTH-Unreal-Wildfire (AUW) for wildfire segmentation. In our experiments we use a state-of-the-art segmentation DNN, namely PIDNet, and compare the our synthetic wildfire images to different real image datasets, along with their potential to augment real wildfire datasets.