Segmentation-Guided Dual-Backbone Fusion for Robust Early Cancer Detection in Heterogeneous Medical Imaging
Abstract
Early detection of lung, liver, and colorectal cancers is clinically decisive because outcomes are strongly stage-dependent, yet early malignant lesions remain intrinsically difficult to identify in routine imaging due to small size, low contrast, ambiguous boundaries, partial-volume effects, and motion artifacts. Although deep learning has achieved strong performance in medical image classification and segmentation, early-stage detection under heterogeneous acquisition conditions is still limited by two structural failure modes: discriminative cues can be diluted by dominant normal anatomy when models are not explicitly guided to relevant regions, and single-backbone feature extractors can amplify inductive bias and become brittle under scanner/protocol shifts. To address these challenges, this paper proposes a segmentation-guided dual-backbone fusion architecture for robust and auditable early cancer detection across CT, MRI, and X-ray modalities. A U-Net–based module generates a probabilistic region-of-interest mask that enables ROI-focused learning via soft gating or ROI cropping, ensuring that representation learning concentrates on clinically meaningful tissue even in “needle-in-a-haystack” conditions. The ROI-focused image is then processed in parallel by two complementary encoders, ResNet and EfficientNet, and their deep embeddings are fused using Deep Feature Concatenation to reduce reliance on any single network’s failure modes and improve robustness to domain shift. The system outputs both calibrated malignancy probabilities and interpretable ROI evidence, enabling structured error attribution that separates localization failures from downstream classification failures. A reproducible synthetic evaluation protocol is further provided to demonstrate reporting practice and deployment-aligned assessment, including overall performance, size-stratified early-lesion sensitivity, domain-shift robustness, segmentation reliability indicators, and calibration measures. The proposed framework thus delivers a practical architecture blueprint that integrates robustness, interpretability, and auditability as first-class objectives for early cancer detection in heterogeneous real-world imaging settings.
Keywords:
Early cancer detection, medical imaging, deep learning, U-Net, ResNet, EfficientNet, deep feature concatenation, feature fusionReferences
- [1] World Health Organization. (2025, February 3). Cancer. https://www.who.int/news-room/fact-sheets/detail/cancer
- [2] Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 71(3), 209–249. https://doi.org/10.3322/caac.21660
- [3] Zheng, R. S., Chen, R., Han, B. F., Wang, S. M., Li, L., Sun, K. X., … He, J. (2024). Cancer incidence and mortality in China, 2022. Zhonghua Zhong Liu Za Zhi (Chinese Journal of Oncology), 46, 221–231.
- [4] Hestetun, K. E., Rosenlund, N. B., Stanisavljević, L., Dahl, O., & Myklebust, M. P. (2022). Stage-dependent prognostic shift in mismatch repair-deficient tumors: Assessing patient outcomes in stage II and III colon cancer. Frontiers in Oncology, 12, 853545. https://doi.org/10.3389/fonc.2022.853545
- [5] Marrero, J. A., Kulik, L. M., Sirlin, C. B., Zhu, A. X., Finn, R. S., Abecassis, M. M., … Roberts, L. R. (2018). Diagnosis, staging, and management of hepatocellular carcinoma: 2018 practice guidance by the American Association for the Study of Liver Diseases. Hepatology, 68(2), 723–750. https://doi.org/10.1002/hep.29913
- [6] Fitzgerald, R. C., Antoniou, A. C., Fruk, L., & Rosenfeld, N. (2022). The future of early cancer detection. Nature Medicine, 28(4), 666–677. https://doi.org/10.1038/s41591-022-01746-x
- [7] Bushberg, J. T., Seibert, J. A., Leidholdt, E. M., Jr., & Boone, J. M. (2012). The essential physics of medical imaging (3rd ed.). Lippincott Williams & Wilkins.
- [8] Hansell, D. M., Bankier, A. A., MacMahon, H., McLoud, T. C., Müller, N. L., & Remy, J. (2008). Fleischner Society: Glossary of terms for thoracic imaging. Radiology, 246(3), 697–722. https://doi.org/10.1148/radiol.2462070712
- [9] Setio, A. A. A., Ciompi, F., Litjens, G., Gerke, P., Jacobs, C., van Riel, S. J., … van Ginneken, B. (2016). Pulmonary nodule detection in CT images: False positive reduction using multi-view convolutional networks. IEEE Transactions on Medical Imaging, 35(5), 1160–1169. https://doi.org/10.1109/TMI.2016.2536809
- [10] Mahesh, M. (2013). The Essential Physics of Medical Imaging, Third Edition. Journal of Nuclear Medicine, 54(8), 1444.
- [11] Zaitsev, M., Maclaren, J., & Herbst, M. (2015). Motion artifacts in MRI: A complex problem with many partial solutions. Journal of Magnetic Resonance Imaging, 42(4), 887–901. https://doi.org/10.1002/jmri.24850
- [12] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
- [13] Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., … Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005
- [14] Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. The New England Journal of Medicine, 380(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259
- [15] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 (pp. 234–241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28
- [16] Milletari, F., Navab, N., & Ahmadi, S.-A. (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV) (pp. 565–571). IEEE. https://doi.org/10.1109/3DV.2016.79
- [17] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90
- [18] Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML). (arXiv:1905.11946). https://doi.org/10.48550/arXiv.1905.11946
- [19] Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine, 15(11), e1002683. https://doi.org/10.1371/journal.pmed.1002683
- [20] Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML) (pp. 1321–1330). https://proceedings.mlr.press/v70/guo17a.html
- [21] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 618–626). https://doi.org/10.1109/ICCV.2017.74
- [22] Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056