Next Article in Journal
ROS1-Rearranged Lung Adenocarcinoma: From Molecular Genetics to Target Therapy
Previous Article in Journal
Peripheral Blood CD8+ T-Lymphocyte Subsets Are Associated with Prognosis in Prostate Cancer Patients
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Based Model Helps to Decide which Patients May Benefit from Pancreatoduodenectomy

1
Hepatobiliopancreatic and Transplantation Center, Hospital de Curry Cabral-CHULC, 1050-099 Lisbon, Portugal
2
Faculdade de Ciências Médicas, NOVA Medical School, Universidade NOVA de Lisboa, 1169-056 Lisbon, Portugal
3
CEDOC—Chronic Diseases Research Center, Nova Medical School, 1150-082 Lisbon, Portugal
4
Blood and Transplantation Center of Lisbon, Instituto Português do Sangue e da Transplantação, Alameda das Linhas de Torres, no. 117, 1769-001 Lisboa, Portugal
5
iNOVA4Health-Advancing Precision Medicine, RG11: Reno-Vascular Diseases Group, Faculdade de Ciências Médicas, NOVA Medical School, Universidade NOVA de Lisboa, 1169-056 Lisbon, Portugal
*
Authors to whom correspondence should be addressed.
Submission received: 10 June 2023 / Revised: 31 July 2023 / Accepted: 7 August 2023 / Published: 10 August 2023

Abstract

:
Pancreatic ductal adenocarcinoma is an invasive tumor with similar incidence and mortality rates. Pancreaticoduodenectomy has morbidity and mortality rates of up to 60% and 5%, respectively. The purpose of our study was to assess preoperative features contributing to unfavorable 1-year survival prognosis. Study Design: Retrospective, single-center study evaluating the impact of preoperative features on short-term survival outcomes in head PDAC patients. Forty-four prior features of 172 patients were tested using different supervised machine learning models. Patient records were randomly divided into training and validation sets (80–20%, respectively), and model performance was assessed by area under curve (AUC) and classification accuracy (CA). Additionally, 33 patients were included as an independent revalidation or holdout dataset group. Results: Eleven relevant features were identified: age, sex, Ca-19-9, jaundice, ERCP with biliary stent, neutrophils, lymphocytes, lymphocyte/neutrophil ratio, neoadjuvant treatment, imaging tumor size, and ASA. Tree regression (tree model) and logistic regression (LR) performed better than the other tested models. The tree model had an AUC = 0.92 and CA = 0.85. LR had an AUC = 0.74 and CA = 0.78, allowing the development of a nomogram based on absolute feature significance. The best performance model was the tree model which allows us to have a decision tree to help clinical decisions. Discussion and conclusions: Based only on preoperative data, it was possible to predict 1-year survival (91.5% vs. 78.1% alive and 70.9% vs. 76.6% deceased for the tree model and LR, respectively). These results contribute to informed decision-making in the selection of which patients with PDAC can benefit from pancreatoduodenectomy. A machine learning algorithm was developed for the recognition of unfavorable 1-year survival prognosis in patients with pancreatic ductal adenocarcinoma. This will contribute to the identification of patients who would benefit from pancreatoduodenectomy. In our cohort, the tree regression model had an AUC = 0.92 and CA = 0.85, whereas the logistic regression had an AUC = 0.74 and CA = 0.78. To further inform decision-making, a decision tree based on tree regression was developed.

1. Introduction

Pancreatic ductal adenocarcinoma (PDAC) is an aggressive cancer, known for its notorious difficulty in early detection and limited treatment options, which in turn contribute to its poor prognosis with similar incidence and mortality rates. It is the third leading cause of cancer-related deaths in the United States of America and the seventh leading cause worldwide, thus becoming an increasing global health burden [1]. These alarming statistics are projected to worsen, with an expected surge of 61.7% in the total number of cases globally by 2040 [2]. The main difference between PDAC and other cancers is the genomic heterogeneity of the tumors, which greatly complicates the identification of patient phenotypes that can predict better or worse prognosis [3]. This may explain the obstacles that prevent the identification of patient phenotypes predictive of better or worse prognosis. For instance, based on transcriptome analysis, the International Cancer Genome Consortium currently divides PDAC into three molecular subtypes: progenitor, squamous, and aberrantly differentiated endocrine exocrine types [4]. However, the decision-making process in clinical practice on multidisciplinary meeting for determining a patient’s suitability for surgery, which is often the only potentially curative treatment for PDAC, remains reliant on a traditional approach. This is predominantly grounded on clinical information such as laboratory testing and imaging, rather than on individual tumor genomic signatures. Despite the recent identification of novel and sensitive biomarkers from studies on non-coding RNAs, such as miRNAs, and from targeted or shotgun proteomic approaches, none of these promising biomarkers have been introduced into routine clinical practice [5].
Radical resection of the tumor, such as pancreaticoduodenectomy, is currently the only possible curative treatment for PDAC [6]. However, the cure does not come without its own set of challenges. Nonetheless, even after curative resection, up to 80% of patients experience disease relapse, resulting in a 5-year survival rate of only 20–30% [7]. The surgery itself is associated with a high morbidity rate of up to 60%, with mortality rates being less than 5%, which brings about a substantial impact on the patients’ quality of life and healthcare costs [8]. The 1-year disease-related mortality after resection for pancreatic cancer is approximately 30% [9]. Some authors even developed the concept of “conditional survival” (CS), which is the chance of surviving a certain period of time, calculated for a group of patients who have already survived a predefined period. This measure can give more accurate insights about survival prospects, which is especially useful for patients who have undergone a pancreatic cancer resection [10].
While the last few years have seen considerable effort dedicated towards identifying new biomarkers, improving therapies, and formulating better healthcare policies, several significant obstacles still remain. Among the most prominent ones are the diversity of disease phenotypes, the high cost of implementing novel methodologies, and the drive for more explainable, faithful, and high-performing models, particularly in oncology [11]. In healthcare, particularly in oncology, the demand for explainability, model fidelity, and performance has increased considerably, which leads to greater penetration of machine learning (ML) in medical management. ML is a collection of artificial intelligence approaches and mathematical modeling aimed at predicting individual outcomes. As a result, ML can potentially inform decision-making and lead to patient-tailored therapeutic approaches [12,13,14]. In recent years, ML has had a transformative impact on many medical areas. Its applications range from diagnostics and therapy to patient monitoring and health management. Some of the major impacts have been found in medical imaging and radiology, analyzing medical images such as X-rays, computerized tomography scans, magnetic resonance imaging, and mammograms to help detect and diagnose disease, whereby based on symptoms, medical history, and test results, ML algorithms can support disease diagnosis [15,16]. In solid organ transplantation, for instance ML has the potential to improve patient and allograft outcomes, being applied in organ matching and allocation [17], organ quality assessment [17], predictive analytics and risk assessment [18,19], and long-term outcomes and survival prediction [19,20,21]. Also, in the field of oncology, ML has made significant contributions by aiding in various aspects of cancer research, diagnosis, treatment, and prognosis. Some key areas where machine learning is being applied in oncology:
Cancer Diagnosis: analyzing medical imaging data, such as mammograms, CT scans, and MRIs, providing support to detect and classify tumors [22].
Prognosis and Survival Prediction: by utilizing clinical and genomic data to predict patient outcomes, such as survival rates and disease progression. By considering a wide range of variables or features, including patient characteristics, tumor characteristics, and treatment history, these models can provide personalized prognostic information [23].
Drug Discovery and Development: ML techniques being employed in the analysis of large-scale genomic and proteomic data to expedite the process of identifying new drug candidates and optimizing existing treatments [24].
Supporting Treatment Planning and Personalized Medicine: assisting in the implementation of personalized treatment plans by the integration of patient-specific data (genomic profiles and clinical records). This approach has enabled oncologists to make more informed decisions about treatment strategies, including selecting appropriate therapies and determining optimal dosages [25,26,27].
Precision Oncology: supporting the identification of genetic mutations or biomarkers associated with specific cancer types [28].
Clinical Trial Optimization: supporting the identification of suitable patient cohorts for clinical trials, potentially leading to more efficient and successful trial outcomes.
Image-Guided Surgery: providing real-time surgical images to assist surgeons during procedures [29].
Also, pancreatic cancer, like many other types of cancer, has not been an exception. In fact, despite the aggressive nature and often late-stage diagnosis of this disease that poses significant challenges for both diagnosis and treatment, in recent years there has been a growing interest in exploring the potential of ML algorithms in the context pancreatic cancer. Even so, researchers have been making significant advances by leveraging the power of ML and deep learning techniques, with the intent of early detection, diagnosis, prognosis, or even treatment planning,
Numerous studies have emerged, employing sophisticated algorithms to analyze complex datasets related to pancreatic cancer, combining large amounts of clinical data, genetic information, and imaging studies that enable the identification of patterns and estimates, and provide valuable insights into our understanding and management of this challenging disease.
Some studies have focused on early-stage identification or predicting the risk for pancreatic cancer, as presented by Placido et. al., applying ML models to 27,900 cases of pancreatic cancer from the Danish National Patient Registry and US Veterans Affairs dataset was able to predict cancer occurrence within incremental time windows, reaching an area under the receiver operating characteristic (AUROC) curve of 0.88 for disease identification up to 36 months before occurrence [30]. Also, Savareh et. al., based on circulating microRNAs associated with pancreatic cancer using Artificial Neural Network and Neighborhood Component Analysis identified five microRNAs (miR-663a, miR-1469, miR-92a-2-5p, miR-125b-1-3p, and miR-532-5p), with an accuracy, sensitivity, and specificity of 0.93, 0.93, and 0.92, respectively, identifying a promising non-invasive diagnostic model for pancreatic cancer [31]. Baek and Lee in 2020 by combining multi-omics data were able to forecast pancreatic cancer survival and recurrence, with an AUC of 0.795 [32]. ML models have also been used in drug response prediction and personalized medicine. Usually, these models are applied as a mean to analyze patient-specific data, including genomic profiles and treatment history, to predict individual responses to different treatment options, providing support to treatment decision making, and identifying patients who are likely to benefit from specific therapies or clinical trials. An example of this strategy was presented by Wei and Ramsey who were able to predict chemotherapy drug response for pancreatic cancer with AUC 0.77 [33].
As technology continues to advance, it is expected that ML will play an increasingly vital role in pancreatic cancer research, ultimately leading to improved diagnostic accuracy and treatment efficacy, as well as maximizing patient outcomes and ensuring the judicious use of aggressive treatments such as pancreatic resection [34].
The primary objective of our study was to comprehensively evaluate the preoperative features that could potentially influence an unfavorable 1-year survival prognosis in patients diagnosed with PDAC. To achieve this, we employed ML techniques to analyze a wide range of preoperative factors and their association with patient outcomes. By leveraging the power of ML algorithms, we aimed to identify key predictive features that could help in early prognostication and aid in the development of personalized treatment strategies for PDAC patients.

2. Materials and Methods

2.1. Study Population

This single-center, retrospective cohort study focused on patients who underwent pancreatoduodenectomy (PD) for pancreatic ductal adenocarcinoma (PDAC) at Hospital Curry Cabral from 2004 to 2016. The purpose was to assess how preoperative characteristics influence short-term survival outcomes (365 days post-operation). For inclusion in the training cohort, we considered 432 patients. Of these 432 patients, 172 were included in this study using the following criteria: patients from whom the authors had full access to the original patient records data; patients who were discharged without mortality from surgical complications; patients with at least a 5-year follow-up. Feature extraction was conducted from electronic patient records, and all data were manually validated by checking the medical records. A second population of patients who underwent PD surgery for PDAC during 2017 was included as an independent revalidation or holdout data-set group (n = 33).
This study was approved by the Ethics Committee of CHULC (INV-106).

2.2. Features and Data Analysis

A total of 44 clinical features were extracted (24 categorical and 20 numeric) from electronic patient records. Categorical features included sex, jaundice, neoadjuvant chemotherapy (yes/no), and AJCC stage. Numeric features included age, levels of serum cancer antigen 19-9 (Ca 19-9), lymphocyte counts, neutrophil counts, and lymphocyte to neutrophil ratio (ly/n ratio), among others presented in Table 1.
Machine learning algorithms (ML) and feature selection were performed in Orange 3 version 3.19.0 (Bioinformatics Lab, University of Ljubljana, Slovenia). Supervised ML algorithms were assessed by area under the receiver operating characteristics (AUC); classification accuracy (CA), F-1 score, and precision were also calculated.
AUC and CA are two of the available methods to measure the performance of a classification model. This AUC curve plot enters into account two parameters: True Positive Rate and False Positive Rate. Classification Accuracy is the proportion of correctly classified subjects. For cross-validation, patients were randomly divided into training and validation sets in an 80% to 20% ratio, respectively. An additional population of 33 patients were subsequently evaluated as a holdout dataset.
From the initial 44 features, and based on Info Gain Feature selection algorithm 11, features were identified which contributed to the strength of the models in identifying 1-year survival patients: age, sex, Ca-19-9, jaundice, endoscopic retrograde cholangiopancreatography (ERCP) with biliary stent, neutrophils, lymphocytes, ly/n ratio, neoadjuvant chemotherapy, imaging tumor size, and ASA. Multicollinearity was evaluated between features, and only ly/n ratio with lymphocytes counts presented a variance inflation factor above 1 (1.27 indicating very low to slightly moderate collinearity, however, well below the value of 5).
The target was defined as “Survivors” for those patients who had more than 1-year survival post PD surgery and “Non-survivors” for those who had less than 1-year survival. We decided on the 1-year cut-off because of the already high morbidity and mortality caused by the disease itself and considering the available therapeutic alternatives. All the patients that had a follow-up less than 1 year were patients that had deceased.

2.3. Algorithms

Several supervised (e.g., Decision Trees, Naive Bayes, Support Vector, Random Forest, Logistic Regression) and unsupervised (e.g., K-means Clustering, Principal Component Analysis, t-Distributed Stochastic Neighbor Embedding) ML algorithms were tested and all models were evaluated based on their AUC and CA. In the end, and based on their performance for AUC and CA, two different types of supervised classification models were used: tree model and logistic regression (LR).

2.3.1. Decision Tree

This is one of the most popular and interesting models in ML, which allows the splitting of data into nodes by class purity. The concept entails the creation of a tree split comprising decision nodes that are connected by branches from the root node to the leaf node. The attribute is assessed at the decision node and each outcome results in a branch. To generate a decision, each branch is led to another node or to the end node [12,35].

2.3.2. Logistic Regression

This is a statistical classification method that is used to predict the probability of a target variable based on the fitting of data to a logistic function. The target or output must assume a dichotomous nature; the target is expected to be success/yes or unsuccessful/no. LR is frequently one of the first go-to algorithms used in classification problems [12,36].

3. Results

The clinical, laboratory, and imaging variables of the training and validation with independent or holdout datasets are presented in Table 2. Only two features were significantly different between the training and holdout datasets: lymphocytes count and American Society of Anesthesiologist scores (ASA). This highlights the consistency of the data across the two sets. However, as part of an external validation of the model using an independent dataset or cohort, this validation with a completely separate dataset helps assess the model’s generalizability beyond the training and validation cohorts, providing a more robust evaluation of its performance.
Tree and logistic regression (LR) ML models were built and validated externally. The relative importance of the features was evaluated, as shown in Figure 1. Preoperative Lymphocytes to Neutrophils ratio (Ly/n), nodule size, age, preoperative neutrophils, preoperative lymphocytes, and neoadjuvant therapy were the features that most contributed, by this order, to the Area Under the Receiver Operating Characteristics (AUC) results obtained in the tree model (Figure 1a) with dominance of the ly/n ratio. In the case of the LR AUC, the ly/n ratio also emerged. In this model, Ca19-9 also contributed to the results and jaundice had a negative weight in the final results (Figure 2c). The feature weights were similar in the classification accuracy (CA) for the tree model, with the exceptions of age and preoperative neutrophils. In the case of LR, different features contributed with different strengths to the CA (Figure 2d).
Figure 2 shows a comparison of the ROC curves of the two models to predict death at 1-year after PD in the training cohort. The two models exhibited excellent performance in the training set with the tree model being the outperformer. The results were slightly better in the case of patients classified as “Survivors” (Figure 3a) versus patients classified as “Non-survivors” (Figure 3b).
Both models performed well, but the decision tree model performed better than the logistic regression model in predicting 1-year survival rates. Figure 3 presents the results of the AUC and CA in the training set for risk of death one year after PD: 0.92 with 0.85 for the tree model and 0.74 and 0.78 for LR, respectively. In the validation holdout dataset, the tree model still had better performance, with AUC and CA for risk of death one year after the PD of 0.89 and 0.76, whereas the LR showed 0.68 and 0.82, respectively.

4. Discussion

The inherent aggressive nature of pancreatic ductal adenocarcinoma (PDAC), coupled with the complexity of the associated surgical treatments, underlines the necessity of well-informed decision-making processes to ensure the best therapeutic outcomes for patients. Comprehensive prognostic models have an important role in the stratification of patients with PDAC and we focused on the variables that are examined during routine clinical examinations [37]. In our cohort, 28.5% of patients did not survive beyond one year. If the decision is to undergo surgery, it should be performed within 1 month after diagnosis. The criteria to consider are survival and quality of life of the patients. Hence, we created models (Curry-score) aided by ML that cross all available information and identify those patients to whom the surgery will grant survival benefits [38].
In our study, we used an integrated approach, leveraging clinical, blood test, and imaging features to predict patient mortality one-year post-surgery. In the non-survivors group, PD surgery would likely increase morbidity and possibly contribute to premature death with a lower quality of life.
Lymph node status (N-stage) and vascular invasion are classic factors of poor prognosis. In this study, we sought to identify any earlier relevant features contributing to the informed discussion of the cases in a multidisciplinary meeting [39,40,41].
Predicting relapse after an operative intervention is difficult. ML algorithms may help to identify at-risk patients with precocious recurrence that would not benefit from a treatment as aggressive as a PD. By identifying these high-risk patients, healthcare professionals can make better-informed decisions and possibly avoid unnecessary surgical interventions that may increase morbidity and compromise quality of life. This approach is particularly critical in multidisciplinary discussions where an understanding of a patient’s overall condition is necessary for effective treatment planning. In addition, these patients can alternatively be candidates for clinical trials. Currently, artificial intelligence research in healthcare is accelerating rapidly, with potential applications across almost every domain of medicine [42,43,44]. Unlike conventional regression-based approaches, ML algorithms are capable of capturing higher-order nonlinear interactions between predictors [45], and their effectiveness has been proven in predicting the recurrence of various diseases [46,47,48].
Some authors have established a nomogram to predict the probability of recurrence within 12 months after surgery in a single medical center [49]. We approached data from a different perspective, seeking to assess not only recurrence but also to predict recurrence in time. We intended to identify patients with a more aggressive biology, so that a therapeutic strategy other than surgery would be offered. Kim et al. [49] also focused on identifying preoperative clinicopathologic factors for predicting early recurrence after surgery. In their study, a Cox proportional hazard regression analysis showed an AUC of 0.665, which is worse than ours with LR. Interestingly, Ca19-9 and tumor size overlapped with our study.
The tree model showed superior predictive capability in validation compared to the LR (Figure 4), constructing a decision tree with the highest AUC in the training and validation sets. In the tree model, we achieved an AUC = 0.92 and CA = 0.85 for predictive validation. Of the patients with a ly/n ratio greater than 0.88, the probability of being alive at 1 year was 100%; in contrast, if the ratio was less than 0.88, only 68.2% were expected to be alive. In this branch, if the nodule was bigger than 36 mm, patients were split into 52.9% (deceased) vs. 75.0% (alive). In the case of survival, all patients who underwent neoadjuvant CT were alive at 1 year, while in the group without neoadjuvant CT, only 72.9% were alive at 1 year. In this last group, 93.3% of patients under 58 years of age were alive after surgery, while only 69.1% were alive if they were over 58 years of age. For example, a 70-year-old patient with a Ly/n ratio below 0.88, a 28 mm nodule, and no neoadjuvant chemotherapy, has a probability of 69.1% of being alive for one year after surgery.
We were able to predict 91.5% of the surviving patients one year after surgery. In our model, 29.1% of the patients were misclassified, as they were predicted as deceased but were alive one year after surgery. However, even if patients in this group were not operated on earlier, they could receive neoadjuvant treatment and be operated on later, as long as biological behavior remained favorable. As such, we do not envision any detrimental effects imposed by the model.
This study had some limitations. The reliability of the 52% challenges the clinical applicability of this model. The fact that the patients are segregated in approximately half on each arm may not be viewed as very informative regarding clinical decision support. However, this value refers to a subgroup of patients that is previously selected in the first step of the decision tree. As we continue to dichotomize the sample, we find that patients may still benefit from surgical treatment. Furthermore, this model predicts that 18 out of 34 patients have a discouraging prognosis and, eventually, become candidates to neoadjuvant chemotherapy. The prevailing outcome from this analysis is that this model is clinically applicable in all groups.
Given the retrospective nature of our study, selection bias cannot be excluded. Due to the low incidence of PDAC, the relatively limited sample size in the training and validation independent datasets may compromise the quantification of the interpatient variability effects. One of the benefits of Machine Leaning is the fact that it allows for the identification of relevant and/or weighted variables for the defined model even in small samples. In this case, we use “Info.gain” for the expected amount of information (reduction of entropy). This study, being single-centered, inevitably raises the question of generalizability. We believe that the future collection of larger and more balanced cohorts from multiple medical centers will further enhance the robustness and validity of the proposed models. As artificial intelligence continues to evolve and embed itself in healthcare, the collaborative relationship between physicians and these human-centered AI tools will significantly influence the clinical landscape, ultimately leading to improved patient outcomes. As mentioned earlier, this study was conducted in a single center with a relatively small sample size. This could limit the generalizability of the findings as they might not be representative of the larger population of patients with PDAC. This could potentially introduce selection bias, as patients in a single center might not capture the full range of variability seen in the general population. As with any retrospective study, there is potential for information bias, meaning that the data collected may not have been as rigorous or as accurate as it would be in a prospective study design. In addition, retrospective studies are inherently subject to confounding variables that may not be fully adjusted for, even with statistical methods. While our models consider several important factors, there might be other unmeasured or unobserved confounding variables that could influence the outcomes. Examples could be lifestyle factors, genetic predispositions, and other comorbidities that were not taken into account. While the predictive models (Tree and Logistic Regression) provided meaningful results, they are based on algorithms that have their own limitations. For instance, they may not perfectly capture the complex biological and pathological interactions in real-world scenarios. Although our study included a validation set, an external validation with an independent dataset from a different center or population would further strengthen the generalizability of our models. Machine learning models, especially those built on relatively small datasets, can be prone to overfitting. Overfitting happens when a model learns the training data too well, to the extent that it performs poorly on new, unseen data. While the models provided for survivors group predictive accuracy, the practical implementation of these models in clinical decision-making would require further study. This involves considering factors like clinician acceptance, integration into current systems, and the potential for these models to actually change patient outcomes. Recognizing these limitations can also provide a basis for future studies that aim to address them.
We anticipate that additional research will play a decisive role in the understanding of the complex and evolving relationship between physicians and human-centered artificial intelligence tools in a live clinical environment, ultimately leading to better outcomes for our patients.

5. Conclusions

In conclusion, the field of cancer research has in recent years made great strides towards improving diagnosis, prognosis, and treatment options. The emergence of multi-omics technologies, which involve analyzing multiple types of biological data, has provided researchers with a wealth of information that can be used to better understand the underlying mechanisms of cancer and develop more effective interventions. However, despite these advancements, pancreatic cancer prognosis still presents significant challenges.
In this context, we present an inexpensive alternative approach that combines routine perioperative laboratory tests and demographic data with a modern machine learning algorithm to improve the accuracy of pancreatic cancer prognosis. By using these readily available data sources, our approach is both cost-effective and easily implemented in clinical settings where resources may be limited.
Our study demonstrates that the use of ML algorithms can significantly improve the prediction of high-risk pancreatic adenocarcinoma patients’ 1-year survival. Moreover, the user-friendliness and low costs of ML software programs make them particularly relevant in resource-limited centers, where the adoption of new diagnostic and prognostic technologies may be more challenging.
Overall, our findings suggest that machine learning has the potential to become a powerful tool for improving cancer prognosis, especially in low-resource settings. While there is still much work to be done to refine and validate these approaches, we are optimistic about the future of machine learning in cancer research and clinical practice.

Author Contributions

E.V. and L.R. conceptualized the draft. E.V. and L.R. contributed equally to writing and reviewing of the original draft through interpretation of the literature. E.V. and L.R. performed the computations, analysis, and interpretation of the results. E.F., L.B., A.N., P.M., M.M., C.A., S.C., B.C., J.B., P.C. and J.G. contributed to data collection and data clearance. H.P.M. critically commented and edited the article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (This study was approved by the Ethics Committee of CHULC (INV-106).

Informed Consent Statement

Informed consent was obtained from all patients involved in this study before surgery.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
  2. Gupta, N.; Yelamanchi, R. Pancreatic adenocarcinoma: A review of recent paradigms and advances in epidemiology, clinical diagnosis and management. World J. Gastroenterol. 2021, 27, 3158. [Google Scholar] [PubMed]
  3. Ying, H.; Dey, P.; Yao, W.; Kimmelman, A.C.; Draetta, G.F.; Maitra, A.; DePinho, R.A. Genetics and biology of pancreatic ductal adenocarcinoma. Genes Dev. 2016, 30, 355–385. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Collisson, E.A.; Bailey, P.; Chang, D.K.; Biankin, A.V. Molecular subtypes of pancreatic cancer. Nat. Rev. Gastroenterol. Hepatol. 2019, 16, 207–220. [Google Scholar] [CrossRef]
  5. Turanli, B.; Yildirim, E.; Gulfidan, G.; Arğa, K.Y.; Sinha, R. Current State of “omics” biomarkers in pancreatic cancer. J. Pers. Med. 2021, 11, 127. [Google Scholar] [CrossRef]
  6. Nevala-Plagemann, C.; Hidalgo, M.; Garrido-Laguna, I. From state-of-the-art treatments to novel therapies for advanced-stage pancreatic cancer. Nat. Rev. Clin. Oncol. 2020, 17, 108–123. [Google Scholar] [CrossRef]
  7. Ferrone, C.R.; Pieretti-Vanmarcke, R.; Bloom, J.; Zheng, H.; Szymonifka, J.; Wargo, J.A.; Thayer, S.P.; Lauwers, G.Y.; Deshpande, V.; Mino-Kenudson, M.; et al. Pancreatic ductal adenocarcinoma: Long-term survival does not equal cure. Surgery 2012, 152, S43–S49. [Google Scholar] [CrossRef] [Green Version]
  8. Sánchez-Velázquez, P.; Muller, X.; Malleo, G.; Park, J.-S.; Hwang, H.-K.; Napoli, N.; Javed, A.A.; Inoue, Y.; Beghdadi, N.; Kalisvaart, M.; et al. Benchmarks in pancreatic surgery: A novel tool for unbiased outcome comparisons. Ann. Surg. 2019, 270, 211–218. [Google Scholar] [CrossRef]
  9. Barugola, G.; Partelli, S.; Marcucci, S.; Sartori, N.; Capelli, P.; Bassi, C.; Pederzoli, P.; Falconi, M. Resectable pancreatic cancer: Who really benefits from resection? Ann. Surg. Oncol. 2009, 16, 3316–3322. [Google Scholar] [CrossRef]
  10. Latenstein, A.E.; van Roessel, S.; van der Geest, L.G.; Bonsing, B.A.; Dejong, C.H.C.; Koerkamp, B.G.; de Hingh, I.H.J.T.; Homs, M.Y.V.; Klaase, J.M.; Lemmens, V.; et al. Conditional survival after resection for pancreatic cancer: A population-based study and prediction model. Ann. Surg. Oncol. 2020, 27, 2516–2524. [Google Scholar]
  11. Das, T.; Andrieux, G.; Ahmed, M.; Chakraborty, S. Integration of online omics-data resources for cancer research. Front. Genet. 2020, 11, 578345. [Google Scholar] [CrossRef]
  12. Sidey-Gibbons, J.A.; Sidey-Gibbons, C.J. Machine learning in medicine: A practical introduction. BMC Med. Res. Methodol. 2019, 19, 64. [Google Scholar] [CrossRef] [Green Version]
  13. El Naqa, I.; Murphy, M.J. What is machine learning? In Machine Learning in Radiation Oncology; Springer: Berlin/Heidelberg, Germany, 2015; pp. 3–11. [Google Scholar]
  14. Rashidi, H.H.; Tran, N.K.; Betts, E.V.; Howell, L.P.; Green, R. Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Acad. Pathol. 2019, 6, 2374289519873088. [Google Scholar] [CrossRef]
  15. Noguerol, T.M.; Paulano-Godino, F.; Martín-Valdivia, M.T.; Menias, C.O.; Luna, A. Strengths, weaknesses, opportunities, and threats analysis of artificial intelligence and machine learning applications in radiology. J. Am. Coll. Radiol. 2019, 16, 1239–1247. [Google Scholar] [CrossRef]
  16. Fritz, B.; Yi, P.H.; Kijowski, R.; Fritz, J. Radiomics and deep learning for disease detection in musculoskeletal radiology: An overview of novel MRI-and CT-based approaches. Investig. Radiol. 2023, 58, 3–13. [Google Scholar] [CrossRef]
  17. Börner, N.; Schoenberg, M.B.; Pöschke, P.; Heiliger, C.; Jacob, S.; Koch, D.; Pöllmann, B.; Drefs, M.; Koliogiannis, D.; Böhm, C.; et al. A Novel Deep Learning Model as a Donor–Recipient Matching Tool to Predict Survival after Liver Transplantation. J. Clin. Med. 2022, 11, 6422. [Google Scholar] [CrossRef]
  18. Vigia, E.; Ramalhete, L.; Ribeiro, R.; Barros, I.; Chumbinho, B.; Filipe, E.; Pena, A.; Bicho, L.; Nobre, A.; Carrelha, S.; et al. Pancreas Rejection in the Artificial Intelligence Era: New Tool for Signal Patients at Risk. J. Pers. Med. 2023, 13, 1071. [Google Scholar] [CrossRef]
  19. Ayers, B.; Sandholm, T.; Gosev, I.; Prasad, S.; Kilic, A. Using machine learning to improve survival prediction after heart transplantation. J. Card. Surg. 2021, 36, 4113–4120. [Google Scholar] [CrossRef]
  20. Senanayake, S.; White, N.; Graves, N.; Healy, H.; Baboolal, K.; Kularatna, S. Machine learning in predicting graft failure following kidney transplantation: A systematic review of published predictive models. Int. J. Med. Inform. 2019, 130, 103957. [Google Scholar] [CrossRef]
  21. Vigia, E.; Ramalhete, L.; Ribeiro, R.; Barros, I.; Chumbinho, B.; Filipe, E. Predicting Function Delay with a Machine Learning Model Improve the Long-term Survival of Pancreatic Grafts. Pancreat. Disord. Ther. 2022, 12, 231. [Google Scholar]
  22. Koh, D.-M.; Papanikolaou, N.; Bick, U.; Illing, R.; Kahn, C.E., Jr.; Kalpathi-Cramer, J.; Matos, C.; Martí-Bonmatí, L.; Miles, A.; Mun, S.K.; et al. Artificial intelligence and machine learning in cancer imaging. Commun. Med. 2022, 2, 133. [Google Scholar] [CrossRef] [PubMed]
  23. Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Patel, L.; Shukla, T.; Huang, X.; Ussery, D.W.; Wang, S. Machine learning methods in drug discovery. Molecules 2020, 25, 5277. [Google Scholar] [CrossRef] [PubMed]
  25. Kandalan, R.N.; Nguyen, D.; Rezaeian, N.H.; Barragán-Montero, A.M.; Breedveld, S.; Namuduri, K.; Jiang, S.; Lin, M.-H. Dose prediction with deep learning for prostate cancer radiation therapy: Model adaptation to different treatment planning practices. Radiother. Oncol. 2020, 153, 228–235. [Google Scholar] [CrossRef] [PubMed]
  26. Huang, C.; Clayton, E.; Matyunina, L.; McDonald, L.; Benigno, B.; Vannberg, F. Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy. Sci. Rep. 2018, 8, 16444. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Rafique, R.; Islam, S.R.; Kazi, J.U. Machine learning in the prediction of cancer therapy. Comput. Struct. Biotechnol. J. 2021, 19, 4003–4017. [Google Scholar] [CrossRef]
  28. Zhang, W.; Chien, J.; Yong, J.; Kuang, R. Network-based machine learning and graph theory algorithms for precision oncology. NPJ Precis. Oncol. 2017, 1, 25. [Google Scholar] [CrossRef] [Green Version]
  29. Knospe, L.; Gockel, I.; Jansen-Winkeln, B.; Thieme, R.; Niebisch, S.; Moulla, Y.; Stelzner, S.; Lyros, O.; Diana, M.; Marescaux, J.; et al. New intraoperative imaging tools and image-guided surgery in gastric cancer surgery. Diagnostics 2022, 12, 507. [Google Scholar] [CrossRef]
  30. Placido, D.; Yuan, B.; Hjaltelin, J.X.; Zheng, C.; Haue, A.D.; Chmura, P.J.; Yuan, C.; Kim, J.; Umeton, R.; Antell, G.; et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat. Med. 2023, 29, 1113–1122. [Google Scholar] [CrossRef]
  31. Savareh, B.A.; Aghdaie, H.A.; Behmanesh, A.; Bashiri, A.; Sadeghi, A.; Zali, M.; Shams, R. A machine learning approach identified a diagnostic model for pancreatic cancer through using circulating microRNA signatures. Pancreatology 2020, 20, 1195–1204. [Google Scholar] [CrossRef]
  32. Baek, B.; Lee, H. Prediction of survival and recurrence in patients with pancreatic cancer by integrating multi-omics data. Sci. Rep. 2020, 10, 18951. [Google Scholar] [CrossRef]
  33. Wei, Q.; Ramsey, S.A. Predicting chemotherapy response using a variational autoencoder approach. BMC Bioinform. 2021, 22, 453. [Google Scholar]
  34. Pfob, A.; Mehrara, B.J.; Nelson, J.A.; Wilkins, E.G.; Pusic, A.L.; Sidey-Gibbons, C. Towards patient-centered decision-making in breast cancer surgery: Machine learning to predict individual patient-reported outcomes at 1-year follow-up. Ann. Surg. 2023, 277, e144–e152. [Google Scholar]
  35. Kingsford, C.; Salzberg, S.L. What are decision trees? Nat. Biotechnol. 2008, 26, 1011–1013. [Google Scholar] [CrossRef]
  36. Shipe, M.E.; Deppen, S.A.; Farjah, F.; Grogan, E.L. Developing prediction models for clinical use using logistic regression: An overview. J. Thorac. Dis. 2019, 11 (Suppl. 4), S574. [Google Scholar] [CrossRef]
  37. Lee, W.; Park, H.J.; Lee, H.-J.; Jun, E.; Song, K.B.; Hwang, D.W.; Lee, J.H.; Lim, K.; Kim, N.; Lee, S.S.; et al. Preoperative data-based deep learning model for predicting postoperative survival in pancreatic cancer patients. Int. J. Surg. 2022, 105, 106851. [Google Scholar] [CrossRef]
  38. Sala Elarre, P.; Oyaga-Iriarte, E.; Yu, K.H.; Baudin, V.; Moreno, L.A.; Carranza, O.; Ortega, A.C.; Ponz-Sarvise, M.; Sosa, L.D.M.; Sastre, F.R.; et al. Use of machine-learning algorithms in intensified preoperative therapy of pancreatic cancer to predict individual risk of relapse. Cancers 2019, 11, 606. [Google Scholar] [CrossRef] [Green Version]
  39. Li, X.; Yang, L.; Yuan, Z.; Lou, J.; Fan, Y.; Shi, A.; Huang, J.; Zhao, M.; Wu, Y. Multi-institutional development and external validation of machine learning-based models to predict relapse risk of pancreatic ductal adenocarcinoma after radical resection. J. Transl. Med. 2021, 19, 281. [Google Scholar] [CrossRef]
  40. He, C.; Mao, Y.; Wang, J.; Duan, F.; Lin, X.; Li, S. Nomograms predict long-term survival for patients with periampullary adenocarcinoma after pancreatoduodenectomy. BMC Cancer 2018, 18, 327. [Google Scholar] [CrossRef]
  41. de Castro, S.; Biere, S.; Lagarde, S.; Busch, O.; Van Gulik, T.; Gouma, D. Validation of a nomogram for predicting survival after resection for adenocarcinoma of the pancreas. J. Br. Surg. 2009, 96, 417–423. [Google Scholar] [CrossRef]
  42. Kelly, C.J.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019, 17, 195. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
  44. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef] [PubMed]
  45. Zeevi, D.; Korem, T.; Zmora, N.; Israeli, D.; Rothschild, D.; Weinberger, A.; Ben-Yacov, O.; Lador, D.; Avnit-Sagi, T.; Lotan-Pompan, M.; et al. Personalized nutrition by prediction of glycemic responses. Cell 2015, 163, 1079–1094. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Liang, J.-D.; Ping, X.-O.; Tseng, Y.-J.; Huang, G.-T.; Lai, F.; Yang, P.-M. Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods. Comput. Methods Programs Biomed. 2014, 117, 425–434. [Google Scholar] [CrossRef]
  47. Tseng, C.-J.; Lu, C.-J.; Chang, C.-C.; Chen, G.-D. Application of machine learning to predict the recurrence-proneness for cervical cancer. Neural Comput. Appl. 2014, 24, 1311–1316. [Google Scholar] [CrossRef]
  48. Ahmad, L.G.; Eshlaghy, A.; Poorebrahimi, A.; Ebrahimi, M.; Razavi, A. Using three machine learning techniques for predicting breast cancer recurrence. J. Health Med. Inf. 2013, 4, 3. [Google Scholar]
  49. Shin, S.; Han, I.; Heo, J.; Choi, D. Predictive nomogram for early recurrence after pancreatectomy in resectable pancreatic cancer: Risk classification using preoperative clinicopathologic factors. HPB 2021, 23, S231. [Google Scholar] [CrossRef]
Figure 1. Feature importance bar plot of the most important features for overall model performance for AUC and CA. As determined by the tree model (a) AUC; (b) CA, and by the logistic regression model (c) AUC; (d) CA.
Figure 1. Feature importance bar plot of the most important features for overall model performance for AUC and CA. As determined by the tree model (a) AUC; (b) CA, and by the logistic regression model (c) AUC; (d) CA.
Onco 03 00013 g001
Figure 2. Receiver operating characteristic curves (AUCs) for predicting 1-year survival of patients with resected PDAC in the training set for tree model (in green) and logistic regression (in orange). (a) Classification as survivors patients; (b) Classification as non-survivors patients.
Figure 2. Receiver operating characteristic curves (AUCs) for predicting 1-year survival of patients with resected PDAC in the training set for tree model (in green) and logistic regression (in orange). (a) Classification as survivors patients; (b) Classification as non-survivors patients.
Onco 03 00013 g002
Figure 3. Machine Leaning models performance and confusion matrix. Receiver operating characteristic curves (AUCs) and classification accuracy (CA).
Figure 3. Machine Leaning models performance and confusion matrix. Receiver operating characteristic curves (AUCs) and classification accuracy (CA).
Onco 03 00013 g003
Figure 4. Models Tree and Logistic Regression, based only on preoperative clinical parameters to predict 1-year survival post Pancreaticoduodenectomy. (a) Tree Viewer plot; (b) Nomogram plot based on Logistic Regression model.
Figure 4. Models Tree and Logistic Regression, based only on preoperative clinical parameters to predict 1-year survival post Pancreaticoduodenectomy. (a) Tree Viewer plot; (b) Nomogram plot based on Logistic Regression model.
Onco 03 00013 g004
Table 1. Demographic, clinical, and biomarker features for pancreatic cancer patient survival analyzed.
Table 1. Demographic, clinical, and biomarker features for pancreatic cancer patient survival analyzed.
Numerical FeaturesCategorical Features
Age at the time of SurgerySex
Ca 19.9 at diagnosis (U/mL)Pre-operative cholangitis (Yes/No)
CEA at diagnosis (ng/mL)Pre-operative biliary drainage (Yes/No)
Total proteins at diagnosis (g/dL)Neoadjuvant therapy (Yes/No)
Albumin at diagnosis (g/dL)AJCC-stage (8th edition)
Total bilirubin at diagnosis/Jaundice (mg/dL)Histological grade
Lymphocyte countASA score
Neutrophil countType of Surgery
Number of total ICU (days)Venous vascular resection (Yes/No)
Image tumor size (mm)Arterial vascular resection (Yes/No)
Weight (kg)Pancreas Consistency (Soft/Firm)
Height (cm)Wirsung Localization (Excentric/Concentric)
Number of excised lymph nodesSealant (Yes/No)
Number of metastasized lymph nodesHemorrhagic complication (Yes/No)
Post-op ho-spitalization daysRespiratory infection (Yes/No)
Date of recurrenceDegree of gastric stasis (A, B, C)
Date of deathSurgical re-intervention (Yes/No)
CEA in EUS aspirate (ng/mL)ICU readmission (Yes/No)
Amylase in EUS aspirate (U/L)Re-hospitalization (Yes/No)
Lymphocyte to Neutrophil ratio Diagnosis
Recurrence (Yes/No)
Status (Deceased/Alive)
Degree of the Pancreatic fistula
Clavien
Table 2. Demographic and clinical characteristics of patients in the training set and holdout dataset. (SD Standard deviation, ASA American Society of Anesthesiologist).
Table 2. Demographic and clinical characteristics of patients in the training set and holdout dataset. (SD Standard deviation, ASA American Society of Anesthesiologist).
FeaturesTraining
(n = 172)
Holdout Dataset or Validation Set
(n = 33)
p Value
Age (years)Median; SD66.38; 9.9566.52; 10.40.9451
Gender Female (%)
Male (%)
48
52
39
61
0.3525
Ca19-9 (U/mL)Median; SD985; 37702273; 78810.1718
Jaundice %80900.1455
Neutrophils (*109/L)
Pre-operative
Median; SD4631; 22034514; 20820.7786
Lymphocytes (*109/L)
Pre-operative
Median; SD1889; 777.61540; 752.40.0187
Lymphocytes/ Neutrophils
Pre-operative
Median; SD0.559; 0.6820.379; 0.220.1338
Neoadjuvant ChemotherapyYes (%)
No (%)
11.6
88.4
9.2
90.8
0.6741
Nodule size (mm)Median; SD30.83; 11.7830.81; 14.360.9959
ASAI (%)8.20.00.0001
II (%)65.322.2
III (%)25.577.8
IV (%)1.00.0
Sealant (Epiploplasty)%69.775.70.8753
Target
(Survivors or Non-survivors)
%71.572.70.8878
*—multiplication.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vigia, E.; Ramalhete, L.; Filipe, E.; Bicho, L.; Nobre, A.; Mira, P.; Macedo, M.; Aguiar, C.; Corado, S.; Chumbinho, B.; et al. Machine Learning-Based Model Helps to Decide which Patients May Benefit from Pancreatoduodenectomy. Onco 2023, 3, 175-188. https://0-doi-org.brum.beds.ac.uk/10.3390/onco3030013

AMA Style

Vigia E, Ramalhete L, Filipe E, Bicho L, Nobre A, Mira P, Macedo M, Aguiar C, Corado S, Chumbinho B, et al. Machine Learning-Based Model Helps to Decide which Patients May Benefit from Pancreatoduodenectomy. Onco. 2023; 3(3):175-188. https://0-doi-org.brum.beds.ac.uk/10.3390/onco3030013

Chicago/Turabian Style

Vigia, Emanuel, Luís Ramalhete, Edite Filipe, Luís Bicho, Ana Nobre, Paulo Mira, Maria Macedo, Catarina Aguiar, Sofia Corado, Beatriz Chumbinho, and et al. 2023. "Machine Learning-Based Model Helps to Decide which Patients May Benefit from Pancreatoduodenectomy" Onco 3, no. 3: 175-188. https://0-doi-org.brum.beds.ac.uk/10.3390/onco3030013

Article Metrics

Back to TopTop