Regression Algorithms in Predicting the SARS-CoV-2 Replicase Polyprotein 1ab Inhibitor: A Comparative Study

Daniel Febrian Sengkey; Angelina Masengi

doi:10.35882/jeeemi.v6i1.338

Daniel Febrian Sengkey Department of Electrical Engineering, Faculty of Engineering, Universitas Sam Ratulangi http://orcid.org/0000-0003-0021-8803
Angelina Stevany Regina Masengi Department of Pharmacology and Therapy, Faculty of Medicine, Universitas Sam Ratulangi, Jl. Kampus Unsrat, Bahu, Manado 95115, INDONESIA https://orcid.org/0000-0002-9760-0103

DOI: https://doi.org/10.35882/jeeemi.v6i1.338

Keywords: SARS-COV-2 replicase polyprotein 1ab, drug discovery, regression algorithms, IC50 prediction

Abstract

Due to its extensive steps and trials, drug discovery is a long and expensive process. In the last decade, as also hard pressed by the COVID-19 pandemic, the screening process could be assisted with the advancement in computational technology including the application of Machine Learning. The classification task in Machine Learning has become one of the major approaches for drug discovery. Unfortunately, this practice uses discretized labels that might lead to the loss of quantitative properties that could be meaningful. Therefore, in this paper, we aim to compare various Machine Learning regression algorithms in predicting inhibitory bioactivity, specifically the IC₅₀ value, with the SARS-CoV-2 Replicase Polyprotein 1ab as the target. With 1,138 non-duplicated data downloaded from the ChEMBL database that was engineered into four dataset variances, 42 regression algorithms were utilized for the prediction. We found that there are computational challenges to the use of regression algorithms in predicting bioactivity, for only a handful and a specific dataset variance that returned valid performance parameters upon testing. The three that yielded the highest counts of valid performance parameters are the Histogram Gradient Boosting Regressor (HGBR), Light Gradient Boosting Machine Regressor (LGBR), and Random Forest Regression (RFR). Further statistical analyses show that there is no significant difference between these three algorithms, except for the time taken for training and testing the model, where the LGBR excels. Therefore, these three algorithms should be primarily considered for the study with the same nature.

Downloads

Download data is not yet available.

References

M. Davies et al., “ChEMBL web services: streamlining access to drug discovery data and utilities,” Nucleic Acids Res., vol. 43, no. W1, pp. W612–W620, Jul. 2015, doi: 10.1093/nar/gkv352.

A. A. Malik, C. Phanus-umporn, N. Schaduangrat, W. Shoombuatong, C. Isarankura-Na-Ayudhya, and C. Nantasenamat, “HCVpred: A web server for predicting the bioactivity of hepatitis C virus NS5B inhibitors,” J. Comput. Chem., vol. 41, no. 20, pp. 1820–1834, Jul. 2020, doi: 10.1002/JCC.26223.

N. Ferdous et al., “Mpropred: A machine learning (ML) driven Web-App for bioactivity prediction of SARS-CoV-2 main protease (Mpro) antagonists,” PLoS One, vol. 18, no. 6 June, pp. 1–21, 2023, doi: 10.1371/journal.pone.0287179.

T. Lerksuthirat, S. Chitphuk, W. Stitchantrakul, D. Dejsuphong, A. A. Malik, and C. Nantasenamat, “Parp1Pred: a Web Server for Screening the Bioactivity of Inhibitors Against Dna Repair Enzyme Parp-1,” EXCLI J., vol. 22, pp. 84–107, 2023, doi: 10.17179/excli2022-5602.

N. S. Ramadhanti, W. A. Kusuma, I. Batubara, and R. Heryanto, “Random Forest to Predict Eucalyptus as a Potential Herb in Preventing Covid19,” in 2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Oct. 2021, pp. 01–05, doi: 10.1109/CIBCB49929.2021.9562940.

V. Svetnik, A. Liaw, C. Tong, J. Christopher Culberson, R. P. Sheridan, and B. P. Feuston, “Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling,” J. Chem. Inf. Comput. Sci., vol. 43, no. 6, pp. 1947–1958, Nov. 2003, doi: 10.1021/CI034160G/SUPPL_FILE/CI034160GSI20031008_041202.ZIP.

R. Rodríguez-Pérez and J. Bajorath, “Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery,” J. Comput. Aided. Mol. Des., vol. 36, no. 5, pp. 355–362, May 2022, doi: 10.1007/S10822-022-00442-9/FIGURES/5.

A. Mullard, “Biotech R&D spend jumps by more than 15,” Nat. Rev. Drug Discov., vol. 15, no. 7, p. 447, 2016, doi: 10.1038/nrd.2016.135.

D. E. Salazar and G. Gormley, Modern Drug Discovery and Development, vol. 26. Elsevier Inc., 2017.

S. Mishra, “Artificial Intelligence: A Review of Progress and Prospects in Medicine and Healthcare,” J. Electron. Electromed. Eng. Med. Informatics, vol. 4, no. 1, pp. 1–23, 2022, doi: 10.35882/jeeemi.v4i1.1.

D. Cucinotta and M. Vanelli, “WHO declares COVID-19 a pandemic,” Acta Biomedica, vol. 91, no. 1. Mattioli 1885, pp. 157–160, 2020, doi: 10.23750/abm.v91i1.9397.

Trie Maya Kadarina and R. Priambodo, “Performance Evaluation of IoT-based SpO2 Monitoring Systems for COVID-19 Patients,” J. Electron. Electromed. Eng. Med. Informatics, vol. 3, no. 2, pp. 64–71, Jul. 2021, doi: 10.35882/JEEEMI.V3I2.1.

G. Li and E. De Clercq, “Therapeutic options for the 2019 novel coronavirus (2019-nCoV),” Nat. Rev. Drug Discov., vol. 19, no. 3, pp. 149–150, 2020, doi: 10.1038/d41573-020-00016-0.

V. T. Sabe et al., “Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: A review,” Eur. J. Med. Chem., vol. 224, p. 113705, Nov. 2021, doi: 10.1016/j.ejmech.2021.113705.

F. Yang et al., “Machine Learning Applications in Drug Repurposing,” Interdiscip. Sci. – Comput. Life Sci., vol. 14, no. 1, pp. 15–21, 2022, doi: 10.1007/s12539-021-00487-8.

F. E. Agamah et al., “Computational/in silico methods in drug target and lead prediction,” Brief. Bioinform., vol. 21, no. 5, pp. 1663–1675, Sep. 2020, doi: 10.1093/BIB/BBZ103.

P. Subhaswaraj and B. Siddhardha, “Molecular docking and molecular dynamic simulation approaches for drug development and repurposing of drugs for severe acute respiratory syndrome-Coronavirus-2,” Comput. Approaches Nov. Ther. Diagnostic Des. to Mitigate SARS-CoV2 Infect. Revolut. Strateg. to Combat Pandemics, pp. 207–246, Jan. 2022, doi: 10.1016/B978-0-323-91172-6.00007-8.

A. S. Omrani et al., “Ribavirin and interferon alfa-2a for severe Middle East respiratory syndrome coronavirus infection: a retrospective cohort study,” Lancet. Infect. Dis., vol. 14, no. 11, pp. 1090–1095, Nov. 2014, doi: 10.1016/S1473-3099(14)70920-X.

W. Yan, Y. Zheng, X. Zeng, B. He, and W. Cheng, “Structural biology of SARS-CoV-2: open the door for novel therapies,” Signal Transduct. Target. Ther. 2022 71, vol. 7, no. 1, pp. 1–28, Jan. 2022, doi: 10.1038/s41392-022-00884-5.

V. Mody et al., “Identification of 3-chymotrypsin like protease (3CLPro) inhibitors as potential anti-SARS-CoV-2 agents,” Commun. Biol. 2021 41, vol. 4, no. 1, pp. 1–10, Jan. 2021, doi: 10.1038/s42003-020-01577-x.

D. Shaji, S. Yamamoto, R. Saito, R. Suzuki, S. Nakamura, and N. Kurita, “Proposal of novel natural inhibitors of severe acute respiratory syndrome coronavirus 2 main protease: Molecular docking and ab initio fragment molecular orbital calculations,” Biophys. Chem., vol. 275, Aug. 2021, doi: 10.1016/j.bpc.2021.106608.

R. Alexpandi, J. F. De Mesquita, S. K. Pandian, and A. V. Ravi, “Quinolines-Based SARS-CoV-2 3CLpro and RdRp Inhibitors and Spike-RBD-ACE2 Inhibitor for Drug-Repurposing Against COVID-19: An in silico Analysis,” Front. Microbiol., vol. 11, Jul. 2020, doi: 10.3389/fmicb.2020.01796.

L. Erlina et al., “Virtual screening of Indonesian herbal compounds as COVID-19 supportive therapy: machine learning and pharmacophore modeling approaches,” BMC Complement. Med. Ther., vol. 22, no. 1, p. 207, Dec. 2022, doi: 10.1186/s12906-022-03686-y.

A. Kumar, S. Loharch, S. Kumar, R. P. Ringe, and R. Parkesh, “Exploiting cheminformatic and machine learning to navigate the available chemical space of potential small molecule inhibitors of SARS-CoV-2,” Comput. Struct. Biotechnol. J., vol. 19, pp. 424–438, 2021, doi: 10.1016/j.csbj.2020.12.028.

K. Mohamed, N. Yazdanpanah, A. Saghazadeh, and N. Rezaei, “Computational drug discovery and repurposing for the treatment of COVID-19: A systematic review,” Bioorg. Chem., vol. 106, p. 104490, Jan. 2021, doi: 10.1016/J.BIOORG.2020.104490.

X. Huang, R. Pearce, G. S. Omenn, and Y. Zhang, “Identification of 13 Guanidinobenzoyl- or Aminidinobenzoyl-Containing Drugs to Potentially Inhibit TMPRSS2 for COVID-19 Treatment,” Int. J. Mol. Sci., vol. 22, no. 13, p. 7060, Jun. 2021, doi: 10.3390/ijms22137060.

M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science (80-. )., vol. 349, no. 6245, pp. 255–260, Jul. 2015, doi: 10.1126/science.aaa8415.

S. Dara, S. Dhamercherla, S. S. Jadav, C. M. Babu, and M. J. Ahsan, “Machine Learning in Drug Discovery: A Review,” Artif. Intell. Rev. 2021 553, vol. 55, no. 3, pp. 1947–1999, Aug. 2021, doi: 10.1007/S10462-021-10058-4.

F. Sulistiawan, W. A. Kusuma, N. S. Ramadhanti, and A. Tedjo, “Drug-Target Interaction Prediction in Coronavirus Disease 2019 Case Using Deep Semi-Supervised Learning Model,” in 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Oct. 2020, pp. 83–88, doi: 10.1109/ICACSIS51025.2020.9263241.

S. Aini, W. A. Kusuma, M. K. D. Hardhienata, and Mushthofa, “Network-Based Molecular Features Selection to Predict the Drug Synergy in Cancer Cells,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 3, pp. 168–176, 2023, doi: 10.35882/jeemi.v5i3.307.

T. Yu et al., “Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning,” Molecules, vol. 28, no. 4, pp. 1–23, 2023, doi: 10.3390/molecules28041679.

A. Fadli et al., “Screening of Potential Indonesia Herbal Compounds Based on Multi-Label Classification for 2019 Coronavirus Disease,” Big Data Cogn. Comput. 2021, Vol. 5, Page 75, vol. 5, no. 4, p. 75, Dec. 2021, doi: 10.3390/BDCC5040075.

G. W. Caldwell, Z. Yan, W. Lang, and J. A. Masucci, “The IC50 Concept Revisited,” Curr. Top. Med. Chem., vol. 12, no. 11, pp. 1282–1290, May 2012, doi: 10.2174/156802612800672844.

C. Bennette and A. Vickers, “Against quantiles: Categorization of continuous variables in epidemiologic research, and its discontents,” BMC Med. Res. Methodol., vol. 12, no. 1, pp. 1–5, Feb. 2012, doi: 10.1186/1471-2288-12-21/FIGURES/3.

D. F. Sengkey, A. Jacobus, and F. J. Manoppo, “Effects of kernels and the proportion of training data on the accuracy of SVM sentiment analysis in lecturer evaluation,” IAES Int. J. Artif. Intell., vol. 9, no. 4, Dec. 2020, doi: 10.11591/IJAI.V9.I4.PP%P.

“RDKit: Open-source cheminformatics.” https://www.rdkit.org/ (accessed Oct. 25, 2023).

“rdkit/rdkit: 2023_03_1 (Q1 2023) Release,” doi: 10.5281/ZENODO.7880616.

C. W. Yap, “PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints,” J. Comput. Chem., vol. 32, no. 7, pp. 1466–1474, May 2011, doi: 10.1002/JCC.21707.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, no. 85, pp. 2825–2830, 2011, Accessed: Oct. 26, 2023. [Online]. Available: http://jmlr.org/papers/v12/pedregosa11a.html.

S. R. Pandala, “shankarpandala/lazypredict: Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning.” https://github.com/shankarpandala/lazypredict (accessed Oct. 26, 2023).

A. Golbraikh, E. Muratov, D. Fourches, and A. Tropsha, “Data Set Modelability by QSAR,” J. Chem. Inf. Model., vol. 54, no. 1, pp. 1–4, Jan. 2014, doi: 10.1021/ci400572x.

K. Felsenstein and K. Pötzelberger, “The Asymptotic Loss of Information for Grouped Data,” J. Multivar. Anal., vol. 67, no. 1, pp. 99–127, Oct. 1998, doi: 10.1006/jmva.1998.1759.

K. E. Markon, M. Chmielewski, and C. J. Miller, “The reliability and validity of discrete and continuous measures of psychopathology: A quantitative review.,” Psychol. Bull., vol. 137, no. 5, pp. 856–879, Sep. 2011, doi: 10.1037/a0023678.