1D and 2D Feature Extraction Based on AAC and DC Protein Descriptors for Classification of Acetylation in Lysine Proteins using Convolutional Neural Network
Abstract
Post-Translational Modification (PTM) denotes a biochemical alteration observed in an amino acid, playing crucial roles in protein activity, functionality, and the regulation of protein structure. The recognition of associated PTMs serves as a fundamental basis for understanding biological processes, therapeutic interventions for diseases, and the development of pharmaceutical agents. Using computational approaches (in silico) offers an efficient and cost-effective means to identify PTM sites swiftly. The exploration of protein classification commences with extracting protein sequence features that are subsequently transformed into numerical features for utilization in classification algorithms. Feature extraction methodologies involve using protein descriptors like Amino Acid Composition (AAC) and Dipeptide Composition (DC). Yet, these approaches exhibit a limitation by neglecting crucial amino acid sequence details. Moreover, both descriptor techniques generate a limited number of 1-dimensional (1D) features, which may not be ideal for processing through the Convolutional Neural Network (CNN) classification method. This investigation presents a novel approach to enhance feature diversity through protein sequence segmentation techniques, employing adjacent and overlapping segment strategies. Furthermore, the study illustrates the organization of features into 1D and 2D formats to facilitate processing through 1D CNN and 2D CNN classification methodologies. The findings of this research endeavour highlight the potential for enhancing the accuracy of acetylation classification in lysine proteins through the multiplication of protein sequence segments in a 2D configuration. The highest accuracy achieved for AAC and DC-based feature extraction methods is 77.39% and 76.75%, respectively.
Downloads
References
[2] A. H. Shukri, V. Lukinović, F. Charih, and K. K. Biggar, “Unraveling the battle for lysine: A review of the competition among post-translational modifications,” Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms, p. 194990, 2023.
[3] A. Rizqiana, M. R. Faisal, and F. R. Lumbanraja, “Implementation protein sequence segmentation in AAC and DC as protein descriptors for improving a classification performance of acetylation prediction,” in Journal of Physics: Conference Series, 2021, vol. 1751, no. 1, p. 12031.
[4] V. Vaghasia, K. S. Lata, S. Patel, and J. Das, “Deciphering the lysine acetylation pattern of leptospiral strains by in silico approach,” Network Modeling Analysis in Health Informatics and Bioinformatics, vol. 12, no. 1, p. 15, 2023.
[5] J. Chen and Y.-H. Tsai, “Applications of genetic code expansion in studying protein post-translational modification,” Journal of Molecular Biology, vol. 434, no. 8, p. 167424, 2022.
[6] B. Abapihi et al., “Parameter estimation for high dimensional classification model on colon cancer microarray dataset,” Journal of Physics: Conference Series, vol. 1899, no. 1, p. 12113, May 2021, doi: 10.1088/1742-6596/1899/1/012113.
[7] J. P. Utami, N. Kurnianingsih, and M. R. Faisal, “An in silico study of the Cathepsin L inhibitory activity of bioactive compounds in Stachytarpheta jamaicensis as a COVID-19 drug therapy,” Makara Journal of Science, vol. 26, no. 1, p. 3, 2022.
[8] M. D. Darma, M. Reza Faisal, I. Budiman, R. Herteno, J. P. Utami, and B. Abapihi, “In Silico Prediction of Indonesian Herbs Compounds as Covid-19 Supportive Therapy using Support Vector Machine,” in 2021 4th International Conference of Computer and Informatics Engineering (IC2IE), Sep. 2021, pp. 62–67. doi: 10.1109/IC2IE53219.2021.9649383.
[9] F. Indriani, K. R. Mahmudah, B. Purnama, and K. Satou, “Prottrans-glutar: Incorporating features from pre-trained transformer-based models for predicting glutarylation sites,” Frontiers in Genetics, vol. 13, p. 885929, 2022.
[10] P. A. Riadi, M. R. Faisal, D. Kartini, R. A. Nugroho, D. T. Nugrahadi, and D. B. Magfira, “A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 1, pp. 73–83, 2024.
[11] R. A. Rahma, R. A. Nugroho, D. Kartini, M. R. Faisal, and F. Abadi, “Combination of texture feature extraction and forward selection for one-class support vector machine improvement in self-portrait classification,” International Journal of Electrical and Computer Engineering, vol. 13, no. 1, pp. 425–434, 2023, doi: 10.11591/ijece.v13i1.pp425-434.
[12] K. A. Putri and W. F. Al Maki, “Enhancing Pneumonia Disease Classification using Genetic Algorithm-Tuned DCGANs and VGG-16 Integration,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 1, pp. 11–22, 2024.
[13] M. R. Faisal, I. Budiman, F. Abadi, D. T. Nugrahadi, M. Haekal, and I. Sutedja, “Applying Features Based on Word Embedding Techniques to 1D CNN for Natural Disaster Messages Classification,” 2022 5th International Conference on Computer and Informatics Engineering, IC2IE 2022, no. December, pp. 192–197, 2022, doi: 10.1109/IC2IE56416.2022.9970188.
[14] M. R. Faisal, R. A. Nugroho, R. Ramadhani, F. Abadi, R. Herteno, and T. H. Saragih, “Natural Disaster on Twitter: Role of Feature Extraction Method of Word2Vec and Lexicon Based for Determining Direct Eyewitness,” Trends in Sciences, vol. 18, no. 23, p. 680, 2021.
[15] H. Zulfiqar et al., “Deep-STP: A deep learning-based approach to predict snake toxin proteins by using word embeddings,” Frontiers in Medicine, vol. 10, p. 1291352, 2024.
[16] P. Pratyush, S. Pokharel, H. Saigo, and D. B. Kc, “pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model,” BMC bioinformatics, vol. 24, no. 1, p. 41, 2023.
[17] L. Yu, L. Xue, F. Liu, Y. Li, R. Jing, and J. Luo, “The applications of deep learning algorithms on in silico druggable proteins identification,” Journal of Advanced Research, vol. 41, pp. 219–231, 2022.
[18] A. Huang, F. Lu, and F. Liu, “Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor,” Frontiers in Microbiology, vol. 14, 2023, doi: 10.3389/fmicb.2023.1130594.
[19] A. Mckenna and S. Dubey, “Machine learning based predictive model for the analysis of sequence activity relationships using protein spectra and protein descriptors,” Journal of Biomedical Informatics, vol. 128, p. 104016, 2022.
[20] F. Zandi, P. Mansouri, and M. Goodarzi, “Global protein-protein interaction networks in yeast saccharomyces cerevisiae and helicobacter pylori,” Talanta, vol. 265, p. 124836, 2023.
[21] S. Charles, A. Subeesh, and J. Natarajan, “Tree based models for classification of membrane and secreted proteins in heart,” Journal of Proteins and Proteomics, pp. 1–11, 2024.
[22] L. Wang and L. Hu, “A deep learning algorithm for predicting protein-protein interactions with nonnegative latent factorization,” in 2021 International Conference on Cyber-Physical Social Intelligence (ICCSI), 2021, pp. 1–6.
[23] Q.-H. Kha, Q.-T. Ho, and N. Q. K. Le, “Identifying SNARE proteins using an alignment-free method based on multiscan convolutional neural network and PSSM profiles,” Journal of Chemical Information and Modeling, vol. 62, no. 19, pp. 4820–4826, 2022.
[24] W. Gao, D. Xu, H. Li, J. Du, G. Wang, and D. Li, “Identification of adaptor proteins by incorporating deep learning and PSSM profiles,” Methods, vol. 209, pp. 10–17, 2023.
[25] S. Chauhan and S. Ahmad, “Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence,” Proteins: Structure, Function, and Bioinformatics, vol. 88, no. 1, pp. 15–30, 2020.
[26] Y. He and S. Wang, “SE-BLTCNN: A channel attention adapted deep learning model based on PSSM for membrane protein classification,” Computational biology and chemistry, vol. 98, p. 107680, 2022.
[27] P. Wang, E. Fan, and P. Wang, “Comparative analysis of image classification algorithms based on traditional machine learning and deep learning,” Pattern Recognition Letters, vol. 141, pp. 61–67, 2021.
[28] S. Huang, I. Arpaci, M. Al-Emran, S. K\il\içarslan, and M. A. Al-Sharafi, “A comparative analysis of classical machine learning and deep learning techniques for predicting lung cancer survivability,” Multimedia Tools and Applications, vol. 82, no. 22, pp. 34183–34198, 2023.
[29] M. R. Faisal et al., “Improving Protein Sequence Classification Performance Using Adjacent and Overlapped Segments on Existing Protein Descriptors,” Journal of Biomedical Science and Engineering, vol. 11, no. 06, pp. 126–143, 2018, doi: 10.4236/jbise.2018.116012.
[30] N. Xiao, D.-S. Cao, M.-F. Zhu, and Q.-S. Xu, “protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences,” Bioinformatics, vol. 31, no. 11, pp. 1857–1859, 2015, doi: 10.1093/bioinformatics/btv042.
[31] I. Budiman, M. R. Faisal, D. T. Nugrahadi, M. K. Delimayanti, S. E. Prastya, and others, “Harvesting Natural Disaster Reports from Social Media with 1D Convolutional Neural Network and Long Short-Term Memory,” in 2023 Eighth International Conference on Informatics and Computing (ICIC), 2023, pp. 1–6.
[32] S. Alsaadi, T. J. Anande, and M. S. Leeson, “Comparative Analysis of 1D-CNN and 2D-CNN for Network Intrusion Detection in Software Defined Networks,” in International Conference on Emerging Internet, Data \& Web Technologies, 2024, pp. 480–491.
[33] X. Ma et al., “Urban feature extraction within a complex urban area with an improved 3D-CNN using airborne hyperspectral data,” Remote Sensing, vol. 15, no. 4, p. 992, 2023.
[34] R. F. R. Junior, I. A. dos Santos Areias, M. M. Campos, C. E. Teixeira, L. E. B. da Silva, and G. F. Gomes, “Fault detection and diagnosis in electric motors using 1d convolutional neural networks with multi-channel vibration signals,” Measurement, vol. 190, p. 110759, 2022.
[35] M. Khairie, M. R. Faisal, R. Herteno, I. Budiman, F. Abadi, and M. I. Mazdadi, “The Effect of Channel Size on Performance of 1D CNN Architecture for Automatic Detection of Self-Reported COVID-19 Symptoms on Twitter,” in 2023 International Seminar on Intelligent Technology and Its Applications (ISITIA), 2023, pp. 621–625.
[36] S. Sattar et al., “Cardiac Arrhythmia Classification Using Advanced Deep Learning Techniques on Digitized ECG Datasets,” Sensors, vol. 24, no. 8, p. 2484, 2024.
[37] A. Kumar, D. Singh, S. Singh, and S. Sharma, “Multiview learning with shallow 1D-CNN for anticancer activity classification of therapeutic peptides,” in Deep Learning Applications in Translational Bioinformatics, Elsevier, 2024, pp. 79–95.
[38] H. M. Rai and K. Chatterjee, “2D MRI image analysis and brain tumor detection using deep learning CNN model LeU-Net,” Multimedia Tools and Applications, vol. 80, no. 28, pp. 36111–36141, 2021.
[39] M. R. Faisal et al., “A Social Community Sensor for Natural Disaster Monitoring in Indonesia Using Hybrid 2D CNN LSTM,” in Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology, 2023, pp. 250–258.
[40] N. Q. K. Le, T. T. Huynh, E. K. Y. Yapp, and H. Y. Yeh, “Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles,” Computer Methods and Programs in Biomedicine, vol. 177, pp. 81–88, 2019, doi: 10.1016/j.cmpb.2019.05.016.
[41] N. H. Arif, M. R. Faisal, A. Farmadi, D. Nugrahadi, F. Abadi, and U. A. Ahmad, “An Approach to ECG-based Gender Recognition Using Random Forest Algorithm,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 107–115, 2024.
[42] J. Bissmark and O. Wärnling, “The sparse data problem within classification algorithms: The effect of sparse data on the na{\"\i}ve Bayes algorithm.” 2017.
[43] K. Poulinakis, D. Drikakis, I. W. Kokkinakis, and S. M. Spottswood, “Machine-learning methods on noisy and sparse data,” Mathematics, vol. 11, no. 1, p. 236, 2023.
Copyright (c) 2024 Mohammad Reza Faisal, Laila Adawiyah, Triando Hamonangan Saragih, Dwi kartini, Rudy Herteno, Favorisen Rosyking Lumbanraja, Lilies Handayani, Siti Aisyah Solechah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).