A Multi-Level Gene-Disease Based Feature Extraction And Classification Framework For Large Biomedical Document Sets


  • V. Shiva Narayana Reddy* & Dr Divya Midhunchakkaravarthy


Biomedical documents, document classification, gene-disease rules.


In the current biomedical repositories, gene and disease pattern discovery play a vital role for biomedical document analysis and ranking. Since, most of the biomedical databases have heterogeneous features with different levels of gene and disease patterns. Gene identification and ranking of high dimensional patterns in cross biomedical repositories are complex and difficult to process due to noise, uncertain and missing values. In the traditional biomedical repositories, data classification algorithms are used to classify the documents using the MeSH terms or user specific keywords. Also, these algorithms use static methods to find the relationship among the gene-sets. Therefore, these models are difficult to find the relational genes and its disease patterns in different biomedical repositories. In the proposed work, a hybrid cross gene baseddisease document classification model is proposed using the machine learning framework. In this work, an optimized Glove feature extraction method and advanced classification model are proposed to find key feature sets from the biomedical documents. Experimental results proved that the feature extraction based gene-disease prediction framework has better optimization than the state-of-arttechniques onvarious  biomedical disease documents.


  1. S. Mohd Faizal, T. M. Thevarajah, S. M. Khor, and S.-W. Chang, “A review of risk prediction models in cardiovascular disease: conventional approach vs. artificial intelligent approach,” Computer Methods and Programs in Biomedicine, vol. 207, p. 106190, Aug. 2021, doi: 10.1016/j.cmpb.2021.106190.
  2. -W. Hsiao, C.-L. Tao, E. Y. Chuang, and T.-P. Lu, “A risk prediction model of gene signatures in ovarian cancer through bagging of GA-XGBoost models,” Journal of Advanced Research, vol. 30, pp. 113–122, May 2021, doi: 10.1016/j.jare.2020.11.006.
  3. Wu, D. Zhu, X. Wang, and S. Zhang, “An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data,” Computational Biology and Chemistry, vol. 95, p. 107566, Dec. 2021, doi: 10.1016/j.compbiolchem.2021.107566.
  4. Pan et al., “Applications and developments of gene therapy drug delivery systems for genetic diseases,” Asian Journal of Pharmaceutical Sciences, Jun. 2021, doi: 10.1016/j.ajps.2021.05.003.
  5. He et al., “Artificial Intelligence in Cancer NOG/PDX Models——Prediction of Driver Gene Matching in Lung Cancer,” Engineering, Aug. 2021, doi: 10.1016/j.eng.2021.06.017.
  6. Nourani and V. Reshadat, “Association extraction from biomedical literature based on representation and transfer learning,” Journal of Theoretical Biology, vol. 488, p. 110112, Mar. 2020, doi: 10.1016/j.jtbi.2019.110112.
  7. Sun, Z. Yang, L. Wang, Y. Zhang, H. Lin, and J. Wang, “Biomedical named entity recognition using BERT in the machine reading comprehension framework,” Journal of Biomedical Informatics, vol. 118, p. 103799, Jun. 2021, doi: 10.1016/j.jbi.2021.103799.
  8. Huang, J. Sun, S. M. Srinivasan, and R. S. Sangwan, “Comparative Study of Disease Classification Using Multiple Machine Learning Models Based on Landmark and Non-Landmark Gene Expression Data,” Procedia Computer Science, vol. 185, pp. 264–273, Jan. 2021, doi: 10.1016/j.procs.2021.05.028.
  9. Kanjirangat and F. Rinaldi, “Enhancing Biomedical Relation Extraction with Transformer Models using Shortest Dependency Path Features and Triplet Information,” Journal of Biomedical Informatics, vol. 122, p. 103893, Oct. 2021, doi: 10.1016/j.jbi.2021.103893.
  10. N. Devendra Kumar, A. C, and K. Srihari, “Extraction of the molecular level biomedical event trigger based on gene ontology using radial belief neural network techniques,” Biosystems, vol. 199, p. 104313, Jan. 2021, doi: 10.1016/j.biosystems.2020.104313.
  11. Lan, X. Wu, Q. Chen, W. Peng, J. Wang, and Y. P. Chen, “GANLDA: Graph attention network for lncRNA-disease associations prediction,” Neurocomputing, Jul. 2021, doi: 10.1016/j.neucom.2020.09.094.
  12. [T. Y. A. Liu et al., “Gene Expression Profile Prediction in Uveal Melanoma Using Deep Learning: A Pilot Study for the Development of an Alternative Survival Prediction Tool,” Ophthalmology Retina, vol. 4, no. 12, pp. 1213–1215, Dec. 2020, doi: 10.1016/j.oret.2020.06.023.
  13. A. Ibrahim, M. U. Ghani Khan, F. Mehmood, M. N. Asim, and W. Mahmood, “GHS-NET a generic hybridized shallow neural network for multi-label biomedical text classification,” Journal of Biomedical Informatics, vol. 116, p. 103699, Apr. 2021, doi: 10.1016/j.jbi.2021.103699.
  14. Mukherjee et al., “Identifying digenic disease genes via machine learning in the Undiagnosed Diseases Network,” The American Journal of Human Genetics, vol. 108, no. 10, pp. 1946–1963, Oct. 2021, doi: 10.1016/j.ajhg.2021.08.010.




How to Cite

V. Shiva Narayana Reddy* & Dr Divya Midhunchakkaravarthy. (2022). A Multi-Level Gene-Disease Based Feature Extraction And Classification Framework For Large Biomedical Document Sets. Journal of Optoelectronics Laser, 41(2), 11–24. Retrieved from http://gdzjg.org/index.php/JOL/article/view/49