Determining the factors of tuberculosis: analysis of machine learning and neural network methods

DOI: 10.31673/2412-9070.2025.021793

Authors

  • Д. В. Невінський, (Nevinskyi D. V.) Lviv Polytechnic National University
  • Д. І. Мартьянов, (Martjanov D. I.) Lviv Polytechnic National University
  • О. А. Господарський, (Hospodarskyy O. A.) Lviv Polytechnic National University
  • Я. І. Виклюк, (Vyklyuk Y. I.) Lviv Polytechnic National University
  • І. О. Сем’янів, (Semianiv I. O.) Bukovinian State Medical University, Chernivtsi

DOI:

https://doi.org/10.31673/2412-9070.2025.021793

Abstract

Tuberculosis (TB) remains one of the most serious infectious diseases globally, particularly in India, where its high incidence poses significant challenges to the healthcare system. This study focuses on analyzing the determinants of TB prevalence in India using machine learning (ML) and neural network (NN) methods. The objective is to identify key factors influencing TB incidence and develop accurate predictive models to support prevention and treatment strategies. Based on statistical data from 2019–2022, encompassing demographic characteristics, social factors, and medical indicators, a comprehensive analysis was conducted. Data processing techniques, including correlation analysis, oversampling (SMOGN) for sample balancing, and modeling with linear regressions (LM, Ridge, Lasso), ML algorithms (Decision Tree, K-Nearest Neighbors, Random Forest), and a deep neural network were employed. Results revealed that linear models exhibited limited accuracy (R² Test up to 0.600), while Random Forest (R² Test = 0.832) and K-Nearest Neighbors (R² Test = 0.865) significantly outperformed them due to their ability to capture nonlinear relationships.
The highest accuracy was achieved by the neural network (R² Test = 0.822, RMSE Test = 0.433), highlighting its effectiveness in detecting complex interdependencies. Key factors influencing TB incidence included population size (Population), gender ratio (Gender Ratio), the number of specialized centers (Nodal_DR_TB_Centres_Per_Population), and urban characteristics (City_Encoded). These findings underscore the potential of integrating ML and NN into medical research for TB forecasting and control, offering valuable insights for developing personalized therapeutic approaches and improving public health outcomes.

Keywords: machine learning; neural networks; disease prediction; oversampling; SMOGN; linear regression; Random Forest; K-Nearest Neighbors; determinants; integration.

Published

2025-07-21

Issue

Section

Articles