A Multi-Model Fusion and Risk Optimization Approach for Non-Invasive Prenatal Testing Based on Random Forest and K-means Clustering
DOI:
https://doi.org/10.54097/xtk12b55Keywords:
Random Forest, K-means Clustering, Gradient Boosting Decision Tree.Abstract
This paper focuses on the complex relationship between Y-chromosome concentration and multiple factors in non-invasive prenatal testing, and constructs a methodological framework that integrates multiple models. First, to address the significant nonlinear characteristics among variables, a random forest model is introduced to assess feature importance, identify key influencing factors, and enhance model robustness. Building on this foundation, segmented polynomial regression is employed to effectively improve overall fitting performance through interval partitioning and local modeling. Furthermore, the concentration threshold is transformed into a risk minimization problem to construct an optimization model. K-means clustering is utilized to achieve joint optimization of grouping and testing timing, thereby reducing overall risk while ensuring structural stability. Additionally, to address class imbalance, SMOTE is introduced for data augmentation, and a classification model is established using Gradient Boosted Decision Trees (GBDT) to effectively identify abnormal samples. Overall, this method integrates Random Forests, piecewise regression, clustering, and ensemble learning models, demonstrating strong generalization capabilities and stability, and holds significant practical value in complex data modeling and risk control.
Downloads
References
[1] Xing Hong, Wei Yiqiang, Li Chenlong. Handling Imbalanced Data Using the G-Mean Weighted Random Forest Algorithm [J]. Advances in Applied Mathematics, 2022, 11: 2071.
[2] Wu Yan, Zhang Huang. Assessment of IoT Network Security Status Based on the K-SMOTE Random Forest Algorithm [J]. IoT Technology, 2025, 15(24): 21-23. DOI:10.16667/j.issn.2095-1302.2025.24.004.
[3] Liu Zhenwen, He Peng. Composition Analysis and Authentication of Glass Artifacts Based on Random Forest and K-means++ [J]. Journal of Changchun University of Technology, 2025, 46(06): 545-551. DOI: 10.15923/j.cnki.cn22-1382/t.2025.6.09.
[4] Liu Siyi. Research on Improved Gradient Boosting Decision Trees and Their Interpretability [D]. Changchun University of Technology, 2025. DOI:10.27805/d.cnki.gccgy.2025.000561.
[5] Li, A., Han, M., Mu, D., et al. A Review of Classification Methods for Multi-class Imbalanced Data [J]. Research on Computer Applications, 2022, 39(12): 3534–3545. DOI: 10.19734/j.issn.1001-3695.2022.03.0198.
[6] Liu Ruiting, Xie Suli, Wang Ke, et al. Construction of a Postoperative Deep Vein Thrombosis Risk Prediction Model for Patients with Lower Limb Traumatic Fractures Based on Gradient Boosted Decision Trees [J]. Journal of Trauma Surgery, 2025, 27(07): 523-531.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







