Enhanced Machine Learning Framework for Autonomous Depression Detection Using Modwave Cepstral Fusion and Stochastic Embedding
Keywords:
depression detection, machine learning, ModWave Cepstral Fusion, background noise, XGBoost classifier, DAIC-WOZ dataset, autonomous detection system, accuracyAbstract
Depression is a prevalent mental illness that requires autonomous detection systems due to its complexity. Existing machine learning techniques face challenges such as background noise sensitivity, slow adaptation speed, and imbalanced data. To address these limitations, this study proposes a novel ModWave Cepstral Fusion and Stochastic Embedding Framework for depression prediction. Then, the Gain Modulated Wavelet Technique removes background noise and normalises audio signals. Difficulties with generalisation, which results in a lack of interpretability, hinder extracting relevant characteristics from speech. To address these issues, an Auto Cepstral Fusion extracts relevant features from speech, capturing temporal and spectral characteristics caused by background voice. Feature selection becomes imperative when choosing relevant features for classification. Selecting irrelevant features can result in overfitting, the curse of dimensionality, and less robustness to noise. Hence, the Principal Stochastic Embedding technique handles high-dimensional data, minimising noise influence and dimensionality. Furthermore, the XGBoost classifier differentiates between depressed and non-depressed individuals. As a result, the proposed method uses the DAIC-WOZ dataset from USC for detecting depressions, achieving an accuracy of 97.02%, precision of 97.02%, recall of 97.02%, F1-score of 97.02%, RMSE of 2.00, and MAE of 0.9, making it a promising tool for autonomous depression detection.
References
2. Uddin M.Z., Dysthe K.K., Følstad A., Brandtzaeg P.B. Deep learning for prediction of depressive symptoms in a large textual dataset. Neural Computing and Applications. 2022. vol. 34(1). pp. 721–744.
3. Jacobson N.C., Chung Y.J. Passive sensing of prediction of moment-to-moment depressed mood among undergraduates with clinical levels of depression sample using smartphones. Sensors. 2020. vol. 20(12). DOI: 10.3390/s20123572.
4. Ormel J., Kessler R.C., Schoevers R. Depression: More treatment but no drop in prevalence: how effective is treatment? And can we do better? Current opinion in psychiatry. 2019. vol. 32(4). pp. 348–354.
5. Culpepper L. Understanding the burden of depression. The Journal of Clinical Psychiatry. 2011. vol. 72(6). DOI: 10.4088/JCP.10126tx1c.
6. Sadock B.J., Sadock V.A., Ruiz P. Compêndio de Psiquiatria: Ciência do Comportamento e Psiquiatria Clínica. Artmed Editora. 2016. 1490 p.
7. Mundt J.C., Vogel A.P., Feltner D.E., Lenderking W.R. Vocal acoustic biomarkers of depression severity and treatment response. Biological psychiatry. 2012. vol. 72(7). pp. 580–587.
8. Hashim N.W., Wilkes M., Salomon R., Meggs J., France D.J. Evaluation of voice acoustics as predictors of clinical depression scores. Journal of Voice. 2017. vol. 31(2). DOI: 10.1016/j.jvoice.2016.06.006.
9. Khoo L.S., Lim M.K., Chong, C.Y., McNaney R. Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches. Sensors. 2024. vol. 24(2). DOI: 10.3390/s24020348.
10. Low D.M., Bentley K.H., Ghosh S.S. Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope investigative otolaryngology. 2020. vol. 5(1). pp. 96–116.
11. Asci F., Costantini G., Di Leo P., Zampogna A., Ruoppolo G., Berardelli A., Saggio G., Suppa A. Machine-learning analysis of voice samples recorded through smartphones: the combined effect of ageing and gender. Sensors. 2020. vol. 20(18). DOI: org/10.3390/s20185022.
12. Chen Z.S., Galatzer-Levy I.R., Bigio B., Nasca C., Zhang Y. Modern views of machine learning for precision psychiatry. Patterns. 2022. vol. 3(11).
13. Jiang H., Hu B., Liu Z., Wang G., Zhang L., Li X., Kang H. Detecting depression using an ensemble logistic regression model based on multiple speech features. Computational and mathematical methods in medicine. 2018. vol. 1. DOI: 10.1155/2018/6508319.
14. Na K.S., Cho S.E., Geem Z.W., Kim Y.K. Predicting future onset of depression among community dwelling adults in the Republic of Korea using a machine learning algorithm. Neuroscience Letters. 2020. vol. 721. DOI: 10.1016/j.neulet.2020.134804.
15. Hochman E., Feldman B., Weizman A., Krivoy A., Gur S., Barzilay E., Gabay H., Levy J., Levinkron O., Lawrence G. Development and validation of a machine learning‐based postpartum depression prediction model: A nationwide cohort study. Depression and anxiety. 2021. vol. 38(4). pp. 400–411.
16. Narziev N., Goh H., Toshnazarov K., Lee S.A., Chung K.M., Noh Y. STDD: Short-term depression detection with passive sensing. Sensors. 2020. vol. 20(5). DOI: 10.3390/s20051396.
17. Ware S., Yue C., Morillo R., Lu J., Shang C., Kamath J., Bamis A., Bi J., Russell A., Wang B. Large-scale automatic depression screening using meta-data from wifi infrastructure. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 2018. vol. 2(4). pp. 1–27.
18. Espinola C.W., Gomes J.C., Pereira J.M.S., dos Santos W.P. Detection of major depressive disorder using vocal acoustic analysis and machine learning. medRxiv. 2020. DOI: 10.1101/2020.06.23.20138651.
19. Qureshi S.A., Hasanuzzaman M., Saha S., Dias G. The Verbal and Non Verbal Signals of Depression--Combining Acoustics, Text and Visuals for Estimating Depression Level. 2019. arXiv preprint arXiv:1904.07656.
20. Chen X., Pan Z. A convenient and low-cost model of depression screening and early warning based on voice data using for public mental health. International Journal of Environmental Research and Public Health. 2021. vol. 18(12). DOI: 10.3390/ijerph18126441.
21. Espinola C.W., Gomes J.C., Pereira J.M.S., dos Santos W.P. Detection of major depressive disorder using vocal acoustic analysis and machine learning – an exploratory study. Research on Biomedical Engineering. 2021. vol. 37. pp. 53–64.
22. Baek J.W., Chung K. Context deep neural network model for predicting depression risk using multiple regression. IEEE Access. 2020. vol. 8. pp. 18171–18181.
23. Zogan H., Razzak I., Wang X., Jameel S., Xu G. Explainable depression detection with multi-aspect features using a hybrid deep learning model on social media. World Wide Web. 2022. vol. 25(1). pp. 281–304.
24. Effati-Daryani F., Zarei S., Mohammadi A., Hemmati E., Ghasemi Yngyknd S., Mirghafourvand M. Depression, stress, anxiety and their predictors in Iranian pregnant women during the outbreak of COVID-19. BMC psychology. 2020. vol. 8. pp. 1–10.
25. Gratch J., Artstein R., Lucas G.M., Stratou G., Scherer S., Nazarian A., Wood R., Boberg J., DeVault D., Marsella S., Traum D., Rizzo S., Morency L.-P. The distress analysis interview corpus of human and computer interviews. LREC. 2014. pp. 3123–3128.
26. Yang L., Jiang D., Xia X., Pei E., Oveneke M.C., Sahli H. Multimodal measurement of depression using deep learning models. Proceedings of the 7th annual workshop on audio/visual emotion challenge. 2017. pp. 53–59.
27. Yang L., Jiang D., Sahli H. Feature augmenting networks for improving depression severity estimation from speech signals. IEEE Access. 2020. vol. 8. pp. 24033–24045.
28. Lu J., Liu B., Lian Z., Cai C., Tao J., Zhao Z. Prediction of Depression Severity Based on Transformer Encoder and CNN Model. In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE. 2022. pp. 339–343.
29. Fang M., Peng S., Liang Y., Hung C.C., Liu S. A multimodal fusion model with multi-level attention mechanism for depression detection. Biomedical Signal Processing and Control. 2023. vol. 82. DOI: 10.1016/j.bspc.2022.104561.
30. Ishimaru M., Okada Y., Uchiyama R., Horiguchi R., Toyoshima I. A new regression model for depression severity prediction based on correlation among audio features using a graph convolutional neural network. Diagnostics. 2023. vol. 13(4). DOI: 10.3390/diagnostics13040727.
31. Cao Y., Hao Y., Li B., Xue J. Depression prediction based on BiAttention-GRU. Journal of Ambient Intelligence and Humanized Computing. 2022. vol. 13(11). pp. 5269–5277.
32. Yin F., Du J., Xu X., Zhao L. Depression detection in speech using transformer and parallel convolutional neural networks. Electronics. 2023. vol. 12(2). DOI: 10.3390/electronics12020328.
33. Ahmed S., Yousuf M.A., Monowar M.M., Hamid M.A., Alassafi M. Taking all the factors we need: A multimodal depression classification with uncertainty approximation. IEEE Access. 2023. vol. 11. DOI: 10.1109/ACCESS.2023.3315243.
Published
How to Cite
Section
Copyright (c) Jithin Jacob

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).