Phoneme-by-Phoneme Speech Recognition as a Classification of Series on a Set of Sequences of Elements of Complex Objects Using an Improved Trie-Tree
Keywords:
trie-tree, sets of sequences, classification of series on a set of sequences of elements of complex objects, dynamical programming, phoneme-by-phoneme recognition of speech commandsAbstract
Sequences, including vector sequences, are applicable in any subject domains. Sequences of scalar values or vectors (series) can be produced by higher-order sequences, for example: a series of states, or elements of complex objects. This academic paper is devoted to the application of an improved trie-tree in the classification of series on a set of sequences of elements of complex objects using the dynamic programming method. The implementation areas of dynamic programming have been considered. It has been shown that dynamic programming is adapted to multi-step operations of calculating additive (multiplicative) similarity/difference measures. It is argued that the improved trie-tree is applicable in the problem of classifying a series on a set of sequences of elements of complex objects using such similarity/difference measures. An analysis of hierarchical representations of sets of sequences has been performed. The advantages of the improved trie-tree over traditional representations of other highly branching trees have been described. A formal description of the improved trie-tree has been developed. An explanation has been given to the previously obtained data on a significant speed gain for operations of adding and deleting sequences in the improved trie-tree relative to the use of an array with an index table (24 and 380 times, respectively). The problem of phoneme-by-phoneme recognition of speech commands has been formulated as a problem of classifying series on a set of sequences of elements of complex objects and a method for its solving has been presented. A method for classifying a series on a set of sequences of elements of complex objects using the improved trie-tree is developed. The method has been studied using the example of phoneme-by-phoneme recognition with a hierarchical representation of the dictionary of speech command classes. In this method, recognition of speech commands is executed traversing the improved trie-tree that stores a set of transcriptions of speech commands – sequences of transcription symbols that denote classes of sounds. Numerical studies have shown that classifying a series as sequences of elements of complex objects increases the frequency of correct classification compared to classifying a series on a set of series, and using the improved trie-tree reduces the time spent on classification.
References
2. Knut D.Je. Iskusstvo programmirovanija. Vol. 3: Sortirovka i poisk. [The Art of Computer Programming. Vol. 3: Sorting and Searching]. M.: Vil'jams. 2000. 832 p. (In Russ.).
3. Briandais R. File searching using variable-length keys. Proc. Western Joint Computer Conf. 1959. pp. 295–298.
4. Gusfield D. Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology. Davis: University of California, 1997. 556 p.
5. Liao T.F., Bolano D., Brzinsky-Fay C., Cornwell B., Fasang A.E., Helske S., Piccarreta R., Raab M., Ritschard G., Struffolino E., Studer M. Sequence analysis: Its past, present, and future. Social science research. 2022. vol. 107. DOI: 10.1016/j.ssresearch.2022.102772.
6. Mathew S., Peat G., Parry E., Sokhal B.S., Yu D. Applying sequence analysis to uncover 'real-world' clinical pathways from routinely collected data: a systematic review. Journal of Clinical Epidemiology. 2024. vol. 166. DOI: 10.1016/j.jclinepi.2023.111226.
7. Gromov V.A., Mazayshvili K.V., Zaikin P.V., Nikolaev E.N., Beschastnov Yu.N., Zvorykina E.I., Parinov А.А., Neznanov А.А. [Differentiating Chaotic and Regular Time Series for Identification of Arteriovenous Fistula State]. Vestnik kibernetiki – Proceedings in Cybernetics. 2022. no. 1(45). pp. 72–82. (In Russ.).
8. Kovaleva K.A., Yahontova I.M. [Research and development theory methods and models for forecasting time series with insurance increments]. Novye Tehnologii – New technologies. 2019. no. 4(50). pp. 239–248. (In Russ.).
9. Zyus`ko K.D. [Forecasting demand for goods using neural networks in conditions of changing dimensionality of input data]. E`konomika i kachestvo sistem svyazi – Economics and quality of communication systems. 2020. no. 1(15). pp. 36–41. (In Russ.).
10. Lucenko E.V. [Application of automated system-cognitive analysis of bank databases on credit card transactions to quantify the risk of fraud]. Nauchny`j zhurnal KubGAU – Scientific Journal of KubSAU. 2021. vol. 172. pp. 82–172. (In Russ.).
11. Kuz`min V.N., Menisov A.B. Investigation of ways and means to improve the effectiveness of detecting computer attacks on critical information infrastructure facilities. Informacionno-upravlyayushhie sistemy` – Information management systems. 2022. no. 4. pp. 29–43. (In Russ.).
12. Leichtnam L., Totel E., Prigent N., Me L. Sec2graph: Network attack detection based on novelty detection on graph structured data. Detection of Intrusions and Malware, and Vulnerability Assessment: 17th International Conference, DIMVA. Springer International Publishing, 2020. pp. 238–258.
13. Zhukova N.A. [Ontological models of transformation of data on the state of technical objects]. Ontologiya proektirovaniya – Design Ontology. 2019. vol. 9. no. 3(33). pp. 345–360. (In Russ.).
14. Nguyen D., Luo W., Nguyen T., Venkatesh S., Phung D. Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint. Machine Learning and Knowledge Discovery in Databases. Proceedings of the European Conference, ECML PKDD (Part II). 2019. pp. 569–584.
15. Fradkin D., Morchen F. Mining sequential patterns for classification. Knowledge and Information Systems. 2015. № 45 (3). pp. 731–749.
16. Privalov A.N., Smirnov V.A. [Fuzzy string match method for detecting fake sites]. Izvestiya TulGU. Tehnicheskie nauki – News of the Tula state university. Technical sciences. 2022. no. 2. pp. 184–191. (In Russ.).
17. Blanchard P. Sequence Analysis. Encyclopedia of Research Methods. London: Sage Publications Ltd. 2020. URL: https://www.researchgate.net/publication/342232021_Sequence_Analysis (дата обращения: 15.05.2024).
18. Vanasse A., Courteau J., Courteau M., Benigeri M., Chiu Y.M., Dufour I., Couillard S., Larivée P., Hudon C. Healthcare utilization after a first hospitalization for COPD: a new approach of State Sequence Analysis based on the '6W' multidimensional model of care trajectories. BMC Health Serv. Res. 2020. vol. 20(1). DOI: 10.1186/s12913-020-5030-0.
19. Su H., Liu S., Zheng B., Zhou X., Zheng K. A survey of trajectory distance measures and performance evaluation. The VLDB Journal. 2020. № 29. pp. 3–32.
20. Kalihman I.L., Vojtenko M.A. Dinamicheskoe programmirovanie v primerah i zadachah: Ucheb. Posobie [Dynamic programming in examples and problems: Textbook]. Moskow: Vyssh. shkola, 1979. 125 p. (In Russ.).
21. Коган Д.И. Динамическое программирование и дискретная многокритериальная оптимизация: учебное пособие. Нижний Новгород: Изд-во Нижегородского ун-та, 2004. 150 с.
22. Bashirzade L.I., Aliev G.S. [Application of dynamic programming to modeling decision making processes]. Arhivarius. 2022. no. 3(66). pp. 51–55. (In Russ.).
23. Vintsyuk T.K. Analiz, raspoznavaniye i interpretatsiya rechevykh signalov [Analysis, recognition and interpretation of speech signals]. K.: Nauk. dumka, 1987. 262 p. (In Russ.).
24. Shelepov V.Y., Dorokhin O.A., Zasipkin A.V., Chervin N.A. [On some approaches to the problem of computer speech recognition of spoken Russian] «Znanie – Dialog – Reshenie»: trudy Mezhdunar. konf. [Proceedings of the Intern. Conf. "Knowledge – Dialogue – Solution"]. 1997. vol. 1. pp. 234–240. (In Russ.).
25. Alshehri M., Coenen F., Dures K. Sub-sequence-based dynamic time warping. Proceedings of the 11th International Conference on Knowledge Discovery and Information Retrieval. 2019. pp. 274–281.
26. Deriso D., Boyd S. A general optimization framework for dynamic time warping. Optimization and Engineering. 2023. vol. 24. pp. 1411–1432.
27. Wang L., Koniusz P. Uncertainty-DTW for Time Series and Sequences. European Conference on Computer Vision (ECCV 2022). Cham: Springer Nature Switzerland. 2022. vol. 13681. pp. 176–195.
28. Bringmann K., Fischer N., Hoog I., Kipouridis E., Kociumaka T., Rotenberg E. Dynamic Time Warping. Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Publisher Society for Industrial and Applied Mathematics. 2024. pp. 208–242.
29. Jain V., Fokow V., Wicht J., Wetzker U. A Dynamic Time Warping Based Method to Synchronize Spectral and Protocol Domains for Troubleshooting Wireless Communication. IEEE Access. 2023. vol. 11. pp. 64668–64678.
30. Kozlov A.V., Savvina G.V., Shelepov V.U. [Isolated word recognition system based on phoneme recognition]. Iskusstvennyj intellekt – Artificial Intelligence. 2003. vol. 1. pp. 156–165. (In Russ.).
31. Dorokhina G.V. [Modification of the algorithm DTW for spoken word recognition based on phoneme recognition]. Problemy iskusstvennogo intellekta – Problems of artificial intelligence. 2015. vol. 0(1). pp. 38–49. (In Russ.).
32. Dorokhina G.V. [Analysis of speech command recognition methods based on the DTW algorithm] Trudy` shestogo mezhdisciplinarnogo seminara «Analiz razgovornoj russkoj rechi» (AR3-2012) [Proceedings of the sixth interdisciplinary seminar «Analysis of Spoken Russian Speech» (AR3-2012)]. 2012. pp. 29–34. (In Russ.).
33. Vasil'yev V.I., Shevchenko A.I., Esh S.N. Printsip reduktsii v zadachakh obnaruzheniya zakonomernostey: Monografiya [The principle of reduction in the problems of detecting patterns: Monograph]. Donetsk, 2009. 340 p. (In Russ.).
34. Buribayeva A.K., Dorokhina G.V., Nitsenko A.V., Shelepov V.Ju. [Segmentation and diphone recognition of speech signals]. Trudy SPIIRAN – SPIIRAS Proceedings. 2014. vol. 31. no. 8. pp. 20–42. (In Russ.).
35. Dorokhina G.V., Pavlyukova A.P. [Module of morphological analysis of words of the Russian language]. Iskusstvennyy intellect – Artificial Intelligence. 2004. № 3. pp. 636–642. (In Russ.).
36. Dorokhina G.V. Patent No. UA 78806 “Device for saving and searching for lowercase values and method for saving and searching for lowercase values” Owner: Institute of problems of artificial intelligence Promyshlennaja sobstvennost' [Industrial property]. 25.04.2007. (In Russ.).
37. Dorokhina G.V., Pavlysh V.N. [A method of presenting sets of sequences]. Informatika i kibernetika – Informatics and Cybernetics. 2016. № 1(3). pp. 56–64. (In Russ.).
38. Dorokhina G.V. [Memory expenses comparison for the method of digital search tree and its improvement]. Iskusstvennyj intellekt – Artificial Intelligence. 2009. vol. 4. pp. 338–343. (In Russ.).
39. Finayev V.I., Dorokhina G.V. [Applications of improved digital search trees]. Problemy iskusstvennogo intellekta – Problems of artificial intelligence. 2019. vol. 4 (15). pp. 62–77. (In Russ.).
40. Bantay L., Abonyi J. Frequent pattern mining-based log file partition for process mining. Engineering Applications of Artificial Intelligence. 2023. vol. 123. DOI: 10.1016/j.engappai.2023.106221.
41. Xing Z., Pei J., Keogh J. A brief survey on sequence classification. SIGKDD Explor. 2010. vol. 12(1). pp. 40–48.
42. Atar R.H., Bhosale D.S. Pattern Based Sequence Classification. International Journal of Advanced Research in Science, Communication and Technology (IJARSCT). 2023. vol. 3. № 1. pp. 390–396.
43. Lazzari N., Poltronieri A., Presutti V. Classifying sequences by combining context-free grammars and OWL ontologies. European Semantic Web Conference. Cham: Springer Nature Switzerland, 2023. С. 156–173.
44. Crochemore M., Lecroq T, Liu L., Ozsu T. Encyclopedia of Database Systems. Verlag: Springer. 2009. pp. 3179–3182.
Published
How to Cite
Section
Copyright (c) Галина Владимировна Дорохина

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).