A Method to Quantitative Compare Obfuscating Ttransformations
Keywords:
obfuscation, executable code, efficiency, resistance, similarityAbstract
The paper considers the problem of quantitative comparison of potency and resistance of practically applied obfuscating transformations of program code. A method is proposed to find the potency and resistance of transformations by calculating the «comprehensibility» of the obfuscated and deobfuscated versions of a program, respectively. As a measure of program comprehensibility, it is proposed to use the similarity of this program to the approximation of its «most comprehensible» version. Based on the proposed method a model to assess potency and resistance was built, the main elements of which are: a set of investigated obfuscating transformations, a similarity function, a method to approximate the most comprehensible version of the program and a deobfuscator. To implement this model 1) obfuscating transformations provided by Hikari obfuscator are chosen; 2) 8 similarity functions are constructed by machine learning methods using static characteristics of programs from CoreUtils, PolyBench and HashCat sets; 3) the smallest program version was chosen as an approximation of the most comprehensible program version (found among the versions obtained using optimization options of GCC, Clang and AOCC compilers); 4) a program deobfuscation scheme based on the optimizing compiler from LLVM was built and implemented. The results of the potency and resistance for sequences of transformations of lengths one, two and three were experimentally obtained. These results showed consistency with the results of independent potency and resistance evaluations obtained by other methods. In particular, it was found that the highest potency and resistance are demonstrated by sequences of transformations starting with transformations of the control flow graph, and the lowest resistance and potency are generally demonstrated by sequences that do not contain such transformations.
References
2. Undrits R., Resende J. et. al. CyberSec4Europe D3.23: Cybersecurity Outlook 2. Research Report D3.23. 2022. pp. 1–82.
3. Biernacki L., Gallagher M., Xu Z., Aga M.T., Harris A., Wei S., Tiwari M., Kasikci B., Malik S., Austin T.. Software-driven security attacks: From vulnerability sources to durable hardware defenses. ACM Journal on Emerging Technologies in Computing Systems (JETC). 2021. vol. 17. no. 3. pp. 1–38.
4. Varnovsky N.P., Zakharov V.A., Kuzurin N.N., Shokurov V.A. [The current state of art in program obfuscations: definitions of obfuscation security]. Trudy Instituta sistemnogo programmirovaniya RAN – Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2014. vol. 26. no. 3 pp. 167–198. (In Russ.).
5. Barak B., Goldreich O., Impagliazzo R., Rudich S., Sahai A., Vadhan S., Yang K. On the (im) possibility of obfuscating programs.Journal of the ACM (JACM). 2012. vol. 59. no. 2. pp. 1–42.
6. Zobernig L. Mathematical Aspects of Program Obfuscation. Doctoral dissertation. ResearchSpace@ Auckland, 2020. Available at: www.math.auckland.ac.nz/~sgal018/Lukas-Zobernig-Thesis.pdf (accessed 19.06.2023).
7. Garg S., Gentry C., Halevi S., Raykova M., Sahai A., Waters B. Candidate indistinguishability obfuscation and functional encryption for all circuits. SIAM Journal on Computing. 2016. vol. 45. no. 3. pp. 882–929.
8. Kochberger P., Schrittwieser S., Coppens B., De Sutter B. Evaluation Methodologies in Software Protection Research. arXiv preprint arXiv:2307.07300. 2023. pp. 1–67.
9. Zhou Y., Main A., Gu Y.X., Johnson H. Information hiding in software with mixed boolean-arithmetic transforms. Information Security Applications: 8th International Workshop on Information Security Applications. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2007. vol. 4867. pp. 61–75.
10. Reichenwallner B., Meerwald-Stadler P. Efficient Deobfuscation of Linear Mixed Boolean- Arithmetic Expressions. Proceedings of the 2022 ACM Workshop on Research on offensive and defensive techniques in the context of Man At The End (MATE) attacks. 2022. pp. 19–28.
11. Xu D., Liu D., Feng W., Ming J., Zheng Q., Li J., Yu Q. Boosting SMT solver performance on mixed-bitwise-arithmetic expressions. Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 2021. pp. 651–664.
12. Liu B., Shen J., Ming J., Zheng Q., Li J., Xu D. MBA-Blast: Unveiling and Simplifying Mixed Boolean-Arithmetic Obfuscation. Proceedings of the 30th USENIX Security Symposium. 2021. pp. 1701–1718.
13. Kosolapov Yu.V. [On Simplifying Expressions with Mixed Boolean-Arithmetic]. Modelirovanie i analiz informacionnyh sistem – Modeling and Analysis of Information Systems. 2023. vol. 30. no. 2. pp. 140–159. (In Russ.).
14. Ceccato M., Tonella P., Basile C., Falcarin P., Torchiano M., Coppens B., De Sutter B. Understanding the behaviour of hackers while performing attack tasks in a professional setting and in a public challenge. Empirical Software Engineering. 2019. no. 24. pp. 240–286.
15. Collberg C., Thomborson C., Low D. A taxonomy of obfuscating transformations. Computer Science Technical Reports 148. Department of Computer Science, The University of Auckland, New Zealand. 1997. pp. 1–36.
16. Mohsen R., Pinto A.M. Evaluating obfuscation security: A quantitative approach. In International Symposium on Foundations and Practice of Security, Springer International Publishing. 2015. pp. 174–192.
17. Banescu S., Ochoa M., Pretschner A. A framework for measuring software obfuscation resilience against automated attacks. In Proceedings of the 1st International Workshop on Software Protection (SPRO ’15). IEEE Press, Piscataway, NJ, USA. 2015. pp. 45–51.
18. Holder W., McDonald J.T., Andel T.R. Evaluating optimal phase ordering in obfuscation executives. Proceedings of the 7th Software Security, Protection, and Reverse Engineering/Software Security and Protection Workshop. 2017. pp. 1–12.
19. Collberg C. The Tigress C Diversifier/Obfuscator. 2016. Available at: tigress.cs.arizona.edu/ (accessed 23.06.2023).
20. Kosolapov Y.V., Borisov P.D. Similarity features for the evaluation of obfuscation effectiveness. In 2020 International Conference on Decision Aid Sciences and Application (DASA). 2020. pp. 898–902.
21. Borisov P.D., Kosolapov Y.V. On the Characteristics of Symbolic Execution in the Problem of Assessing the Quality of Obfuscating Transformations. Aut. Control Comp. Sci. 2022. vol. 56(7). pp. 595–605.
22. Xiao Y, Guo Y., Wang Y. Metrics for code obfuscation based on symbolic execution and N-scope complexity. Chinese Journal of Network and Information Security. 2022. vol. 8. no. 6. pp. 123–134.
23. Crescenzo G.D. Cryptographic program obfuscation: Practical solutions and application-driven models. Versatile Cybersecurity. 2018. pp. 141–167.
24. Gulwani S., Polozov O., Singh R. Program synthesis. Foundations and Trends in Programming Languages. 2017. vol. 4. no. 1-2. pp. 1–119.
25. Borisov P.D., Kosolapov Y.V. On the Automatic Analysis of the Practical Resistance of Obfuscating Transformations. Aut. Control Comp. Sci. 2020. vol. 54. pp. 619–629.
26. Walenstein A., El-Ramly M., Cordy J.R., Evans W.S, Mahdavi K., Pizka M., Ramalingam G., von Gudenberg J.W. Similarity in Programs. Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings. 2007. vol. 6301. pp. 1–8.
27. Ceccato M., Di Penta M., Nagra J., Falcarin P., Ricca F., Torchiano M., Tonella P. The effectiveness of source code obfuscation: An experimental assessment. 17th International Conference on Program Comprehension, IEEE. 2009. pp. 178–187.
28. Ceccato M., Di Penta M., Falcarin P., Ricca F., Torchiano M., Tonella P. A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empirical Software Engineering. 2014. vol. 19. pp. 1040–1074.
29. Borisov P.D., Kosolapov Yu.V. [Method to Evaluate Program Similarity Using Machine Learning Methods]. Trudy Instituta sistemnogo programmirovaniya RAN – Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023. vol. 34. no. 5. pp. 63–76. (In Russ.).
30. Naville Z. Hikari–an improvement over Obfuscator-LLVM. 2017. Available at: https://github.com/HikariObfuscator/Hikari (accessed 14.11.2023).
31. Junod P., Rinaldini J., Wehrli J., Michielin J. Obfuscator-LLVM–software protection for the masses. In Proc. of IEEE/ACM 1st International Workshop on Software Protection. 2015. pp. 3–9.
32. Haq I.U., Caballero J. A survey of binary code similarity. ACM Computing Surveys (CSUR). 2021. vol. 54. no. 3. pp. 1–38.
33. Pagani F., Dell’Amico M., Balzarotti D. Beyond precision and recall: understanding uses (and misuses) of similarity hashes in binary analysis. In Proc. of the Eighth ACM Conference on Data and Application Security and Privacy. 2018. pp. 354–365.
34. Ding S.H., Fung B.C., Charland P. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE. 2019. pp. 472–489.
35. Garba P., Favaro M. Saturn-software deobfuscation framework based on llvm. In Proceedings of the 3rd ACM Workshop on Software Protection. 2019. pp. 27–38.
36. Dinaburg A., Ruef A. Mcsema: Static translation of x86 instructions to LLVM. ReCon 2014 Conference, Montreal, Canada. 2014.
37. Eyrolles N. Obfuscation with Mixed Boolean-Arithmetic Expressions: reconstruction, analysis and simplification tools. Doctoral dissertation. Universite Paris-Saclay, 2017. Available at: https://theses.hal.science/tel-01623849/document (accessed 14.07.2023).
38. Liang M., Li Z., Zeng Q., Fang Z. Deobfuscation of virtualization-obfuscated code through symbolic execution and compilation optimization. In International Conference on Information and Communications Security. Springer International Publishing, 2018. pp. 313–324.
39. Panchenko M., Auler R., Sakka L., Ottoni G. Lightning BOLT: powerful, fast, and scalable binary optimization. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction. 2021. pp. 119–130.
40. Moreira A.A., Ottoni G., Quintao Pereira F.M. Vespa: static profiling for binary optimization. Proceedings of the ACM on Programming Languages. 2021. vol. 5. pp. 1–28.
41. Viticchie A., Regano L., Torchiano M., Basile C., Ceccato M., Tonella P., Tiella R. Assessment of source code obfuscation techniques. 16th international working conference on source code analysis and manipulation (SCAM), IEEE. 2016. pp. 11–20.
42. GCC, the GNU Compiler Collection. Available at: https://gcc.gnu.org/ (accessed 14.07.2023).
43. Clang: a C language family frontend for LLVM. Available at: https://clang.llvm.org/ (accessed 14.07.2023).
44. AMD Optimizing C/C++ and Fortran Compilers (AOCC). Available at: https://developer.amd.com/amd-aocc/ (accessed 14.07.2023).
45. Coreutils – GNU core utilities. Available at: https://www.gnu.org/software/coreutils/ (accessed 14.07.2023).
46. PolyBench/C – the Polyhedral Benchmark suite. Available at: https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/ (accessed 14.07.2023).
47. HashCat – advanced password recovery. Available at: https://hashcat.net/hashcat/ (accessed 14.07.2023).
48. small-programs. A set of small programs for experiments with obfuscations. Available at: https://github.com/Boriskin61/small-programs (accessed 22.07.2023).
49. Kutz D.O. Method for modeling indirect addressing within dynamic symbolic interpretation. Doctorial dissertation, Moscow, 2023. Available at: https://www.ispras.ru/dcouncil/docs/diss/2023/kuc/dissertacija-kuc.pdf (accessed 03.09.2023).
50. Lebedev R.K. [Automatic generation of hash functions for program code obfuscation]. Prikladnaya Diskretnaya Matematika – Applied Discrete Mathematics. 2020. no. 50. pp. 102–117. (In Russ.).
51. Lebedev V.V. [Control Flow Flattening deobfuscation using symbolic execution]. Prikladnaya Diskretnaya Matematika – Applied Discrete Mathematics. 2021. no. 14. pp. 134–138. (In Russ.).
52. BinShamlan M.H., Bamatraf M.A., Zain A.A. The impact of control flow obfuscation technique on software protection against human attacks. In 2019 First International Conference of Intelligent Computing and Engineering (ICOICE), IEEE. 2019. pp. 1–5.
53. Xu D. Opaque Predicate: Attack and Defense in Obfuscated Binary Code. Doctoral dissertation, 2018. Available at: https://etda.libraries.psu.edu/files/final_submissions/17513 (accessed 22.09.2023).
54. Sun Y. Software Protection Algorithm based on Control Flow Obfuscation. International Journal of Performability Engineering. 2018. vol. 14. no. 9. pp. 2181–2188.
55. Kim J., Kang S., Cho E.S., Paik J.Y. LOM: lightweight classifier for obfuscation methods. In Information Security Applications: 22nd International Conference, WISA 2021. Springer International Publishing. 2021. pp. 3–15.
56. Zhao Y., Tang Z., Ye G., Peng D., Fang D., Chen X., Wang Z. Semantics-aware obfuscation scheme prediction for binary. Computers & Security. 2020. no. 99. pp. 1–17.
57. Wang C., Hill J., Knight J., Davidson J. Software tamper resistance: Obstructing static analysis of programs. Technical report CS-2000-12. Department of Computer Science, University of Virginia, USA. 2000.
58. Dullien T., Rolles R. Graph-based comparison of executable objects (english version). Proceedings of the Symposium sur la Securite des Technologies de ’Information et des Communications. 2005. vol. 5. no. 1.
Published
How to Cite
Section
Copyright (c) Петр Дмитриевич Борисов, Юрий Владимирович Косолапов

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).