Software similarity measurements using UML diagrams: A systematic literature review

Authors

DOI:

https://doi.org/10.26594/register.v8i1.2248

Keywords:

software similarity, similarity measurement, UML diagram similarity, semantic similarity, structural similarity

Abstract

Every piece of software uses a model to derive its operational, auxiliary, and functional procedures. Unified Modeling Language (UML) is a standard displaying language for determining, recording, and building a software product. Several algorithms have been used by researchers to measure similarities between UML artifacts. However, there no literature studies have considered measurements of UML diagram similarities. This paper presents the results of a systematic literature review concerning similarity measurements between the UML diagrams of different software products. The study reviews and identifies similarity measurements of UML artifacts, with class diagram, sequence diagram, statechart diagram, and use case diagram being UML diagrams that are widely used as research objects for measuring similarity. Measuring similarity enables resolution of the problem domains of software reuse, similarity measurement, and clone detection. The instruments used to measure similarity are semantic and structural similarity. The findings indicate opportunities for future research regarding calculating other UML diagrams, compiling calculation information for each diagram, adapting semantic and structural similarity calculation methods, determining the best weight for each item in the diagram, testing novel proposed methods, and building or finding good datasets for use as testing material.

Author Biographies

Evi Triandini, Institut Teknologi dan Bisnis STIKOM Bali

Department of Information Systems

Reza Fauzan, Politeknik Negeri Banjarmasin

Department of Informatics Engineering

Daniel O. Siahaan, Institut Teknologi Sepuluh Nopember

Department of Informatics Engineering

Siti Rochimah, Institut Teknologi Sepuluh Nopember

Department of Informatics Engineering

I Gede Suardika, Institut Teknologi dan Bisnis STIKOM Bali

Department of Information Systems

Devi Karolita, Monash University

Department of Software Systems and Cybersecurity

References

[1] M. J. Chonoles, "Chapter 2 - What is UML?," in OCUP Certification Guide, Morgan Kaufmann, 2018, pp. 17-41.

[2] B. Kitchenham, R. Pretorius, D. Budgen, O. P. Brereton, M. Turner, M. Niazi and S. Linkman, "Systematic literature reviews in software engineering – A tertiary study," Information and Software Technology, vol. 52, no. 8, pp. 792-805, 2010.

[3] I. Inayat, S. S. Salim, S. Marczak, M. Daneva and S. Shamshirband, "A systematic literature review on agile requirements engineering practices and challenges," Computers in Human Behavior, vol. 51, pp. 915-929, 2015.

[4] K. Tuma, G. Calikli and R. Scandariato, "Threat analysis of software systems: A systematic literature review," Journal of Systems and Software, vol. 144, pp. 275-294, 2018.

[5] E. Souza, A. Moreira and M. Goulão, "Deriving architectural models from requirements specifications: A systematic mapping study," Information and Software Technology, vol. 109, pp. 26-39, 2019.

[6] W.-J. Park and D.-H. Bae, "A two-stage framework for UML specification matching," Information and Software Technology, vol. 53, no. 3, pp. 230-244, 2011.

[7] H. Störrle, "Towards clone detection in UML domain models," in ECSA '10: Proceedings of the Fourth European Conference on Software Architecture: Companion, 2010.

[8] K. Robles, A. Fraga, J. Morato and J. Llorens, "Towards an ontology-based retrieval of UML Class Diagrams," Information and Software Technology, vol. 54, no. 1, pp. 72-86, 2012.

[9] H. O. Salami and M. A. Ahmed, "A Framework for Class Diagram Retrieval Using Genetic Algorithm," in The 24th International Conference on Software Engineering & Knowledge Engineering, San Francisco Bay, 2012.

[10] B. Bonilla-Morales, S. Crespo and C. Clunie, "Reuse of Use Cases Diagrams: An Approach based on Ontologies and Semantic Web Technologies," IJCSI International Journal of Computer Science Issues, vol. 9, no. 1, pp. 24-29, 2012.

[11] H. O. Salami and M. Ahmed, "Class Diagram Retrieval Using Genetic Algorithm," in 2013 12th International Conference on Machine Learning and Applications, 2013.

[12] W. K. G. Assuncao and S. R. Vergilio, "Class Diagram Retrieval with Particle Swarm Optimization," in The 25th International Conference on Software Engineering and Knowledge Engineering (SEKE 2013), 2013.

[13] D. H. Qiu, H. Li and J. L. Sun, "Measuring software similarity based on structure and property of class diagram," in 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), 2013.

[14] H. O. Salami and M. Ahmed, "A framework for reuse of multi-view UML artifacts," The International Journal of Soft Computing and Software Engineering [JSCSE], vol. 3, no. 3, pp. 156-162, 156-162.

[15] S. Singh and R. Kaur, "Clone Detection in UML Class Models using Class Metrics," ACM SIGSOFT Software Engineering Notes, vol. 39, no. 3, 2014.

[16] M. A.-R. Al-Khiaty and M. Ahmed, "Similarity assessment of UML class diagrams using a greedy algorithm," in 2014 International Computer Science and Engineering Conference (ICSEC), 2014.

[17] M. A.-R. Al-Khiaty and M. Ahmed, "Similarity assessment of UML class diagrams using simulated annealing," in 2014 IEEE 5th International Conference on Software Engineering and Service Science, 2014.

[18] H. O. Salami and M. Ahmed, "Retrieving sequence diagrams using genetic algorithm," in 2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2014.

[19] O. Nikiforova, K. Gusarovs, L. Kozacenko, D. Ahilcenoka and D. Ungurs, "An Approach to Compare UML Class Diagrams Based on Semantical Features of Their Elements," in ICSEA 2015: The Tenth International Conference on Software Engineering Advances, 2015.

[20] A. Elkamel, M. Gzara and H. Ben-Abdallah, "An UML class recommender system for software design," in 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), 2016.

[21] M. A.-R. Al-Khiaty and M. Ahmed, "UML Class Diagrams: Similarity Aspects and Matching," Lecture Notes on Software Engineering, vol. 4, no. 1, pp. 41-47, 2016.

[22] A. Adamu and W. M. N. W. Zainoon, "A Framework for Enhancing the Retrieval of UML Diagrams. In: Kapitsaki G., Santana de Almeida E. (eds) Software Reuse: Bridging with Social-Awareness," in International Conference on Software Reuse, Cham, 2016.

[23] M. A.-R. Al-Khiaty and M. Ahmed, "Matching UML class diagrams using a Hybridized Greedy-Genetic algorithm," in 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), 2017.

[24] A. Adamu and W. M. N. W. Zainon, "Multiview Similarity Assessment Technique of UML Diagrams," Procedia Computer Science, vol. 124, pp. 311-318, 2017.

[25] A. Adamu and W. M. N. W. Zainon, "Similarity Assessment of UML Sequence Diagrams Using Dynamic Programming. In: Badioze Zaman H. et al. (eds) Advances in Visual Informatics," in International Visual Informatics Conference, Cham, 2017.

[26] D. O. Siahaan, Y. Desnelita, Gustientiedina and S. Sunarti, "Structural and semantic similarity measurement of UML sequence diagrams," in 2017 11th International Conference on Information & Communication Technology and System (ICTS), 2017.

[27] A. Adamu and W. M. N. W. Zainon, "Matching and retrieval of state machine diagrams from software repositories using Cuckoo Search Algorithm," in 2017 8th International Conference on Information Technology (ICIT), 2017.

[28] R. Fauzan, D. O. Siahaan, S. Rochimah and E. Triandini, "Class Diagram Similarity Measurement: A Different Approach," in 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), 215-219, 2018.

[29] R. Fauzan, D. O. Siahaan, S. Rochimah and E. Triandini, "Activity Diagram Similarity Measurement: A Different Approach," in 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2018.

[30] A. Adamu, W. M. N. Wan and S. M. Abdulrahman, "Empirical Investigation of UML Models Matching through Different Weight Calibration," in ICSCA '19: Proceedings of the 2019 8th International Conference on Software and Computer Applications, 2019.

[31] P. E. Triandini, R. Fauzan, D. O. Siahaan and S. Rochimah, "Sequence Diagram Similarity Measurement: A Different Approach," in 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2019.

[32] R. Fauzan, D. O. Siahaan, S. Rochimah and E. Triandini, "Use Case Diagram Similarity Measurement: A New Approach," in 2019 12th International Conference on Information & Communication Technology and System (ICTS), 2019.

[33] P. Čech, "Matching UML class models using graph edit distance," Expert Systems with Applications, vol. 130, pp. 206-224, 2019.

[34] M. Bae, S. Kang and S. Oh, "Semantic similarity method for keyword query system on RDF," Neurocomputing, vol. 146, pp. 264-275, 2014.

[35] M. Fowler, UML Distilled: A Brief Guide to the Standard Object Modeling Language, 3rd ed., Addison-Wesley Professional, 2003.

[36] J. Kovse and T. Härder, "Generic XMI-Based UML Model Transformations," in International Conference on Object-Oriented Information Systems, Berlin, Heidelberg, 2002.

[37] M. L. McHugh, "Interrater reliability: the kappa statistic," Biochemia Medica, vol. 22, no. 3, pp. 276-282, 2012.

[38] J. L. Fleiss, B. Levin and M. C. Paik, Statistical Methods for Rates and Proportions, John Wiley & Sons, 2013.

[39] J. R. Landis and G. G. Koch, "The Measurement of Observer Agreement for Categorical Data," Biometrics, vol. 33, no. 1, pp. 159-174, 1977.

[40] K. L. Gwet, "Computing inter‐rater reliability and its variance in the presence of high agreement," British Journal of Mathematical and Statistical Psychology, vol. 61, no. 1, pp. 29-48, 2008.

[41] K. L. Gwet, "Testing the Difference of Correlated Agreement Coefficients for Statistical Significance," Educational and Psychological Measurement, vol. 76, no. 4, 2016.

[42] T. Ohyama, "Statistical inference of Gwet’s AC1 coefficient for multiple raters and binary outcomes," Communications in Statistics - Theory and Methods, 2020.

[43] E. Cho, "Making Reliability Reliable: A Systematic Approach to Reliability Coefficients," Organizational Research Methods, pp. 1-32, 2016.

[44] A. Wieland, C. F. Durach, J. Kembro and H. Treiblmaier, "Statistical and judgmental criteria for scale purification," Supply Chain Management, vol. 22, no. 4, pp. 321-328, 2017.

[45] S. J. Zepeda and A. M. Jimenez, "Teacher Evaluation and Reliability: Additional Insights Gathered from Inter-rater Reliability Analyses," Journal of Educational Supervision, vol. 2, no. 2, pp. 11-26, 2019.

[46] K. Gwet, "Kappa Statistic is not Satisfactory for Assessing the Extent of Agreement Between Raters," Statistical Methods for Inter-rater Reliability Assessment, no. 1, 2002.

[47] N. Wongpakaran, T. Wongpakaran, D. Wedding and K. L. Gwet, "A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples," BMC Medical Research Methodology, vol. 12, no. 61, 2013.

[48] A. M. Jimenez and S. J. Zepeda, "A Comparison of Gwet’s AC1 and Kappa When Calculating Inter-Rater Reliability Coefficients in a Teacher Evaluation Context," Journal of Education Human Resources, vol. 38, no. 3, pp. 290-300, 2020.

[49] P. Pakray, S. Bandyopadhyay and A. Gelbukh, "Textual Entailment Using Lexical and Syntactic Similarity," International Journal of Artificial Intelligence & Applications (IJAIA), vol. 2, no. 1, pp. 43-58, 2011.

[50] A. Pawar and V. Mago, "Calculating the similarity between words and sentences using a lexical database and corpus statistics," 2018.

[51] D. Mazgutova and J. Kormos, "Syntactic and lexical development in an intensive English for Academic Purposes programme," Journal of Second Language Writing, vol. 29, pp. 3-15, 2015.

[52] B. Thompson and M. Post, "Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity," in Proceedings of the 5th Conference on Machine Translation (WMT), 2020.

[53] R. Fauzan, D. Siahaan, S. Rochimah and E. Triandini, "A Different Approach on Automated Use Case Diagram Semantic Assessment," International Journal of Intelligent Engineering and Systems, vol. 14, no. 1, pp. 496-505, 2021.

[54] R. Fauzan, D. Siahaan, S. Rochimah and E. Triandini, "Automated Class Diagram Assessment using Semantic and Structural Similarities," International Journal of Intelligent Engineering and Systems, vol. 14, no. 2, pp. 52-66, 2021.

[55] R. Fauzan, D. O. Siahaan, S. Rochimah and E. Triandini, "Novel Approach to Automated Behavioral Diagram Assessment Using Label Similarity and Subgraph Edit Distance," Computer Science, vol. 22, no. 2, pp. 191-207, 2021.

[56] X. Zhang, S. Sun and K. Zhang, "A New Hybrid Improved Method for Measuring Concept Semantic Similarity in WordNet," The International Arab Journal of Information Technology, vol. 17, no. 4, pp. 433-439, 2020.

[57] Z. Wu and M. Palmer, "Verb Semantics and Lexical Selection," in ACL '94: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994.

[58] F. Husein and R. Sarno, "Developing Word Sense Disambiguation Corpuses Using Word2vec and Wu Palmer for Disambiguation," in 2018 International Seminar on Application for Technology of Information and Communication, 2018.

[59] R. P. Honeck, "Semantic similarity between sentences," Journal of Psycholinguistic Research, vol. 2, p. 137–151, 1973.

[60] P. Sunilkumar and A. P. Shaji, "A Survey on Semantic Similarity," in 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), 2019.

[61] J.-B. Gao, B.-W. Zhang and X.-H. Chen, "A WordNet-based semantic similarity measurement combining edge-counting and information content theory," Engineering Applications of Artificial Intelligence, vol. 39, pp. 80-88, 2015.

[62] A. M. Jacobs and A. Kinder, "Features of word similarity," arXiv, 2018.

[63] Z. Yuan, L. Yan and Z. Ma, "Structural similarity measure between UML class diagrams based on UCG," Requirements Eng., vol. 25, p. 213–229, 2020.

[64] L. A. Zager and G. C. Verghese, "Graph similarity scoring and matching," Applied Mathematics Letters, vol. 21, no. 1, pp. 86-94, 2008.

[65] H. Bunke, "Exact (Graph) Matching," TU Wien, Szeged, 2013.

[66] M. Fey, J. E. Lenssen, C. Morris, J. Masci and N. M. Kriege, "Deep Graph Matching Consensus," arXiv, 2020.

[67] P. Swoboda, D. Kainmüller, A. Mokarian, C. Theobalt and F. Bernard, "A Convex Relaxation for Multi-Graph Matching," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[68] C. Liu, R. Wang, Z. Jiang and J. Yan, "Deep Reinforcement Learning of Graph Matching," arXiv, 2020.

[69] R. Hoffmann, C. McCreesh and C. Reilly, "Between Subgraph Isomorphism and Maximum Common Subgraph," in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 2017.

[70] C. Luo, X. Wang, C. Su and Z. Ni, "A Fixture Design Retrieving Method Based on Constrained Maximum Common Subgraph," IEEE Transactions on Automation Science and Engineering, vol. 15, no. 2, pp. 692-704, 2018.

[71] E. Duesbury, J. D. Holliday and P. Willett, "Maximum Common Subgraph Isomorphism Algorithms: A Review," MATCH Communications in Mathematical and in Computer Chemistry, vol. 77, no. 2, pp. 213-232, 2017.

[72] Y. Bai, D. Xu, A. Wang, K. Gu, X. Wu, A. Marinovic, C. Ro, Y. Sun and W. Wang, "Fast Detection of Maximum Common Subgraph via Deep Q-Learning," arXiv, 2020.

[73] H. Munawaroh, D. O. Siahaan, R. Fauzan and E. Triandini, "Structural Similarity Measurement using Graph Edit Distance-Greedy on State chart Diagrams," in 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS), 2020.

[74] K. Riesen, M. Ferrer and H. Bunke, "Approximate Graph Edit Distance in Quadratic Time," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 17, no. 2, pp. 483-494, 2020.

[75] F. Zulfa, D. O. Siahaan, R. Fauzan and E. Triandini, "Inter-Structure and Intra-Structure Similarity of Use Case Diagram using Greedy Graph Edit Distance," in 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS), 2020.

[76] K. Riesen and H. Bunke, "Improving Approximate Graph Edit Distance by Means of a Greedy Swap Strategy," in International Conference on Image and Signal Processing, Cham, 2014.

[77] K. Riesen and H. Bunke, "Graph Edit Distance — Novel Approximation Algorithms," in Handbook of Pattern Recognition and Computer Vision, 2016, pp. 275-291.

Downloads

Published

2021-05-17

How to Cite

[1]
E. Triandini, R. Fauzan, D. O. Siahaan, S. Rochimah, I. G. Suardika, and D. Karolita, “Software similarity measurements using UML diagrams: A systematic literature review”, regist. j. ilm. teknol. sist. inf., vol. 8, no. 1, pp. 10–23, May 2021.

Issue

Section

Article