260Accesses
4Citations
Abstract
Tools and methods in the context of Model Driven Engineering (MDE) have to be evaluated and tested using appropriate models as test cases. Unfortunately, adequate test models are scarcely available in many application domains and have to be artificially created. In this regard, model generators have been proposed recently. Principally, they generate test models by modifying a base model using a specified set of edit operations. The modification process should be done in a way that the resulting test models are as “realistic” as possible, i.e. the applied changes should resemble the real evolution that one observes in real software systems at the abstraction level of models. Therefore, we have to (1) properly capture the evolution of real software models, (2) statistically model the evolution (changes) and (3) finally properly reproduce it in the generated test models. To this end, we reversed engineered all revisions of nine typical Java systems into their class diagrams (totally 6,559 distinct models). We compared the subsequent models using a state-of-the-art model differencing tool and we computed the changes between them in terms of applied edit operations. We investigated the fitness of 60 promising distributions on the observed frequencies of edit operations in order to statistically model the changes. Four of our candidate distributions were successful to statistically model the changes with very good success rates. Since it was not known how to implement them, i.e. produce their random variates, we developed a practical implementation. The implemented distributions are then used to reproduce the real evolution of software systems in order to synthesize more realistic test models for MDE tools.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.
















Similar content being viewed by others
Notes
Our notion of a test model should not be confused with the notion of a test model in the context of model-based testing: there, a test model is an abstract specification of a test suite, the single test cases not necessarily being models. In our context, a test model is one test case for an MDE tool.
For instance, in state machines,createState(Name) creates a state with a given name.
To be precise: Discrete Pareto, Yule, Warring and beta-negative binomial distributions.
A set of generated test models can be found on the accompanying website of the paper at [47].
This pipeline is also applicable to other model types as long as its model type specific parts, i.e. the similarity-based configuration and the edit operations, are properly adapted.
Fig. 2 See the accompanying website of the paper at [47], for the full list of the tested distributions.
Some of these distributions are famous and were also quite successful in other fields of research e.g. binomial, Poisson, gamma and Chi-Squared distributions.
Similarly theDescending Factorial is also defined.
There is also another symbol for the ascending factorial that is used in the related literature. It is called thePochhammer Symbol and is usually denoted by\((x)_{n}\). It is also used as descending factorial in some cases, so caution is advised.
The Riemann Zeta function is a special case of theHurwitz Zeta Function and the latter is itself a special case of theLerch Transcendent Function.
Also referred to as the Zipf distribution, the Riemann Zeta distribution or the Zeta distribution.
\(\mathbb {R}\) indicates the set of all real numbers.
The General Lerch distribution is defined based on the Lerch Transcendent function. The Lerch Transcendent function gives the Riemann Zeta function as its special case.
In the literature also referred to as the Generalized Waring distribution. When its support is shifted, it is referred to as the beta-Pascal distribution.
60 distributions, 9 projects, 5 low-level operations, 15 model element types.
60 distributions, 9 projects, 188 high-level operations.
The probability plot depicts the cumulative distribution functions (ranging from 0 to 1) of two data sets or distributions against each other in order to visually assess their closeness.
CDF: Cumulative Distribution Function.
To be more precise, 2 out of 9,468 fittings or almost 0.02 % of all possible fittings. The number of 9,468 is the fitting of 4 distributions on 9 projects over (75 low-level + 188 high-level) operations, i.e.\(9{,}468 = 4 \times 9 \times (75+188)\).
Actually based on the shifted data (see Sect. 5.1.1).
See [36] for a short summary of where else the Power Law is also observed.
Power of a test is the probability to reject the null hypothesis, when it is wrong.
For the detailed information on how other aspects of modification process is controlled by the Stochastic Controller, please refer to [39].
In other words we look for smallest\(k\) such that\(u \le F ( x_k)\).
\(\rho =s-1\) is the parameter of the discrete Pareto distribution [Eq. (3)].
\(\lfloor x \rfloor \) is the floor function of\(x\) which is by definition, the largest integer not greater than\(x\).
For the interpretation of the negative binomial distribution, please see Sect. 4.1.
\(\left( {\begin{array}{c}n\\ k\end{array}}\right) =\frac{n!}{k! ( n-k )!}\) .
References
Adamic, L.A., Huberman, B.A.: Zipf’s law and the internet. Glottometrics 3, 143–150 (2002).http://www.hpl.hp.com/research/idl/papers/ranking/adamicglottometrics.pdf
Ahrens JH, Dieter U (1974) Computer methods for sampling from gamma, beta, poisson and bionomial distributions. Computing 12(3):223–246
Altmanninger K, Seidl M, Wimmer M (2009) A survey on model versioning approaches. Int J Web Inf Syst 5(3):271–304
Banks J (ed) (1998) Handbook of simulation: principles, methodology, advances, application and practice. Wiley, New York
Banks J, Carson JS, Nelson BL, Nicol DM (2010) Discrete-event systems simulation, 5th edn. Pearson
Baxter G, Frean M, Noble J, Rickerby M, Smith H, Visser M, Melton H, Tempero E (2006) Understanding the shape of java software. SIGPLAN Not. 41
Bennett BS (1995) Simulation fundamentals, 1st edn. Prentice Hall, New Jersey
Brambilla M, Cabot J, Wimmer M (2012) Model-driven software engineering in practice. Synthesis lectures on software engineering. Morgan & Claypool Publishers
Cheng RCH (1977) The generation of gamma variables with non-integral shape parameter. Applied statistics, pp 71–75
Cheng RCH (1978) Generating beta variates with nonintegral shape parameters. Commun ACM 21(4):317–322
Concas G, Marchesi M, Pinna S, Serra N (2007) Power-laws in a large object-oriented software system. IEEE Trans Softw Eng 33
Devroye L (1986) Non-uniform random variate generation. Springer, Berlin
Embrechts P, Hofert M (2010) A note on generalized inverses. ETH Zurich (preprint)
Erdelyi A, Magnus W, Oberhettinger F, Tricomi FG (1955) Higher transcendental functions, vol 1. McGraw-Hill, New York
Fishman GS (2001) Discrete-event simulation, modeling, programming and analysis. Springer, New York
Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, New York
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley Professional
Gradshteyn IS, Ryzhik IM (2007) Table of integrals, series and products, 7th edn. Academic Press, New York
Ichii M, Matsushita M, Inoue K (2008) An exploration of power-law in use-relation of java software systems. In: 19th Australian conference on software engineering ASWEC
Ilijašic L, Saitta L (2010) Long-tailed distributions in grid complex network. In: Proceedings of 2nd workshop grids meets autonomic computing GMAC. ACM, USA
Irwin JO (1975) The generalized waring distribution. Part I. J R Stat Soc Ser A (Gen) 138(1):18–31
Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions, vols 1, 2, 2nd edn. Wiley, New York
Johnson NL, Kotz S, Balakrishnan N (1997) Discrete multivariate distributions. Wiley, New York
Johnson NL, Kotz S, Kemp AW (1992) Univariate discrete distributions, 2nd edn. Wiley Interscience, New York
Johnson NL, Kotz S, Kemp AW (2005) Univariate discrete distributions, 3rd edn. Wiley Interscience, New York
Kehrer T, Kelter U, Pietsch P, Schmidt M (2012) Adaptability of model comparison tools. In: Proceedings of the 27th international conference on automated software engineering. ASE, USA
Kehrer T, Kelter U, Taentzer G (2011) A rule-based approach to the semantic lifting of model differences in the context of model versioning. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering. ASE, USA
Kleppe AG, Warmer JB, Bast W (2003) MDA explained, the model driven architecture: practice and promise. Addison-Wesley Professional
Kolovos DS, Ruscio DD, Paige RF, Pierantonio A (2009) Different models for model matching: an analysis of approaches to support model differencing. In: Proceedings of ICSE workshop comparison and versioning of software models. CVSM, USA
Krishnamoorthy K (2006) Handbook of statistical distributions with applications. Chapman & Hall/CRC
Lanza M, Marinescu R (2006) Object-oriented metrics in practice—using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer, Berlin
Law AM (2007) Simulation modeling and analysis, 4th edn. McGraw-Hill, New York
L’Ecuyer P (1994) Uniform random number generation. Ann Oper Res 53(1):77–120
Lemeshko BY, Lemeshko S, Postovalov S (2007) The power of goodness of fit tests for close alternatives. Meas Tech 50(2):132–141
Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Internet Math 1:226–251
Newman MEJ (2005) Power laws, pareto distributions and Zipf’s law. Contemp Phys 46:323–351
Olver FWJ, Lozier DW, Boisvert RF, Clark CW (2010) NIST handbook of mathematical functions. NIST and Cambridge University Press
Pietsch P, Shariat Yazdi H, Kelter U (2011) Generating realistic test models for model processing tools. In: 26th international conference on automated software engineering. ASE, USA
Pietsch P, Shariat Yazdi H, Kelter U (2012) Controlled generation of models with defined properties. In: Software engineering SE2012, Berlin
Pietsch P, Shariat Yazdi H, Kelter U, Kehrer T (2012) Assessing the quality of model differencing engines. In: Comparison and versioning of software models (CVSM 2012)
(2008–2009) R-forge distributions Core Team: a guide on probability distributions
Ripley BD (1987) Stochastic simulation. Wiley, New York
Ross SM (2006) Simulation, 4th edn. Academic Press, New York
Saucier R (2000) Computer generation of statistical distributions. Army Research Laboratory
Shariat Yazdi H, Pietsch P (2011) The SiDiff model generator.http://pi.informatik.uni-siegen.de/qudimo/smg/
Shariat Yazdi H, Pietsch P, Kehrer T, Kelter U (2013) Statistical analysis of changes for synthesizing realistic test models. In: Multi-conference software engineering 2013 (SE2013). Geselschaft Informatik
Shariat Yazdi H, Pietsch P, Kehrer T, Kelter U (2014) Accompanied material and data for the Computer Science—Research and Development (CSRD) journal paper (Springer).http://pi.informatik.uni-siegen.de/qudimo/smg/CSRD2014
Steele M, Chaseling J (2006) Powers of discrete goodness-of-fit test statistics for a uniform null against a selection of alternative distributions. Commun Stat Simul Comput 35(4):1067–1075
Steele M, Chaseling J, Hurst C (2007) Comparing the simulated power of discrete goodness-of-fit tests for small sample sizes. In: 2nd international conference on Asian simulation and modeling, towards sustainable livelihood and environment
Stephan M, Cordy JR (2012) A survey of methods and applications of model comparison, Tech. rep. QueenGs University, Ontario
Stephan M, Cordy JR (2013) A survey of model comparison approaches and applications. International conference on model-driven engineering and software development MODELSWARD (to appear)
Vasa R (2010) Growth and change dynamics in open source software systems. Ph.D. thesis, Swinburne University of Technology
Vasa R, Lumpe M, Jones A (2010) Helix—software evolution data set.http://www.ict.swin.edu.au/research/projects/helix
Vasa R, Schneider JG, Nierstrasz O (2007) The inevitable stability of software change. In: IEEE international conference on software maintenance, ICSM
Vasa R, Schneider JG, Nierstrasz O, Woodward C (2007) On the resilience of classes to change. ECEASST 8
Walck C (2007) Handbook on statistical distributions for exprimentalists, Tech. rep. Particle Physics Group, Fysikum, University of Stockholm
Wenzel S (2011) Unique identification of elements in evolving models: towards fine-grained traceability in model-driven engineering. Ph.D. thesis, Uni. Siegen
Wheeldon R, Counsell S (2003) Power law distributions in class relationships. In: Proceedings of 3rd IEEE international workshop source code analysis and manipulation. IEEE
Wimmer G, Altmann G (1999) Thesaurus of univariate discrete probability distributions, 1st edn. Stamm
Zörnig P, Altmann G (1995) Unified representation of Zipf distributions. Comput Stat Data Anal 19(4):461–473
Acknowledgments
The work of the first author, Hamed Shariat Yazdi, is supported by the German Research Foundation (DFG) under Grant KE 499/5-1. The authors would like to thank the anonymous reviewers for their constructive suggestions.
Author information
Authors and Affiliations
Software Engineering Group, Department of Computer Sciences, University of Siegen, 57068 , Siegen, Germany
Hamed Shariat Yazdi, Pit Pietsch, Timo Kehrer & Udo Kelter
- Hamed Shariat Yazdi
You can also search for this author inPubMed Google Scholar
- Pit Pietsch
You can also search for this author inPubMed Google Scholar
- Timo Kehrer
You can also search for this author inPubMed Google Scholar
- Udo Kelter
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toHamed Shariat Yazdi.
Additional information
This paper is the revised and extended version of [46] presented at the Software Engineering Conference 2013 (SE2013) in Aachen, Germany.
Rights and permissions
About this article
Cite this article
Shariat Yazdi, H., Pietsch, P., Kehrer, T.et al. Synthesizing realistic test models.Comput Sci Res Dev30, 231–253 (2015). https://doi.org/10.1007/s00450-014-0255-y
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative