Dependence between components in natural systems is a well studied phenomenon in the form of biological and social networks. The concept of community structure arises from the analysis of social networks and has successfully been applied to complex networks in other fields such as biology, physics and computing. We provide empirical evidence that dependence between statements in source code gives rise to community structure. This leads to the introduction of the concept of dependence communities in software and we provide evidence that they reflect the semantic concerns of a program.
Current definitions of sliced-based cohesion and coupling metrics are not defined for procedures which do not have clearly defined output variables and definitions of output variable vary from study-to-study. We solve these problems by introducing corresponding new, more efficient forms of slice-based metrics in terms of maximal slices. We show that there is a strong correlation between these new metrics and the old metrics computed using output variables.
We conduct an investigation into dependence clusters which are closely related to dependence communities. We undertake an empirical study using definitions of dependence clusters from previous studies and show that, while programs do contain large dependence clusters, over 75% of these are not 'true' dependence clusters.
We bring together the main elements of the thesis in a study of software quality, investigating their interrelated nature. We show that procedures that are members of multiple communities have a low cohesion, programs with higher coupling have larger dependence communities, programs with large dependence clusters also have large dependence communities and programs with high modularity have low coupling.
Dependence communities and maximal-slice-based metrics have a huge number of potential applications including program comprehension, maintenance, debugging, refactoring, testing and software protection.