Conducting Quantitative Software Engineering Studies with Alitheia Core
In our work (missing reference) (missing reference) (missing reference) (missing reference) (missing reference) (missing reference) (missing reference) (missing reference) (missing reference), we examine the benefits of conducting large scale sofware engineering research with Alitheia Core.
In our work, we used Alitheia Core revision 88a6fcc6d15aac911aba710ea8a30e2a9a166443. (Note that after this revision, Alitheia Core data schema changed, so the datasets we provide may not work.
We processed two data sets:
- A four project which includes both raw data and Alitheia Core metadata from running the first updater stage for the four projects we describe in our work (JConvert, Gnome-VFS, Evolution and FreeBSD).
- The data set from my Phd work. This dataset was used to conduct the case study.
Using the datasets
Both datasets are essentially MySQL dumps. To use them with Alitheia Core, do the following (the instructions assume a Unix-like OS):
- Install Alitheia Core
- Load either dump in the MySQL database. This can be done using the following command:
Note that the raw data for the four project dataset must be placed (or linked
- Alitheia Core uses the machine's name to register it as a node in a cluster
installation. Projects are always assigned to a cluster node, through an entry
CLUSTERNODE_PROJECTtable. This means that by default Alitheia Core will not be able to use the imported data as the projects are assigned to another (our) machine. To fix this, start Alitheia Core and open a connection to MySQL. Then do:
- Alitheia Core should now be able to process the data. To test the installation, start Alitheia Core, install a plug-in in the plug-ins page, go to the projects page, select a project and click Synchronize in the plug-in dialog. Check whether the jobs number is reducing and whether the failed jobs count is 0 or at least very low.
Replicating the case study
Instructions on how to replicate the experiments described in the paper using the provided datasets.
To measure the performance of Alitheia Core, we used the raw data from the first
dataset. To perform the measurements in your environment, in turn just
add each project and start all
updaters at once. For more precise time keeping, one could set the
eu.sqooss.log.perf parameter (in the top level
pom.xml file) to
and recompile Alitheia Core. Detailed performance
measurements (per job) can then be found in the
Development Teams and Maintainability
The second case study was performed on projects whose primary language (as can be obtained by counting the number of files in their latest version) is C (listed in this file). For each project, we addded it to the Alitheia Core system, assigned it manually to the most appropriate node in our cluster and started the source code metadata updater. After the project data were imported, we run the required metrics. At the time the experiment took place, Alitheia Core was not able to handle metric dependencies, so the order of metric execution was preserved by manually initiating metric runs. Later versions, such as the one linked above, will automatically run metrics and their dependencies in the correct order.
To do the correlation analysis required in order to examine the two hypotheses,
we obtained results from both the maintainability index and the developer
metrics plug-ins, for each type of entity we were interested about
(specifically, project versions and modules). The query to obtain the results
at the module level for project
Gnome-VFS is the following:
The query to obtain the results at the project level using a 3 month time window to determine team size (team size comprises of developers that have committed at least once within this time window) for the same project is the following:
Results at the project and module level for all projects can be obtained by means of simple shell scripts that execute the two queries for all projects in the Then, importing the data in R and performing the correlation analysis is as simple as the following script. As the dataset to be analyzed is very big, R will need a machine with at least 4 gigabytes of memory to process it.